SQL Server XML 字符串操作

2021-10-01 00:00:00 xml sql-server sql-server-2008-r2

让我声明我是 XML 新手.也就是说,我的问题是我有一个创建 XML 数据的 SQL Server,并将其放入一个必须通过安全门到达另一台服务器的文件中.门有几个脏"词的列表,如果包含这些词,将导致文件失败.我需要的是一种让 SQL 搜索 XML 数据、每个节点的方法,如果存在脏"值,则将其删除(替换为空白).XML 不是强类型的,脏"字可能是较长字符串的一部分.在这种情况下,字符串的其余部分必须保持完整.

Let me state I am an XML novice. That said, my issue is I have a SQL Server that creates XML data, and places that into a file that must pass through a security gate to another server. The gate has a list of several "dirty"words that will cause the files to fail if they are included. What I need, is a way for SQL to search the XML data, every node, and if the "dirty" value is present, strip it out (replace with blank). The XML is not strongly typed, and the "dirty"word could possibly be part of a longer string. In that case, the rest of the string must remain intact.

例如,如果脏"字是hold",那么字符串我们认为这些真理是不言自明的"将变成我们这些真理是不言自明的".

For example, if the "dirty" word is "hold," the string "We hold these truths to be self evident" would become "We these truths to be self evident."

同样,这个脏"字可以在任何节点中,并且标签不会总是相同的.我需要编写一个过程或触发器来分析基于脏词列表的 XML 值来清理它.

Again, this "dirty" word could be in any node, and the tags will not always be the same. I need to write a procedure or trigger that analyzes the XML value based on the dirty word list to clean it up.

推荐答案

将 XML 分解为每个节点一行的表格.该表需要一个 id 与该节点在粉碎的 XML 中的位置相对应,以便能够写回更改.

Shred the XML to a table with one row for each node. The table needs an id that corresponds to the position of the node in the shredded XML to be able to write back the changes.

将你的坏词放在一个表中,对于每个词,使用 replace 将它们从带有节点值的表中删除.

Have your bad words in a table and for each word use replace to remove them from the table with the nodes values.

最后,您遍历清理过的值并将它们一次一个节点写回 XML,以用于实际修改的节点.

Finally you loop through the cleaned values and write them back to the XML one node at a time for the nodes that was actually modified.

-- A table to hold the bad words
declare @BadWords table
(
  ID int identity,
  Value nvarchar(10)
)

-- These are the bad ones.
insert into @BadWords values
('one'),
('three'),
('five'),
('hold')

-- XML that needs cleaning
declare @XML xml = '
<root>
  <itemone ID="1one1">1one1</itemone>
  <itemtwo>2two2</itemtwo>
  <items>
    <item>1one1</item>
    <item>2two2</item>
    <item>onetwothreefourfive</item>
  </items>
  <hold>We hold these truths to be self evident</hold>
</root>
'

-- A helper table to hold the values to modify
declare @T table
(
  ID int identity,
  Pos int,
  OldValue nvarchar(max),
  NewValue nvarchar(max),
  Attribute bit
)

-- Get all attributes from the XML
insert into @T(Pos, OldValue, NewValue, Attribute)
select row_number() over(order by T.N),
       T.N.value('.', 'nvarchar(max)'),
       T.N.value('.', 'nvarchar(max)'),
       1
from @XML.nodes('//@*') as T(N)

-- Get all values from the XML
insert into @T(Pos, OldValue, NewValue, Attribute)
select row_number() over(order by T.N),
       T.N.value('text()[1]', 'nvarchar(max)'),
       T.N.value('text()[1]', 'nvarchar(max)'),
       0
from @XML.nodes('//*') as T(N)

declare @ID int
declare @Pos int
declare @Value nvarchar(max)
declare @Attribute bit

-- Remove the bad words from @T, one bad word at a time
select @ID = max(ID) from @BadWords
while @ID > 0
begin
  select @Value = Value
  from @BadWords
  where ID = @ID

  update @T
  set NewValue = replace(NewValue, @Value, '')

  set @ID -= 1
end

-- Write the cleaned values back to the XML
select @ID = max(ID) from @T
while @ID > 0
begin
  select @Value = nullif(NewValue, OldValue),
         @Attribute = Attribute,
         @Pos = Pos
  from @T
  where ID = @ID

  print @Attribute

  if @Value is not null
    if @Attribute = 1  
      set @XML.modify('replace value of ((//@*)[sql:variable("@Pos")])[1] 
                       with sql:variable("@Value")')
    else
      set @XML.modify('replace value of ((//*)[sql:variable("@Pos")]/text())[1] 
                           with sql:variable("@Value")')
  set @ID -= 1
end

select @XML

注意:在某些情况下,上面的代码不会处理修改本身产生错误值的值.

Note: In some cases the code above will not deal with values where the modification itself creates the bad value.

<item>fioneve</item>

将被修改为

<item>five</item>

相关文章