SQL Server:从 VARCHAR(MAX) 字段替换无效的 XML 字符

2022-01-03 00:00:00 xml sql tsql sql-server sql-server-2012

我有一个 VARCHAR(MAX) 字段,它以 XML 格式连接到外部系统.接口抛出以下错误:

I have a VARCHAR(MAX) field which is being interfaced to an external system in XML format. The following errors were thrown by the interface:

mywebsite.com-2015-0202.xml:413005: parser error : xmlParseCharRef: invalid xmlChar value 29
ne and Luke's family in Santa Fe. You know you have a standing invitation,
                                                                               ^
mywebsite.com-2015-0202.xml:455971: parser error : xmlParseCharRef: invalid xmlChar value 25
The apprentice nodded, because frankly, who hadnt? That diseases like chol
                                                      ^
mywebsite.com.com-2015-0202.xml:456077: parser error : xmlParseCharRef: invalid xmlChar value 28
bon mot; a sentimental love of nature and animals; the proverbial British 
                                                                               ^
mywebsite.com-2015-0202.xml:472073: parser error : xmlParseCharRef: invalid xmlChar value 20
"Andyou want that?"
          ^
mywebsite.com-2015-0202.xml:492912: parser error : xmlParseCharRef: invalid xmlChar value 25
She couldnt live like this anymore.

我们发现以下字符列表无效:

We found that the following list of characters are invalid:

�








	

























我正在尝试清理这些数据,我找到了一个 SQL 函数来清理这些字符 此处.但是,该函数将 NVARCHAR(4000) 作为输入参数,因此我已将该函数更改为使用 VARCHAR(MAX).

I am trying to clean this data, and I found a SQL function to clean these characters here. However, the function was taking NVARCHAR(4000) as input parameter, so I have changed the function to use VARCHAR(MAX) instead.

如果将 NVARCHAR(4000) 更改为 VARCHAR(MAX) 会产生错误的结果,请告知任何人吗?抱歉,我无法在本地测试此界面,因此想寻求意见/建议.

Could anyone please advise if changing the NVARCHAR(4000) to VARCHAR(MAX) would produce wrong results? Sorry, I wouldn't be able to test this interface locally so thought to seek opinion/advise.

原始功能:

CREATE FUNCTION fnStripLowAscii (@InputString nvarchar(4000))
RETURNS nvarchar(4000)
AS
BEGIN
IF @InputString IS NOT NULL
BEGIN
  DECLARE @Counter int, @TestString nvarchar(40)

  SET @TestString = '%[' + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR(5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31) + ']%'

  SELECT @Counter = PATINDEX (@TestString, @InputString COLLATE Latin1_General_BIN)

  WHILE @Counter <> 0
  BEGIN
    SELECT @InputString = STUFF(@InputString, @Counter, 1, NCHAR(164))
    SELECT @Counter = PATINDEX (@TestString, @InputString COLLATE Latin1_General_BIN)
  END
END
RETURN(@InputString)
END

修改版本:

CREATE FUNCTION [dbo].RemoveInvalidXMLCharacters (@InputString VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
    IF @InputString IS NOT NULL
    BEGIN
      DECLARE @Counter INT, @TestString NVARCHAR(40)

      SET @TestString = '%[' + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR(5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31) + ']%'

      SELECT @Counter = PATINDEX (@TestString, @InputString COLLATE Latin1_General_BIN)

      WHILE @Counter <> 0
      BEGIN
        SELECT @InputString = STUFF(@InputString, @Counter, 1, ' ')
        SELECT @Counter = PATINDEX (@TestString, @InputString COLLATE Latin1_General_BIN)
      END
    END
    RETURN(@InputString)
END

推荐答案

使用 VARCHAR(MAX) 是安全的,因为我的数据列是 VARCHAR(MAX)场地.此外,如果我将 VARCHAR(MAX) 字段传递给接受 NVARCHAR(MAX) 参数的 SQL 函数.

It is safe to use VARCHAR(MAX) as my data column is a VARCHAR(MAX) field. Also, there will be an overhead of converting VARCHAR(MAX) to NVARCHAR(MAX) if I pass a VARCHAR(MAX) field to the SQL function which accepts the NVARCHAR(MAX) param.

非常感谢@RhysJones、@Damien_The_Unbeliever 的评论.

Thank you very much @RhysJones, @Damien_The_Unbeliever for your comments.

相关文章