从基于 SET 的 SQL 中的 VARCHAR 字段中提取段/值的最佳方法

以下面的数据为例:

SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [主题]联合所有SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112' AS [主题]联合所有选择帮助台呼叫参考 F0012347,记录新呼叫,40113"作为 [主题]

我想做的是按如下方式提取这些数据:

如您所见,我需要提取 Ref、Type 和OurRef 作为单独的列,以确保在处理生成的电子邮件时有效地基于集合的 SQL.

通常对于这种情况,我会使用这样的函数:

创建函数 dbo.fnParseString (@Section SMALLINT ,@分隔符 CHAR ,@Text VARCHAR(MAX))返回 VARCHAR(8000)作为开始声明@NextPos SMALLINT;声明@LastPos SMALLINT;声明@Found SMALLINT;选择 @NextPos = CHARINDEX(@Delimiter, @Text, 1) ,@LastPos = 0 ,@找到 = 1而@NextPos >0AND ABS(@Section) <>@成立选择 @LastPos = @NextPos ,@NextPos = CHARINDEX(@Delimiter, @Text, @NextPos + 1) ,@Found = @Found + 1返回 LTRIM(RTRIM(CASE当@找到<>ABS(@Section) 或 @Section = 0 THEN NULL当@Section >0 THEN SUBSTRING(@Text, @LastPos + 1, CASE WHEN @NextPos = 0 THEN DATALENGTH(@Text) - @LastPos ELSE @NextPos - @LastPos - 1 END)ELSE SUBSTRING(@Text, @LastPos + 1, CASE WHEN @NextPos = 0 THEN DATALENGTH(@Text) - @LastPos ELSE @NextPos - @LastPos - 1 END)结尾))结尾

例如,我然后将 ref 之前的空格替换为包含逗号并按如下方式拆分:

带有示例数据AS ( SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [主题]联合所有选择帮助台呼叫参考 F0012346,呼叫已解决,40112"联合所有选择帮助台呼叫参考 F0012347,记录新呼叫,40113")SELECT dbo.fnParseString(2, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Ref] ,dbo.fnParseString(3, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Type] ,dbo.fnParseString(4, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [OurRef]来自示例数据

如您所见,我有一个解决方案可以得到我想要的最终结果,但是使用凌乱的 udf 并不理想 &我想知道 sql-server有更好的方法来做这样的事情 - 也许是内联正则表达式?IE.我认为 PATINDEX() 接受正则表达式作为搜索字符串 - 这与 SUBSTRING() 结合可以满足我的需要,但我真的不知道从哪里开始?p>

请注意,这是一个简化的示例,主题是可变的,我也将采用相同的技术来解析正文,正文将包含 8 项数据,我需要使用各种分隔符进行解析,因此这排除了使用 ParseName() 因为它只允许 4 个部分,并且我不能使用固定长度(即 substring())因为长度会非常不同(特别是如果涉及不同的帮助台(它们是) - 这就是为什么我沿着 PATINDEX() & SUBSTRING()

解决方案

经过额外的工作,我们决定不使用 Art 的答案中的方法(即使它有效).

我们需要一种更强大的方法来验证和提取子字符串,所以我通过 CLR 路由使用正则表达式(感谢 Pondlife 为我指明了正确的方向).

我采取的方法如下:

首先我编译了以下 CLR:(从 C# 示例转换为 VB 这里)

导入 System.Data导入 System.Data.SqlClient导入 System.Data.SqlTypes导入 Microsoft.SqlServer.Server导入 System.Text.RegularExpressions导入 System.Text部分公共类 UserDefinedFunctions公共共享只读选项为 RegexOptions = RegexOptions.IgnorePatternWhitespace 或 RegexOptions.Multiline<SqlFunction()>_公共共享函数 RegexMatch(ByVal 输入 As SqlChars, ByVal 模式 As SqlString) As SqlBoolean将正则表达式调暗为新的正则表达式(pattern.Value,Options)返回正则表达式.IsMatch(新字符串(input.Value))结束功能<SqlFunction()>_公共共享函数 RegexReplace(ByVal 表达式 As SqlString, ByVal 模式 As SqlString, ByVal replace As SqlString) As SqlStringIf expression.IsNull OrElse pattern.IsNull OrElse replace.IsNull Then返回 SqlString.Null万一Dim r As New Regex(pattern.ToString())Return New SqlString(r.Replace(expression.ToString(), replace.ToString()))结束功能' 返回匹配的字符串.结果由第三个参数分隔<SqlFunction()>_公共共享函数 RegexSelectAll(ByVal 输入 As SqlChars, ByVal 模式 As SqlString, ByVal matchDelimiter As SqlString) As SqlString将正则表达式调暗为新的正则表达式(pattern.Value,Options)暗淡结果为 Match = regex.Match(New String(input.Value))Dim sb As New StringBuilder()虽然结果.成功sb.Append(results.Value)结果 = results.NextMatch()' 用换行符分隔结果|换行符If results.Success Thensb.Append(matchDelimiter.Value)万一结束时返回新的 SqlString(sb.ToString())结束功能' 返回匹配的字符串' matchIndex 是结果的从零开始的索引.第一场比赛为 0,第二场比赛为 1,以此类推<SqlFunction()>_公共共享函数 RegexSelectOne(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchIndex As SqlInt32) As SqlString将正则表达式调暗为新的正则表达式(pattern.Value,Options)暗淡结果为 Match = regex.Match(New String(input.Value))将 resultStr 调暗为 String = ""暗淡索引 As Integer = 0虽然结果.成功如果索引 = matchIndex 则resultStr = results.Value.ToString()万一结果 = results.NextMatch()索引 += 1结束时返回新的 SqlString(resultStr)结束功能结束类

我按如下方式安装了这个 CLR:

EXEC sp_configure'clr 启用' ,'1'去重新配置使用 [db_Utility]去从D:Program FilesMicrosoft SQL ServerMSSQL10_50.MSSQLSERVERMSSQLBinnSQL_CLR_RegExp.dll"创建程序集 SQL_CLR_RegExpPERMISSION_SET = 安全去-- ===============================================-- 如果输入匹配模式,则返回 1 或 0-- VB函数:RegexMatch(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlBoolean-- ===============================================创建函数 [dbo].[RegexMatch](@input [nvarchar](MAX) ,@pattern [nvarchar](MAX))返回 [位]以调用者身份执行作为外部名称[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexMatch]去-- ===============================================-- 返回一个逗号分隔的已找到对象字符串-- VB函数:RegexReplace(ByVal expression As SqlString, ByVal pattern As SqlString, ByVal replace As SqlString) As SqlString-- ===============================================创建函数 [dbo].[RegexReplace](@表达式 [nvarchar](MAX) ,@pattern [nvarchar](MAX) ,@replace [nvarchar](MAX))返回 [nvarchar](MAX)以调用者身份执行作为外部名称[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexReplace]去-- ===============================================-- 返回一个逗号分隔的已找到对象字符串-- VB函数:RegexSelectAll(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchDelimiter As SqlString) As SqlString-- ===============================================创建函数 [dbo].[RegexSelectAll](@input [nvarchar](MAX) ,@pattern [nvarchar](MAX) ,@matchDelimiter [nvarchar](MAX))返回 [nvarchar](MAX)以调用者身份执行作为外部名称[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexSelectAll]去-- ===============================================-- 返回从零开始的索引的匹配索引-- RegexSelectOne(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchIndex As SqlInt32) As SqlString-- ===============================================创建函数 [dbo].[RegexSelectOne](@input [nvarchar](MAX) ,@pattern [nvarchar](MAX) ,@matchIndex [整数])返回 [nvarchar](MAX)以调用者身份执行作为外部名称[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexSelectOne]去

然后我编写了以下包装函数来简化使用:

设置 ANSI_NULLS ON去设置 QUOTED_IDENTIFIER ON去-- ===============================================——作者:<乔登·皮林>-- 创建日期:<30/01/2013>-- 描述:<使用开始和结束文本调用 RegexSelectOne 并清理结果>-- ===============================================创建函数 [dbo].[RegexSelectOneWithScrub](@Haystack VARCHAR(MAX),@StartNeedle VARCHAR(MAX),@EndNeedle VARCHAR(MAX))返回 VARCHAR(MAX)作为开始声明 @ReturnStr VARCHAR(MAX)--#### 使用开始和结束针从 HayStack 中提取文本SET @ReturnStr = dbo.RegexSelectOne(@Haystack, REPLACE(@StartNeedle, ' ','s') + '((.|
)+?)' + REPLACE(@EndNeedle, ' ','s'), 0)--#### 移除针头SET @ReturnStr = REPLACE(@ReturnStr, @StartNeedle, '')SET @ReturnStr = REPLACE(@ReturnStr, @EndNeedle, '')--#### 修剪空白SET @ReturnStr = LTRIM(RTRIM(@ReturnStr))--#### 修剪换行符和回车符SET @ReturnStr = dbo.SuperTrim(@ReturnStr)返回@ReturnStr结尾去

允许的用法如下:

DECLARE @Subject VARCHAR(250) = 'HelpDesk Call Reference F0012345, Call Update, 40111'声明 @Ref VARCHAR(250) = NULLIF dbo.RegexMatch(@Subject, '^HelpDesksCallsReferencesF[0-9]{7},s(CallsResolved|CallsUpdate|Newscallslogged),(|s+)([0-9]+|未知)$') = 1SET @Ref = ISNULL(dbo.RegexSelectOneWithScrub(@Subject, 'HelpDesk Call Reference', ','), '无效 (#1)')别的SET @Ref = '无效 (#2)'选择@Ref

这对于多个搜索使用起来要快得多,并且在处理具有不同开头和结尾短语等的大量文本时更强大.

Take the following example data:

SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113' AS [Subject]

What i would like to do is extract this data as follows:

As you can see, i need to extract the Ref, Type & OurRef as seperate columns to ensure efficient set based SQL when processing the resulting emails.

Usually for this scenario i would use a function such as this:

CREATE FUNCTION dbo.fnParseString (
    @Section SMALLINT ,
    @Delimiter CHAR ,
    @Text VARCHAR(MAX)
)
RETURNS VARCHAR(8000)
AS 
    BEGIN
        DECLARE @NextPos SMALLINT;
        DECLARE @LastPos SMALLINT;
        DECLARE @Found SMALLINT;

        SELECT  @NextPos = CHARINDEX(@Delimiter, @Text, 1) ,
                @LastPos = 0 ,
                @Found = 1

        WHILE @NextPos > 0
            AND ABS(@Section) <> @Found 
            SELECT  @LastPos = @NextPos ,
                    @NextPos = CHARINDEX(@Delimiter, @Text, @NextPos + 1) ,
                    @Found = @Found + 1

        RETURN LTRIM(RTRIM(CASE
            WHEN @Found <> ABS(@Section) OR @Section = 0 THEN NULL
            WHEN @Section > 0 THEN SUBSTRING(@Text, @LastPos + 1, CASE WHEN @NextPos = 0 THEN DATALENGTH(@Text) - @LastPos ELSE @NextPos - @LastPos - 1 END)
            ELSE SUBSTRING(@Text, @LastPos + 1, CASE WHEN @NextPos = 0 THEN DATALENGTH(@Text) - @LastPos ELSE @NextPos - @LastPos - 1 END)
        END))
    END

For example i then replace the white space before the ref to include a comma and split as follows:

WITH    ExampleData
          AS ( SELECT   'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
               UNION ALL
               SELECT   'HelpDesk Call Reference F0012346, Call Resolved, 40112'
               UNION ALL
               SELECT   'HelpDesk Call Reference F0012347, New call logged, 40113'
             )
    SELECT  dbo.fnParseString(2, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Ref] ,
            dbo.fnParseString(3, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Type] ,
            dbo.fnParseString(4, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [OurRef]
    FROM    ExampleData

As you can see, i have a solution that gets the end-result i'm after, but the use of a messy udf isnt ideal & i was wondering if sql-server has a better way of doing stuff like this - perhaps inline regular expressions? I.e. i think PATINDEX() accepts regular expressions as a search string - this in conjunction with SUBSTRING() could do what i need but i dont really know where to start?

Edit: Please note that this is a simplified example, the subject is variable and i'll also be adapting the same technique to parse the body, the body will have 8 items of data that i need to parse out using a variety of delimiters, so this rules out the use of ParseName() as it only allows 4 parts, and i can’t use fixed length (i.e. substring()) as the length will be very varied (especially if different helpdesks are involved (which they are) - this is why i was thinking along the lines of PATINDEX() & SUBSTRING()

解决方案

After additional work we decided not to use the approach in Art's answer (even though it worked).

We needed a far more powerfull way of validating and extracting the substrings, So I went the Regular Expression via CLR route (Thanks to Pondlife for pointing me in the right direction).

The approach i took was as follows:

First I compiled the following CLR: (Converted to VB from a C# example Here)

Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
Imports System.Text.RegularExpressions
Imports System.Text

Partial Public Class UserDefinedFunctions

    Public Shared ReadOnly Options As RegexOptions = RegexOptions.IgnorePatternWhitespace Or RegexOptions.Multiline

    <SqlFunction()> _
    Public Shared Function RegexMatch(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlBoolean
        Dim regex As New Regex(pattern.Value, Options)
        Return regex.IsMatch(New String(input.Value))
    End Function

    <SqlFunction()> _
    Public Shared Function RegexReplace(ByVal expression As SqlString, ByVal pattern As SqlString, ByVal replace As SqlString) As SqlString
        If expression.IsNull OrElse pattern.IsNull OrElse replace.IsNull Then
            Return SqlString.Null
        End If

        Dim r As New Regex(pattern.ToString())
        Return New SqlString(r.Replace(expression.ToString(), replace.ToString()))
    End Function

    ' returns the matching string. Results are separated by 3rd parameter
    <SqlFunction()> _
    Public Shared Function RegexSelectAll(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchDelimiter As SqlString) As SqlString
        Dim regex As New Regex(pattern.Value, Options)
        Dim results As Match = regex.Match(New String(input.Value))

        Dim sb As New StringBuilder()
        While results.Success
            sb.Append(results.Value)

            results = results.NextMatch()

            ' separate the results with newline|newline
            If results.Success Then
                sb.Append(matchDelimiter.Value)
            End If
        End While

        Return New SqlString(sb.ToString())

    End Function

    ' returns the matching string
    ' matchIndex is the zero-based index of the results. 0 for the 1st match, 1, for 2nd match, etc
    <SqlFunction()> _
    Public Shared Function RegexSelectOne(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchIndex As SqlInt32) As SqlString
        Dim regex As New Regex(pattern.Value, Options)
        Dim results As Match = regex.Match(New String(input.Value))

        Dim resultStr As String = ""
        Dim index As Integer = 0

        While results.Success
            If index = matchIndex Then
                resultStr = results.Value.ToString()
            End If

            results = results.NextMatch()

            index += 1
        End While

        Return New SqlString(resultStr)

    End Function

End Class

I installed this CLR as follows:

EXEC sp_configure 
    'clr enabled' ,
    '1'

GO

RECONFIGURE
USE [db_Utility]

GO
CREATE ASSEMBLY SQL_CLR_RegExp FROM 'D:Program FilesMicrosoft SQL ServerMSSQL10_50.MSSQLSERVERMSSQLBinnSQL_CLR_RegExp.dll' WITH
PERMISSION_SET = SAFE

GO
-- =============================================
-- Returns 1 or 0 if input matches pattern
-- VB function: RegexMatch(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlBoolean
-- =============================================
CREATE FUNCTION [dbo].[RegexMatch]
    (
      @input [nvarchar](MAX) ,
      @pattern [nvarchar](MAX)
    )
RETURNS [bit]
    WITH EXECUTE AS CALLER
AS EXTERNAL NAME 
    [SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexMatch]
GO

-- =============================================
-- Returns a comma separated string of found objects
-- VB function: RegexReplace(ByVal expression As SqlString, ByVal pattern As SqlString, ByVal replace As SqlString) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexReplace]
    (
      @expression [nvarchar](MAX) ,
      @pattern [nvarchar](MAX) ,
      @replace [nvarchar](MAX)
    )
RETURNS [nvarchar](MAX)
    WITH EXECUTE AS CALLER
AS EXTERNAL NAME 
    [SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexReplace]
GO
-- =============================================
-- Returns a comma separated string of found objects
-- VB function: RegexSelectAll(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchDelimiter As SqlString) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectAll]
    (
      @input [nvarchar](MAX) ,
      @pattern [nvarchar](MAX) ,
      @matchDelimiter [nvarchar](MAX)
    )
RETURNS [nvarchar](MAX)
    WITH EXECUTE AS CALLER
AS EXTERNAL NAME 
    [SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexSelectAll]
GO
-- =============================================
-- Returns finding matchIndex of a zero based index
-- RegexSelectOne(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchIndex As SqlInt32) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectOne]
    (
      @input [nvarchar](MAX) ,
      @pattern [nvarchar](MAX) ,
      @matchIndex [int]
    )
RETURNS [nvarchar](MAX)
    WITH EXECUTE AS CALLER
AS EXTERNAL NAME 
    [SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexSelectOne]
GO 

I then wrote the following wrapping function to simplify use:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:      <Jordon Pilling>
-- Create date: <30/01/2013>
-- Description: <Calls RegexSelectOne with start and end text and cleans the result>
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectOneWithScrub]
(
    @Haystack VARCHAR(MAX),
    @StartNeedle VARCHAR(MAX),
    @EndNeedle VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
    DECLARE @ReturnStr VARCHAR(MAX)

    --#### Extract text from HayStack using Start and End Needles
    SET @ReturnStr = dbo.RegexSelectOne(@Haystack, REPLACE(@StartNeedle, ' ','s') + '((.|
)+?)' + REPLACE(@EndNeedle, ' ','s'), 0)

    --#### Remove the Needles
    SET @ReturnStr = REPLACE(@ReturnStr, @StartNeedle, '')
    SET @ReturnStr = REPLACE(@ReturnStr, @EndNeedle, '')

    --#### Trim White Space
    SET @ReturnStr = LTRIM(RTRIM(@ReturnStr))

    --#### Trim Line Breaks and Carriage Returns
    SET @ReturnStr = dbo.SuperTrim(@ReturnStr)

    RETURN @ReturnStr

END
GO

This allowed usage as follows:

DECLARE @Subject VARCHAR(250) = 'HelpDesk Call Reference F0012345, Call Update, 40111' 
DECLARE @Ref VARCHAR(250) = NULL

IF dbo.RegexMatch(@Subject, '^HelpDesksCallsReferencesF[0-9]{7},s(CallsResolved|CallsUpdate|Newscallslogged),(|s+)([0-9]+|unknown)$') = 1
    SET @Ref = ISNULL(dbo.RegexSelectOneWithScrub(@Subject, 'HelpDesk Call Reference', ','), 'Invalid (#1)')
ELSE
    SET @Ref = 'Invalid (#2)'

SELECT @Ref

This was far quicker to use for multiple searches, and far more powerfull when dealing with lots of text with differeent start and end phrases etc.

相关文章