如何在 SQL Server 中替换正则表达式 HTML 标签?

2021-09-10 00:00:00 regex sql tsql sql-server

我在 SQL Server 2005 中有一个表,其中包含数百行 HTML 内容.某些内容具有 HTML,例如:

I have a table in SQL Server 2005 with hundreds of rows with HTML content. Some of the content has HTML like:

<span class=heading-2>Directions</span>

方向"随页面名称而变化.

where "Directions" changes depending on page name.

我需要将所有 标签更改为

> 和

标签.

I need to change all the <span class=heading-2> and </span> tags to <h2> and </h2> tags.

我过去编写此查询是为了进行内容更改,但由于结束 HTML 标记,它不适用于我当前的问题:

I wrote this query to do content changes in the past, but it doesn't work for my current problem because of the ending HTML tag:

Update  ContentManager
Set ContentManager.Content = replace(Cast(ContentManager.Content AS NVARCHAR(Max)), 'old text', 'new text')

有谁知道我怎样才能完全在 T-SQL 中实现到 h2 的替换?我发现的一切都表明我必须进行 CLR 集成.谢谢!

Does anyone know how I could accomplish the span to h2 replacing purely in T-SQL? Everything I found showed I would have to do CLR integration. Thanks!

推荐答案

确实,T-SQL 本身并不支持正则表达式,在这种问题中,正则表达式将成为首选工具.首先,我会说解决方案的复杂程度在很大程度上取决于您的数据的一致性.例如,假设我们搜索具有以下标题的项目:

Indeed T-SQL does not natively support regular expressions and this is the sort of problem in which regular expressions would be the tool of choice. First, I'll say that the level of complication in the solution depends greatly on how consistent your data is. For example, suppose we search for items with the heading:

Select ..
From ...
Where HtmlContent Like '<span class="heading-2">%'

这假定 spanclass 之间没有额外的间距,并且在结束括号之前的最后一个双引号之后没有额外的间距.我们可以编写 '%<span%class="heading-2"%>%' 来说明空格,但也会发现 div 标记为 heading-2 在与任何 span 标签相同的内容中.如果后面的场景不应该发生,但你可能有不同的空间,那么使用这个修改后的模式.我们真正会遇到麻烦的是结束标签.假设我们的内容如下所示:

This assumes no additional spacing between span and class as well as no additional spacing after the final double quote before the end bracket. We could write '%<span%class="heading-2"%>%' to account for the spaces but that would also find div tags marked as heading-2 in the same content as any span tag. If this later scenario shouldn't happen but you might have varying spaces, then use this revised pattern. Where we will really run into troubles is the closing tag. Suppose our content looks like so:

<span class="heading-2"> Foo <span class="heading-3">Bar</span> And Gamma Too</span> .... <span class="heading-4">Fubar Is the right way!</span>...

找到正确的结束 span 标记以更改为 </h2> 并不是那么简单.您不能简单地找到第一个 并将其更改为 .如果你知道你没有嵌套的 span 标签,那么你可以编写一个用户定义的函数来做到这一点:

It is not so simple to find the correct closing span tag to change to </h2>. You cannot simply find the first </span> and change it to </h2>. If you knew that you had no nested span tags, then you could write a user-defined function that would do it:

Create Function ReplaceSpanToH2( @HtmlContent nvarchar(max) )
Returns nvarchar(max)
As
Begin
    Declare @StartPos int
    Declare @EndBracket int

    Set @StartPos = CharIndex('<span class="heading-2">', @HtmlContent)
    If @StartPos = 0
        Return @HtmlContent

    Set @HtmlContent = Replace(@HtmlContent, '<span class="heading-2">', '<h2>')

    -- find next </span>
    Set @StartPos = CharIndex('</span>', @HtmlContent, @StartPos)

    Set @HtmlContent = Stuff(@HtmlContent, @StartPos, 7, '</h2>')
    Return @HtmlContent
End

相关文章