搜索包含另一个字符串中所有单词的 varchar 字段

2021-09-10 00:00:00 tsql sql-server-2008 sql-server

尝试做一个小的存储过程而不需要为此添加自由文本索引(SQL Server 2008)

trying to do a small stored procedure without needing to add freetext indexing just for this (SQL Server 2008)

基本上,我想查找某个字段包含参数中所有单词的所有记录.

Basically, I want to find all records where a certain field contains all the words from a parameter.

因此,如果在字段中我有这是一个测试字段",并且我的 SP 的参数是这个测试字段",它将返回它,就像参数是字段这个测试"一样.

So if in the field I have "This is a test field", and the parameter to my SP would be "this test field" it would return it, as it would if the parameter was "field this test".

表很小(4000条)记录,负载会很低,所以效率不是什么大问题.现在我能想到的唯一解决方案是用表值函数拆分两个字符串并从那里开始.

The table is very small (4000) record and load will be low, so efficiency is not a big deal. Right now the only solution i can think of is to split both strings with table valued function and go from there.

有更简单的想法吗?

谢谢!

推荐答案

这是一个使用递归 CTE 的解决方案.这实际上使用了两个单独的递归.第一个将字符串拆分为标记,第二个使用每个标记递归过滤记录.

Here is a solution using recursive CTEs. This actually uses two separate recursions. The first one splits the strings into tokens and the second one recursively filters the records using each token.

declare     
    @searchString varchar(max),
    @delimiter char;

select 
@searchString  = 'This is a test field'
,@delimiter = ' '

declare @tokens table(pos int, string varchar(max))

 ;WITH Tokens(pos, start, stop) AS (
      SELECT 1, 1, CONVERT(int, CHARINDEX(@delimiter, @searchString))
      UNION ALL
      SELECT pos + 1, stop + 1, CONVERT(int, CHARINDEX(@delimiter, @searchString, stop + 1))
      FROM Tokens
      WHERE stop > 0
    )
    INSERT INTO @tokens
    SELECT pos,
      SUBSTRING(@searchString, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS string
    FROM Tokens
    OPTION (MAXRECURSION 25000) ;

;with filter(ind, myfield) as (
    select  1,myfield from mytable where myfield like '%'+(select string from @tokens where pos = 1)+'%'    
    union all
    select  ind + 1, myfield from filter where myfield like '%'+(select string from @tokens where pos = ind + 1)+'%'    
    )

    select * from filter where ind = (select COUNT(1) from @tokens)

这花了我大约 15 秒来搜索包含 10k 条记录的表格以查找搜索字符串 'this is a test field'..(字符串中的单词越多,花费的时间越长..)

This took me about 15 seconds to search a table of 10k records for the search string 'this is a test field'.. (the more words in the string, the longer it takes.. )

编辑
如果您想要模糊搜索,即即使没有完全匹配也返回紧密匹配的结果,您可以将查询中的最后一行修改为 -
select * from (select max(ind) as ind, myfield from filter group by myfield) t order by ind desc

'ind' 会为您提供在 myfield 中找到的搜索字符串中的单词数.

'ind' would give you the number of words from the search string found in myfield.

相关文章