在 SQL 中按子字符串查找字符串的最快方法?

我有两列的大表:ID 和标题.Id 是 bigint,我可以自由选择 Title 列的类型:varchar、char、text 等等.列标题包含随机文本字符串,如abcdefg"、q"、allyourbasebelongtous",最多 255 个字符.

I have huge table with 2 columns: Id and Title. Id is bigint and I'm free to choose type of Title column: varchar, char, text, whatever. Column Title contains random text strings like "abcdefg", "q", "allyourbasebelongtous" with maximum of 255 chars.

我的任务是通过给定的子字符串获取字符串.子字符串也有随机长度,可以是字符串的开头、中间或结尾.最明显的执行方式:

My task is to get strings by given substring. Substrings also have random length and can be start, middle or end of strings. The most obvious way to perform it:

SELECT * FROM t LIKE '%abc%'

我不关心 INSERT,我只需要进行快速选择.我该怎么做才能尽可能快地执行搜索?

I don't care about INSERT, I need only to do fast selects. What can I do to perform search as fast as possible?

我用的是MS SQL Server 2008 R2,就我所见,全文搜索没用.

I use MS SQL Server 2008 R2, full text search will be useless, as far as I see.

推荐答案

如果你想使用比 Randy 的答案更少的空间并且你的数据中有相当多的重复,你可以创建一个 N-Ary 树数据结构,其中每条边都是下一个字符并将数据中的每个字符串和尾随子字符串挂在上面.

If you want to use less space than Randy's answer and there is considerable repetition in your data, you can create an N-Ary tree data structure where each edge is the next character and hang each string and trailing substring in your data on it.

您按深度第一顺序对节点进行编号.然后,您可以为每条记录创建一个最多包含 255 行的表,其中包含记录的 ID,以及与字符串或尾随子字符串匹配的树中的节点 ID.然后,当您进行搜索时,您会找到代表您正在搜索的字符串(以及所有尾随子字符串)的节点 ID,并进行范围搜索.

You number the nodes in depth first order. Then you can create a table with up to 255 rows for each of your records, with the Id of your record, and the node id in your tree that matches the string or trailing substring. Then when you do a search, you find the node id that represents the string you are searching for (and all trailing substrings) and do a range search.

相关文章