自 2008 R2 版本以来，是否有任何 Sql Server 全文搜索 (FTS) 性能改进?

2021-12-30 00:00:00 performance lucene full-text-search solr sql-server

我们正在对包含 260 万条记录的表使用 SQL Server 2008 R2 全文搜索.搜索性能通常很差，它遵循普遍报告的模式:冷系统/首次运行 ~10+ 秒，后续运行 ~1-2 秒.这与 2013 年 2 月的以下文章中报告的结果一致:

所以你认为你可以搜索——比较 Microsoft SQL Server FTS和 Apache Lucene

文章展示了以下使用维基百科转储数据的速度比较结果:

<前>索引速度、大小和单个查询执行时间使用:Lucene MS SQL FTS索引速度 3 MB/秒 1 MB/秒指数规模 10-25% 25-30%简单查询 <20 ms <20 ms使用自定义分数查询 <4 秒 > 20 秒

<前>并行查询执行(10 个线程，每个查询的平均执行时间以毫秒为单位):MS SQL FTS Lucene(文件系统)Lucene(RAM)冷系统: 简单查询 56 643 21提升查询 19669* 859 27第二次执行: 简单查询 14 8 <5提升查询 465 17 9*平均时间，第一个查询最多可以执行 2 分钟(！)

我的问题是:

自从这篇文章于 2013 年 2 月 8 日发表以来，有几个主要的 SQL Server 版本，当他们迁移到更新的 SQL 时，有人可以报告对相同数据(最好是 1+ 百万条记录)的任何 FTS 性能改进吗?服务器版本(2012、2014 和 2016)?
最近的 SQL Server 版本是否像 solr/lucene 一样支持放置在 RAM 中的 FTS 目录/索引?

更新:在我们的场景中，我们很少将新数据插入到 FT 目录链接表中，但经常运行只读搜索.所以，我不认为 SQL 不断重建 FTS 索引是问题所在.

解决方案

SQL Server 2012 中的全文搜索改进:

<块引用>

我们查看了整个代码库，从在等待正在进行的索引更新以释放共享模式锁时查询如何阻塞，从索引片段填充期间分配了多少内存，到我们如何将查询代码库重新组织为流用于优化 TOP N 搜索查询的表值函数，我们如何维护键分布直方图以在并行线程上执行搜索，一直到我们如何更好地利用处理器计算指令(例如评分等级)……最终结果是我们能够显着提高性能(在许多情况下，当涉及具有大型查询工作负载的并发索引更新时，10 倍)和扩展，而无需更改任何存储结构或现有 API 表面.我们所有从 SQL 2008/R2 到 Denali 的客户都将从这项改进中受益.

we're using SQL Server 2008 R2 Full-Text Search over a table with 2.6 million records. The search performance often is poor, it follows the commonly reported pattern: cold system/first run ~10+ sec, subsequent runs ~1-2 sec. This is inline with results reported in the following article dated of Feb, 2013:

So You Think You Can Search – Comparing Microsoft SQL Server FTS and Apache Lucene

The article shows the following speed comparison results using Wikipedia dump data:

Indexing speed, size and single query execution time using:

                        Lucene      MS SQL FTS
Indexing Speed          3 MB/sec    1 MB/sec
Index Size              10-25%      25-30%
Simple query            < 20 ms     < 20 ms
Query With Custom Score < 4 sec     > 20 sec

Parallel Query Executions (10 threads, average execution time per query in ms):

                                     MS SQL FTS  Lucene (File System)   Lucene (RAM)
Cold System:         Simple Query    56          643                    21
                     Boost Query     19669*      859                    27
Second executions:   Simple Query    14          8                      < 5
                     Boost Query     465         17                     9

*average time, the very first query could be executed up to 2 min(!)

My questions are:

Since there were several major SQL Server releases since the article was published on Feb 8, 2013, can someone report any FTS performance improvements over same data (preferably of 1+ million records) when they migrated to more recent SQL Server versions (2012, 2014 and 2016)?
Do more recent SQL Server versions support FTS catalogs/indexes placed in RAM just as solr/lucene do?

UPDATE: in our scenario we seldom insert new data into FT catalog linked table, but run read only searches very often. So, I don't think SQL constantly rebuilding FTS index is the issue.

解决方案

Fulltext Search Improvements in SQL Server 2012:

We looked at the entire code base from how queries block while waiting an ongoing index update to release a shared schema lock, from how much memory is allocated during index fragment population, to how we could reorganize the query code base as a streaming Table Value Function to optimize for TOP N search queries, how we could maintain key distribution histograms to execute search on parallel threads, all the way to how we could take better advantage of the processor compute instructions (scoring ranks for example)… End result is that we are able to significantly boost performance (10X in many cases when it comes to concurrent index updates with large query workloads) and scale without having to change any storage structures or existing API surface. All our customers going from SQL 2008 / R2 to Denali will benefit with this improvement.

相关文章