我可以预测我的 Zend Framework 索引有多大吗?(以及一些快速的 Q:s)

2021-12-29 00:00:00 sql php mysql zend-framework

我在一个 mysql 表中有大约 100000 行,其中 每行大约有 8 个字段.

I have around 100thousand rows in a mysql table, where each row has about 8 fields.

我终于掌握了如何使用 Zend Lucene 从 mysql 表中索引和搜索数据.

I have finally got the hold on how to use Zend Lucene to index and search data from a mysql table.

在我的网站完全实现此功能之前,我有一些问题:

Before I fully implement this funcionality to my website, I have some questions:

1- 是否可以提前确定索引的大小?这是因为在 Zend 手册中它说索引的最大大小是 2GB.我立刻认为这对我的桌子来说还不够!

1- Is it possible to determine the size of a index in advance? This because in the Zend manual it says the max size of a index is 2GB. I am straight away thinking that isn't enough for my table!

2- 我读过一些帖子,他们说 Zend Lucene 搜索在大型索引上非常慢,最多几分钟!直接使用 mysql 命令(SELECT、LIKE 等)而不是 zend 会更快吗?

2- I have read posts where they say Zend Lucene search is very slow on large indexes, up to minutes! Is it faster to use mysql commands directly (SELECT, LIKE etc) instead of zend?

3- 是否有其他解决方案可以解决我的问题,即为具有 这些功能至少,并且不需要全文mysql索引(字段).

3- Is there any other solutions to my problem which is to create a search engine for classifieds which has these functions atleast, and doesn't require full-text mysql indexes (fields).

谢谢

推荐答案

SOLR 基本上是一个 Apache Tomcat 容器,它实现了一个 REST 接口来查询 Apache Lucene 索引.是的,您需要能够在您的 Web 服务器上运行 Java 应用程序.这是您需要与您的托管服务提供商解决的问题.

SOLR is basically an Apache Tomcat container that implements a REST interface to query an Apache Lucene index. Yes, you need to be able to run a Java application on your web server. This is an issue for you to work out with your hosting provider.

使用您的网络应用程序的客户端不需要运行 Java.您的 PHP 应用程序可以对 SOLR 服务进行 REST 查询,并将结果格式化为 HTML.客户端只能看到 HTML 输出;它永远不需要知道数据来自用 Java 实现的服务.

Clients using your web app don't need to run Java. Your PHP app could make a REST query to the SOLR service and format the results in HTML. A client sees only the HTML output; it never needs to know that the data came from a service implemented in Java.

Zend_Search_Lucene 是一个纯 PHP 实现,应该与 Apache Lucene 的工作方式相同.Zend 解决方案甚至使用相同的索引文件格式.所以在存储方面它们应该是相等的.

Zend_Search_Lucene is a pure-PHP implementation that is supposed to work identically to Apache Lucene. The Zend solution even uses an identical index file format. So storage-wise they should be equal.

我使用 Java Lucene 为 StackOverflow 数据转储(2009 年 10 月)建立索引.我索引了 150 万行,包括大约 1 演出的文本数据.Lucene索引是1323MB,而同样数据的MySQL FULLTEXT索引只有466MB.

I used Java Lucene to index the StackOverflow data dump (October 2009). I indexed 1.5 million rows, including about 1 gig of text data. The Lucene index was 1323 MB, whereas the MySQL FULLTEXT index of the same data was only 466 MB.

使用 SQL LIKE 谓词代替任何全文索引解决方案当然不需要空间,因为它无论如何都不能使用常规索引.但是在我使用 LIKE 的测试中,它比 Java Lucene 慢了大约 200 倍,而 Java Lucene 又比相同数据上的 MySQL FULLTEXT 索引慢了大约 40%.

Using SQL LIKE predicates in lieu of any fulltext indexing solution requires no space of course, because it cannot make use of a conventional index anyway. But in my tests using LIKE was about 200 times slower than Java Lucene, which was in turn about 40% slower than a MySQL FULLTEXT index on the same data.

查看我最近关于 MySQL 全文索引解决方案的演示:

See my recent presentation about fulltext indexing solutions with MySQL:

http://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql

它无法与 Java Lucene 技术的性能和可扩展性相媲美,这并不奇怪.PHP 作为一种语言的优势在于提高了开发效率,而不是运行时效率.

It's not surprising that it can't match the performance and scalability of the Java Lucene technology. PHP's advantage as a language is increasing development efficiency, not runtime efficiency.

更新:我刚刚尝试使用 Zend_Search_Lucene 创建索引.使用 PHP 创建索引比使用 Java Lucene 技术慢得多,所以我只索引了 10,000 个文档.这花了将近 15 分钟,这将使索引整个集合需要大约 36 小时.将此与 Java Lucene 进行比较,Java Lucene 在我的测试中在 7 分钟内索引了 150 万个文档的完整集合.

update: I just tried creating an index using Zend_Search_Lucene. Creating an index is far slower with PHP than with the Java Lucene technology, so I only indexed 10,000 documents. This took almost 15 minutes, which would make it take about 36 hours to index the whole collection. Compare this to Java Lucene, which in my test indexed the full collection of 1.5 million documents in under 7 minutes.

我使用 Zend_Search_Lucene 创建的索引大小为 8.75 MB.推断这个 150 倍,我估计完整索引将是 1312.5 MB.所以我得出结论,Zend_Search_Lucene 创建的索引与 Java Lucene 生成的索引大小大致相同.这符合预期.

The size of the index I created with Zend_Search_Lucene is 8.75 MB. Extrapolating this 150x, I estimate the full index would be 1312.5 MB. So I conclude that Zend_Search_Lucene creates an index of about the same size as the index produced by Java Lucene. This is as expected.

相关文章