使用 ElasticSeach 作为我的部分数据库的主要来源

2021-12-13 00:00:00 elasticsearch mysql

我已经看到许多与此主题类似的问题(

我正在学习 ElasticSearch,我想用它来搜索我网站上的产品.我不需要 User 和 ProductReview 被搜索 - 只需要 Product 表.

我可以想到两种解决方案来实现这一目标:

  1. 定期将Product表从 MySQL 复制到 ES
  2. 在 MySQL 中保留 User 和 ProductReview,在 ES 中保留产品

据我所知,如果我使用选项 1,那么我可以使用 go-mysql-elasticsearch 将 ES 与 MySQL 同步:这是一个好的解决方案吗?

我更倾向于使用选项 2,因为它更容易,而且我不需要担心数据同步.这个选项让我担心的是:

  • ES 作为主要数据来源可靠吗?
  • 在某个时间点,如果我必须修改 Product 表结构,是否可以在不删除和重新创建 Product Index 的情况下进行修改?
  • 在 MySQL 的情况下,我通常备份 Prod DB 并在测试 DB 上恢复它......是否仍然可以使用 ES 进行从 Prod 到测试的备份和恢复?

我没有使用 ES/NoSQL 的经验,希望得到任何建议.

解决方案

让我首先说明 Elasticsearch 不是严格意义上的数据库,并且不应该理想地使用像这样.但是,没有什么可以阻止您这样做(而且很多人都在这样做),并且根据 Elastic 的好人的说法,他们永远不会努力尝试使 ES 成为真正的数据库.ES 的主要目标是成为一个快速可靠的搜索和分析引擎.

如果可以,您应该始终保留另一个主要事实来源,如果出现问题,您可以随时轻松(重新)构建您的 ES 指数.

就您而言,选项 1 似乎是可行的方法,因为您只想让用户搜索您的产品,因此在 ES 中同步其他表毫无意义.

选项 2 听起来很吸引人,但前提是您决定只使用 ES,如果您想依赖事务(ES 没有事务支持),您真的不应该这样做.您需要知道的另一件事是,如果您只有 ES 中的数据并且您的索引由于某种原因而损坏(在升级期间、ES 中的错误、代码中的错误等),您的数据将消失,您的业务会受苦.

为了更准确地回答您的问题:

  1. 只要您在游戏中投入足够的精力和金钱,ES 就可以作为主要的真相来源可靠.但是,您可能还没有数百万的产品和用户(还没有),因此拥有一个包含至少三个节点的 HA 集群来搜索具有几个字段的数千个产品似乎不是一笔划算的支出.

  2. 当您的产品表发生变化时,很容易将表重新索引到 ES(甚至在 实时),如果你有几千个产品,它可以运行得足够快,你真的不必担心它.如果由于某种原因同步失败,您可以再次运行该过程而不会浪费太多时间.使用 零停机时间别名技术,您可以在不影响用户的情况下完成.

  3. ES 还提供快照/恢复功能,以便您可以拍摄 PROD 的快照并将其安装到您的 TEST 集群中,只需一次 REST 调用.

I have seen many similar question to this topic (including this one, which talks about how ElasticSearch version 6 has overcome many of its limitations as the primary data store), but I am still not clear on the following:

I am creating an online shopping website and I am using MySQL as my DB.

This is a simplified version of my DB (Users can post Product on the website for sale)

I am learning about ElasticSearch and I want to use it to search the products on my website. I don't need User and ProductReview to be searched - only Product table.

I can think of 2 solutions to achieve this:

  1. Periodically copy Product table from MySQL to ES
  2. Keep User and ProductReview in MySQL and Product in ES

As far as I know, if I use option 1, then I can use go-mysql-elasticsearch to sync ES with MySQL: Is this a good solution?

I am more tempted to use option 2, as it is easier and I don't need to worry about data synchronization. What concerns me about this option is:

  • Is ES reliable to be the primary source of data?
  • At some point in time, if I have to modify the Product table structure, would I be able to do so without deleting and recreating the Product Index?
  • In case of MySQL, I normally take a backup of Prod DB and Restore it on Test DB... Is it still possible to do a Backup and Restore from Prod to Test using ES?

I have no experience with ES/NoSQL and would appreciate any advice.

解决方案

Let me start by stating that Elasticsearch is NOT a database, in the strict sense of the term, and should ideally not be used as such. However, nothing prevents you from doing it (and many people are doing it) and according to the good folks at Elastic, they won't ever strive to try and make ES a real database. The main goal of ES is to be a fast and reliable search and analytics engine, period.

If you can, you should always keep another primary source of truth from which you can easily (re-)build your ES indices anytime if something goes south.

In your case, option 1 seems to be the way to go since all you want to do is to allow users to search your products, so there's no point in synching the other tables in ES.

Option 2 sounds appealing, but only if you decide to go only with ES, which you really shouldn't if you want to rely on transactions (ES doesn't have transactional support). Another thing you need to know is that if you only have your data in ES and your index gets corrupted for some reason (during an upgrade, a bug in ES, a bug in your code, etc), your data is gone and your business will suffer.

So to answer your questions more precisely:

  1. ES can be reliable as a primary source of truth provided you throw enough efforts and money into the game. However, you probably don't have millions of products and users (yet), so having a HA cluster with minimum three nodes to search a few thousands products with a few fields doesn't seem like a good spend.

  2. When your products table changes, it is easy to reindex the table into ES (or even in real time) and if you have a few thousand products, it can go fast enough that you don't really have to worry about it. If the synch fails for some reason, you can run the process again without wasting too much time. With the zero-downtime alias technique, you can do it without impacting your users.

  3. ES also provides snapshot/restore capabilities so that you can take a snapshot of PROD and install it in your TEST cluster with a single REST call.

相关文章