ElasticSearch 更新不是即时的,你如何等待 ElasticSearch 完成更新它的索引?

问题描述

我正在尝试提高针对 ElasticSearch 进行测试的套件的性能.

I'm attempting to improve performance on a suite that tests against ElasticSearch.

测试需要很长时间,因为 Elasticsearch 不会在更新后立即更新它的索引.例如,以下代码运行时不会引发断言错误.

The tests take a long time because Elasticsearch does not update it's indexes immediately after updating. For instance, the following code runs without raising an assertion error.

from elasticsearch import Elasticsearch
elasticsearch = Elasticsearch('es.test')

# Asumming that this is a clean and empty elasticsearch instance
elasticsearch.update(
     index='blog',
     doc_type=,'blog'
     id=1,
     body={
        ....
    }
)

results = elasticsearch.search()
assert not results
# results are not populated

目前针对此问题的共同解决方案是将 time.sleep 调用放入代码中,以给 ElasticSearch 一些时间来更新其索引.

Currently out hacked together solution to this issue is dropping a time.sleep call into the code, to give ElasticSearch some time to update it's indexes.

from time import sleep
from elasticsearch import Elasticsearch
elasticsearch = Elasticsearch('es.test')

# Asumming that this is a clean and empty elasticsearch instance
elasticsearch.update(
     index='blog',
     doc_type=,'blog'
     id=1,
     body={
        ....
    }
)

# Don't want to use sleep functions
sleep(1)

results = elasticsearch.search()
assert len(results) == 1
# results are now populated

显然这不是很好,因为它很容易失败,假设如果 ElasticSearch 花费超过一秒的时间来更新它的索引,尽管不太可能,测试会失败.当你运行 100 次这样的测试时,它也非常慢.

Obviously this isn't great, as it's rather failure prone, hypothetically if ElasticSearch takes longer than a second to update it's indexes, despite how unlikely that is, the test will fail. Also it's extremely slow when you're running 100s of tests like this.

我解决问题的尝试是查询 待处理的集群作业查看是否还有任务需要完成.但是这不起作用,并且此代码将在没有断言错误的情况下运行.

My attempt to solve the issue has been to query the pending cluster jobs to see if there are any tasks left to be done. However this doesn't work, and this code will run without an assertion error.

from elasticsearch import Elasticsearch
elasticsearch = Elasticsearch('es.test')

# Asumming that this is a clean and empty elasticsearch instance
elasticsearch.update(
     index='blog',
     doc_type=,'blog'
     id=1,
     body={
        ....
    }
)

# Query if there are any pending tasks
while elasticsearch.cluster.pending_tasks()['tasks']:
    pass

results = elasticsearch.search()
assert not results
# results are not populated

所以基本上,回到我原来的问题,ElasticSearch 更新不是立即,您如何等待 ElasticSearch 完成对其索引的更新?

So basically, back to my original question, ElasticSearch updates are not immediate, how do you wait for ElasticSearch to finish updating it's index?


解决方案

从 5.0.0 版本开始,elasticsearch 有一个选项:

As of version 5.0.0, elasticsearch has an option:

 ?refresh=wait_for

关于索引、更新、删除和批量 api.这样,在 ElasticSearch 中显示结果之前,请求不会收到响应.(耶!)

on the Index, Update, Delete, and Bulk api's. This way, the request won't receive a response until the result is visible in ElasticSearch. (Yay!)

请参阅 https://www.elastic.co/guide/en/elasticsearch/reference/master/docs-refresh.html了解更多信息.

See https://www.elastic.co/guide/en/elasticsearch/reference/master/docs-refresh.html for more information.

edit:这个功能似乎已经是最新 Python elasticsearch api 的一部分:https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.index

edit: It seems that this functionality is already part of the latest Python elasticsearch api: https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.index

将您的 elasticsearch.update 更改为:

Change your elasticsearch.update to:

elasticsearch.update(
     index='blog',
     doc_type='blog'
     id=1,
     refresh='wait_for',
     body={
        ....
    }
)

你不应该需要任何睡眠或轮询.

and you shouldn't need any sleep or polling.

相关文章