写入 Amazon DynamoDB (PHP API) 的速度非常慢
此问题已发布在 AWS 论坛上,但仍未得到解答 https://forums.aws.amazon.com/thread.jspa?threadID=94589
This question has been already posted on AWS forums, but yet remains unanswered https://forums.aws.amazon.com/thread.jspa?threadID=94589
我正在尝试对一长串短项目(大约 1.2 亿个)执行初始上传,稍后通过唯一键检索它们,这似乎是 DynamoDb 的完美案例.
I'm trying to to perform an initial upload of a long list of short items (about 120 millions of them), to retrieve them later by unique key, and it seems like a perfect case for DynamoDb.
但是,我目前的写入速度非常慢(每 100 次写入大约需要 8-9 秒),这使得初始上传几乎不可能(按照目前的速度大约需要 3 个月).
However, my current write speed is very slow (roughly 8-9 seconds per 100 writes) which makes initial upload almost impossible (it'd take about 3 months with current pace).
我已阅读 AWS 论坛以寻找答案,并且已经尝试过以下方法:
I have read AWS forums looking for an answer and already tried the following things:
我从单个put_item"调用切换到批量写入 25 个项目(建议最大批量写入大小),并且我的每个项目都小于 1Kb(这也是推荐的).即使我的 25 个项目也低于 1Kb,这也是非常典型的情况,但这并不能保证(无论如何都应该无关紧要,因为我理解只有单个项目的大小对 DynamoDB 很重要).
I switched from single "put_item" calls to batch writes of 25 items (recommended max batch write size), and each of my items is smaller than 1Kb (which is also recommended). It is very typical even for 25 of my items to be under 1Kb as well, but it is not guaranteed (and shouldn't matter anyway as I understand as only single item size is important for DynamoDB).
我使用最近引入的欧盟区域(我在英国)直接通过调用 set_region('dynamodb.eu-west-1.amazonaws.com') 指定其入口点,因为显然没有其他的在 PHP API 中做到这一点的方法.AWS 控制台显示该表位于适当的区域,因此可以正常工作.
I use the recently introduced EU region (I'm in the UK) specifying its entry point directly by calling set_region('dynamodb.eu-west-1.amazonaws.com') as there is apparently no other way to do that in PHP API. AWS console shows that the table in a proper region, so that works.
我已通过调用 disable_ssl() 禁用 SSL(每 100 条记录获得 1 秒).
I have disabled SSL by calling disable_ssl() (gaining 1 second per 100 records).
不过,一个包含 100 个项目的测试集(25 个项目的 4 批写入调用)的索引时间永远不会少于 8 秒.每个批量写入请求大约需要 2 秒,所以它不像第一个是即时的,随后的请求就会很慢.
Still, a test set of 100 items (4 batch write calls for 25 items) never takes less than 8 seconds to index. Every batch write request takes about 2 seconds, so it's not like the first one is instant and consequent requests are then slow.
我的表配置吞吐量是 100 个写入单元和 100 个读取单元,到目前为止应该足够了(也尝试了更高的限制以防万一,但没有效果).
My table provisioned throughput is 100 write and 100 read units which should be enough so far (tried higher limits as well just in case, no effect).
我也知道请求序列化有一些费用,所以我可以使用队列来累积"我的请求,但这对于 batch_writes 真的那么重要吗?而且我认为这不是问题,因为即使是单个请求也需要很长时间.
I also know that there are some expenses on request serialisation so I can probably use the queue to "accumulate" my requests, but does that really matter that much for batch_writes? And I don't think that is the problem because even a single request takes too long.
我发现有些人在 API 中修改 cURL 标头(尤其是Expect:")以加快请求速度,但我认为这不是正确的方法,并且自该建议以来 API 已更新已发布.
I found that some people modify the cURL headers ("Expect:" particularly) in the API to speed the requests up, but I don't think that is a proper way, and also the API has been updated since that advice was posted.
运行我的应用程序的服务器也很好 - 我读到有时 CPU 负载会飙升,但在我的情况下一切都很好,只是网络请求花费了太长时间.
The server my application is running on is fine as well - I've read that sometimes the CPU load goes through the roof, but in my case everything is fine, it's just the network request that takes too long.
我现在卡住了 - 还有什么我可以尝试的吗?如果我没有提供足够的信息,请随时询问更多信息.
I'm stuck now - is there anything else I can try? Please feel free to ask for more information if I haven't provided enough.
还有其他最近的主题,显然是在同样的问题上,这里(虽然目前没有答案).
There are other recent threads, apparently on the same problem, here (no answer so far though).
这个服务应该是超快的,所以一开始我真的很困惑这个问题.
This service is supposed to be ultra-fast, so I'm really puzzled by that problem in the very beginning.
推荐答案
如果你是从本地机器上传,速度会受到你和服务器之间的各种流量/防火墙等的影响.如果我调用 DynamoDB,每个请求只需要 0.3 秒,因为往返澳大利亚的时间很长.
If you're uploading from your local machine, the speed will be impacted by all sorts of traffic / firewall etc between you and the servers. If I call DynamoDB each request takes 0.3 of a second simply because of the time to travel to/from Australia.
我的建议是使用 PHP 创建自己的 EC2 实例(服务器),将脚本和所有文件作为一个块上传到 EC2 服务器,然后从那里进行转储.EC2 服务器应该拥有比 DynamoDB 服务器更快的速度.
My suggestion would be to create yourself an EC2 instance (server) with PHP, upload the script and all files to the EC2 server as a block and then do the dump from there. The EC2 server shuold have the blistering speed to the DynamoDB server.
如果您对自己使用 LAMP 设置 EC2 没有信心,那么他们有一项新服务Elastic Beanstalk"可以为您完成这一切.完成上传后,只需烧录服务器 - 希望您可以在他们的免费套餐"定价结构内完成所有这些工作:)
If you're not confident about setting up EC2 with LAMP yourself, then they have a new service "Elastic Beanstalk" that can do it all for you. When you've completed the upload, simply burn the server - and hopefully you can do all that within their "free tier" pricing structure :)
不能解决长期的连接问题,但会减少三个月的上传时间!
Doesn't solve long term issues of connectivity, but will reduce the three month upload!
相关文章