Dynamodb:使用两个以上属性进行查询
问题描述
在 Dynamodb 中,您需要在索引中指定可用于进行查询的属性.
In Dynamodb you need to specify in an index the attributes that can be used for making queries.
如何使用两个以上的属性进行查询?
How can I make a query using more than two attributes?
使用 boto 的示例.
Example using boto.
Table.create('users',
schema=[
HashKey('id') # defaults to STRING data_type
], throughput={
'read': 5,
'write': 15,
}, global_indexes=[
GlobalAllIndex('FirstnameTimeIndex', parts=[
HashKey('first_name'),
RangeKey('creation_date', data_type=NUMBER),
],
throughput={
'read': 1,
'write': 1,
}),
GlobalAllIndex('LastnameTimeIndex', parts=[
HashKey('last_name'),
RangeKey('creation_date', data_type=NUMBER),
],
throughput={
'read': 1,
'write': 1,
})
],
connection=conn)
如何使用 boto 查找名字为John"、姓氏为Doe"并在2015 年 3 月 21 日"创建的用户?
How can I look for users with first name 'John', last name 'Doe', and created on '3-21-2015' using boto?
解决方案
您的数据建模过程必须考虑您的数据检索要求,在 DynamoDB 中您只能通过哈希或哈希 + 范围键进行查询.
Your data modeling process has to take into consideration your data retrieval requirements, in DynamoDB you can only query by hash or hash + range key.
如果按主键查询不足以满足您的要求,您当然可以通过创建二级索引(本地或全局)来获得备用键.
If querying by primary key is not enough for your requirements, you can certainly have alternate keys by creating secondary indexes (Local or Global).
但是,在某些情况下,可以将多个属性的串联用作您的主键,以避免维护二级索引的成本.
However, the concatenation of multiple attributes can be used in certain scenarios as your primary key to avoid the cost of maintaining secondary indexes.
如果您需要通过名字、姓氏和创建日期来获取用户,我建议您将这些属性包含在 Hash 和 Range Key 中,这样就不需要创建额外的索引.
If you need to get users by First Name, Last Name and Creation Date, I would suggest you to include those attributes in the Hash and Range Key, so the creation of additional indexes are not needed.
哈希键应该包含一个可以由您的应用程序计算的值,同时提供统一的数据访问.例如,假设您选择如下定义密钥:
The Hash Key should contain a value that could be computed by your application and at same time provides uniform data access. For example, say that you choose to define your keys as follow:
哈希键(名称):first_name#last_name
Hash Key (name): first_name#last_name
范围键(已创建):MM-DD-YYYY-HH-mm-SS-毫秒
Range Key (created) : MM-DD-YYYY-HH-mm-SS-milliseconds
您始终可以附加其他属性,以防提及的属性不足以使您的键在整个表中唯一.
users = Table.create('users', schema=[
HashKey('name'),
RangeKey('created'),
], throughput={
'read': 5,
'write': 15,
})
将用户添加到表中:
with users.batch_write() as batch:
batch.put_item(data={
'name': 'John#Doe',
'first_name': 'John',
'last_name': 'Doe',
'created': '03-21-2015-03-03-02-3243',
})
您在2015 年 3 月 21 日"创建的用于查找用户 John Doe 的代码应类似于:
Your code to find the user John Doe, created on '03-21-2015' should be something like:
name_john_doe = users.query_2(
name__eq='John#Doe',
created__beginswith='03-21-2015'
)
for user in name_john_doe:
print user['first_name']
重要注意事项:
我.如果您的查询开始变得过于复杂,并且由于连接字段过多而导致哈希或范围键过长,那么一定要使用二级索引.这是一个好兆头,仅主索引不足以满足您的要求.
i. If your query starts to get too complicated and the Hash or Range Key too long by having too many concatenated fields then definitely use Secondary Indexes. That's a good sign that only a primary index is not enough for your requirements.
二.我提到哈希键应该提供统一的数据访问:
ii. I mentioned that the Hash Key should provide uniform data access:
"Dynamo 使用一致的散列将其键空间划分为副本并确保均匀的负载分布.统一的钥匙分布可以帮助我们实现均匀的负载分布,假设密钥的访问分布没有高度倾斜." [DYN]
"Dynamo uses consistent hashing to partition its key space across its replicas and to ensure uniform load distribution. A uniform key distribution can help us achieve uniform load distribution assuming the access distribution of keys is not highly skewed." [DYN]
Hash Key 不仅可以唯一标识记录,而且是保证负载分配的机制.Range Key(使用时)有助于指示将大部分一起检索的记录,因此,也可以针对这种需要优化存储.
Not only the Hash Key allows to uniquely identify the record, but also is the mechanism to ensure load distribution. The Range Key (when used) helps to indicate the records that will be mostly retrieved together, therefore, the storage can also be optimized for such need.
下面的链接对这个话题有完整的解释:
The link below has a complete explanation about the topic:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.UniformWorkload
相关文章