dynamodb:使用python扫描vs查询

我在dynamodb中有一个表,其中包含以下列元素:

clientId : Primary partition Key
timeId : Sort Key

clientId 区分不同客户的记录 timeId 只是一个链接到特定cliclid的历元时间戳。表的示例输出如下所示:

clientId             timeId              Bucket         dateColn
0000000028037c08     1544282940.0495     MyAWSBucket    1544282940
0000000028037c08     1544283640.119842   MyAWSBucket    1544283640

我正在使用以下代码获取记录:

ap.add_argument("-c","--clientId",required=True,help="name of the client")
ap.add_argument("-st","--startDate",required=True,help="start date to filter")
ap.add_argument("-et","--endDate",required=True,help="end date to filter")
args = vars(ap.parse_args())

dynamodb = boto3.resource('dynamodb', region_name='us-west-1')

table = dynamodb.Table('MyAwsBucket-index')

response = table.query(
    KeyConditionExpression=Key('clientId').eq(args["clientId"]) and Key('timeId').between(args['startDate'], args['endDate'])
)

本质上,我试图首先根据 客户 接着是两个时间戳——开始时间和结束时间。我可以使用以下的时间戳来获取所有的记录:

KeyConditionExpression=Key('clientId').eq(args["clientId"])

但是,当我包含startdate和time时,会得到以下错误:

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the Query operation: Query condition missed key schema element: clientId

如何解决此问题并同时使用clientid以及开始时间和结束时间。我知道我可以用 scan 但也在某个地方读过 扫描 别把唱片拿得太快。因为我的表有数百万行,现在我确定是否应该使用 扫描 是的。有人能帮忙吗?

此外,我的开始时间和结束时间搜索输入是datecoln中给定的整数,而不是timeid中给定的float类型。不确定这是否会产生任何错误。

我读到我可以使用scan,但也读到scan不能快速获取记录的地方。因为我的表有数百万行,现在确定我是否应该使用scan。

dynamodb扫描是一个非常昂贵的操作,因为它读取所有的文档,从而消耗了大量的配置吞吐量。因此 scan 应尽量避免查询表。

botocore.exceptions.clienterror:调用查询操作时出错(validationexception):查询条件缺少键架构元素:clientid

此错误表示分区键的值 clientId 未在查询中指定。这有点混乱,因为该值可能确实不是空的,但这可能意味着分区键应该是数字,但是 args["clientId"] 是不可接受的字符串。请参考 this 如何指定参数的预期数据类型的文档。