我有一个json文件,它有多个带有文本字段的对象:
{
"messages":
[
{"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
{"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid\nI hope they don't phone back"},
{"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
{"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
]
}
我只对“agentext”字段感兴趣。
我基本上需要去掉agentText字段中的每个单词,并对单词的出现次数进行计数。
所以我的python代码:
import json
with open('20190626-101200-text-messages.json') as f:
data = json.load(f)
for message in data['messages']:
splittext= message['agentText'].strip().replace('\n',' ').replace('\r',' ')
if len(splittext)>0:
splittext2 = splittext.split(' ')
print(splittext2)
给我这个:
['That', 'customer', 'was', 'great']
['That', 'customer', 'was', 'stupid', 'I', 'hope', 'they', "don't", 'phone', 'back']
['Line', 'number', '3']
如何将每个单词添加到计数数组中?
如此喜欢;
That 2
customer 2
was 2
great 1
..
等等?