在python中,如何从多个列表中的单词创建和数组,单词出现次数[重复]

dragonfury2 • 4 年前 • 1008 次点击

这个问题已经有了答案:

Count Words in Python 2答

我有一个json文件,它有多个带有文本字段的对象:

{
"messages": 
[
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid\nI hope they don't phone back"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
]
}

我只对“agentext”字段感兴趣。

我基本上需要去掉agentText字段中的每个单词,并对单词的出现次数进行计数。

所以我的python代码:

import json

with open('20190626-101200-text-messages.json') as f:
  data = json.load(f)

for message in data['messages']:
    splittext= message['agentText'].strip().replace('\n',' ').replace('\r',' ')
    if len(splittext)>0:
        splittext2 = splittext.split(' ')
        print(splittext2)

给我这个:

['That', 'customer', 'was', 'great']
['That', 'customer', 'was', 'stupid', 'I', 'hope', 'they', "don't", 'phone', 'back']
['Line', 'number', '3']

如何将每个单词添加到计数数组中? 如此喜欢;

That 2
customer 2
was 2
great 1
..

等等?

Python社区是高质量的Python/Django开发社区
本文地址：http://www.python88.com/topic/43286

1008 次点击

文章 [ 2 ] | 最新文章 4 年前

• 1 楼

Andrej Kesely 4 年前

data = '''{"messages":
[
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid I hope they don't phone back"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
]
}
'''

import json
from collections import Counter
from pprint import pprint

def words(data):
    for m in data['messages']:
        yield from m['agentText'].split()

c = Counter(words(json.loads(data)))
pprint(c.most_common())

印刷品:

[('That', 2),
 ('customer', 2),
 ('was', 2),
 ('great', 1),
 ('stupid', 1),
 ('I', 1),
 ('hope', 1),
 ('they', 1),
 ("don't", 1),
 ('phone', 1),
 ('back', 1),
 ('Line', 1),
 ('number', 1),
 ('3', 1)]

• 2 楼

Bob White 4 年前

看看这个。

data = {
    "messages": 
        [
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid\nI hope they don't phone back"},
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
        ]
}

var = []

for row in data['messages']:
    new_row = row['agentText'].split()
    if new_row:
        var.append(new_row)

temp = dict()

for e in var:
    for j in e:
        if j in temp:
            temp[j] = temp[j] + 1
        else:
            temp[j] = 1

for key, value in temp.items():
    print(f'{key}: {value}')

登录后回复