社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  Python

在python中,如何从多个列表中的单词创建和数组,单词出现次数[重复]

dragonfury2 • 6 年前 • 1800 次点击  

这个问题已经有了答案:

我有一个json文件,它有多个带有文本字段的对象:

{
"messages": 
[
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid\nI hope they don't phone back"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
]
}

我只对“agentext”字段感兴趣。

我基本上需要去掉agentText字段中的每个单词,并对单词的出现次数进行计数。

所以我的python代码:

import json

with open('20190626-101200-text-messages.json') as f:
  data = json.load(f)

for message in data['messages']:
    splittext= message['agentText'].strip().replace('\n',' ').replace('\r',' ')
    if len(splittext)>0:
        splittext2 = splittext.split(' ')
        print(splittext2)

给我这个:

['That', 'customer', 'was', 'great']
['That', 'customer', 'was', 'stupid', 'I', 'hope', 'they', "don't", 'phone', 'back']
['Line', 'number', '3']

如何将每个单词添加到计数数组中? 如此喜欢;

That 2
customer 2
was 2
great 1
..

等等?

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/43286
 
1800 次点击  
文章 [ 2 ]  |  最新文章 6 年前
Andrej Kesely
Reply   •   1 楼
Andrej Kesely    6 年前
data = '''{"messages":
[
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid I hope they don't phone back"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
]
}
'''

import json
from collections import Counter
from pprint import pprint

def words(data):
    for m in data['messages']:
        yield from m['agentText'].split()

c = Counter(words(json.loads(data)))
pprint(c.most_common())

印刷品:

[('That', 2),
 ('customer', 2),
 ('was', 2),
 ('great', 1),
 ('stupid', 1),
 ('I', 1),
 ('hope', 1),
 ('they', 1),
 ("don't", 1),
 ('phone', 1),
 ('back', 1),
 ('Line', 1),
 ('number', 1),
 ('3', 1)]
Bob White
Reply   •   2 楼
Bob White    6 年前

看看这个。

data = {
    "messages": 
        [
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid\nI hope they don't phone back"},
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
        ]
}

var = []

for row in data['messages']:
    new_row = row['agentText'].split()
    if new_row:
        var.append(new_row)

temp = dict()

for e in var:
    for j in e:
        if j in temp:
            temp[j] = temp[j] + 1
        else:
            temp[j] = 1

for key, value in temp.items():
    print(f'{key}: {value}')