如果您想输出包含每次搜索中所有单词的每个句子,可以构建一个查找相关字符串的词典,然后缩小范围,只输出包含所有所需单词的项目。
news_sentences = '''Joe Biden is the us president
John McCain was a congressman
JPMorgan Chase is the way to go
This is an irrelevant sentence
Kanye West IS an artist'''.split('\n')
from collections import defaultdict
newsDict = defaultdict(set)
for sentence in news_sentences:
for word in sentence.split():
newsDict[word.lower()].add(sentence)
#
oov_ner_data = input('test, hit me: ').split()
report = newsDict[oov_ner_data[0].lower()].copy()
for word in oov_ner_data[1:]:
report &= newsDict[word.lower()]
if report:
print(*report,sep='\n')
else:
print("Good news")
如果搜索词恰好出现在一个不相关的句子中,就有可能出现“假新闻”。你可以限制每个新闻句子的扫描量,但在我看来,关键词原则上可以出现在任何地方。您还可以在新闻解析和输入时潜在地丢弃常用词。