使用lower()和str.translate进行字符串操作,不更改字符串-python 3.7.0

Jerry M. • 5 年前 • 1888 次点击

我试图删除标点和小写的长字符串(取自文本文件)。

我有一个示例文本文件,如下所示:

This. this, Is, is. An; an, Example. example! Sentence? sentence.

然后我有以下脚本:

def get_input(filepath):
    f = open(filepath, 'r')
    content = f.read()
    return content

def normalize_text(file):
    all_words = word_tokenize(file)
    for word in all_words:
        word = word.lower()
        word = word.translate(str.maketrans('','',string.punctuation))

    return all_words

def get_collection_size(mydict):
    total = sum(mydict.values())
    return total

def get_vocabulary_size(mylist):
    unique_list = numpy.unique(mylist)
    vocabulary_size = len(unique_list)
    return vocabulary_size

myfile = get_input('D:\\PythonHelp\\example.txt')

total_words = normalize_text(myfile)
mydict = countElement(total_words)
print(total_words)
print(mydict)
print("Collection Size: {}".format(get_collection_size(mydict)))
print("Vocabulary Size: {}".format(get_vocabulary_size(total_words)))

我得到如下结果:

['This', '.', 'this', ',', 'Is', ',', 'is', '.', 'An', ';', 'an', ',', 'Example', '.', 'example', '!', 'Sentence', '?', 'sentence', '.']
{'This': 1, '.': 4, 'this': 1, ',': 3, 'Is': 1, 'is': 1, 'An': 1, ';': 1, 'an': 1, 'Example': 1, 'example': 1, '!': 1, 'Sentence': 1, '?': 1,
'sentence': 1}
Collection Size: 20
Vocabulary Size: 15

但是,我希望:

['this', 'is', 'an', 'example', 'sentence']
{'this:' 2, 'is:' 2, 'an:' 2, 'example:' 2, 'sentence:' 2}
Collection Size: 10
Vocabulary Size: 5

为什么是 def normalize_text(file): 使用 str.maketrans 和 .lower() 工作不正常?

当我跑步时 python --version 我得到 3.7.0

Python社区是高质量的Python/Django开发社区
本文地址：http://www.python88.com/topic/39431

1888 次点击

文章 [ 2 ] | 最新文章 5 年前

• 1 楼

Matt L. 5 年前

错误出现在以下代码行中:

for word in all_words:
    word = word.lower()
    word = word.translate(str.maketrans('','',string.punctuation))

索引变量 word 在这种情况下,将由循环临时创建。不能就地更换。见 https://eli.thegreenplace.net/2015/the-scope-of-index-variables-in-pythons-for-loops/

相反,有两种方法可以循环和替换>方法1是追加到新列表中,如下所示:

all_words_new = []
for word in all_words:
    new_word = word.lower()
    newer_word = new_word.translate(str.maketrans('','',string.punctuation))
    all_words_new.append(newer_word)

选项2是一个列表理解,它有点高级。

all_words_new = [word.lower() for word in all_words]
all_words_newer = [word.translate(str.maketrans('','',string.punctuation)) for word in all_words]

有关更多列表理解,请参见 https://www.pythonforbeginners.com/basics/list-comprehensions-in-python

• 2 楼

chepner 5 年前

分配给 word 不更改以前分配给的列表元素 单词 ;它只是改变了名称 单词 现在是指。

你想建立一个新的名单:

def normalize_text(file):
    # This could be defined once outside the function
    table = str.maketrans('','',string.punctuation)
    all_words = word_tokenize(file)
    return [word.lower().translate(table) for word in all_words]

类似的方法是直接分配给一个列表元素,这与分配给 单词 .

def normalize_text(file):
    all_words = word_tokenize(file)
    for i, word in enumerate(all_words):
        word = word.lower()
        all_words[i] = word.translate(str.maketrans('','',string.punctuation))

return all_words

登录后回复