价值
j
当从列表中删除某个项时不应更改(因为下一次迭代中将在该位置出现另一个列表项)。做
j=i+1
每次删除项目时重新启动迭代(这不是所需的)。更新后的代码现在只会增加
J
在其他条件下。
def filter_descriptions(descriptions):
MAX_SIMILAR_ALLOWED = 0.6 #40% unique and 60% similar
i = 0
while i < len(descriptions):
print("Processing {}/{}...".format(i + 1, len(descriptions)))
desc_to_evaluate = descriptions[i]
j = i + 1
while j < len(descriptions):
similarity_ratio = SequenceMatcher(None, desc_to_evaluate, descriptions[j]).ratio()
if similarity_ratio > MAX_SIMILAR_ALLOWED:
del descriptions[j]
else:
j += 1
i += 1
return descriptions