Python:如何使用regex显示文本文件中的前几个数字

smokingpenguin • 5 年前 • 1425 次点击

我的任务是显示两个不同文本文件的俯视图。文本文件的格式为“文件”,后跟路径文件夹、视图、打开/关闭。我遇到的麻烦是显示顶部视图和路径文件夹的标题必须按字母顺序排列,以防视图相同。

我已经用glob读取了两个不同的文件。我甚至使用regex来确保文件的读取方式与预期一致。我也知道我可以使用排序/排序来按字母顺序排序。我主要关心的是显示文本文件的顶视图。

这是我的文件:

文件1.txt

file Marvel/GuardiansOfGalaxy 300 1
file DC/Batman 504 1
file GameOfThrones 900 0
file DC/Superman 200 1
file Marvel/CaptainAmerica 342 0

文件2.txt

file Science/Biology 200 1
file Math/Calculus 342 0
file Psychology 324 1
file Anthropology 234 0
file Science/Chemistry 444 1

**(从格式可以看出,第三个选项卡是意见 )

这个输出应该是这样的:

file GameOfThrones 900 0
file DC/Batman 504 1
file Science/Chemistry 444 1
file Marvel/CaptainAmerica 342 0
file Math/Calculus 342 0 
...

除此之外,我目前正在开发的功能是显示俯视图:

records = dict(re.findall(r"files (.+) (\d+).+", files))
main_dict = {}

for file in records:
    print(file)
    #idk how to display the top views

return main_dict

Python社区是高质量的Python/Django开发社区
本文地址：http://www.python88.com/topic/51009

1425 次点击

文章 [ 3 ] | 最新文章 5 年前

• 1 楼

DirtyBit 6 年前

从我上面的评论来看:

读取两个文件并将它们的行存储在列表中
平展列表
按字符串中的视图对列表排序

因此 :

list.txt文件:

file Marvel/GuardiansOfGalaxy 300 1
file DC/Batman 504 1
file GameOfThrones 900 0
file DC/Superman 200 1
file Marvel/CaptainAmerica 342 0

列表2.txt:

file Science/Biology 200 1
file Math/Calculus 342 0
file Psychology 324 1
file Anthropology 234 0
file Science/Chemistry 444 1

以及 :

fileOne = 'list.txt'
fileTwo = 'list2.txt'

result = []
with open (fileOne, 'r') as file1Obj, open(fileTwo, 'r') as file2Obj:
      result.append(file1Obj.readlines())
      result.append(file2Obj.readlines())

result = sum(result, [])                 # flattening the nested list
result = [i.split('\n', 1)[0] for i in result]  # removing the \n char

print(sorted(result, reverse=True, key = lambda x: int(x.split()[2]))) # sorting by the view

输出 :

[
 'file GameOfThrones 900 0', 'file DC/Batman 504 1', 'file Science/Chemistry 444 1', 
 'file Marvel/CaptainAmerica 342 0', 'file Math/Calculus 342 0', 
 'file Psychology 324 1', 'file Marvel/GuardiansOfGalaxy 300 1', 
 'file Anthropology 234 0', 'file DC/Superman 200 1', 'file Science/Biology 200 1'
]

较短版本 :

with open (fileOne, 'r') as file1Obj, open(fileTwo, 'r') as file2Obj: result = file1Obj.readlines() + file2Obj.readlines()    
print(list(i.split('\n', 1)[0] for i in sorted(result, reverse=True, key = lambda x: int(x.split()[2]))))   # sorting by the view

• 2 楼

Allan 6 年前

您可以使用以下代码:

#open the 2 files in read mode
with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
  data = f1.read() + f2.read() #store the content of the two files in a string variable
  lines = data.split('\n') #split each line to generate a list
  #do the sorting in reverse mode, based on the 3rd word, in your case number of views
  print(sorted(lines[:-1], reverse=True, key=lambda x:int(x.split()[2])))

输出:

['file GameOfThrones 900 0', 'file DC/Batman 504 1', 'file Science/Chemistry 444 1', 'file Marvel/CaptainAmerica 342 0', 'file Math/Calculus 342 0', 'file Psychology 324 1', 'file Marvel/GuardiansOfGalaxy 300 1', 'file Anthropology 234 0', 'file DC/Superman 200 1', 'file Science/Biology 200 1']

• 3 楼

Arne 6 年前

提取排序条件

首先,你需要得到你想要对每一行进行排序的信息。可以使用此正则表达式从行中提取视图和路径:

>>> import re
>>> criteria_re = re.compile(r'file (?P<path>\S*) (?P<views>\d*) \d*')
>>> m = criteria_re.match('file GameOfThrones 900 0')
>>> res = (int(m.group('views')), m.group('path'))
>>> res
(900, 'GameOfThrones')

排序

现在整个过程只需要应用到您的文件集合。因为我们不需要默认搜索,所以需要设置 key 搜索函数的参数,帮助它知道我们到底要按什么排序:

def sort_files(files):
    lines = []
    for file in records:
        for line in open(file):
            m = criteria_re.match(line)
            # maybe do some error handling here, in case the regex doesn't match
            lines.append((line, (-int(m.group('views')), m.group('path'))))
            # taking the negative view count makes the comparison later a
            # bit more simple, since we can just sort be descending order
            # for both view as well as alphabetical path order 
    # the sorting criteria were only tagging along to help with the order, so
    # we can discard them in the result
    return [line for line, criterion in sorted(lines, key=lambda x: x[1])]

登录后回复