在评论者的要求下发布答案
my answer to a similar question
使用相同的技术来改变文件的最后一行,而不仅仅是得到它。
对于一个大文件,
mmap
是最好的办法。改善现有的
MMAP
答:这个版本在Windows和Linux之间是可移植的,并且应该运行得更快(尽管在32位Python上,如果文件在GB范围内,如果不做一些修改,它将无法工作,请参阅
other answer for hints on handling this, and for modifying to work on Python 2
)
import io # Gets consistent version of open for both Py2.7 and Py3.x
import itertools
import mmap
def skip_back_lines(mm, numlines, startidx):
'''Factored out to simplify handling of n and offset'''
for _ in itertools.repeat(None, numlines):
startidx = mm.rfind(b'\n', 0, startidx)
if startidx < 0:
break
return startidx
def tail(f, n, offset=0):
# Reopen file in binary mode
with io.open(f.name, 'rb') as binf, mmap.mmap(binf.fileno(), 0, access=mmap.ACCESS_READ) as mm:
# len(mm) - 1 handles files ending w/newline by getting the prior line
startofline = skip_back_lines(mm, offset, len(mm) - 1)
if startofline < 0:
return [] # Offset lines consumed whole file, nothing to return
# If using a generator function (yield-ing, see below),
# this should be a plain return, no empty list
endoflines = startofline + 1 # Slice end to omit offset lines
# Find start of lines to capture (add 1 to move from newline to beginning of following line)
startofline = skip_back_lines(mm, n, startofline) + 1
# Passing True to splitlines makes it return the list of lines without
# removing the trailing newline (if any), so list mimics f.readlines()
return mm[startofline:endoflines].splitlines(True)
# If Windows style \r\n newlines need to be normalized to \n, and input
# is ASCII compatible, can normalize newlines with:
# return mm[startofline:endoflines].replace(os.linesep.encode('ascii'), b'\n').splitlines(True)
这假定尾线的数量足够小,你可以安全地将它们全部读入内存中;也可以使这成为一个生成器函数,并通过替换最后一行来手动读取一行。
mm.seek(startofline)
# Call mm.readline n times, or until EOF, whichever comes first
# Python 3.2 and earlier:
for line in itertools.islice(iter(mm.readline, b''), n):
yield line
# 3.3+:
yield from itertools.islice(iter(mm.readline, b''), n)
最后,以二进制模式读取(必须使用
MMAP
)所以它给了
str
行(py2)和
bytes
线条(PY3);如果需要
unicode
(PY2)或
STR
(py3),可以调整迭代方法来为您解码和/或修复换行符:
lines = itertools.islice(iter(mm.readline, b''), n)
if f.encoding: # Decode if the passed file was opened with a specific encoding
lines = (line.decode(f.encoding) for line in lines)
if 'b' not in f.mode: # Fix line breaks if passed file opened in text mode
lines = (line.replace(os.linesep, '\n') for line in lines)
# Python 3.2 and earlier:
for line in lines:
yield line
# 3.3+:
yield from lines
注意:我在一台无法访问python进行测试的机器上输入这些内容。请告诉我,如果我打字什么,这是足够的相似。
my other answer
那我
认为
它应该可以工作,但是调整(例如处理
offset
)可能会导致微妙的错误。如果有任何错误,请在评论中告诉我。