为了提高处理非常大文件的效率(通常在需要使用tail的日志文件情况下),您通常希望避免读取整个文件(即使这样做时不需要立即将整个文件读入内存),但是,您确实需要以某种方式计算行中的偏移量,而不是角色。一种可能是逐字符反向读取seek(),但这非常慢。相反,最好是在更大的块中处理。
我有一个实用功能,我刚才写的文件向后读取,可以在这里使用。
import os, itertools
def rblocks(f, blocksize=4096):
"""Read file as series of blocks from end of file to start.
The data itself is in normal order, only the order of the blocks is reversed.
ie. "hello world" -> ["ld","wor", "lo ", "hel"]
Note that the file must be opened in binary mode.
"""
if 'b' not in f.mode.lower():
raise Exception("File must be opened using binary mode.")
size = os.stat(f.name).st_size
fullblocks, lastblock = divmod(size, blocksize)
# The first(end of file) block will be short, since this leaves
# the rest aligned on a blocksize boundary. This may be more
# efficient than having the last (first in file) block be short
f.seek(-lastblock,2)
yield f.read(lastblock)
for i in range(fullblocks-1,-1, -1):
f.seek(i * blocksize)
yield f.read(blocksize)
def tail(f, nlines):
buf = ''
result = []
for block in rblocks(f):
buf = block + buf
lines = buf.splitlines()
# Return all lines except the first (since may be partial)
if lines:
result.extend(lines[1:]) # First line may not be complete
if(len(result) >= nlines):
return result[-nlines:]
buf = lines[0]
return ([buf]+result)[-nlines:]
f=open('file_to_tail.txt','rb')
for line in tail(f, 20):
print line
[编辑]添加了更具体的版本(避免了两次反转)