如果文件不太大,可以立即读取,并使用re.findall():
import re
with open("infile.txt") as finp:
data=finp.read()
with open('outfile.txt', "w") as f:
for item in re.findall(r">.+?[\r\n\f][AGTC]*?AATAAA[AGTC]{2,}GGAC[AGTC]*", data):
f.write(item+"\n")
"""
+? and *? means non-greedy process;
>.+?[\r\n\f] matches a line starting with '>' and followed by any characters to the end of the line;
[AGTC]*?AATAAA matches any number of A,G,T,C characters, followed by the AATAAA pattern;
[AGTC]{2,} matches at least two or more characters of A,G,T,C;
GGAC matches the GGAC pattern;
[AGTC]* matches the empty string or any number of A,G,T,C characters.
"""