您可以使用两个regex操作。第一个通过匹配
^[a-zA-Z\s\(\)]*$
,然后第二个使用正向先行收集所需子字符串:
.*?(?= [A-Z])
.
import re
my_try = ['a bb Aas','aa 1 Aasdf','aa bb (cc) AA','aaa ASD','aa . ASD','aaaa 1 bb Aas']
filtered = [x for x in my_try if re.match(r'^[a-zA-Z\s\(\)]*$', x)]
result = [re.match(r'.*?(?= [A-Z])', x).group(0) for x in filtered]
print(result) # => ['a bb', 'aa bb (cc)', 'aaa']
如果预期某些字符串可能会通过筛选(即,包含除字母字符、括号或空格之外的其他内容),但可能与lookahead不匹配,则需要筛选中间结果:
import re
my_try = ['a bb Aas','aaa ASD','aa . ASD','aaaa 1 bb Aas', '']
# ^^ could cause problems
filtered = [x for x in my_try if re.match(r'^[a-zA-Z\s\(\)]*$', x)]
matches = [re.match(r'.*?(?= [A-Z])', x) for x in filtered]
result = [x.group(0) for x in matches if x]
print(result) # => ['a bb', 'aaa']