因为我没有您的原始文本,所以我将使用您示例中的字符串。
看看下面的两个正则表达式是否适合您。我还包括第三个更精确的。
'(?<=\dU)[\w]+@[\w\.]+?(?=U|\s|$)'
.
'(?<=\dU)[\w]+@[\w]+\.[\w]+?(?=U|\s|$)'
.
示例测试
>>> import re
>>> string = '''3465Usjohnson@astate.eduUProvost instead of sjohnson@astate.edu The surround text that it is being extracted from: 870-972-3465Usjohnson@astate.eduUProvost and Vice ChancellorDr. Lynita Cooksey870-972-2 030 870-972-2036Ulcooksey@astate.edu'''
>>> re.findall('(?<=\dU)[\w]+@[\w\.]+?(?=U|\s|$)', string)
#Output
['sjohnson@astate.edu', 'sjohnson@astate.edu', 'lcooksey@astate.edu']
>>> re.findall('(?<=\dU)[\w]+@[\w]+\.[\w]+?(?=U|\s|$)', string)
#Output
['sjohnson@astate.edu', 'sjohnson@astate.edu', 'lcooksey@astate.edu']
.
更准确一点,因为电子邮件都以
.edu
'(?<=\dU)[\w]+@[\w]*\.edu(?=U|\s|$)'
.
示例测试
>>> string = '''3465Usjohnson@astate.eduUProvost instead of sjohnson@astate.edu The surround text that it is being extracted from: 870-972-3465Usjohnson@astate.eduUProvost and Vice ChancellorDr. Lynita Cooksey870-972-2 030 870-972-2036Ulcooksey@astate.edu'''
>>> re.findall('(?<=\dU)[\w]+@[\w]*\.edu(?=U|\s|$)', string)
#Output
['sjohnson@astate.edu', 'sjohnson@astate.edu', 'lcooksey@astate.edu']