如何从特定域中排除电子邮件地址,并通过pythonical方法提取其他地址

py_noob • 5 年前 • 1443 次点击

我有一个电子邮件地址列表,其中一些来自相关域,另一些来自垃圾邮件/无关电子邮件域。我想把这两个都记录下来,但要单独列出。我知道相关的是从哪里来的(总是同一个领域- @gmail.com 但垃圾邮件来自不同的地方,所有这些都需要被捕获)。

    # Extract all email ids from a JSON file
    import re
    import json

     with open("test.json", 'r') as fp:
         json_decode = json.loads(fp.read())

         line = str(json_decode)

         match = re.findall(r'[\w\.-]+@[\w.-]+', line)
         l = len(match)
         print(match)

         for i in match:
             domain = match.split('@')[i]


        OUTPUT: match = ['image001.png@01D36CD8.2A2219D0', 'arealjcl@countable.us', 'taylor.l.ingram@gmail.com']

前两个是垃圾邮件,第三个是合法的电子邮件,所以他们必须在不同的名单。我是不是在 @ 确定域或排除所有 @gmail.com网站 然后把它放到另一张单子上。

Python社区是高质量的Python/Django开发社区
本文地址：http://www.python88.com/topic/52692

1443 次点击

文章 [ 3 ] | 最新文章 5 年前

• 1 楼

ggcarmi 6 年前

您可以根据定义的相关域将它们分为两个列表

 # extract all email ids from a json file
 import re
 import json

 relevant_domains = ['@gmail.com'] # you can add more

 with open("test.json", 'r') as fp:
     json_decode = json.loads(fp.read())

     line = str(json_decode)

     match = re.findall(r'[\w\.-]+@[\w.-]+', line)
     l = len(match)
     print(match)

     relevant_emails = []
     spam_emails = []

     for email in match:
         domain = email.split('@')[1]

         if domain in relevant_domains:
             relevant_emails.append(email)
         else:
             spam_emails.append(email)

• 2 楼

wwii 6 年前

当您将电子邮件地址拆分为 '@' 你会得到两个项目列表:

In [3]: 'image001.png@01D36CD8.2A2219D0'.split('@')
Out[3]: ['image001.png', '01D36CD8.2A2219D0']

如果你想检查域索引结果的第二项:

In [4]: q = 'image001.png@01D36CD8.2A2219D0'.split('@')

In [5]: q[1]
Out[5]: '01D36CD8.2A2219D0'

所以你的for循环更像是:

In [9]: for thing in match:
   ...:     domain = thing.split('@')[1]
   ...:     print(domain)
   ...:     
01D36CD8.2A2219D0
countable.us
gmail.com

• 3 楼

Siddharth Dushantha 6 年前

我建议你用 endswith() 功能。以下是您使用它的方法:

legit = []
spam = []

# We iterate through the list of matches
for email in match:

    # This checks if the email ends with @gmail.com.
    # If it returns True, that means it is a good email.
    # But, if it returns False, then it means that the email
    # is spam.
    email_status = email.endswith("@gmail.com")


    if email_status == False:
        spam.append(email)

    else:
        legit.append(email)

编辑: 更改了代码,以便正确回答您的问题

登录后回复