社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
私信  •  关注

Philip

Philip 最近回复了

通过您的两个示例,我可以使用Python的非贪婪语法创建regex,如前所述 here

1:[123]   2:[foo]   3:[456]
1:[2]   2:[foo1c#BAR]   3:[]

下面是正则表达式:

^([^A-Za-z]*)(.*?)([^A-Za-z]*)$

mo.group(2) 你想要什么,在哪里 mo

下面将从所讨论的URL中得到63个URL

import requests
import re

url = "https://en.wikipedia.org/wiki/Collatz_conjecture"
text = requests.get(url).text

url_pattern = r"((http(s)?://)([\w-]+\.)+[\w-]+[.com]+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?)"

# Get all matching patterns of url_pattern
# this will return a list of tuples 
# where we are only interested in the first item of the tuple
urls = re.findall(url_pattern, text)

# using list comprehension to get the first item of the tuple, 
# and the set function to filter out duplicates
unique_urls = set([x[0] for x in urls])
print(f'Number of unique HTML tags: {len(unique_urls)} found on {url}')

输出:

Number of unique HTML tags: 63 found on https://en.wikipedia.org/wiki/Collatz_conjecture