社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  Python

使用python在字符串列表中查找unqiue子字符串模式

Xin Niu • 3 年前 • 1333 次点击  

我有一个字符串列表如下:

['/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-130_S_4817-ses-2018-05-04_14_33_33.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_0767-ses-2019-04-08_12_52_36.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_5097-ses-2019-05-07_09_56_14.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_4061-ses-2017-09-26_14_07_37.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-002_S_1280-ses-2017-03-13_13_38_31.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-082_S_5282-ses-2019-06-17_10_11_15.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-018_S_4399-ses-2019-08-06_13_03_58.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-123_S_0106-ses-2018-10-11_12_54_59.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_2333-ses-2018-12-26_15_31_55.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-031_S_2018-ses-2019-01-24_11_26_13.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_0679-ses-2017-07-05_09_46_36.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0303-ses-2017-05-11_13_39_46.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0454-ses-2017-09-06_09_41_25.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_2187-ses-2019-10-09_13_19_17.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-116_S_4043-ses-2018-03-02_10_03_10.0.txt',

我希望用“sub-??”模式提取唯一的主题id_S_u???'在名单上。

到目前为止,我可以用:

unique_subject = re.search('(.*)_sub-(.*)-ses(.*).txt', all_files[0]).group(2)

但这只适用于单个字符串。我需要做一个循环。

unique_subject = set()

for f in all_files:
    unique_subject.add(re.search('(.*)_sub-(.*)-ses(.*).txt', f).group(2))

我想知道是否有更好的方法来做到这一点。最后,我想为每个主题安排第一节课。有没有快速的方法?

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/131618
 
1333 次点击  
文章 [ 2 ]  |  最新文章 3 年前
constantstranger
Reply   •   1 楼
constantstranger    3 年前

您可以使用相同的正则表达式(我在会话部分添加了一个连字符),并将集合更改为关键字/值为subject/first session的dict。考虑到您希望以不同的方式对待每个主题的第一行,我认为您当前使用循环列表元素的方法很好。

all_files = [
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-130_S_4817-ses-2018-05-04_14_33_33.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_0767-ses-2019-04-08_12_52_36.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_5097-ses-2019-05-07_09_56_14.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_4061-ses-2017-09-26_14_07_37.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-002_S_1280-ses-2017-03-13_13_38_31.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-082_S_5282-ses-2019-06-17_10_11_15.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-018_S_4399-ses-2019-08-06_13_03_58.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-123_S_0106-ses-2018-10-11_12_54_59.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_2333-ses-2018-12-26_15_31_55.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-031_S_2018-ses-2019-01-24_11_26_13.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_0679-ses-2017-07-05_09_46_36.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0303-ses-2017-05-11_13_39_46.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0454-ses-2017-09-06_09_41_25.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_2187-ses-2019-10-09_13_19_17.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-116_S_4043-ses-2018-03-02_10_03_10.0.txt'
]

import re
unique_subject = {}

for f in all_files:
    groups = re.search('(.*)_sub-(.*)-ses-(.*).txt', f)
    subject = groups.group(2)
    if subject not in unique_subject:
        session = groups.group(3)
        unique_subject[subject] = session

[print(f"{k} : {v}") for k, v in unique_subject.items()]

输出:

130_S_4817 : 2018-05-04_14_33_33.0
141_S_0767 : 2019-04-08_12_52_36.0
041_S_5097 : 2019-05-07_09_56_14.0
068_S_4061 : 2017-09-26_14_07_37.0
002_S_1280 : 2017-03-13_13_38_31.0
082_S_5282 : 2019-06-17_10_11_15.0
018_S_4399 : 2019-08-06_13_03_58.0
123_S_0106 : 2018-10-11_12_54_59.0
141_S_2333 : 2018-12-26_15_31_55.0
031_S_2018 : 2019-01-24_11_26_13.0
041_S_0679 : 2017-07-05_09_46_36.0
037_S_0303 : 2017-05-11_13_39_46.0
037_S_0454 : 2017-09-06_09_41_25.0
068_S_2187 : 2019-10-09_13_19_17.0
116_S_4043 : 2018-03-02_10_03_10.0
lemon
Reply   •   2 楼
lemon    3 年前

尝试使用以下方法:

l = re.findall('\d{3}_S_\d{4}', ''.join(all_files))