如何在python中搜索一行中的字符串并提取两个字符之间的数据

Rekha G • 5 年前 • 1689 次点击

文件内容:

module traffic(
    green_main, yellow_main, red_main, green_first, yellow_first, 
    red_first, clk, rst, waiting_main, waiting_first
);

我需要搜索字符串'module',我需要提取(……);括号之间的内容。

这是我试过的代码,我无法得到结果

fp = open(file_name)
contents = fp.read()
unique_word_a = '('
unique_word_b = ');'
s = contents

for line in contents:
    if 'module' in line:
        your_string=s[s.find(unique_word_a)+len(unique_word_a):s.find(unique_word_b)].strip()
        print(your_string)

Python社区是高质量的Python/Django开发社区
本文地址：http://www.python88.com/topic/44329

1689 次点击

文章 [ 2 ] | 最新文章 5 年前

• 1 楼

rusu_ro1 6 年前

如果要在“(”)之间提取内容,可以执行以下操作:(但首先要注意如何处理内容):

for line in content.split('\n'):
    if 'module' in line:
        line_content = line[line.find('(') + 1: line.find(')')]

如果您的内容不只是一行:

import math 
def find_all(your_string, search_string, max_index=math.inf, offset=0,):
    index = your_string.find(search_string, offset)

    while index != -1 and index < max_index:
        yield index
        index = your_string.find(search_string, index + 1)

s = content.replace('\n', '')

for offset in find_all(s, 'module'):
    max_index = s.find('module', offset=offset + len('module'))
    if max_index == -1:
        max_index = math.inf
    print([s[start + 1: stop] for start, stop in zip(find_all(s, '(',max_index, offset), find_all(s, ')', max_index, offset))])

• 2 楼

tobias_k 6 年前

您的代码有以下问题:

for line in contents:
    if 'module' in line:

在这里, contents 是包含文件全部内容的单个字符串,而不是可以逐行循环的字符串(行)列表或文件句柄。因此,你的 line 实际上不是一行,而是字符串中的一个字符,显然可以从未包含子字符串 "module" .

因为你从来没有使用这个 线 在循环中,您只需删除循环和条件,代码就可以正常工作。(如果您将代码更改为实际循环行,并且 find 在这些行中,它不会工作,因为 ( 和 ) 不在同一行。)

或者,可以使用正则表达式:

>>> content = """module traffic(green_main, yellow_main, red_main, green_first, yellow_first, 
...                red_first, clk, rst, waiting_main, waiting_first);"""
...
>>> re.search("module \w+\((.*?)\);", content, re.DOTALL).group(1)
'green_main, yellow_main, red_main, green_first, yellow_first, \n               red_first, clk, rst, waiting_main, waiting_first'

在这里, module \w+\((.*?)\); 方法

单词 module 后面是空格和一些单词类型 \w 文字
字面上的开头 (
抓捕组 (...) 带着任何东西 . ,包括换行符( re.DOTALL ),非贪婪 *?
字面上的结束语 ) 和 ;

和 group(1) 在(非逃逸的)对 (…)

如果你想把这些列在清单上:

>>> list(map(str.strip, _.split(",")))
['green_main', 'yellow_main', 'red_main', 'green_first', 'yellow_first', 'red_first', 'clk', 'rst', 'waiting_main', 'waiting_first']

登录后回复