我有一份文件的标题和副标题清单。
test_list = ['heading', 'heading','sub-heading', 'sub-heading', 'heading', 'sub-heading', 'sub-sub-heading', 'sub-sub-heading', 'sub-heading', 'sub-heading', 'sub-sub-heading', 'sub-sub-heading','sub-sub-heading', 'heading']
我想为每个标题和副标题指定唯一索引,如下所示:
seg_ids = ['1', '2', '2_1', '2_2', '3', '3_1', '3_1_1', '3_1_2', '3_2', '3_3', '3_3_1', '3_3_2', '3_3_3', '4']
这是我创建这个结果的代码,但它很混乱,并且仅限于深度3。如果有任何文档带有子标题,代码就会变得更加复杂。有什么类似蟒蛇的方法吗?
seg_ids = []
for idx, an_ele in enumerate(test_list):
head_id = 0
subh_id = 0
subsubh_id = 0
if an_ele == 'heading' and idx == 0: # if it is the first element
head_id = '1'
seg_ids.append(head_id)
else:
last_seg_ids = seg_ids[idx-1].split('_') # find the depth of the last element
head_id = last_seg_ids[0]
if len(last_seg_ids) == 2:
subh_id = last_seg_ids[1]
elif len(last_seg_ids) == 3:
subh_id = last_seg_ids[1]
subsubh_id = last_seg_ids[2]
if an_ele == 'heading':
head_id= str(int(head_id)+1)
subh_id = 0 # reset sub_heading index
subsubh_id = 0 # reset sub_sub_heading index
elif an_ele == 'sub-heading':
subh_id= str(int(subh_id)+1)
subsubh_id = 0 # reset sub_sub_heading index
elif an_ele == 'sub-sub-heading':
subsubh_id= str(int(subsubh_id)+1)
else:
print('ERROR')
if subsubh_id==0:
if subh_id !=0:
seg_ids.append(head_id+'_'+subh_id)
else:
seg_ids.append(head_id)
if subsubh_id !=0:
seg_ids.append(str(head_id)+'_'+str(subh_id)+'_'+str(subsubh_id))
print(seg_ids)