如何在python中将一串键值转换为正确的dict?

eatkimchi • 2 年前 • 957 次点击

假设我有一本字典一串中间有未知数量的空格:

'address: 123 fake street city: new york state: new york population: 500000'

我该怎么去

{'address': '123 fake street',
 'city': 'new york',
 'state': 'new york',
 'population': 500000}

甚至是列表或元组,其效果如下:

['address', '123 fake street'],
['city', 'new york'...]

(1) 假设键总是带有冒号的单字

key:

city:

population:

(2) 假设边缘情况 Address: 可能是“X栋S/W约翰·史密斯转交”

Python社区是高质量的Python/Django开发社区
本文地址：http://www.python88.com/topic/133799

957 次点击

文章 [ 5 ] | 最新文章 2 年前

• 1 楼

Dominic Johnston Whiteley 2 年前

import re
string = 'address: 123 fake street city: new york state: new york    population:        500000'
pat = "[a-zA-Z0-9_]+:"
keys = re.findall(pat,string)
vals = [val.strip() for val in re.split(pat,string)]
## ignoring vals[0] as it will always be the empty string
result = dict(zip(keys, vals[1:]))
print(result)

• 2 楼

Mark Tolonen 2 年前

另一种方法是 re.split .在单词上拆分,后跟冒号,并可选地用空格包围,捕获单词:

>>> s = 'address: C/O John Smith @ Building X, S/W city: new york state: new york    population:        500000'
>>> re.split('\s*(\w+):\s*',s)
['', 'address', 'C/O John Smith @ Building X, S/W', 'city', 'new york', 'state', 'new york', 'population', '500000']

开头会有一个空字符串, zip 向上交替设置键和值,并转换为 dict :

>>> x=re.split('\s*(\w+):\s*',s)
>>> dict(zip(x[1::2],x[2::2]))
{'address': 'C/O John Smith @ Building X, S/W', 'city': 'new york', 'state': 'new york', 'population': '500000'}

• 3 楼

esquisilo 2 年前

如果你不想使用正则表达式,我想你可以这样做:

将字符串拆分为一个由冒号分隔的数组。
初始化字典
循环使用冒号分隔的字符串,按空格分隔,将最后一个元素称为下一个键,将其他元素称为最后一个键的条目。
然后对第一个元素(有一个项但没有项)和最后一个元素(有一个项但没有项)进行例外。

这里有一些代码

#1. Separate by colon
blah = 'address: 123 fake street city: new york state: new york    population:        500000'
colon_split = blah.split(":")
#Assuming string is always formatted...
# Key(ONE WORD): Entry, maybe multiple words KEY(ONE WORD): etc. etc. etc.
#Then the first element of the array above is automatically a key.

#Initialize your dictionary
Dictionary = {}

#3. Separate by space to find keys and populate the dict
#Since your entries are in series here, you'll have three kind of element.
#....Your first element will be just a key
#....The rest will be the entry for the previous key followed by a new key
#....Your last element will be just an entry

for i in range(len(colon_split)):
    if i==0:
        key = colon_split[i]
    elif i==len(colon_split)-1:
        entry = colon_split[i]
        Dictionary[key] = entry
    else:
        entry_array = colon_split[i].split(" ")[:-1]
        entry = " ".join(entry_array)
        #Put the current iteration's entry into the last iteration's key 
        Dictionary[key] = entry
        key = colon_split[i].split(" ")[-1]

• 4 楼

Dieter 2 年前

你可以试试这个正则表达式: (\w+): *([\w\\\/\- \.\@\_\| ]+)([^\w:]|$) 但你也必须把它剥掉

import re

my_string = 'address: 123 fake street city: new york state: new york    population:        500000'

{ x.group(1): x.group(2).strip() for x in re.finditer(r'(\b\w+\b): *([\w\\\/\- \.\_\|\@ ]+)([^\w:]|$)', my_string)}
    
    â

结果:

    {'address': '123 fake street',
     'city': 'new york',
     'state': 'new york',
     'population': '500000'}

• 5 楼

Andrej Kesely 2 年前

试试看( regex101 ):

import re

s = "address: C/O John Smith @ Building X, S/W city: new york state: new york    population:        500000"

d = dict(re.findall(r"([^\s]+)\s*:\s*(.*?)\s*(?=[^\s]+:|$)", s))
print(d)

印刷品:

{
    "address": "C/O John Smith @ Building X, S/W",
    "city": "new york",
    "state": "new york",
    "population": "500000",
}

登录后回复