社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  Python

如何在python中将一串键值转换为正确的dict?

eatkimchi • 3 年前 • 1735 次点击  

假设我有一本字典 一串 中间有未知数量的空格:

'address: 123 fake street city: new york state: new york population: 500000'

我该怎么去

{'address': '123 fake street',
 'city': 'new york',
 'state': 'new york',
 'population': 500000}

甚至是列表或元组,其效果如下:

['address', '123 fake street'],
['city', 'new york'...]

(1) 假设键总是带有冒号的单字

key:

city:

population:

(2) 假设边缘情况 Address: 可能是“X栋S/W约翰·史密斯转交”

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/133799
 
1735 次点击  
文章 [ 5 ]  |  最新文章 3 年前
Dominic Johnston Whiteley
Reply   •   1 楼
Dominic Johnston Whiteley    3 年前
import re
string = 'address: 123 fake street city: new york state: new york    population:        500000'
pat = "[a-zA-Z0-9_]+:"
keys = re.findall(pat,string)
vals = [val.strip() for val in re.split(pat,string)]
## ignoring vals[0] as it will always be the empty string
result = dict(zip(keys, vals[1:]))
print(result)
Mark Tolonen
Reply   •   2 楼
Mark Tolonen    3 年前

另一种方法是 re.split .在单词上拆分,后跟冒号,并可选地用空格包围,捕获单词:

>>> s = 'address: C/O John Smith @ Building X, S/W city: new york state: new york    population:        500000'
>>> re.split('\s*(\w+):\s*',s)
['', 'address', 'C/O John Smith @ Building X, S/W', 'city', 'new york', 'state', 'new york', 'population', '500000']

开头会有一个空字符串, zip 向上交替设置键和值,并转换为 dict :

>>> x=re.split('\s*(\w+):\s*',s)
>>> dict(zip(x[1::2],x[2::2]))
{'address': 'C/O John Smith @ Building X, S/W', 'city': 'new york', 'state': 'new york', 'population': '500000'}
esquisilo
Reply   •   3 楼
esquisilo    3 年前

如果你不想使用正则表达式,我想你可以这样做:

  1. 将字符串拆分为一个由冒号分隔的数组。
  2. 初始化字典
  3. 循环使用冒号分隔的字符串,按空格分隔,将最后一个元素称为下一个键,将其他元素称为最后一个键的条目。
  4. 然后对第一个元素(有一个项但没有项)和最后一个元素(有一个项但没有项)进行例外。

这里有一些代码

#1. Separate by colon
blah = 'address: 123 fake street city: new york state: new york    population:        500000'
colon_split = blah.split(":")
#Assuming string is always formatted...
# Key(ONE WORD): Entry, maybe multiple words KEY(ONE WORD): etc. etc. etc.
#Then the first element of the array above is automatically a key.

#Initialize your dictionary
Dictionary = {}

#3. Separate by space to find keys and populate the dict
#Since your entries are in series here, you'll have three kind of element.
#....Your first element will be just a key
#....The rest will be the entry for the previous key followed by a new key
#....Your last element will be just an entry

for i in range(len(colon_split)):
    if i==0:
        key = colon_split[i]
    elif i==len(colon_split)-1:
        entry = colon_split[i]
        Dictionary[key] = entry
    else:
        entry_array = colon_split[i].split(" ")[:-1]
        entry = " ".join(entry_array)
        #Put the current iteration's entry into the last iteration's key 
        Dictionary[key] = entry
        key = colon_split[i].split(" ")[-1]
Dieter
Reply   •   4 楼
Dieter    3 年前

你可以试试这个正则表达式: (\w+): *([\w\\\/\- \.\@\_\| ]+)([^\w:]|$) 但你也必须把它剥掉

import re

my_string = 'address: 123 fake street city: new york state: new york    population:        500000'

{ x.group(1): x.group(2).strip() for x in re.finditer(r'(\b\w+\b): *([\w\\\/\- \.\_\|\@ ]+)([^\w:]|$)', my_string)}
    
    ​

结果:

    {'address': '123 fake street',
     'city': 'new york',
     'state': 'new york',
     'population': '500000'}
Andrej Kesely
Reply   •   5 楼
Andrej Kesely    3 年前

试试看( regex101 ):

import re

s = "address: C/O John Smith @ Building X, S/W city: new york state: new york    population:        500000"

d = dict(re.findall(r"([^\s]+)\s*:\s*(.*?)\s*(?=[^\s]+:|$)", s))
print(d)

印刷品:

{
    "address": "C/O John Smith @ Building X, S/W",
    "city": "new york",
    "state": "new york",
    "population": "500000",
}