社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  Python

Python dataframe将DICT列表列转换为具有单个元素的列

THill3 • 3 年前 • 1465 次点击  

我试着用不同的方式问这个问题,我得到的答案只针对问题的一个特定部分,而不是整个问题。为了避免混淆,我再次尝试,并用不同的措辞回答这个问题。

我有一个数据框架,其中有几列有常规数据,但有一列有字典列表作为元素。这里有一个例子。

list_of_dicts = [{'a':'sam','b':2},{'a':'diana','c':'grape', 'd':5},{'a':'jody','c':7,'e':'foo','f':9}]
list_of_dicts_2 = [{'a':'joe','b':2},{'a':'steve','c':'pizza'},{'a':'alex','c':7,'e':'doh'}]

df4.loc[0,'lists_of_stuff'] = list_of_dicts
df4.loc[1,'lists_of_stuff'] = list_of_dicts_2

df4.loc[0,'other1'] = 'Susie'
df4.loc[1,'other1'] = 'Rachel'

df4.loc[0,'other2'] = 123
df4.loc[1,'other2'] = 456

df4
    other1  lists_of_stuff                                                              other2
0   Susie   [{'a':'sam','b':2},{'a':'diana','c':'grape', 'd':5},{'a':'jody','c':7,'e':'foo','f':9}]                 123
1   Rachel  [{'a':'joe','b':2},{'a':'steve','c':'pizza'},{'a':alex,'c':7,'e':'doh'}]        456

我正在尝试将这些字典拆分为列,以便有一个更简单的数据框架。类似这样(列顺序可能不同)

    other1 a_1   b   a_2   c     d   a_3      c_2   e   f   other2
0   Susie  sam   2   diana grape 5   jody     7     foo 9   123
1   Rachel joe   2   steve pizza NaN alex     7     doh NaN 456

或者像这样

    other1 a     b   c     d   e   f   other2
0   Susie  sam   2   NaN   NaN NaN NaN 123
1   Susie  diana NaN 4     5   NaN NaN 123
2   Susie  jody  NaN 7     NaN foo 9   123
3   Rachel joe   2   NaN   NaN NaN NaN 456 
4   Rachel steve NaN pizza NaN NaN NaN 456
5   Rachel alex  NaN 7     NaN doh NaN 456

两件事 不要 工作是 pd.DataFrame(df4['list_of_stuff']) (只显示数据帧的原样,即它不会改变任何内容)和 pd.json_normalize(df4['list_of_stuff']) (这会抛出一个错误)。此外,Flatte_json和涉及系列的解决方案尚未产生可行的结果。

将df4转化为提议的输出之一的正确方法是什么?

(是的,我在别处问了几乎相同的问题。 List of variable size dicts to a dataframe .这个问题不清楚,所以我决定用一个新问题再试一次,而不是在另一个问题上添加一堆东西,让它难以理解。)

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/133722
 
1465 次点击  
文章 [ 1 ]  |  最新文章 3 年前
Andrej Kesely
Reply   •   1 楼
Andrej Kesely    3 年前

尝试:

# if the lists_of_stuff are strings, apply literal_eval
#from ast import literal_eval
#df["lists_of_stuff"] = df["lists_of_stuff"].apply(literal_eval)

df = df.explode("lists_of_stuff")
df = pd.concat([df, df.pop("lists_of_stuff").apply(pd.Series)], axis=1)
print(df)

印刷品:

   other1  other2      a    b      c    d    e    f
0   Susie     123    sam  2.0    NaN  NaN  NaN  NaN
0   Susie     123  diana  NaN  grape  5.0  NaN  NaN
0   Susie     123   jody  NaN      7  NaN  foo  9.0
1  Rachel     456    joe  2.0    NaN  NaN  NaN  NaN
1  Rachel     456  steve  NaN  pizza  NaN  NaN  NaN
1  Rachel     456   alex  NaN      7  NaN  doh  NaN

编辑:要重新索引列,请执行以下操作:

#... code as above
df = df.reset_index(drop=True).reindex(
    [*df.columns[:1]] + [*df.columns[2:]] + [*df.columns[1:2]], axis=1
)
print(df)

印刷品:

   other1      a    b      c    d    e    f  other2
0   Susie    sam  2.0    NaN  NaN  NaN  NaN     123
1   Susie  diana  NaN  grape  5.0  NaN  NaN     123
2   Susie   jody  NaN      7  NaN  foo  9.0     123
3  Rachel    joe  2.0    NaN  NaN  NaN  NaN     456
4  Rachel  steve  NaN  pizza  NaN  NaN  NaN     456
5  Rachel   alex  NaN      7  NaN  doh  NaN     456