Py学习  »  Python

Python dataframe将DICT列表列转换为具有单个元素的列

THill3 • 3 年前 • 1495 次点击  

我试着用不同的方式问这个问题,我得到的答案只针对问题的一个特定部分,而不是整个问题。为了避免混淆,我再次尝试,并用不同的措辞回答这个问题。

我有一个数据框架,其中有几列有常规数据,但有一列有字典列表作为元素。这里有一个例子。

list_of_dicts = [{'a':'sam','b':2},{'a':'diana','c':'grape', 'd':5},{'a':'jody','c':7,'e':'foo','f':9}]
list_of_dicts_2 = [{'a':'joe','b':2},{'a':'steve','c':'pizza'},{'a':'alex','c':7,'e':'doh'}]

df4.loc[0,'lists_of_stuff'] = list_of_dicts
df4.loc[1,'lists_of_stuff'] = list_of_dicts_2

df4.loc[0,'other1'] = 'Susie'
df4.loc[1,'other1'] = 'Rachel'

df4.loc[0,'other2'] = 123
df4.loc[1,'other2'] = 456

df4
    other1  lists_of_stuff                                                              other2
0   Susie   [{'a':'sam','b':2},{'a':'diana','c':'grape', 'd':5},{'a':'jody','c':7,'e':'foo','f':9}]                 123
1   Rachel  [{'a':'joe','b':2},{'a':'steve','c':'pizza'},{'a':alex,'c':7,'e':'doh'}]        456

我正在尝试将这些字典拆分为列,以便有一个更简单的数据框架。类似这样(列顺序可能不同)

    other1 a_1   b   a_2   c     d   a_3      c_2   e   f   other2
0   Susie  sam   2   diana grape 5   jody     7     foo 9   123
1   Rachel joe   2   steve pizza NaN alex     7     doh NaN 456

或者像这样

    other1 a     b   c     d   e   f   other2
0   Susie  sam   2   NaN   NaN NaN NaN 123
1   Susie  diana NaN 4     5   NaN NaN 123
2   Susie  jody  NaN 7     NaN foo 9   123
3   Rachel joe   2   NaN   NaN NaN NaN 456 
4   Rachel steve NaN pizza NaN NaN NaN 456
5   Rachel alex  NaN 7     NaN doh NaN 456

两件事 不要 工作是 pd.DataFrame(df4['list_of_stuff']) (只显示数据帧的原样,即它不会改变任何内容)和 pd.json_normalize(df4['list_of_stuff']) (这会抛出一个错误)。此外,Flatte_json和涉及系列的解决方案尚未产生可行的结果。

将df4转化为提议的输出之一的正确方法是什么?

(是的,我在别处问了几乎相同的问题。 List of variable size dicts to a dataframe .这个问题不清楚,所以我决定用一个新问题再试一次,而不是在另一个问题上添加一堆东西,让它难以理解。)

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/133722
 
1495 次点击  
文章 [ 1 ]  |  最新文章 3 年前
Andrej Kesely
Reply   •   1 楼
Andrej Kesely    3 年前

尝试:

# if the lists_of_stuff are strings, apply literal_eval
#from ast import literal_eval
#df["lists_of_stuff"] = df["lists_of_stuff"].apply(literal_eval)

df = df.explode("lists_of_stuff")
df = pd.concat([df, df.pop("lists_of_stuff").apply(pd.Series)], axis=1)
print(df)

印刷品:

   other1  other2      a    b      c    d    e    f
0   Susie     123    sam  2.0    NaN  NaN  NaN  NaN
0   Susie     123  diana  NaN  grape  5.0  NaN  NaN
0   Susie     123   jody  NaN      7  NaN  foo  9.0
1  Rachel     456    joe  2.0    NaN  NaN  NaN  NaN
1  Rachel     456  steve  NaN  pizza  NaN  NaN  NaN
1  Rachel     456   alex  NaN      7  NaN  doh  NaN

编辑:要重新索引列,请执行以下操作:

#... code as above
df = df.reset_index(drop=True).reindex(
    [*df.columns[:1]] + [*df.columns[2:]] + [*df.columns[1:2]], axis=1
)
print(df)

印刷品:

   other1      a    b      c    d    e    f  other2
0   Susie    sam  2.0    NaN  NaN  NaN  NaN     123
1   Susie  diana  NaN  grape  5.0  NaN  NaN     123
2   Susie   jody  NaN      7  NaN  foo  9.0     123
3  Rachel    joe  2.0    NaN  NaN  NaN  NaN     456
4  Rachel  steve  NaN  pizza  NaN  NaN  NaN     456
5  Rachel   alex  NaN      7  NaN  doh  NaN     456