在python中从excel文件导入pandas数据框时发生类型错误

import pandas as pd path = 'file.xlsx' dict1 = {'a' : [3, [1, 2, 3], 'text1'], 'b' : [4, [4, 5, 6, 7], 'text2']} print('\n\nType 1:', type(dict1['a'][1])) df1 = pd.DataFrame(dict1) df1.to_excel(path, sheet_name='Sheet1') print("\n\nSaved df:\n", df1 , '\n\n') df2 = pd.read_excel(path, sheet_name='Sheet1') print("\n\nLoaded df:\n", df2 , '\n\n') dict2 = df2.to_dict(orient='list') print("New dict:", dict2, '\n\n') print('Type 2:', type(dict2['a'][1]))

Type 1: <class 'list'> Saved df: a b 0 3 4 1 [1, 2, 3] [4, 5, 6, 7] 2 text1 text2 Loaded df: a b 0 3 4 1 [1, 2, 3] [4, 5, 6, 7] 2 text1 text2 New dict: {'a': [3, '[1, 2, 3]', 'text1'], 'b': [4, '[4, 5, 6, 7]', 'text2']} Type 2: <class 'str'>

现在,有一个选项 read_excel 它允许我们改变 dtype 但是没有这样的选项来更改 D型 任何一排的。所以,在读入数据之后,我们必须自己进行类型转换。

正如你在问题中所展示的, df['a'][1] 有类型 str ,但你希望它有类型 list .

所以,假设我们有一些绳子 l ='[1, 2, 3]' 我们可以把它转换成一个int列表( l=[1, 2, 3] 作为 [int(val) for val in l.strip('[]').split(',')] . 现在,我们可以将它与 .apply 得到我们想要的东西的方法:

df.iloc[1] = df.iloc[1].apply(lambda x : [int(val) for val in x.strip('[]').split(',')])

把这个例子放在一起,我们有:

import pandas as pd

# Data as read in by read_excel method
df2 = pd.DataFrame({'a' : [3, '[1, 2, 3]', 'text1'],
                   'b' : [4, '[4, 5, 6, 7]', 'text2']})
print('Type: ', type(df2['a'][1]))
#Type:  <class 'str'>

# Convert strings in row 1 to lists
df2.iloc[1] = df2.iloc[1].apply(lambda x : [int(val) for val in x.strip('[]').split(',')])

print('Type: ', type(df2['a'][1]))
#Type:  <class 'list'>

dict2 = df2.to_dict(orient='list')