>>>import pandas as pd
>>> obj1 = pd.Series(['a','b'])>>> obj2 = pd.Series(['c','d'])>>> pd.concat([obj1, obj2])0 a
1 b
0 c
1 d
dtype:object
1
2
3
4
5
6
7
8
9
设置
ignore_index=True
,放弃原有的索引值:
>>>import pandas as pd
>>> obj1 = pd.Series(['a','b'])>>> obj2 = pd.Series(['c','d'])>>> pd.concat([obj1, obj2], ignore_index=True)0 a
1 b
2 c
3 d
dtype:object
1
2
3
4
5
6
7
8
9
设置
keys
参数,添加最外层的索引:
>>>import pandas as pd
>>> obj1 = pd.Series(['a','b'])>>> obj2 = pd.Series(['c','d'])>>> pd.concat([obj1, obj2], keys=['s1','s2'])
s1 0 a
1 b
s2 0 c
1 d
dtype:object
1
2
3
4
5
6
7
8
9
设置
names
参数,为索引添加标签:
>>>import pandas as pd
>>> obj1 = pd.Series(['a','b'])>>> obj2 = pd.Series(['c','d'])>>> pd.concat([obj1, obj2], keys=['s1','s2'], names=['Series name','Row ID'])
Series name Row ID
s1 0 a
1 b
s2 0 c
1 d
dtype:object
1
2
3
4
5
6
7
8
9
10
合并
DataFrame
对象:
>>>import pandas as pd
>>> obj1 = pd.DataFrame([['a',1],['b',2]], columns=['letter','number'])>>> obj2 = pd.DataFrame([['c',3],['d',4]], columns=['letter','number'])>>> obj1
letter number
0 a 11 b 2>>>>>> obj2
letter number
0 c 31 d 4>>>>>> pd.concat([obj1, obj2])
letter number
0 a 11 b 20 c 31 d 4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
合并
DataFrame
对象,不存在的值将会被 NaN 填充:
>>>import pandas as pd
>>> obj1 = pd.DataFrame([['a',1],['b',2]], columns=['letter','number'])>>> obj2 = pd.DataFrame([['c',3,'cat'],['d',4,'dog']], columns=['letter','number','animal'])>>> obj1
letter number
0 a 11 b 2>>>>>> obj2
letter number animal
0 c 3 cat
1 d 4 dog
>>>>>> pd.concat([obj1, obj2])
letter number animal
0 a 1 NaN
1 b 2 NaN
0 c 3 cat
1 d 4 dog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
合并
DataFrame
对象,设置
join="inner"
不存在的列将会舍弃:
>>>import pandas as pd
>>> obj1 = pd.DataFrame([['a',1],['b',2]], columns=['letter','number'])>>> obj2 = pd.DataFrame([['c',3,'cat'],['d',4,'dog']], columns=['letter','number','animal'])>>> obj1
letter number
0 a 11
b 2>>>>>> obj2
letter number animal
0 c 3 cat
1 d 4 dog
>>>>>> pd.concat([obj1, obj2], join="inner")
letter number
0 a 11 b 20 c 31 d 4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
合并
DataFrame
对象,设置
axis=1
沿 y 轴合并(增加列):
>>>import pandas as pd
>>> obj1 = pd.DataFrame([['a',1],['b',2]], columns=['letter','number'])>>> obj2 = pd.DataFrame([['bird','polly'],['monkey','george']], columns=['animal','name'])>>> obj1
letter number
0 a 11 b 2>>>>>> obj2
animal name
0 bird polly
1 monkey george
>>>>>> pd.concat([obj1, obj2], axis=1)
letter number animal name
0 a 1 bird polly
1 b 2 monkey george
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
设置
verify_integrity=True
,检查新的索引是否有重复项,有重复项会报错:
>>>import pandas as pd
>>> obj1 = pd.DataFrame([1], index=['a'])>>> obj2 = pd.DataFrame([2], index=['a'])>>> obj1
0
a 1>>>>>> obj2
0
a 2>>>>>> pd.concat([obj1, obj2], verify_integrity=True)
Traceback (most recent call last):...
ValueError:
Indexes have overlapping values:['a']
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
设置
sort=True
,会对列索引进行排序输出:
>>> obj1 = pd.DataFrame([['a',3],['d',2]], columns=['letter','number'])>>> obj2 = pd.DataFrame([['c',1,'cat'],['b',4,'dog']], columns=['letter','number','animal'])>>> obj1
letter number
0 a 31 d 2>>>>>> obj2
letter number animal
0 c 1 cat
1 b 4 dog
>>>>>> pd.concat([obj1, obj2], sort=True)
animal letter number
0 NaN a 31 NaN d 20 cat c 11 dog b 4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
【02x00】append
Append 方法事实上是在一个 Series / DataFrame 对象后最追加另一个 Series / DataFrame 对象并返回一个新对象,不改变原对象的值。
>>>import pandas as pd
>>> obj1 = pd.DataFrame([[1,2],[3,4]], columns=list('AB'))>>> obj2 = pd.DataFrame([[5,6],[7,8]], columns=list('AB'))>>>>>> obj1
A B
012134>>>>>> obj2
A B
056178>>>>>> obj1.append(obj2)
A B
012134056178>>>>>> obj1.append(obj2, ignore_index=True)
A B
012134256378
>>>import pandas as pd
>>> obj1 = pd.DataFrame({'key':['b','a','c'],'data1':range(3)})>>> obj2 = pd.DataFrame({'key':['a','c','b'],'data2':range(3)})>>> obj1
key data1
0 b 01 a 12
c 2>>>>>> obj2
key data2
0 a 01 c 12 b 2>>>>>> pd.merge(obj1, obj2)
key data1 data2
0 b 021 a 102 c 21
>>>import pandas as pd
>>> obj1 = pd.DataFrame({'key':['b','b','a','c','a','a','b'],'data1':range(7)})>>> obj2 = pd.DataFrame({'key':['a','b','d'],'data2':range(3)})>>>>>> obj1
key data1
0 b 01 b 12 a 23 c 34 a 45 a 56 b 6>>>>>> obj2
key data2
0 a 01 b 12 d 2>>>>>> pd.merge(obj1, obj2)
key data1 data2
0 b 011 b 112 b 613 a 204 a 405 a 50
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
【03x03】多对多连接
多对多连接是指两个 DataFrame 对象中的列的值都有重复值。
>>>import pandas as pd
>>> obj1 = pd.DataFrame({'key':['a','b','b','c'],'data1':range(4)})>>> obj2 = pd.DataFrame({'key':['a','a','b','b','c','c'],'data2':range(6)})>>> obj1
key data1
0 a 01 b 12 b 23 c 3>>>>>> obj2
key data2
0 a 01 a 12 b 23 b 34 c 45 c 5>>>>>> pd.merge(obj1, obj2)
key data1 data2
0 a 001 a 012 b 123 b 134 b 225 b 236 c 347 c 35
>>>import pandas as pd
>>> obj1 = pd.DataFrame({'key':['b','a','c'],'data1':range(3)})>>> obj2 = pd.DataFrame({'key':['a','c','b'],'data2':range(3)})>>> obj1
key data1
0 b 01 a 12 c 2>>>>>> obj2
key data2
0 a 01 c 12 b 2>>>>>> pd.merge(obj1, obj2, on='key'
)
key data1 data2
0 b 021 a 102 c 21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
如果要根据多个键进行合并,传入一个由列名组成的列表即可:
>>>import pandas as pd
>>> left = pd.DataFrame({'key1':['foo','foo','bar'],'key2':['one','two','one'],'lval':[1,2,3]})>>> right = pd.DataFrame({'key1':['foo','foo','bar','bar'],'key2':['one','one','one','two'],'rval':[4,5,6,7]})>>> left
key1 key2 lval
0 foo one 11 foo two 22 bar one 3>>>>>> right
key1 key2 rval
0 foo one 41 foo one 52 bar one 63 bar two 7>>>>>> pd.merge(left, right, on=['key1','key2'])
key1 key2 lval rval
0 foo one 141 foo one 152 bar one 36
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
如果两个对象的列名不同,就可以使用
left_on
、
right_on
参数分别进行指定:
>>>import pandas as pd
>>> obj1 = pd.DataFrame({'lkey':['b','b','a','c','a','a','b'],'data1':range(7)})>>
> obj2 = pd.DataFrame({'rkey':['a','b','d'],'data2':range(3)})>>> obj1
lkey data1
0 b 01 b 12 a 23 c 34 a 45 a 56 b 6>>>>>> obj2
rkey data2
0 a 01 b 12 d 2>>>>>> pd.merge(obj1, obj2, left_on='lkey', right_on='rkey')
lkey data1 rkey data2
0 b 0 b 11 b 1 b 12 b 6 b 13 a 2 a 04 a 4 a 05 a 5 a 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
【03x05】参数 how
在前面的示例中,结果里面 c 和 d 以及与之相关的数据消失了。默认情况下,
merge
做的是内连接(
'inner'
),结果中的键是交集。其他方式还有:
'left'
、
'right'
、
'outer'
,含义如下:
'inner'
:内连接,即使用两个对象中
都有
的键(交集);
'outer'
:外连接,即使用两个对象中
所有
的键(并集);
'left'
:左连接,即使用
左
对象中所有的键;
'right'
:右连接,即使用
右
对象中所有的键;
>>>import pandas as pd
>>> obj1 = pd.DataFrame({'key':['b','b','a','c','a','a','b'],'data1':range(7)})>>> obj2 = pd.DataFrame({'key':['a','b','d'],'data2':range(3)})>>> obj1
key data1
0 b 01 b 12 a 23 c 34 a 45 a 56 b 6>>>>>> obj2
key data2
0 a 01 b 12 d 2>>>>>>
pd.merge(obj1, obj2, on='key', how='inner')
key data1 data2
0 b 011 b 112 b 613 a 204 a 405 a 50>>>>>> pd.merge(obj1, obj2, on='key', how='outer')
key data1 data2
0 b 0.01.01 b 1.01.02 b 6.01.03 a 2.00.04 a 4.00.05 a 5.00.06 c 3.0 NaN
7 d NaN 2.0>>>>>> pd.merge(obj1, obj2, on='key', how='left')
key data1 data2
0 b 01.01 b 11.02 a 20.03 c 3 NaN
4 a 40.05 a 50.06 b 61.0>>>>>> pd.merge(obj1, obj2, on='key', how='right')
key data1 data2
0 b 0.011 b 1.012 b 6.013 a 2.004 a 4.005 a 5.006 d NaN 2
>>>import pandas as pd
>>> left = pd.DataFrame({'key1':['foo','foo','bar'],'key2':['one','two','one'],'lval':[1,2,3]})>>> right = pd.DataFrame({'key1':['foo','foo','bar','bar'],'key2':['one','one','one','two'],'rval':[4,5,6,7]})>>> left
key1 key2 lval
0 foo one 11 foo two 22 bar one 3>>>>>> right
key1 key2 rval
0 foo one 41 foo one 52 bar one 63 bar two 7>>>>>> pd.merge(left, right, on='key1')
key1 key2_x lval key2_y rval
0 foo one 1 one 41 foo one 1 one 52 foo two 2 one 43 foo two 2 one 54 bar one 3 one 65 bar one 3 two 7>>>>>> pd.merge(left, right, on='key1', suffixes=('_left','_right'))
key1 key2_left lval key2_right rval
0 foo one 1 one 41 foo one 1 one 52 foo two 2 one 43 foo two 2 one 54 bar one 3 one 65 bar one 3 two 7
在以下示例中,按照 left 的 key 列进行连接,而 right 对象的连接键位于其索引中,因此要指定
right_index=True
:
>>>import pandas as pd
>>> left = pd.DataFrame({'key':['a','b','a','a','b','c'],'value':range(6)})>>> right = pd.DataFrame({'group_val':[3.5,7]}, index=['a','b'])>>> left
key value
0 a 01 b 12 a 23 a 34 b 45 c 5>>>>>> right
group_val
a 3.5
b 7.0>>>>>> pd.merge(left, right, left_on='key', right_index=True)
key value group_val
0 a 03.52 a 23.53 a 33.51 b 17.04 b 47.0