既然表已经创建成功,我们就可以开始插入数据了,先从tushare.pro上面获取沪深300指数日线行情数据,用INSERT INTO这个SQL语句插入。
import psycopg2
import pandas as pd
import tushare as ts
con = psycopg2.connect(database="postgres"
, user="postgres", password="", host="127.0.0.1", port="5432")
print("Database opened successfully")
cur = con.cursor()
pro = ts.pro_api()
df = pro.index_daily(ts_code='399300.SZ', start_date='20190501', end_date='20190531')
ts_code = df['ts_code'].tolist()
trade_date = df['trade_date'].tolist()
open_p = df['open'].tolist()
high_p = df['high'].tolist()
low_p = df['low'].tolist()
close_p = df['close'].tolist()
pre_close = df['pre_close'].tolist()
pct_chg = df['pct_chg'].tolist()
count = 0
for i in range(len(ts_code)):
cur.execute("""
INSERT INTO 沪深300指数日线行情 (ts_code, trade_date, open_p, high_p, low_p, close_p, pre_close, pct_chg)
VALUES( %s, %s, %s, %s, %s, %s, %s, %s);""",
(ts_code[i],
trade_date[i],
open_p[i],
high_p[i],
low_p[i],
close_p[i],
pre_close[i],
pct_chg[i]))
con.commit()
print("已插入{0}行,共有{1}行".format(count, len(ts_code)))
count += 1
同理,将tushare.pro里面的沪深股票前复权通用行情数据插入表“沪深股票qfq日线行情”,示例中只插入两只股票,平安银行'000001.SZ' 和万科A'000002.SZ'。
这里我们介绍另一种存储数据的方法,直接用Pandas自带的df.to_sql(),将获取的DataFrame一次性插入到数据库中,比上面介绍的先建表,再一行行插入的方法要简洁很多。
from sqlalchemy import create_engine
import pandas as pd
import tushare as ts
ts.set_token('your token')
engine = create_engine('postgresql://postgres:password@localhost:5432/postgres')
print('Database opened successfully')
pro = ts.pro_api()
code_list = ['000001.SZ', '000002.SZ']
for i in code_list:
print(i)
df = ts.pro_bar(ts_code=i, adj='qfq', start_date='20190501', end_date='20190531')
df.to_sql(name='沪深股票qfq日线行情', con=engine, index=False, if_exists='append')
值得注意的一点是,这种方法在数据量小的时候一般不会出问题,但当数据量很大时,可能会因服务器无法响应而报错。这时,需要设置参数值chunksize,限制每次插入的行数。更多有关参数的说明,可到官方文档查看【1】。
有了数据,我们就可以用SQL对数据库进行一系列的操作了。
我们可以用Pandas自带的.read_sql()方法获取数据,直接返回的是DataFrame格式,非常方便,详细的参数解析请查看官方文档【2】。SQL的查询功能是很强大的,下面介绍常用的一些筛选条件。
选取某张表的特定几列:
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('postgresql://postgres:password@localhost:5432/postgres')
df_index = pd.read_sql("SELECT ts_code, trade_date, close_p FROM 沪深300指数日线行情;", con=engine)
print(df_index.head())
ts_code trade_date close_p
0 399300.SZ 2019-05-31 3629.7893
1 399300.SZ 2019-05-30 3641.1833
2 399300.SZ 2019-05-29 3663.9090
3 399300.SZ 2019-05-28
3672.2605
4 399300.SZ 2019-05-27 3637.1971
用DISTINCT选取唯一值:
df = pd.read_sql("SELECT DISTINCT ts_code FROM 沪深股票qfq日线行情;", con=engine)
print(df)
ts_code
0 000001.SZ
1 000002.SZ
用COUNT计数:
df = pd.read_sql("SELECT COUNT(DISTINCT ts_code) FROM 沪深股票qfq日线行情;", con=engine)
print(df)
count
0 2
用WHERE语句筛选数值:
df = pd.read_sql("SELECT * FROM 沪深股票qfq日线行情 WHERE trade_date = '20190528';", con=engine)
print(df)
ts_code trade_date open_p ... close_p pre_close pct_chg
0 000001.SZ 2019-05-28 12.31 ... 12.49 12.37 0.97
1 000002.SZ 2019-05-28 27.00 ... 27.62 27.00 2.30
WHERE语句搭配AND和OR一起使用:
df = pd.read_sql("SELECT ts_code, trade_date FROM 沪深股票qfq日线行情 WHERE (trade_date '20190510' OR trade_date > '20190520') AND pct_chg > 1;", con=engine)
print(df)
ts_code trade_date
0 000001.SZ 2019-05-21
1 000002.SZ 2019-05-28
2 000002.SZ 2019-05-07
和WHERE语句类似,BETWEEN也可以搭配AND和OR一起使用:
df = pd.read_sql("SELECT ts_code, trade_date FROM 沪深股票qfq日线行情 WHERE trade_date BETWEEN '20190510' AND '20190520' AND pct_chg > 1;", con=engine)
print(df)
ts_code trade_date
0 000001.SZ 2019-05-15
1 000001.SZ 2019-05-14
2 000001.SZ 2019-05-10
3 000002.SZ 2019-05-15
4 000002.SZ 2019-05-10
WHERE和IN的组合,可以简化WHERE结合多个OR进行筛选的代码:
df = pd.read_sql("SELECT ts_code, trade_date FROM 沪深股票qfq日线行情 WHERE trade_date IN ('20190510', '20190520', '20190527');", con=engine)
print(df)
ts_code trade_date
0 000001.SZ 2019-05-27
1 000001.SZ 2019-05-20
2 000001.SZ 2019-05-10
3 000002.SZ 2019-05-27
4 000002.SZ 2019-05-20
5 000002.SZ 2019-05-10
NULL的意思是空值,IS NULL代表是空值,IS NOT NULL代表不是空值:
df = pd.read_sql("SELECT COUNT(*) FROM 沪深股票qfq日线行情 WHERE close_p IS NULL ;", con=engine)
print(df)
count
0 0
可以用聚合函数对数据做一些计算,如平均值AVG(),最大值MAX(),求和SUM():
df = pd.read_sql("SELECT AVG(close_p) FROM 沪深300指数日线行情;", con=engine)
print(df)
avg
0 3659.63762
聚合函数也可以和WHERE语句结合进行筛选:
df = pd.read_sql("SELECT AVG(close_p) FROM 沪深300指数日线行情 WHERE trade_date > '20190515';", con=engine)
print(df)
avg
0 3645.740858
用AS为新列命名:
df = pd.read_sql("""SELECT MAX(close_p) AS max_close_p,
MAX(open_p) AS max_open_p FROM 沪深300指数日线行情;""", con=engine)
print(df)
max_close_p max_open_p
0 3743.9635 3775.0765
ORDER BY排序,默认为升序,降序需要在末尾加上DESC:
df = pd.read_sql("""SELECT ts_code, trade_date FROM 沪深300指数日线行情 ORDER BY trade_date;""", con=engine)
print(df)
df = pd.read_sql("""SELECT ts_code, trade_date FROM 沪深300指数日线行情 ORDER BY trade_date DESC;""", con=engine)
print(df)
ORDER BY也可以根据多个列进行排序:
df = pd.read_sql("""SELECT trade_date, ts_code FROM 沪深股票qfq日线行情 ORDER BY trade_date, ts_code;""", con=engine)
print(df)
trade_date ts_code
0 2019-05-06 000001.SZ
1 2019-05-06 000002.SZ
2 2019-05-07 000001.SZ
3 2019-05-07 000002.SZ
4
2019-05-08 000001.SZ
GROUP BY进行分组,并结合聚合函数分组计算数据:
df = pd.read_sql("""SELECT ts_code, COUNT(*) FROM 沪深股票qfq日线行情 GROUP BY ts_code;""", con=engine)
print(df)
ts_code count
0 000001.SZ 20
1 000002.SZ 20
如果要在分组GROUP BY的基础上再增加聚合函数筛选条件,可用HAVING:
df = pd.read_sql("""SELECT ts_code FROM 沪深股票qfq日线行情 GROUP BY ts_code HAVING COUNT(*) > 15 ;""", con=engine)
print(df)
ts_code
0 000001.SZ
1 000002.SZ
LIMIT限制取出的行数:
df = pd.read_sql("""SELECT * FROM 沪深股票qfq日线行情 LIMIT 3;""", con=engine)
print(df)
ts_code trade_date open_p ... close_p pre_close pct_chg
0 000001.SZ 2019-05-31 12.16 ... 12.18
12.22 -0.33
1 000001.SZ 2019-05-30 12.32 ... 12.22 12.40 -1.45
2 000001.SZ 2019-05-29 12.36 ... 12.40 12.49 -0.72
03
总结
本文介绍了数据库系统的优势,如何用Python连接数据库并用SQL进行后续的查询操作。
SQL是非常强大的查询语言,在使用Python对数据进行分析之前,可以通过筛选精准地获取想要的数据。
Python和SQL的组合能够大大提升数据分析的效率和质量,希望大家可以好好学习和利用起来!
http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html【1】http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html#pandas.read_sql【2】
https://www.datacamp.com/courses/intro-to-sql-for-data-science【Datacamp】
https://stackabuse.com/working-with-postgresql-in-python/【Working with PostgreSQL in Python】
利用Python创建不同资产组合的基本框架
利用Python玩转PDF,简单实用
利用python进行蒙特卡罗模拟
利用Python实现摘要自动提取,完美瘦身只需一行代码
利用Python搞定期货数据,获得免费API
基于Python的交互式可视化工具