Py学习  »  Python

那些有趣\/用的 Python 库

Python网络爬虫与数据挖掘 • 4 年前 • 299 次点击  

来源:苏生不惑    

segmentfault.com/a/1190000010103386


图片处理


pip install pillow
from PIL import Image
import numpy as np

a = np.array(Image.open( test.jpg ))
b = [255,255,255] - a
im = Image.fromarray(b.astype( uint8 ))
im.save( new.jpg )


youtube-dl下载国外视频


pip install youtube-dl #直接安装youtube-dl
pip install -U youtube-dl #安装youtube-dl并更新
youtube-dl "http://www.youtube.com/watch?v=-wNyEUrxzFU"


查看对象的全部属性和方法


pip install pdir2

>>> import pdir,requests

>>> pdir(requests)

module attribute:

__cached__, __file__, __loader__, __name__, __package__, __path__, __spec__

other:

__author__, __build__, __builtins__, __copyright__, __license__, __title__,

__version__, _internal_utils, adapters, api, auth, certs, codes, compat, cookies, exceptions, hooks, logging, models, packages, pyopenssl, sessions, status_codes, structures, utils, warnings

special attribute:

__doc__
class:

NullHandler: This handler does nothing. It s intended to be used to avoid the

PreparedRequest: The fully mutable :class:`PreparedRequest ` object,

Request: A user-created :class:`Request ` object.

Response: The :class:`Response ` object, which contains a

Session: A Requests session.

exception:

ConnectTimeout: The request timed out while trying to connect to the remote server.

ConnectionError: A Connection error occurred.

DependencyWarning: Warned when an attempt is made to import a module with missing optional

FileModeWarning: A file was opened in text mode, but Requests determined itsbinary length.

HTTPError: An HTTP error occurred.

ReadTimeout: The server did not send any data in the allotted amount of time.

RequestException: There was an ambiguous exception that occurred while handling your

Timeout: The request timed out.

TooManyRedirects: Too many redirects.

URLRequired: A valid URL is required to make a request.

function:

delete: Sends a DELETE request.

get: Sends a GET request.

head: Sends a HEAD request.

options: Sends a OPTIONS request.

patch: Sends a PATCH request.

post: Sends a POST request.

put: Sends a PUT request.

request: Constructs and sends a :class:`Request `.

session: Returns a :class:`Session` for context-management.


Python 玩转网易云音乐


pip install ncmbot

import ncmbot

#登录

bot = ncmbot.login(phone= xxx , password= yyy )

bot.content # bot.json()

#获取用户歌单

ncmbot.user_play_list(uid= 36554272 )


下载视频字幕


pip install getsub



Python 财经数据接口包





    
pip install tushare

import tushare as ts

#一次性获取最近一个日交易日所有股票的交易数据

ts.get_today_all()



代码,名称,涨跌幅,现价,开盘价,最高价,最低价,最日收盘价,成交量,换手率

code name changepercent trade open high low settlement

0 002738 中矿资源 10.023 19.32 19.32 19.32 19.32 17.56

1 300410 正业科技 10.022 25.03 25.03 25.03 25.03 22.75

2 002736 国信证券 10.013 16.37 16.37 16.37 16.37 14.88

3 300412 迦南科技 10.010 31.54 31.54 31.54 31.54 28.67

4 300411 金盾股份 10.007 29.68 29.68 29.68 29.68 26.98

5 603636 南威软件 10.006 38.15 38.15 38.15 38.15 34.68

6 002664 信质电机 10.004 30.68 29.00 30.68 28.30 27.89

7 300367 东方网力 10.004 86.76 78.00 86.76 77.87 78.87

8 601299 中国北车 10.000 11.44 11.44 11.44 11.29 10.40

9 601880 大连港 10.000 5.72 5.34 5.72 5.22 5.20

10 000856 冀东装备 10.000 8.91 8.18 8.91 8.18 8.10


开源漏洞靶场


# 安装pip

curl -s https://bootstrap.pypa.io/get-pip.py | python3

# 安装docker

apt-get update && apt-get install docker.io

# 启动docker服务

service docker start

# 安装compose

pip install docker-compose

# 拉取项目

git clone git@github.com:phith0n/vulhub.git

cd vulhub


# 进入某一个漏洞/环境的目录

cd nginx_php5_mysql

# 自动化编译环境

docker-compose build

# 启动整个环境

docker-compose up -d

#测试完成后,删除整个环境

docker-compose down


北京实时公交


pip install -r requirements.txt 安装依赖

python manage.py build_cache 获取离线数据,建立本地缓存

#项目自带了一个终端中的查询工具作为例子,运行:python manage.py cli

>>> from beijing_bus import BeijingBus

>>> lines = BeijingBus.get_all_lines()

>>> lines

[122(农业展览馆-华纺易城公交场站)>, 101(广顺南大街北口-蓝龙家园)>, ...]

>>> lines = BeijingBus.search_lines( 847 )

>>> lines

[847(马甸桥西-雷庄村)>, 847(雷庄村-马甸桥西)>]

>>> line = lines[0]

>>> print line.id, line.name

541 847(马甸桥西-雷庄村)

>>> line.stations

[, , , ...]

>>> station = line.stations[0]

>>> print station.name, station.lat, station.lon

马甸桥西 39.967721 116.372921

>>> line.get_realtime_data(1) # 参数为站点的序号,从1开始

[

{

id : 公交车id,

lat : 公交车的位置,

lon : 公交车位置,

next_station_name : 下一站的名字,

next_station_num : 下一站的序号,

next_station_distance : 离下一站的距离,

next_station_arriving_time : 预计到达下一站的时间,

station_distance : 离本站的距离,

station_arriving_time : 预计到达本站的时间,

},

...

]


文章提取器


git clone https://github.com/grangier/python-goose.git

cd python-goose

pip install -r requirements.txt

python setup.py install



>>> from goose import Goose

>>> from goose.text import StopWordsChinese

>>> url = http://www.bbc.co.uk/zhongwen/simp/chinese_news/2012/12/121210_hongkong_politics.shtml

>>> g = Goose({ stopwords_class : StopWordsChinese})

>>> article = g.extract(url=url)

>>> print article.cleaned_text[:150]

香港行政长官梁振英在各方压力下就其大宅的违章建筑(僭建)问题到立法会接受质询,并向香港民众道歉。

梁振英在星期二(1210日)的答问大会开始之际在其演说中道歉,但强调他在违章建筑问题上没有隐瞒的意图和动机。

一些亲北京阵营议员欢迎梁振英道歉,且认为应能获得香港民众接受,但这些议员也质问梁振英有


Python 艺术二维码生成器


pip  install  MyQR

myqr https://github.com

myqr https://github.com -v 10 -l Q



伪装浏览器身份


pip install fake-useragent

from fake_useragent import UserAgent

ua = UserAgent()



ua.ie

# Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US);

ua.msie

# Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)

ua[ Internet Explorer ]

# Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)

ua.opera

# Opera/9.80 (X11; Linux i686; U; ru) Presto/2.8.131 Version/11.11

ua.chrome

# Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2


美化 curl

pip install httpstat

httpstat httpbin.org/get



python shell


pip install sh

from sh import ifconfig

print ifconfig("eth0")


处理中文文本内容


pip install -U textblob#英文文本的情感分析

pip install snownlp#中文文本的情感分析

from snownlp import SnowNLP

text = "I am happy today. I feel sad today."

from textblob import TextBlob

blob = TextBlob(text)

TextBlob("I am happy today. I feel sad today.")

blob.sentiment

Sentiment(polarity=0.15000000000000002, subjectivity=1.0)





s = SnowNLP(u 这个东西真心很赞 )



s.words # [u 这个 , u 东西 , u 真心 ,

# u 很 , u 赞 ]



s.tags # [(u 这个 , u r ), (u 东西 , u n ),

# (u 真心 , u d ), (u 很 , u d ),

# (u 赞 , u Vg )]



s.sentiments # 0.9769663402895832 positive的概率



s.pinyin # [u zhe , u ge , u dong , u xi ,

# u zhen , u xin , u hen , u zan ]



s = SnowNLP(u 「繁體字」「繁體中文」的叫法在臺灣亦很常見。)



s.han # u 「繁体字」「繁体中文」的叫法

# 在台湾亦很常见。


抓取发放代理


pip install -U getproxy

➜ ~ getproxy --help

Usage: getproxy [OPTIONS]



Options:

--in-proxy TEXT Input proxy file

--out-proxy TEXT Output proxy file

--help Show this message and exit.


  • --in-proxy 可选参数,待验证的 proxies 列表文件

  • --out-proxy 可选参数,输出已验证的 proxies 列表文件,如果为空,则直接输出到终端

  • --in-proxy 文件格式和 --out-proxy 文件格式一致


zhihu api


pip install git+git://github.com/lzjun567/zhihu-api --upgrade

from zhihu import Zhihu

zhihu = Zhihu()

zhihu.user(user_slug="xiaoxiaodouzi")



{ avatar_url_template : https://pic1.zhimg.com/v2-ca13758626bd7367febde704c66249ec_{size}.jpg ,

badge : [],

name : 我是小号 ,

headline : 程序员 ,

gender : -1,

user_type : people ,

is_advertiser : False,

avatar_url : https://pic1.zhimg.com/v2-ca13758626bd7367febde704c66249ec_is.jpg ,

url : http://www.zhihu.com/api/v4/people/1da75b85900e00adb072e91c56fd9149 , type : people ,

url_token : xiaoxiaodouzi ,

id : 1da75b85900e00adb072e91c56fd9149 ,

is_org : False}


Python 密码泄露查询模块

pip install leakPasswd

import leakPasswd

leakPasswd.findBreach( taobao )



解析 nginx 访问日志并格式化输出


pip install ngxtop

$ ngxtop

running for 411 seconds, 64332 records processed: 156.60 req/sec



Summary:

| count | avg_bytes_sent | 2xx | 3xx | 4xx | 5xx |

|---------+------------------+-------+-------+-------+-------|

| 64332 | 2775.251 | 61262 | 2994 | 71 | 5 |



Detailed:

| request_path | count | avg_bytes_sent | 2xx | 3xx | 4xx | 5xx |

|------------------------------------------+---------+------------------+-------+-------+-------+-------|

| /abc/xyz/xxxx | 20946 | 434.693 | 20935 | 0 | 11 | 0 |

| /xxxxx.json | 5633 | 1483.723 | 5633 | 0 | 0 | 0 |

| /xxxxx/xxx/xxxxxxxxxxxxx | 3629 | 6835.499 | 3626 | 0 | 3 | 0 |

| /xxxxx/xxx/xxxxxxxx | 3627 | 15971.885 | 3623 | 0 | 4 | 0 |

| /xxxxx/xxx/xxxxxxx | 3624 | 7830.236 | 3621 | 0 | 3 | 0 |

| /static/js/minified/utils.min.js | 3031 | 1781.155 | 2104 | 927 | 0 | 0 |

| /static/js/minified/xxxxxxx.min.v1.js | 2889 | 2210.235 | 2068 | 821 | 0 | 0 |

| /static/tracking/js/xxxxxxxx.js | 2594 | 1325.681 | 1927 | 667 | 0 | 0 |

| /xxxxx/xxx.html | 2521 | 573.597 | 2520 | 0 | 1 | 0 |

| /xxxxx/xxxx.json | 1840 | 800.542 | 1839 | 0 | 1 | 0 |


火车余票查询


pip install iquery

Usage:

iquery (-c|彩票)

iquery (-m|电影)

iquery -p

iquery -l song [singer]

iquery -p

iquery []

iquery [-dgktz]



Arguments:

from 出发站

to 到达站

date 查询日期



city 查询城市

show 演出的类型

days 查询近(几)天内的演出, 若省略, 默认15



city 城市名,加在-p后查询该城市所有莆田医院

hospital 医院名,加在city后检查该医院是否是莆田系





Options:

-h, --help 显示该帮助菜单.

-dgktz 动车,高铁,快速,特快,直达

-m 热映电影查询

-p 莆田系医院查询

-l 歌词查询

-c 彩票查询



Show:

演唱会 音乐会 音乐剧 歌舞剧 儿童剧 话剧

歌剧 比赛 舞蹈 戏曲 相声 杂技 马戏 魔术


学习Python就关注:datanami



近期文章:

去了一趟字节跳动,被怼了!

如何写出让同事无法维护的代码?

一键脱衣AI原理解密:开源算法,英伟达伯克利研究,不高深也不神秘

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/37118
 
299 次点击