社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  Python

我用 Python 写了一个 PDF 转换器!

程序员大咖 • 2 年前 • 199 次点击  
👇👇关注后回复 “进群” ,拉你进程序员交流群👇👇
       

来自:CSDN,作者:不正经的kimol君

链接:https://blog.csdn.net/kimol_justdo/article/details/109267805



前言


一、思路分析

https://app.xunjiepdf.com

二、我的代码

导入相关库:

import time
import requests

定义PDF2Word类:

class PDF2Word():
    def __init__(self):
        self.machineid = 'ccc052ee5200088b92342303c4ea9399'
        self.token = ''
        self.guid = ''
        self.keytag = ''
    
    def produceToken(self):
        url = 'https://app.xunjiepdf.com/api/producetoken'
        headers = {
                'User-Agent''Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0',
                'Accept''application/json, text/javascript, */*; q=0.01',
                'Accept-Language''zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
                'Content-Type' 'application/x-www-form-urlencoded; charset=UTF-8',
                'X-Requested-With''XMLHttpRequest',
                'Origin''https://app.xunjiepdf.com',
                'Connection''keep-alive',
                'Referer''https://app.xunjiepdf.com/pdf2word/',}
        data = {'machineid':self.machineid}
        res = requests.post(url,headers=headers,data=data)
        res_json = res.json()
        if res_json['code'] == 10000:
            self.token = res_json['token']
            self.guid = res_json['guid']
            print('成功获取token')
            return True
        else:
            return False
    
    def uploadPDF(self,filepath):
        filename = filepath.split('/')[-1]
        files = {'file': open(filepath,'rb')}
        url = 'https://app.xunjiepdf.com/api/Upload'
        headers = {
                'User-Agent''Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0',
                'Accept''*/*',
                'Accept-Language''zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
                'Content-Type''application/pdf',
                'Origin''https://app.xunjiepdf.com',
                'Connection''keep-alive',
                'Referer''https://app.xunjiepdf.com/pdf2word/',}
        params = (
                ('tasktype''pdf2word'),
                ('phonenumber'''),
                ('loginkey'''),
                ('machineid', self.machineid),
                ('token', self.token),
                ('limitsize''2048'),
                ('pdfname', filename),
                ('queuekey', self.guid),
                ('uploadtime'''),
                ('filecount''1'),
                ('fileindex''1'),
                ('pagerange''all'),
                ('picturequality'''),
                ('outputfileextension''docx'),
                ('picturerotate''0,undefined'),
                ('filesequence''0,undefined'),
                ('filepwd'''),
                ('iconsize'''),
                ('picturetoonepdf'''),
                ('isshare''0'),
                ('softname''pdfonlineconverter'),
                ('softversion''V5.0'),
                ('validpagescount''20'),
                ('limituse''1'),
                ('filespwdlist'''),
                ('fileCountwater''1'),
                ('languagefrom'''),
                ('languageto'''),
                ('cadverchose'''),
                ('pictureforecolor'''),
                ('picturebackcolor'''),
                ('id''WU_FILE_1'),
                ('name', filename),
                ('type''application/pdf'),
                ('lastModifiedDate'''),
                ('size' ''),)
        res= requests.post(url,headers=headers,params=params,files=files)
        res_json = res.json()
        if res_json['message'] == '上传成功':
            self.keytag = res_json['keytag']
            print('成功上传PDF')
            return True
        else:
            return False
        
    def progress(self):
        url = 'https://app.xunjiepdf.com/api/Progress'
        headers = {
                'User-Agent''Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0',
                'Accept''text/plain, */*; q=0.01',
                'Accept-Language''zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
                'Content-Type''application/x-www-form-urlencoded; charset=UTF-8',
                'X-Requested-With''XMLHttpRequest',
                'Origin''https://app.xunjiepdf.com',
                'Connection''keep-alive',
                'Referer''https://app.xunjiepdf.com/pdf2word/',}
        data = {
              'tasktag': self.keytag,
              'phonenumber''',
              'loginkey''',
              'limituse''1'}
        res= requests.post(url,headers=headers,data=data)
        res_json = res.json()
        if res_json['message'] == '处理成功':
            print('PDF处理完成')
            return True
        else:
            print('PDF处理中')
            return False
        
    def downloadWord(self,output):
        url = 'https://app.xunjiepdf.com/download/fileid/%s'%self.keytag
        res = requests.get(url)
        with open(output,'wb'as f:
            f.write(res.content)
            print('PDF下载成功("%s")'%output)
            
    def convertPDF(self,filepath,outpath):
        filename = filepath.split('/')[-1]
        filename = filename.split('.')[0]+'.docx'
        self.produceToken()
        self.uploadPDF(filepath)
        while True:
            res = self.progress()
            if res == True:
                break
            time.sleep(1)
        self.downloadWord(outpath+filename)

执行主函数:

if __name__=='__main__':    
    pdf2word = PDF2Word()
    pdf2word.convertPDF('001.pdf','')

-End-

最近有一些小伙伴,让我帮忙找一些 面试题 资料,于是我翻遍了收藏的 5T 资料后,汇总整理出来,可以说是程序员面试必备!所有资料都整理到网盘了,欢迎下载!

点击👆卡片,关注后回复【面试题】即可获取

在看点这里好文分享给更多人↓↓

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/151009