从Python2迁移到Python3实战(一) - pyupgrade

前言

从这篇开始我会不定期写一些在实际工作中把项目代码从Python2.7迁移到最新的Python 3.7的经验。

这篇先介绍pyupgrade - 一个修改代码中Python 2语法到最新版本写法的工具，同时它还可以作为pre-commit钩子，可以在代码提交或者push时拒绝引入旧的用法。

为什么需要这么一个工具呢？3个理由：

替换代码中旧版本Python的用法。例如 '%s %s'%(a,b)这种百分号的字符串格式化写法
替换成Python 3的新语法。例如在Python 3中 super不再需要传递self、字符串格式化在Python 3.6及以后可以直接用f-strings
迁移后不再需要支持Python2，所以应该去掉six模块的相关使用，直接用Python3的代码写才是正途。

我日常维护的项目中Python代码都在几千到上百万行级别，可以设想一下，如果人工来做代码替换将是一个极为浩大的工程。

在现有的Python世界，过去只有lib2to3模块和其衍生品（之后我会专门讲），但是效果有限，pyupgrade是一个很好的补充，我们来了解一下它都实现了那些功能

集合

set(())              # set()
set([])              # set()



    
set((1,))            # {1}
set((1, 2))          # {1, 2}
set([1, 2])          # {1, 2}
set(x for x in y)    # {x for x in y}
set([x for x in y])  # {x for x in y}

左面是替换前的代码，后面井号后的注释部分是替换后的效果。set相关的部分算是统一用法，并不是左面的写法在Python3已经不可用。

字典解析

dict((a, b) for a, b in y)    # {a: b for a, b in y}
dict([(a, b) for a, b in y])  # {a: b for a, b in y}

同上，属于统一用法

Python2.7+ Format说明符

'{0} {1}'.format(1, 2


    
)    # '{} {}'.format(1, 2)
'{0}' '{1}'.format(1, 2)  # '{}' '{}'.format(1, 2)

从Python2.7开始，不再强制指定索引

使用str.format替代printf风格的字符串format写法

'%s %s' % (a, b)                  # '{} {}'.format(a, b)
'%r %2f' % (a, b)                 # '{!r} {:2f}'.format(a, b)
'%(a)s %(b)s' %


    
 {'a': 1, 'b': 2}  # '{a} {b}'.format(a=1, b=2)

后面的是Python2.7推荐的写法。但是可以传入 --keep-percent-format忽略这类修改。

Unicode literals

u'foo'      # 'foo'
u"foo"      # 'foo'
u'''foo'''  # '''foo'''

在Python3中，u'foo'其实已经是字符串的'foo'，默认是不会修改这个类型数据的，除非传入 --py3-plus或者 --py36-plus:

❯ cat unicode_literals.py
u'foo'      # 'foo'
u"foo"      # 'foo'
u'''foo'''  # '''foo'''

❯ pyupgrade --py36-plus unicode_literals.py



    
Rewriting unicode_literals.py

❯ cat unicode_literals.py
'foo'      # 'foo'
"foo"      # 'foo'
'''foo'''  # '''foo'''

Invalid escape sequences

现在flake8已经会检查出这个类型错误(W605)：

# strings with only invalid sequences become raw strings
'\d'    # r'\d'
# strings with mixed valid / invalid sequences get escaped
'\n\d'


    
  # '\n\\d'
# `ur` is not a valid string prefix in python3
u'\d'   # u'\\d'

❯ cat escape_seq.py
'\d'    # r'\d'

❯ flake8 escape_seq.py
escape_seq.py:1:2: W605 invalid escape sequence '\d'

❯ pyupgrade escape_seq.py



    
Rewriting escape_seq.py

❯ cat escape_seq.py
r'\d'    # r'\d'

`is` / `isnot`

is/ isnot从Python3.8开始会抛出SyntaxWarning错误，应该使用 ==/ !=替代:

❯ python
Python


    
 3.8.0a4+ (heads/master:289f1f80ee, May  9 2019, 07:16:38)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license"


    
 for more information.
>>> 1 is 1
<stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
True
>>> 1 is not 1
<stdin>:


    
1: SyntaxWarning: "is not" with a literal. Did you mean "!="?
False
>>>

pyupgrade会做如下替换：

x is 5      # x == 5
x is not 5  # x != 5
x is 'foo'  # x == foo

`ur`字符串文字

ur'...'这种用法在python3已经不可用了：

ur'foo'         # u'foo'
ur'\s'          # u'\\s'
# unicode escapes are left alone
ur'\u2603'      # u'\u2603'
ur'\U0001f643'  # u'\U0001f643'

数字的L后缀

在Python2数字后面会有L后缀，在Python3不再支持了：

5L                            


    
# 5
5l                            # 5
123456789123456789123456789L  # 123456789123456789123456789

八进制数字

这个最常见的用法是修改文件权限，在Python2中可以直接使用0755，但是Python3中这样是错误的：

# Python 2
In : import os

In : !touch 1.txt

In : os.chmod('1.txt', 0755)




    
In : ll 1.txt
-rwxr-xr-x 1 dongwm 0 May  9 07:26 1.txt*  # 755权限正常

# Python 3
In : os.chmod('1.txt', 0644)
  File "",


    
 line 1
    os.chmod('1.txt', 0644)
                         ^
SyntaxError: invalid token


In : os.chmod('1.txt', 0o644)

In : ll 1.txt
-rw-r--r--


    
 1 dongwm 0 May  9 07:26 1.txt

pyupgrade会帮助修复这个问题：

0755  # 0o755
05    # 5

super()

class C(Base):
    def f(self):
        super(C, self


    
).f()   # super().f()

在Python3中，使用super不再需要手动传递self，传入 --py3-plus或者 --py36-plus会修复这个问题。

新式类

class C(object): pass     # class C: pass
class C(B, object): pass  # class C(B): pass

Python3 中只有新式类，传入 --py3-plus或者 --py36-plus会修复这个问题。

移除six相关兼容代码

当完全迁移到Python3之后，就没必要兼容Python2了，可以传入 --py3-plus或者 --py36-plus去掉six相关代码：

six.text_type             # str
six.binary_type           # bytes
six


    
.class_types           # (type,)
six.string_types          # (str,)
six.integer_types         # (int,)
six.unichr                # chr
six.iterbytes             # iter
six.print_(...)           # print(...)
six.exec_(c, g, l)        # exec(c, g, l)



    
six.advance_iterator(it)  # next(it)
six.next(it)              # next(it)
six.callable(x)           # callable(x)

from six import text_type
text_type                 # str

@six.python_2_unicode_compatible  # decorator is removed
class C:



    
    def __str__(self):
        return u'C()'

class C(six.Iterator): pass              # class C: pass

class C(six.with_metaclass(M, B)): pass  # class C(B, metaclass=M): pass

isinstance(..., six.class_types)


    
    # isinstance(..., type)
issubclass(..., six.integer_types)  # issubclass(..., int)
isinstance(..., six.string_types)   # isinstance(..., str)

six.b('...')                            # b'...'
six.u('...')                            # '...'
six.byte2int(bs)                        # bs[0]



    
six.indexbytes(bs, i)                   # bs[i]
six.iteritems(dct)                      # dct.items()
six.iterkeys(dct)                       # dct.keys()
six.itervalues(dct)                     # dct.values()
six.viewitems(dct)                      # dct.items()



    
six.viewkeys(dct)                       # dct.keys()
six.viewvalues(dct)                     # dct.values()
six.create_unbound_method(fn, cls)      # fn
six.get_unbound_method(meth)            # meth
six.get_method_function(meth)           # meth.__func__



    
six.get_method_self(meth)               # meth.__self__
six.get_function_closure(fn)            # fn.__closure__
six.get_function_code(fn)               # fn.__code__
six.get_function_defaults(fn)           # fn.__defaults__
six.get_function_globals(fn)            # fn.__globals__



    
six.assertCountEqual(self, a1, a2)      # self.assertCountEqual(a1, a2)
six.assertRaisesRegex(self, e, r, fn)   # self.assertRaisesRegex(e, r, fn)
six.assertRegex(self, s, r)             # self.assertRegex(s, r)

目前还有 six.add_metaclass这个点没有实现，其他的都可以了~

f-strings

这是我最喜欢的一个功能，现在迁移到Python3都会迁到Python3.6+，所以可以直接使用 -- py36-plus参数，字符串格式化不需要用str.format，而是直接用f-strings：

'{foo} {bar}'.format(foo=foo, bar=bar)  # f'{foo} {bar}'
'{} {}'.format(foo, bar)                # f'{foo} {bar}'
'{} {}'.format(foo.bar, baz.womp}       # f'{foo.bar} {baz.womp}'

后记

项目地址: https://github.com/asottile/pyupgrade

我已经在酱厂最大的几个项目之一应用了pyupgrade，已经达到生产环境使用的标准，请放心使用~