PEP8--Python编码风格参考(一)

2017年08月02日 原创
关键词: python
摘要 对PEP8全文的翻译。(更新中)

Introduction

This document gives coding conventions for the Python code comprising the standard library in the main Python distribution. Please see the companion informational PEP describing style guidelines for the C code in the C implementation of Python [1] .

This document and PEP 257 (Docstring Conventions) were adapted from Guido's original Python Style Guide essay, with some additions from Barry's style guide [2] .

This style guide evolves over time as additional conventions are identified and past conventions are rendered obsolete by changes in the language itself.

Many projects have their own coding style guidelines. In the event of any conflicts, such project-specific guides take precedence for that project.

简介

本文结合Python的主流发行版本中的标准库,给出了Python代码的编码规范。同时也请参阅给出了C语言编码规范的《C implementation of Python》。

本文和PEP 257(文档字符串规范) 都是根据Guido原创的论文《Python Style Guide》结合一些《Barry's style guide》编写而成的。

该规范迭代了数次,随着Python语言的变化,增加了新的规范,废除了过时的规范。

许多项目拥有他们自己的编码规范。如果本文的规范和其他项目中的规范发生冲突,其他项目中的规范应当优先考虑。

A Foolish Consistency is the Hobgoblin of Little Minds

One of Guido's key insights is that code is read much more often than it is written. The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of Python code. As PEP 20 says, "Readability counts".

A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important.

However, know when to be inconsistent -- sometimes style guide recommendations just aren't applicable. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask!

In particular: do not break backwards compatibility just to comply with this PEP!

Some other good reasons to ignore a particular guideline:

  1. When applying the guideline would make the code less readable, even for someone who is used to reading code that follows this PEP.
  2. To be consistent with surrounding code that also breaks it (maybe for historic reasons) -- although this is also an opportunity to clean up someone else's mess (in true XP style).
  3. Because the code in question predates the introduction of the guideline and there is no other reason to be modifying that code.
  4. When the code needs to remain compatible with older versions of Python that don't support the feature recommended by the style guide.

简单保持一致性对于初学者来说是噩梦

代码应该多读而不是多写是Guido的一个主要见解。本文的提供的规范致力于改良代码的可读性并且在广泛的Python代码中同样适用。就如同 PEP 20 所说, "可读性决定一切"。

编码规范就是编码的一致性。保持本文中的编码规范的一致性固然重要,但更重要的是保持项目中的编码规范一致性,最重要的是保持模块或者方法中的编码规范一致性。

但是,我们有时却需要打破一致性--有时编码规范的建议并不适用。当你疑惑是否要保持一致时,你需要自己做最好的决定。选择其它的代码规范中看起来最好的。如果有疑问,不要犹豫,去网上发问!

特别的:不要在运用PEP的时候打破向下兼容性!

需要忽视某些规范的一些场景:

  1. 当应用该规范时会让代码的可读性下降,即使阅读代码的人非常熟悉该PEP.
  2. 当和上下文的代码保持一致时会打破该规范(可能是因为历史原因)-- 虽然这可能是个解决他人的糟糕代码的机会 (in true XP style)。
  3. 如果问题代码是在引入该规范之前编写的,并且没有其他的理由去修改它。
  4. 当代码需要兼容旧版本的Python并且该版本的Python不支持该规范中需要用到的特性。

Code lay-out

Indentation

Use 4 spaces per indentation level.

Continuation lines should align wrapped elements either vertically using Python's implicit line joining inside parentheses, brackets and braces, or using a hanging indent [7] . When using a hanging indent the following should be considered; there should be no arguments on the first line and further indentation should be used to clearly distinguish itself as a continuation line.

Yes:

# Aligned with opening delimiter.
foo = long_function_name(var_one, var_two,
                         var_three, var_four)

# More indentation included to distinguish this from the rest.
def long_function_name(
        var_one, var_two, var_three,
        var_four):
    print(var_one)

# Hanging indents should add a level.
foo = long_function_name(
    var_one, var_two,
    var_three, var_four)

No:

# Arguments on first line forbidden when not using vertical alignment.
foo = long_function_name(var_one, var_two,
    var_three, var_four)

# Further indentation required as indentation is not distinguishable.
def long_function_name(
    var_one, var_two, var_three,
    var_four):
    print(var_one)

The 4-space rule is optional for continuation lines.

Optional:

# Hanging indents *may* be indented to other than 4 spaces.
foo = long_function_name(
  var_one, var_two,
  var_three, var_four)

When the conditional part of an if -statement is long enough to require that it be written across multiple lines, it's worth noting that the combination of a two character keyword (i.e. if ), plus a single space, plus an opening parenthesis creates a natural 4-space indent for the subsequent lines of the multiline conditional. This can produce a visual conflict with the indented suite of code nested inside the if -statement, which would also naturally be indented to 4 spaces. This PEP takes no explicit position on how (or whether) to further visually distinguish such conditional lines from the nested suite inside the if -statement. Acceptable options in this situation include, but are not limited to:

# No extra indentation.
if (this_is_one_thing and
    that_is_another_thing):
    do_something()

# Add a comment, which will provide some distinction in editors
# supporting syntax highlighting.
if (this_is_one_thing and
    that_is_another_thing):
    # Since both conditions are true, we can frobnicate.
    do_something()

# Add some extra indentation on the conditional continuation line.
if (this_is_one_thing
        and that_is_another_thing):
    do_something()

(Also see the discussion of whether to break before or after binary operators below.)

The closing brace/bracket/parenthesis on multi-line constructs may either line up under the first non-whitespace character of the last line of list, as in:

my_list = [
    1, 2, 3,
    4, 5, 6,
    ]
result = some_function_that_takes_arguments(
    'a', 'b', 'c',
    'd', 'e', 'f',
    )

or it may be lined up under the first character of the line that starts the multi-line construct, as in:

my_list = [
    1, 2, 3,
    4, 5, 6,
]
result = some_function_that_takes_arguments(
    'a', 'b', 'c',
    'd', 'e', 'f',
)

代码布局

缩进

每一个缩进级别使用4个空格。

多行的一句代码需要把在小括号、中括号以及大括号里的元素利用Python的隐式换行或者悬浮缩进 [7] 进行垂直排列。当使用悬浮缩进的时候需要遵守:在第一行不能有参数,缩进的行作为同一句代码需要明确地与其他行进行区分。

正确:

# 根据第一个括号进行排列。
foo = long_function_name(var_one, var_two,
                         var_three, var_four)

# 更多的缩进使该部分代码与其他的代码区分开来。
def long_function_name(
        var_one, var_two, var_three,
        var_four):
    print(var_one)

# 悬浮缩进需要增加一个缩进级别。
foo = long_function_name(
    var_one, var_two,
    var_three, var_four)

错误:

# 没有使用垂直排列的时候,参数不应该在首行。
foo = long_function_name(var_one, var_two,
    var_three, var_four)

# 需要进一步缩进,因为换行的缩进不够明显。
def long_function_name(
    var_one, var_two, var_three,
    var_four):
    print(var_one)

缩进是4个空格的规则在多行的一句代码里是可以忽略的的。

可选的:

# 悬浮缩进可以使用比4个空格更少的缩进。
foo = long_function_name(
  var_one, var_two,
  var_three, var_four)

当条件语句if的条件过长,需要写在多行时。需要注意像if这样的两个字符的关键字在后面加上了一个空格,并且在左括号包围的多行条件中,会自动的增加4个空格的缩进。这会产生视觉上的效果让这部分条件代码和if语句里的同样自动缩进了4个空格的代码区分开。该PEP没有过多的探讨怎样更好的对条件和if语句里的代码进行区分。可以使用的方案如下所示,但不仅仅只有下面这些:

# 没有额外的缩进。
if (this_is_one_thing and
    that_is_another_thing):
    do_something()

# 增加注释,可以在支持代码高亮的编辑器中提供一些区分度。
if (this_is_one_thing and
    that_is_another_thing):
    # Since both conditions are true, we can frobnicate.
    do_something()

# 增加一些额外的缩进。
if (this_is_one_thing
        and that_is_another_thing):
    do_something()

(也可见下文对在二进制运算符之后是否换行的讨论。)

在多行的一句代码中,右括号可以排列在最后一行同上一行第一个非空字符相同的位置,比如:

my_list = [
    1, 2, 3,
    4, 5, 6,
    ]
result = some_function_that_takes_arguments(
    'a', 'b', 'c',
    'd', 'e', 'f',
    )

也可以排列在最后一行的第一个字符,比如:

my_list = [
    1, 2, 3,
    4, 5, 6,
]
result = some_function_that_takes_arguments(
    'a', 'b', 'c',
    'd', 'e', 'f',
)

Tabs or Spaces?

Spaces are the preferred indentation method.

Tabs should be used solely to remain consistent with code that is already indented with tabs.

Python 3 disallows mixing the use of tabs and spaces for indentation.

Python 2 code indented with a mixture of tabs and spaces should be converted to using spaces exclusively.

When invoking the Python 2 command line interpreter with the -t option, it issues warnings about code that illegally mixes tabs and spaces. When using -tt these warnings become errors. These options are highly recommended!

制表符还是空格?

空格是建议使用的缩进方式。

制表符应该只在已经使用制表符方式缩进的代码中使用。

Python 3 不允许制表符和空格缩进混用。

Python 2 的制表符和空格混用的代码应该重构成只用空格缩进的代码。

当使用Python 2命令行时,加上-t选项可以把制表符和空格混用的地方以警告的方式提示出来,加上-tt选项可以把制表符和空格混用的地方以错误的方式提示出来。这两个选项十分推荐使用。

Maximum Line Length

Limit all lines to a maximum of 79 characters.

For flowing long blocks of text with fewer structural restrictions (docstrings or comments), the line length should be limited to 72 characters.

Limiting the required editor window width makes it possible to have several files open side-by-side, and works well when using code review tools that present the two versions in adjacent columns.

The default wrapping in most tools disrupts the visual structure of the code, making it more difficult to understand. The limits are chosen to avoid wrapping in editors with the window width set to 80, even if the tool places a marker glyph in the final column when wrapping lines. Some web based tools may not offer dynamic line wrapping at all.

Some teams strongly prefer a longer line length. For code maintained exclusively or primarily by a team that can reach agreement on this issue, it is okay to increase the nominal line length from 80 to 100 characters (effectively increasing the maximum length to 99 characters), provided that comments and docstrings are still wrapped at 72 characters.

The Python standard library is conservative and requires limiting lines to 79 characters (and docstrings/comments to 72).

The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation.

Backslashes may still be appropriate at times. For example, long, multiple with -statements cannot use implicit continuation, so backslashes are acceptable:

with open('/path/to/some/file/you/want/to/read') as file_1, \
     open('/path/to/some/file/being/written', 'w') as file_2:
    file_2.write(file_1.read())

(See the previous discussion on multiline if-statements for further thoughts on the indentation of such multiline with -statements.)

Another such case is with assert statements.

Make sure to indent the continued line appropriately.

最大行长度

所有的行应该不超过79个字符。

对于有着更少结构限制的长的语句块例如注释和文档语句,行的长度应该限制在72个字符以内。

限制编辑器窗口的宽度使得并列打开多个文件有了可能,同样也在使用代码审核工具展示两个版本的代码时更加好用。

在大部分工具中,默认的代码排列破坏了代码的视觉结构,使得它更难理解。79个字符的限制可以在窗口的长度为80时避免换行,哪怕工具在最后一列放置了标记字符。有一些基于网页的工具可能根本就不支持动态换行。

有些团队强烈的倾向于更长的行长度。对于代码只在该团队或者绝大部分在该团队可以采取这样的方案,把称作的行长度从80字符增加到100字符(有效的增加最大行长度到99字符),同时注释和文档语句仍然在72个字符换行。

推荐的长代码的换行方式是采用Python在括号之间的隐式换行。过长的一行代码可以通过在括号里换行形成多行代码。如果实在是没有其他办法,可以使用反斜线符号进行换行。

反斜线在很多情况下适用。比如说,在比较长又比较多的with语句中是无法使用隐式换行的,所以使用反斜线是可以接受的。

with open('/path/to/some/file/you/want/to/read') as file_1, \
     open('/path/to/some/file/being/written', 'w') as file_2:
    file_2.write(file_1.read())

(with语句的缩进可以参考上文中的多行IF语句)

assert语句同样适用反斜线。

在语句换行之后,一定要确保合适的缩进。

Should a line break before or after a binary operator?

For decades the recommended style was to break after binary operators. But this can hurt readability in two ways: the operators tend to get scattered across different columns on the screen, and each operator is moved away from its operand and onto the previous line. Here, the eye has to do extra work to tell which items are added and which are subtracted:

# No: operators sit far away from their operands
income = (gross_wages +
          taxable_interest +
          (dividends - qualified_dividends) -
          ira_deduction -
          student_loan_interest)

To solve this readability problem, mathematicians and their publishers follow the opposite convention. Donald Knuth explains the traditional rule in his Computers and Typesetting series: "Although formulas within a paragraph always break after binary operations and relations, displayed formulas always break before binary operations" [3] .

Following the tradition from mathematics usually results in more readable code:

# Yes: easy to match operators with operands
income = (gross_wages
          + taxable_interest
          + (dividends - qualified_dividends)
          - ira_deduction
          - student_loan_interest)

In Python code, it is permissible to break before or after a binary operator, as long as the convention is consistent locally. For new code Knuth's style is suggested.

在运算符后面还是前面换行?

长久以来人们推崇的是在运算符后面换行。但这会在两个方面降低可读性:运算符会在屏幕上分散成多个列,每个运算符都在上一行,远离了它的操作数。这就就会让人的眼睛做更多的工作。

# 错误:运算符远离了操作数。
income = (gross_wages +
          taxable_interest +
          (dividends - qualified_dividends) -
          ira_deduction -
          student_loan_interest)

为了解决这个可读性问题,数学家和出版商遵循相反的规则。Donald Knuth在他的书中《 Computers and Typesetting 》解释了传统的换行规则:"虽然公式里通常在运算符之后换行,但是显示公式的时候却通常在运算符之前换行。" [3] .

遵循数学传统能够得到更具可读性的代码:

# 正确:容易把运算符和操作数配对
income = (gross_wages
          + taxable_interest
          + (dividends - qualified_dividends)
          - ira_deduction
          - student_loan_interest)

在Python代码中,在运算符之前或者之后换行都是可以的,只要保持上下文的编码规范即可。对于新的代码则建议使用在运算符之后换行。

Blank Lines

Surround top-level function and class definitions with two blank lines.

Method definitions inside a class are surrounded by a single blank line.

Extra blank lines may be used (sparingly) to separate groups of related functions. Blank lines may be omitted between a bunch of related one-liners (e.g. a set of dummy implementations).

Use blank lines in functions, sparingly, to indicate logical sections.

Python accepts the control-L (i.e. ^L) form feed character as whitespace; Many tools treat these characters as page separators, so you may use them to separate pages of related sections of your file. Note, some editors and web-based code viewers may not recognize control-L as a form feed and will show another glyph in its place.

空行

顶级方法和类的前后需要两行空行。

在类中的方法前后需要一行空行。

多余的空行可以用来把多个方法分隔开(保守使用)。空行也可以在许多单行代码里省略(比如未实现的接口)。

在方法里使用保守使用空行来表明逻辑区块。

Python accepts the control-L (i.e. ^L) form feed character as whitespace; Many tools treat these characters as page separators, so you may use them to separate pages of related sections of your file. Note, some editors and web-based code viewers may not recognize control-L as a form feed and will show another glyph in its place.

Source File Encoding

Code in the core Python distribution should always use UTF-8 (or ASCII in Python 2).

Files using ASCII (in Python 2) or UTF-8 (in Python 3) should not have an encoding declaration.

In the standard library, non-default encodings should be used only for test purposes or when a comment or docstring needs to mention an author name that contains non-ASCII characters; otherwise, using \x , \u , \U , or \N escapes is the preferred way to include non-ASCII data in string literals.

For Python 3.0 and beyond, the following policy is prescribed for the standard library (see PEP 3131 ): All identifiers in the Python standard library MUST use ASCII-only identifiers, and SHOULD use English words wherever feasible (in many cases, abbreviations and technical terms are used which aren't English). In addition, string literals and comments must also be in ASCII. The only exceptions are (a) test cases testing the non-ASCII features, and (b) names of authors. Authors whose names are not based on the latin alphabet MUST provide a latin transliteration of their names.

Open source projects with a global audience are encouraged to adopt a similar policy.

源代码编码

在Python的核心代码中编码始终用UTF8(Python2用ASCII)。

在Python2中使用ASCII编码以及在Python3中使用UTF8编码的文件不应该声明编码。

在标准库中,非默认的编码格式只能出现在测试或者注释以及文档语句需要提到作者名字的情况下。其他情况则使用 \x , \u , \U , \N  转义字符。

在Python 3.0以及上版本,在标准库中规定如下规则 (见PEP 3131 ): Python标准库中的所有关键词都必须使用ASCII编码并且应该尽量使用英文单词(缩写或者专业词汇不是英文单词)除此之外,字符串字面量以及注释都必须使用ASCII编码。唯一的例外就是测试非ASCII的特性以及作者的名字。如果作者的名字不是拉丁语,则必须提供一个拉丁音译。

开源的项目建议采用相似的规则。

 

Imports

  • Imports should usually be on separate lines, e.g.:

    Yes: import os
         import sys
    
    No:  import sys, os
    

    It's okay to say this though:

    from subprocess import Popen, PIPE
    
  • Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

    Imports should be grouped in the following order:

    1. standard library imports
    2. related third party imports
    3. local application/library specific imports

    You should put a blank line between each group of imports.

  • Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages) if the import system is incorrectly configured (such as when a directory inside a package ends up on sys.path):

    import mypkg.sibling
    from mypkg import sibling
    from mypkg.sibling import example
    

    However, explicit relative imports are an acceptable alternative to absolute imports, especially when dealing with complex package layouts where using absolute imports would be unnecessarily verbose:

    from . import sibling
    from .sibling import example
    

    Standard library code should avoid complex package layouts and always use absolute imports.

    Implicit relative imports should never be used and have been removed in Python 3.

  • When importing a class from a class-containing module, it's usually okay to spell this:

    from myclass import MyClass
    from foo.bar.yourclass import YourClass
    

    If this spelling causes local name clashes, then spell them

    import myclass
    import foo.bar.yourclass
    

    and use "myclass.MyClass" and "foo.bar.yourclass.YourClass".

  • Wildcard imports (from <module> import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools. There is one defensible use case for a wildcard import, which is to republish an internal interface as part of a public API (for example, overwriting a pure Python implementation of an interface with the definitions from an optional accelerator module and exactly which definitions will be overwritten isn't known in advance).

    When republishing names this way, the guidelines below regarding public and internal interfaces still apply.

导入语句

  • 导入语句通常应该是在单独的行,例如:

    正确: import os
         import sys
    
    错误:  import sys, os
    

    不过这样的做法是可以行的:

    from subprocess import Popen, PIPE
    
  • 导入语句应该总是在文件的顶部,在模块注释或者文档语句之后,在模块的全局变量和常量之前。

    导入语句应该按照以下的规则分组。

    1. 标准库的导入。
    2. 相关的第三方库的导入。
    3. 本地应用或者库的导入。

    在每组之间应该留一个空行。

  • 建议使用绝对导包。因为他们通常更具有可读性并且在导包系统配置的不正确时(例如一个包中的文件夹以sys.path结尾)表现地更好(或者至少能给出更好的错误信息):

    import mypkg.sibling
    from mypkg import sibling
    from mypkg.sibling import example
    

    然而明确的相对导包也是可以接受的一个选项。尤其是在复杂的包环境情况下,适用绝对导包会造成不必要的冗余。

    from . import sibling
    from .sibling import example
    

    标准库的代码应该避免复杂的包结构,并且总是使用绝对导包。

    不明确的相对导包应该禁止使用,并且在Python 3中已经被移除了。

  • 从一个模块导入类时,通常用下面的方法:

    from myclass import MyClass
    from foo.bar.yourclass import YourClass
    

    如果类名冲突时,则用包名的拼接来使用这些类。

    import myclass
    import foo.bar.yourclass
    

    使用 "myclass.MyClass" 和 "foo.bar.yourclass.YourClass".

  • 应该避免使用通配符导入 (from <module> import *) ,因为这会导致当前名称空间里的名称不清晰,迷惑阅读代码的人和许多自动化工具。 通配符导入可以有一种防御性的用法,用来在公共接口中重新发布内部接口 (例如,重写一个可选的加速模块的纯Python的实现,并且事先不清楚哪些定义会被覆盖).

    当用这种方式重新发布,以下的区分公共和内部接口的方法仍然有效。

Module level dunder names

Module level "dunders" (i.e. names with two leading and two trailing underscores) such as __all__, __author__, __version__, etc. should be placed after the module docstring but before any import statements except from __future__ imports. Python mandates that future-imports must appear in the module before any other code except docstrings.

For example:

"""This is the example module.

This module does stuff.
"""

from __future__ import barry_as_FLUFL

__all__ = ['a', 'b', 'c']
__version__ = '0.1'
__author__ = 'Cardinal Biggles'

import os
import sys

模块级别的双下划线名称

模块级别的双下划线名称(在名称前面和后面都有两个下划线)例如 __all__, __author__, __version__, 等等,应该被放在模块的文档字符串之后,在除了from __future__的其他导入语句之前。 Python强制future的导入语句必须放置在模块中除文档字符串之前。

例如:

"""This is the example module.

This module does stuff.
"""

from __future__ import barry_as_FLUFL

__all__ = ['a', 'b', 'c']
__version__ = '0.1'
__author__ = 'Cardinal Biggles'

import os
import sys

String Quotes

In Python, single-quoted strings and double-quoted strings are the same. This PEP does not make a recommendation for this. Pick a rule and stick to it. When a string contains single or double quote characters, however, use the other one to avoid backslashes in the string. It improves readability.

For triple-quoted strings, always use double quote characters to be consistent with the docstring convention in PEP 257.

字符串引用

在Python中,单引号字符串和双引号字符串是一样的。本文不对此做推荐。选择一种规则,然后保持就可以了。当一个字符串包含单引号或者双引号字符时,为了避免使用反斜线来转义,使用另一种引号即可。这会改善可读性。

对于三个引号的字符串,总是使用双引号字符,这样可以与文档字符串的规则PEP 257保持一致。