雪夜读书: September 2006

雪夜读书

Saturday, September 02, 2006

Py 2.5 what's new 之 with


8 PEP 343: The 'with' statement 

The 'with' statement clarifies code that previously would use try...finally 
blocks to ensure that clean-up code is executed. In this section, I'll discuss 
the statement as it will commonly be used. In the next section, I'll examine the 
implementation details and show how to write objects for use with this statement. 

'with' 语句的功能是执行清理代码，它要比了我们过去用的 try ... finally清楚得多。
本章我会先探讨它是怎么用的，然后再研究其实现的细节，并且写一个能给 with 语句用的对象。

The 'with' statement is a new control-flow structure whose basic structure is: 

'with' statement是一个新的流程控制结构，其基本的语法如下:


with expression [as variable]:
    with-block

The expression is evaluated, and it should result in an object that supports the 
context management protocol. This object may return a value that can optionally 
be bound to the name variable. (Note carefully that variable is not assigned the 
result of expression.) The object can then run set-up code before with-block is 
executed and some clean-up code is executed after the block is done, even if the 
block raised an exception.

(with语句会)先计算表达式，得到一个实现context management接口(这里pep作者使用了protocol，
而不是大家都比较熟悉的interface的术语)的对象。这个对象可能会返回一个值，而你则可以选择
是不是把这个值绑定到变量名("name variable")上。(请注意，这不是将表达式到值赋值给变量。) 
然后对象先运行预备代码("set-up code")，再执行with-block，最后在离开block的时候，
即便是因为发生了异常，它也会执行清理代码。

To enable the statement in Python 2.5, you need to add the following directive 
to your module:

要在Python 2.5里面用with语句，你得在module里面加上下面这行: 


from __future__ import with_statement

The statement will always be enabled in Python 2.6. 
等到了Python 2.6，你就不用再这么麻烦了。

Some standard Python objects now support the context management protocol and 
can be used with the 'with' statement. File objects are one example: 

有些Python对象已经实现了context management接口，因此现在就可以用在with语句里里。
File对象就是一例:

with open('/etc/passwd', 'r') as f:
    for line in f:
        print line
        ... more processing code ...

After this statement has executed, the file object in f will have been automatically 
closed, even if the 'for' loop raised an exception part-way through the block.

只要这个语句运行结束，即便是因为 for 循环运行到一半的时候抛出了异常，
file对象f也会自动关闭。

The threading module's locks and condition variables also support the 'with' statement: 

threading模块的lock和condition对象也支持with 语句
(据shhgs得理解，这里的locks表示threading里面的所有lock，如Lock, RLock都已经支持with了):


lock = threading.Lock()
with lock:
    # Critical section of code
    ...

The lock is acquired before the block is executed and always released once the block 
is complete.

执行block之前得先得到lock，然后执行完毕之后会释放锁。

The decimal module's contexts, which encapsulate the desired precision and rounding 
characteristics for computations, provide a context_manager() method for getting a 
context manager: 

decimal模块提供了一个能返回context对象的context_manager()方法(请注意看注释，
至少Py2.5b1里面是没有这个方法，你得用get_manager() )，这个context对象
(context是上下文的意思)将计算的精度和近似要求封装了起来
(什么叫上下文？这才叫上下文！Perl的那个上下文根本就是假冒伪劣):

import decimal

# Displays with default precision of 28 digits
v1 = decimal.Decimal('578')
print v1.sqrt()

ctx = decimal.Context(prec=16) 
#----------------------------------------
# 根据shhgs的试验，ctx对象是没有context_manager()方法的，
# 你得
# with ctx.get_manager() :
#----------------------------------------
with ctx.context_manager():
    # All code in this block uses a precision of 16 digits.
    # The original context is restored on exiting the block.
    print v1.sqrt()



8.1 Writing Context Managers 
Under the hood, the 'with' statement is fairly complicated.
 Most people will only use 'with' in company with existing objects 
 and don't need to know these details, so you can skip the rest of 
 this section if you like. Authors of new objects will need to understand 
 the details of the underlying implementation and should keep reading. 

'with'语句背后的实现是相当复杂的。绝大多数人只需要知道怎样把现成的对象用到with里面，
所以只要你觉得合适，完全可以跳过下面这段。但是如果你要写能给with用的对象，
那就得受累读下去了。 

A high-level explanation of the context management protocol is: 

context management 接口的大致概括如下:

The expression is evaluated and should result in an object called a 
``context manager''. The context manager must have __enter__() and __exit__() methods. 

(with语句先)计算表达式，拿到一个被成为context manager(上下文管理器)的对象。
这个context manager必须提供__enter__()和__exit__()方法。

The context manager's __enter__() method is called. 
The value returned is assigned to VAR.
 If no 'as VAR' clause is present, the value is simply discarded. 

然后(with)再调用context manager的__enter__()方法。
如果语句里面还有'as VAR'，那么它会顺手把(__enter__()所返回的)值赋给VAR。
如果没有'as VAR'，这个值就丢了。


The code in BLOCK is executed. 

然后执行BLOCK。

If BLOCK raises an exception, the __exit__(type, value, traceback) is called 
with the exception details, the same values returned by sys.exc_info(). 
The method's return value controls whether the exception is re-raised: 
any false value re-raises the exception, and True will result in suppressing it. 
You'll only rarely want to suppress the exception, because if you do the author 
of the code containing the 'with' statement will never realize anything went wrong. 

如果BLOCK引发了异常，那么它会根据这个异常去调用__exit__(type, value, traceback)。
这个方法所返回的值同sys.exc_info()方法是相同的。这个返回值会告诉系统是不是再把
异常往上抛: false值表示再抛，True表示把它压下来。一般来说你不太需要去压制异常，
因为要是你在这里把异常压了下来，那些写with语句的人就永远也不知道这里还发生过异常了。

If BLOCK didn't raise an exception, the __exit__() method is still called,
 but type, value, and traceback are all None. 

如果BLOCK没有引发异常，with还是会调用__exit__()方法，只是这时type, value，
traceback都是None。

Let's think through an example. I won't present detailed code but will only 
sketch the methods necessary for a database that supports transactions. 

我们举一个例子。我不会给具体的代码，只是大致地划拉一下一个
支持transaction地数据库所必须提供的方法。

(For people unfamiliar with database terminology: a set of changes to the database 
are grouped into a transaction. Transactions can be either committed, meaning that 
all the changes are written into the database, or rolled back, meaning that the 
changes are all discarded and the database is unchanged. See any database textbook 
for more information.) 

(这段话是为那些不熟悉数据库的朋友准备的: 所谓transaction就是把一组对数据库的修改捆绑起来。
你可以commit一个transaction，也就是说把这些修改全部写入数据库；也可以roll back，
也就是说把这些修改全都扔掉。随便找本数据库的书都会比我这里讲的详细。)

Let's assume there's an object representing a database connection. 
Our goal will be to let the user write code like this: 

架设这里有一个表示database connection的对象。我们的目标是要让用户能这样写代码:

db_connection = DatabaseConnection()
with db_connection as cursor:
    cursor.execute('insert into ...')
    cursor.execute('delete from ...')
    # ... more operations ...

The transaction should be committed if the code in the block runs flawlessly or 
rolled back if there's an exception. Here's the basic interface for DatabaseConnection 
that I'll assume:

如果block运行无误的话，这个transaction就算是commit了。只要有异常，我们就roll back。
下面是我所预想的这个DatabaseConnection的基本接口。


class DatabaseConnection:
    # Database interface
    def cursor (self):
        "Returns a cursor object and starts a new transaction"
    def commit (self):
        "Commits current transaction"
    def rollback (self):
        "Rolls back current transaction"

The __enter__() method is pretty easy, having only to start a new transaction. 
For this application the resulting cursor object would be a useful result, 
so the method will return it. The user can then add as cursor to their 'with' 
statement to bind the cursor to a variable name. 

__enter__()方法相当简单，只需要启动一个transaction就行了。对application来说，
cursor 对象会用得着，所以 __enter__() 得把它返回出去。这样用户就能在with语句里用cursor了。

class DatabaseConnection:
    ...
    def __enter__ (self):
        # Code to start a new transaction
        cursor = self.cursor()
        return cursor

The __exit__() method is the most complicated 
because it's where most of the work has to be done. 
The method has to check if an exception occurred. 
If there was no exception, the transaction is committed. 
The transaction is rolled back if there was an exception. 

__exit__()最烦了，因为绝大多数的工作都是在这里干的。
它得检查是不是有异常。如果没有，它得负责commit，如果有它得roll back。

In the code below, execution will just fall off the end of the function, 
returning the default value of None. None is false, so the exception will 
be re-raised automatically. If you wished, you could be more explicit and 
add a return statement at the marked location. 

在下面这段代码里，真正重要的东西是函数的最后部分，返回一个None。None是false值，
所以异常会被重新抛出来。如果你想明确一下，可以在我注释的地方加一个return语句。

class DatabaseConnection:
    ...
    def __exit__ (self, type, value, tb):
        if tb is None:
            # No exception, so commit
            self.commit()
        else:
            # Exception occurred, so rollback.
            self.rollback()
            # return False



8.2 The contextlib module 
The new contextlib module provides some functions and a decorator that are useful 
for writing objects for use with the 'with' statement. 

2.5新加的contextlib模块提供了一些用于编写供'with'语句使用的对象的方法和decorator。

The decorator is called contextfactory, and lets you write a single generator 
function instead of defining a new class. The generator should yield exactly one value.
The code up to the yield will be executed as the __enter__() method, 
and the value yielded will be the method's return value that will get bound to the 
variable in the 'with' statement's as clause, if any. The code after the yield will be 
executed in the __exit__() method. Any exception raised in the block will be raised 
by the yield statement. 

这个decorotor被称为contextfactory。有了它，你可以只写一个generator方法
而不用去定义一个新的类了。这个generator只能yield一个值。yield之前的代码表示__enter__()方法，
yield出来的值会绑定到 with 语句的 as 子句的变量，如果有的话。yield后面的代码是__exit__()要执行的。
此外，yield还会把block里面的异常再抛出来。

Our database example from the previous section could be written using this decorator as: 

我们前面讲的数据库的例子可以这样改写:

from contextlib import contextfactory

@contextfactory
def db_transaction (connection):
    cursor = connection.cursor()
    try:
        yield cursor
    except:
        connection.rollback()
        raise
    else:
        connection.commit()

db = DatabaseConnection()
with db_transaction(db) as cursor:
    ...

The contextlib module also has a nested(mgr1, mgr2, ...) function 
that combines a number of context managers so you don't need to write nested 
'with' statements. In this example, the single 'with' statement both starts a database 
transaction and acquires a thread lock: 

contextlib模块还提供了一个能把多个context manager捆绑起来的nested(mgr1, mgr2, ... )函数，
这样你就不用把with语句嵌套起来了。下面这个例子里，我们只用一次with就启动了数据库的
transaction并获取了线程的锁。

lock = threading.Lock()
with nested (db_transaction(db), lock) as (cursor, locked):
    ...

Finally, the closing(object) function returns object so that it can be bound 
to a variable, and calls object.close() at the end of the block. 

最后，(contextfactory的)closing(object)函数会返回一个对象，这样你就可以把它绑定到变量上，
然后在block的最后(with会自动)调用object.close()了。

import urllib, sys
from contextlib import closing

with closing(urllib.urlopen('http://www.yahoo.com')) as f:
    for line in f:
        sys.stdout.write(line)

¶ 2:39 AM 0 Comments

Py 2.5 what's new 之 yield

 
Py 2.5 what's new 之 yield 
------------------------------

:Date: 2006-8-31
:Author: shhgs
:Copyright: 为了表达本人对CSDN论坛“脚本语言(Perl/Python)”专区的强烈不满，
     特此宣布，本文档不允许任何人发布或者链接到CSDN论坛的“脚本语言Perl/Python”专区。
     除此之外，任何人均可以阅读，分发本文档的电子版，或者本文档的链接。此外，
     任何人均可以将本文档张贴到除CSDN论坛“脚本语言Perl/Python”专区之外的其它
     BBS。任何人均可以打印本文档，以供自己或他人使用，但是不得以任何名义向任何人收取任何费用。
     上述名义包括，但不限于，纸张费，打印费，耗材费等等。分发、张贴本文档的时候，必须保留这段版权申明。
     如果有人要出版本文档，必须事先获得本人的同意。
     
Py 2.5 对yield做了本质性的增强，使得Py有了自己的first class的coroutine。

我们先来看看传统的yield。Py 2.3加入的yield使得Python实现了first class的generator。
generator是enumerator/iterator的自然延伸，其区别在于，iterator/enumerator遍历的是
一个既有的有限集合，而generator则是依次生成集合的各个元素，并且这个集合还可以是无限的。
从算法上讲，generator同递归一样，源出数学归纳法，但是与递归相比，一是其代码更为清晰，
二是它没有嵌套层数的限制。

但是你提供工具想让别人干什么和别人会怎么去用这个工具，从根本上讲是两码事。generator
问世之初就有人敏感地指出，这是一个semi coroutine。所谓的coroutine是指，一种有多个
entry point和suspend point的routine。Py 2.3的yield实现了多个entry/suspend point，
但是由于其无法在generator每次重新启动的时候往里面传新的数据，因此只能被称作semi 
coroutine。

当然也不是全然没有办法。但是总的来说，要想往里面传新的数据，你就得动用一些技巧。
本文的主旨不在于向诸位介绍这些技巧，这里我们关心的是，为什么那些大牛们要挖空心思去
改造generator，他们想要干什么，以及怎么干。

Py 2.5 yield 的语法
=====================================

讲了半天往generator里面传数据，那么怎么个传法呢？

Py 2.5的generator有了一个新的send方法，我们就是用这个方法往generator里面传数据的。
假设我们要把message送到gen里面 ::

    ...
    gen = f()
    ...
    gen.send(message)
    ...

那么我们又该怎样定义generator，让它接收message呢？这里先要说明，
Py 2.5对yield的语法做了修改，现在yield已经不再是语句而是表达式了。
因此你可以这样定义generator ::

    def f() :
 ...
 val = yield i
 ...

当用户gen.send(message)的时候，message就被送进generator，赋给val了。
而generator还是像以前那样生成i。所以如果你想拿到i，还得 ::
    
    ...
    gen = f()
    ...
    p = gen.send(message)
    ...

介绍完Py 2.5的generator的语法，现在再来讲讲Py 2.3的语法在2.5里是怎么解释的。
Py 2.3的generator只有一个next方法 ::

    gen.next()

因此在2.5里它就是 ::

    gen.send(None)

的简写。而在2.3的generator只能这样yield值 ::
   
   yield i

因此在2.5里，这表示generator忽略了你传进来的值。

yield的语法就这么简单，如果读者还有什么疑问的话，可以参看Python Manual里面的what's new。


yield的用途
===============================

1. 合作多任务
.................................


PEP342_ 提到的coroutine的用途包括“模拟，游戏，异步I/O，以及其它形式的事件驱动或
合作多任务编程”。那么我们就从相对简单的合作多任务开始。 

.. _PEP342: http://www.python.org/dev/peps/pep-0342/ 

所谓合作多任务的意思是，一个系统同时有多个任务在运行，而且这些任务都非常的合作，会自愿地 
将系统的控制权转交给其它任务。与多线程相比，合作多任务有两个非常显著的特点。首先是顺序的
决定性。大家都知道多线程环境是非决定性的。各个线程什么时候开始，什么时候挂起都是由线程
调度机制决定的，因此你永远也无法知道某个线程会在什么时候挂起，什么时候重新启动。
而合作多任务从本质上讲还是单线程的程序。只不过我们将每个任务封装成一个单独的函数
(这里就是generator)，然后通过调度程序按照一定的算法轮流调用这种函数，从而推进
任务的进展。

讲到这里，大家应该对“合作”有一点体会了。这里，每个任务都必须合作，也就是说在较短
的时间里将制权转交出去。如果某个任务进入了死循环，那么整个系统也就死了。

下面我们就来举一个用generator实现合作多任务的例子。假设这是一盘棋，电脑引擎和
用户界面程序分别做成generator。::

    player = GetUserInput(...) 
    engine = Engine(...)
    
    def game(red, black) :
        ...
        move = red.next()
        while move != Move.Resign :
            if turn == black : 
                turn = red
            else :
                turn = black
            game_state.update(move)
            move = yield turn.send(move)
        game_state.update(move)

这里能很清楚地看出generator所实现的合作多任务的单线程本质。因此如果我们的象棋引擎耍赖的话，::
    
    def Engine() :
        ...
        if game.LoseInevitable :
        while 1 :
            sleep(1000)
        yield Move.Resign

那么你的程序就死了。

这是合作多任务的先天缺陷，因此在设计的时候你就得想好了，这个任务是不是
适合用合作多任务来解决。


2. 异步I/O
.................................

coroutine的另一个用途是异步I/O。关于异步I/O，我曾经在邮件列表里写过 `一封信`_ ，
有兴趣的读者可以去看看。

.. _`一封信`: http://groups.google.sm/group/python-cn/browse_thread/thread/1b4903dbf21b4fcf/1c10ce45d41b9246?lnk=raot&hl=it

在异步环境下，你把一堆socket交给监听器。监听器则负责告诉你socket是不是可读可写。
监听器只能帮你把数据读出来，至于读出来的东西是不是合法，该怎么用，它就无能为力了。
因此你得写回调函数，让监听器帮你把信息分发到回调函数里。

这个任务可不容易。因为监听器是根据收到的信息来判断调用哪个回调函数的，
但是函数却不一定知道该怎么处理这个信息。比方说，监听器听到用户输入了一个PASS命令，
于是调用do_PASS。但是这个口令是谁输的，或者这个用户在输口令之前是不是先用了USER命令，
监听器是不知道的。监听器不知道，do_PASS也就无从得知了。
因此回调函数里面还有一大堆麻烦事等着。

有了coroutine之后，我们可以将每个会话封装成一个generator。当监听器听到数据的时候，
可以用send方法，唤醒coroutine；coroutine根据收到的消息，生成适当的值，
交还给监听器之后自己接着睡。coroutine的这种工作方式与线程很相似，
因此也被称作pseudo-thread。

下面我们举一个完整的例子。程序清单如下: ::

      1    #!/usr/local/bin/python2.5
      2    
      3    import socket, select, collections
      4    
      5    SOCK_TIMEOUT = 0.1
      6    BUFSIZ = 8192
      7    PORT   = 10000
      8    
      9    def get_auth_config() :
     10        return {'shhgs': 'hello', 'limodou': 'world'}
     11    
     12    def task() :
     13        authdb = get_auth_config()
     14    
     15        username = yield 'Greetings from EchoServer on %s\nUserName Please: \r\n' % socket.gethostname()
     16    
     17        username = username.strip()
     18        if username not in authdb :
     19            yield '\nInvalid user. Byebye\r\n'
     20            return
     21        else :
     22            password = yield '\nYour Password Please:\r\n'
     23    
     24        password = password.strip()
     25        if authdb[username] == password :
     26            val = yield '\nMay you enjoy the EchoServer.\r\n'
     27        else :
     28            yield '\nWrong Password\r\n'
     29            return
     30            
     31        while  val:
     32            val = val.strip()
     33            val = yield ( ">>> " + val + '\r\n')
     34    
     35    def main(proto) :
     36        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
     37        sock.bind(('' , PORT))
     38        sock.listen(5)
     39        sock.settimeout(SOCK_TIMEOUT)
     40    
     41        connPool = {}     # 这两个变量相当重要，主控程序要通过connPool选择pseudo-thread 
     42        msgQueue = {}     # 而msgQueue则是存储要发给generator的消息队列的
     43
     44        try :
     45            while 1 :
     46                try :
     47                    conn, addr = sock.accept()
     48                    connPool[conn] = proto()
     49                    greetings = connPool[conn].next() # 注意，第一次调用generator的send时，只能传None。或者像这样，调用next
     50                    conn.sendall(greetings)
     51                except socket.timeout :
     52                    pass
     53    
     54                conns = connPool.keys()
     55                try :
     56                    i,o,e = select.select(conns, conns, (), SOCK_TIMEOUT )
     57                except :
     58                    i = o = e = []
     59    
     60                for conn in i :
     61                    try :
     62                        data = conn.recv(BUFSIZ)
     63                        if data :
     64                            response = connPool[conn].send(data)
     65                            if conn in msgQueue :  
     66                                msgQueue[conn].append(response)   # msgQueue的值必须是list
     67                            else :    
     68                                msgQueue[conn] = [ response, ]
     69                    except socket.error :
     70                        try : 
     71                            connPool.pop(conn)
     72                            msgQueue.pop(conn)
     73                        except :
     74                            pass
     75                        conn.close()
     76    
     77                for conn in o :
     78                    try :
     79                        if conn in msgQueue :
     80                            msgs = msgQueue.pop(conn)
     81                            for response in msgs :
     82                                conn.sendall(response)
     83                                if response in ('\nInvalid user. Byebye\r\n', '\nWrong Password\r\n') : # 终于知道正规的协议为什么都是用错误号的了。
     84                                    connPool.pop(conn)
     85                                    conn.close()
     86                    except socket.error :
     87                        try : 
     88                            connPool.pop(conn)
     89                            msgQueue.pop(conn)
     90                        except :
     91                            pass
     92                        conn.close()
     93    
     94        except :
     95            sock.close()
     96    
     97    if __name__ == "__main__" :
     98    #    t = task()
     99    #    input = raw_input(t.next())
    100    #    while input :
    101    #        resp = t.send(input)
    102    #        input = raw_input(resp)
    103        main(task)
    

task就是一个pseudo-thread，其调试部分在最后，就是被注释掉的那几行。
如果把raw_input代进去，这就是一个非常简单的程序，相信初学者也应该能写。
但是如果要求全部用callback，那问题可就复杂了。

主控程序虽然长了点，但也很简单。这里主要提几个地方。

1)  拿到generator之后，第一次只能send一个None，或者调用next。如果你想把接口做得
    友好一点，可以参考 PEP342_ 的consumer函数。这是一个decorator，可以返回一个能直接
    send消息的generator。

2)  connPool和msgQueue是必不可少的。对于读，我们可以不用list。因为不管哪种协议，
    监听器每循环一次，每个socket只会读一次。
    但是写必须要用list。因为在有些协议里，比方说IM协议，
    很可能会出现一次循环里有多个pseudo-thread要往同一个socket里面写东西的情况。
    这时你就必须用list保存数据了。

3)  这一点不是generator的东西。第39行，我们设了sock的timeout，因此47行的时候，
    sock就不会傻等下去了。此外，第56行，select的SOCK_TIMEOUT也很重要。如果你不给
    timeout值，那么select就是block的。第一次循环的时候，sock.accept很可能没听到连接，
    因此conns是空的。而block的select要等到至少有一个socket能读写才会返回。
    于是程序就死了。这里你也可以指定timeout为0。这就变成poll了。

4)  coroutine本质上还是单线程。读者可以这样修改程序: ::

         31        while  val:
         32            val = val.strip()
         33            val = yield ( ">>> " + val + '\r\n')
        -->     if username == 'shhgs' :
        -->            sleep(30)

    你会发现，如果shhgs输入了东西，EchoServer就会停上一段时间。从这也能看出，
    coroutine从本质上讲还是单线程的。所以，再强调一遍。使用coroutine之前，
    先想好了你的任务是不是适合用coroutine解决。

¶ 2:30 AM 0 Comments