Defining functions using the def
statement is a cornerstone of all programs.
The goal of this chapter is to present some more advanced
and unusual function definition and usage patterns. Topics include
default arguments, functions that take any number of arguments,
keyword-only arguments, annotations, and closures. In addition, some
tricky control flow and data passing problems involving callback
functions are addressed.
To write a function that accepts any number of positional arguments,
use a *
argument. For example:
def
avg
(
first
,
*
rest
):
return
(
first
+
sum
(
rest
))
/
(
1
+
len
(
rest
))
# Sample use
avg
(
1
,
2
)
# 1.5
avg
(
1
,
2
,
3
,
4
)
# 2.5
In this example, rest
is a tuple of all the extra positional arguments passed.
The code treats it as a sequence in performing subsequent calculations.
To accept any number of keyword arguments, use an argument that starts
with **
. For example:
import
html
def
make_element
(
name
,
value
,
**
attrs
):
keyvals
=
[
'
%s
="
%s
"'
%
item
for
item
in
attrs
.
items
()]
attr_str
=
''
.
join
(
keyvals
)
element
=
'<{name}{attrs}>{value}</{name}>'
.
format
(
name
=
name
,
attrs
=
attr_str
,
value
=
html
.
escape
(
value
))
return
element
# Example
# Creates '<item size="large" quantity="6">Albatross</item>'
make_element
(
'item'
,
'Albatross'
,
size
=
'large'
,
quantity
=
6
)
# Creates '<p><spam></p>'
make_element
(
'p'
,
'<spam>'
)
Here, attrs
is a dictionary that holds the passed keyword arguments
(if any).
If you want a function that can accept both any number of positional
and keyword-only arguments, use *
and **
together. For example:
def
anyargs
(
*
args
,
**
kwargs
):
(
args
)
# A tuple
(
kwargs
)
# A dict
With this function, all of the positional arguments are placed into a tuple args
,
and all of the keyword arguments are placed into a dictionary kwargs
.
A *
argument can only appear as the last positional argument in a function
definition. A **
argument can only appear as the last argument. A
subtle aspect of function definitions is that arguments can still appear
after a *
argument.
def
a
(
x
,
*
args
,
y
):
pass
def
b
(
x
,
*
args
,
y
,
**
kwargs
):
pass
Such arguments are known as keyword-only arguments, and are discussed further in Recipe 7.2.
This feature is easy to implement if you place the keyword arguments
after a *
argument or a single unnamed *
. For example:
def
recv
(
maxsize
,
*
,
block
):
'Receives a message'
pass
recv
(
1024
,
True
)
# TypeError
recv
(
1024
,
block
=
True
)
# Ok
This technique can also be used to specify keyword arguments for functions that accept a varying number of positional arguments. For example:
def
mininum
(
*
values
,
clip
=
None
):
m
=
min
(
values
)
if
clip
is
not
None
:
m
=
clip
if
clip
>
m
else
m
return
m
minimum
(
1
,
5
,
2
,
-
5
,
10
)
# Returns -5
minimum
(
1
,
5
,
2
,
-
5
,
10
,
clip
=
0
)
# Returns 0
Keyword-only arguments are often a good way to enforce greater code clarity when specifying optional function arguments. For example, consider a call like this:
msg
=
recv
(
1024
,
False
)
If someone is not intimately familiar with the workings of the recv()
, they may
have no idea what the False
argument means. On the other hand, it is much
clearer if the call is written like this:
msg
=
recv
(
1024
,
block
=
False
)
The use of keyword-only arguments is also often preferrable to tricks involving
**kwargs
, since they show up properly when the user asks for help:
>>>
help
(
recv
)
Help on function recv in module __main__:
recv(maxsize, *, block)
Receives a message
Keyword-only arguments also have utility in more advanced contexts. For
example, they can be used to inject arguments into functions that make
use of the *args
and **kwargs
convention for accepting all inputs.
See Recipe 9.11 for an example.
You’ve written a function, but would like to attach some additional information to the arguments so that others know more about how a function is supposed to be used.
Function argument annotations can be a useful way to give programmers hints about how a function is supposed to be used. For example, consider the following annotated function:
def
add
(
x
:
int
,
y
:
int
)
->
int
:
return
x
+
y
The Python interpreter does not attach any semantic meaning to the attached annotations. They are not type checks, nor do they make Python behave any differently than it did before. However, they might give useful hints to others reading the source code about what you had in mind. Third-party tools and frameworks might also attach semantic meaning to the annotations. They also appear in documentation:
>>>
help
(
add
)
Help on function add in module __main__:
add(x: int, y: int) -> int
>>>
Although you can attach any kind of object to a function as an annotation (e.g., numbers, strings, instances, etc.), classes or strings often seem to make the most sense.
Function annotations are merely stored in a function’s
__annotations__
attribute. For example:
>>>
add
.
__annotations__
{'y': <class 'int'>, 'return': <class 'int'>, 'x': <class 'int'>}
Although there are many potential uses of annotations, their primary utility is probably just documentation. Because Python doesn’t have type declarations, it can often be difficult to know what you’re supposed to pass into a function if you’re simply reading its source code in isolation. An annotation gives someone more of a hint.
See Recipe 9.20 for an advanced example showing how to use annotations to implement multiple dispatch (i.e., overloaded functions).
To return multiple values from a function, simply return a tuple. For example:
>>>
def
myfun
():
...
return
1
,
2
,
3
...
>>>
a
,
b
,
c
=
myfun
()
>>>
a
1
>>>
b
2
>>>
c
3
Although it looks like myfun()
returns multiple values, a tuple is
actually being created. It looks a bit peculiar, but it’s actually
the comma that forms a tuple, not the parentheses. For example:
>>>
a
=
(
1
,
2
)
# With parentheses
>>>
a
(1, 2)
>>>
b
=
1
,
2
# Without parentheses
>>>
b
(1, 2)
>>>
When calling functions that return a tuple, it is common to assign the result to multiple variables, as shown. This is simply tuple unpacking, as described in Recipe 1.1. The return value could also have been assigned to a single variable:
>>>
x
=
myfun
()
>>>
x
(
1
,
2
,
3
)
>>>
You want to define a function or method where one or more of the arguments are optional and have a default value.
On the surface, defining a function with optional arguments is easy—simply assign values in the definition and make sure that default arguments appear last. For example:
def
spam
(
a
,
b
=
42
):
(
a
,
b
)
spam
(
1
)
# Ok. a=1, b=42
spam
(
1
,
2
)
# Ok. a=1, b=2
If the default value is supposed to be a mutable container, such as a
list, set, or dictionary, use None
as the default and write code like
this:
# Using a list as a default value
def
spam
(
a
,
b
=
None
):
if
b
is
None
:
b
=
[]
...
If, instead of providing a default value, you want to write code that merely tests whether an optional argument was given an interesting value or not, use this idiom:
_no_value
=
object
()
def
spam
(
a
,
b
=
_no_value
):
if
b
is
_no_value
:
(
'No b value supplied'
)
...
Here’s how this function behaves:
>>>
spam
(
1
)
No b value supplied
>>>
spam
(
1
,
2
)
# b = 2
>>>
spam
(
1
,
None
)
# b = None
>>>
Carefully observe that there is a distinction between passing no
value at all and passing a value of None
.
Defining functions with default arguments is easy, but there is a bit more to it than meets the eye.
First, the values assigned as a default are bound only once at the time of function definition. Try this example to see it:
>>>
x
=
42
>>>
def
spam
(
a
,
b
=
x
):
...
(
a
,
b
)
...
>>>
spam
(
1
)
1 42
>>>
x
=
23
# Has no effect
>>>
spam
(
1
)
1 42
>>>
Notice how changing the variable x
(which was used as a default
value) has no effect whatsoever. This is because the default value was
fixed at function definition time.
Second, the values assigned as defaults should always be immutable
objects, such as None
, True
, False
, numbers, or strings.
Specifically, never write code like this:
def
spam
(
a
,
b
=
[]):
# NO!
...
If you do this, you can run into all sorts of trouble if the default value ever escapes the function and gets modified. Such changes will permanently alter the default value across future function calls. For example:
>>>
def
spam
(
a
,
b
=
[]):
...
(
b
)
...
return
b
...
>>>
x
=
spam
(
1
)
>>>
x
[]
>>>
x
.
append
(
99
)
>>>
x
.
append
(
'Yow!'
)
>>>
x
[99, 'Yow!']
>>>
spam
(
1
)
# Modified list gets returned!
[99, 'Yow!']
>>>
That’s probably not what you want. To avoid this, it’s better to
assign None
as a default and add a check inside the function for it,
as shown in the solution.
The use of the is
operator when testing for None
is a critical
part of this recipe. Sometimes people make this mistake:
def
spam
(
a
,
b
=
None
):
if
not
b
:
# NO! Use 'b is None' instead
b
=
[]
...
The problem here is that although None
evaluates to False
, many
other objects (e.g., zero-length strings, lists, tuples, dicts, etc.) do as well. Thus, the test just shown would falsely treat certain inputs
as missing. For example:
>>>
spam
(
1
)
# OK
>>>
x
=
[]
>>>
spam
(
1
,
x
)
# Silent error. x value overwritten by default
>>>
spam
(
1
,
0
)
# Silent error. 0 ignored
>>>
spam
(
1
,
''
)
# Silent error. '' ignored
>>>
The last part of this recipe is something that’s rather subtle—a function
that tests to see whether a value (any value) has been supplied to an
optional argument or not. The tricky part here is that you can’t use
a default value of None
, 0
, or False
to test for the presence of
a user-supplied argument (since all of these are perfectly valid values that
a user might supply). Thus, you need something else to test
against.
To solve this problem, you can create a unique private instance of
object
, as shown in the solution (the _no_value
variable). In the
function, you then check the identity of the supplied argument against
this special value to see if an argument was supplied or not. The
thinking here is that it would be extremely unlikely for a user to
pass the _no_value
instance in as an input value. Therefore, it
becomes a safe value to check against if you’re trying to determine whether
an argument was supplied or not.
The use of object()
might look rather unusual here. object
is a
class that serves as the common base class for almost all objects in
Python. You can create instances of object
, but they are wholly
uninteresting, as they have no notable methods nor any instance data
(because there is no underlying instance
dictionary, you can’t even set any attributes). About the only thing you can do is perform tests for
identity. This makes them useful as special values, as shown in the
solution.
You need to supply a short callback function for use with an operation
such as sort()
, but you don’t want to write a separate one-line function
using the def
statement. Instead, you’d like a shortcut that
allows you to specify the function “in line.”
Simple functions that do nothing more than evaluate an expression can
be replaced by a lambda
expression. For example:
>>>
add
=
lambda
x
,
y
:
x
+
y
>>>
add
(
2
,
3
)
5
>>>
add
(
'hello'
,
'world'
)
'helloworld'
>>>
The use of lambda
here is the same as having typed this:
>>>
def
add
(
x
,
y
):
...
return
x
+
y
...
>>>
add
(
2
,
3
)
5
>>>
Typically, lambda
is used in the context of some other operation, such as
sorting or a data reduction:
>>>
names
=
[
'David Beazley'
,
'Brian Jones'
,
...
'Raymond Hettinger'
,
'Ned Batchelder'
]
>>>
sorted
(
names
,
key
=
lambda
name
:
name
.
split
()[
-
1
]
.
lower
())
['Ned Batchelder', 'David Beazley', 'Raymond Hettinger', 'Brian Jones']
>>>
Although lambda
allows you to define a simple function, its use is highly
restricted. In particular, only a single expression can be specified, the
result of which is the return value. This means that no other language features, including
multiple statements, conditionals, iteration, and exception handling, can
be included.
You can quite happily write a lot of Python code without ever using lambda. However, you’ll occasionally encounter it in programs where someone is writing a lot of tiny functions that evaluate various expressions, or in programs that require users to supply callback functions.
You’ve defined an anonymous function using lambda
, but you also need
to capture the values of certain variables at the time of definition.
Consider the behavior of the following code:
>>>
x
=
10
>>>
a
=
lambda
y
:
x
+
y
>>>
x
=
20
>>>
b
=
lambda
y
:
x
+
y
>>>
Now ask yourself a question. What are the values of a(10)
and
b(10)
? If you think the results might be 20 and 30, you would be
wrong:
>>>
a
(
10
)
30
>>>
b
(
10
)
30
>>>
The problem here is that the value of x
used in the lambda
expression is a free variable that gets bound at runtime, not
definition time. Thus, the value of x
in the lambda expressions is
whatever the value of the x
variable happens to be at the time of
execution. For example:
>>>
x
=
15
>>>
a
(
10
)
25
>>>
x
=
3
>>>
a
(
10
)
13
>>>
If you want an anonymous function to capture a value at the point of definition and keep it, include the value as a default value, like this:
>>>
x
=
10
>>>
a
=
lambda
y
,
x
=
x
:
x
+
y
>>>
x
=
20
>>>
b
=
lambda
y
,
x
=
x
:
x
+
y
>>>
a
(
10
)
20
>>>
b
(
10
)
30
>>>
The problem addressed in this recipe is something that tends to come up in code that tries to be just a little bit too clever with the use of lambda functions. For example, creating a list of lambda expressions using a list comprehension or in a loop of some kind and expecting the lambda functions to remember the iteration variable at the time of definition. For example:
>>>
funcs
=
[
lambda
x
:
x
+
n
for
n
in
range
(
5
)]
>>>
for
f
in
funcs
:
...
(
f
(
0
))
...
4
4
4
4
4
>>>
Notice how all functions think that n
has the last value
during iteration. Now compare to the following:
>>>
funcs
=
[
lambda
x
,
n
=
n
:
x
+
n
for
n
in
range
(
5
)]
>>>
for
f
in
funcs
:
...
(
f
(
0
))
...
0
1
2
3
4
>>>
As you can see, the functions now capture the value of n
at the time of definition.
You have a callable that you would like to use with some other Python code, possibly as a callback function or handler, but it takes too many arguments and causes an exception when called.
If you need to reduce the number of arguments to a function, you
should use functools.partial()
. The partial()
function allows you
to assign fixed values to one or more of the arguments, thus reducing
the number of arguments that need to be supplied to subsequent calls.
To illustrate, suppose you have this function:
def
spam
(
a
,
b
,
c
,
d
):
(
a
,
b
,
c
,
d
)
Now consider the use of partial()
to fix certain argument values:
>>>
from
functools
import
partial
>>>
s1
=
partial
(
spam
,
1
)
# a = 1
>>>
s1
(
2
,
3
,
4
)
1 2 3 4
>>>
s1
(
4
,
5
,
6
)
1 4 5 6
>>>
s2
=
partial
(
spam
,
d
=
42
)
# d = 42
>>>
s2
(
1
,
2
,
3
)
1 2 3 42
>>>
s2
(
4
,
5
,
5
)
4 5 5 42
>>>
s3
=
partial
(
spam
,
1
,
2
,
d
=
42
)
# a = 1, b = 2, d = 42
>>>
s3
(
3
)
1 2 3 42
>>>
s3
(
4
)
1 2 4 42
>>>
s3
(
5
)
1 2 5 42
>>>
Observe that partial()
fixes the values for certain arguments and
returns a new callable as a result. This new callable accepts the
still unassigned arguments, combines them with the arguments given to
partial()
, and passes everything to the original function.
This recipe is really related to the problem of making seemingly incompatible bits of code work together. A series of examples will help illustrate.
As a first example, suppose you have a list of points represented as tuples of (x,y) coordinates. You could use the following function to compute the distance between two points:
points
=
[
(
1
,
2
),
(
3
,
4
),
(
5
,
6
),
(
7
,
8
)
]
import
math
def
distance
(
p1
,
p2
):
x1
,
y1
=
p1
x2
,
y2
=
p2
return
math
.
hypot
(
x2
-
x1
,
y2
-
y1
)
Now suppose you want to sort all of the points according to their
distance from some other point. The sort()
method of lists accepts
a key
argument that can be used to customize sorting, but it only
works with functions that take a single argument (thus, distance()
is not suitable). Here’s how you might use partial()
to fix it:
>>>
pt
=
(
4
,
3
)
>>>
points
.
sort
(
key
=
partial
(
distance
,
pt
))
>>>
points
[(3, 4), (1, 2), (5, 6), (7, 8)]
>>>
As an extension of this idea, partial()
can often be used to tweak the
argument signatures of callback functions used in other libraries. For
example, here’s a bit of code that uses multiprocessing
to asynchronously
compute a result which is handed to a callback function that accepts both
the result and an optional logging argument:
def
output_result
(
result
,
log
=
None
):
if
log
is
not
None
:
log
.
debug
(
'Got:
%r
'
,
result
)
# A sample function
def
add
(
x
,
y
):
return
x
+
y
if
__name__
==
'__main__'
:
import
logging
from
multiprocessing
import
Pool
from
functools
import
partial
logging
.
basicConfig
(
level
=
logging
.
DEBUG
)
log
=
logging
.
getLogger
(
'test'
)
p
=
Pool
()
p
.
apply_async
(
add
,
(
3
,
4
),
callback
=
partial
(
output_result
,
log
=
log
))
p
.
close
()
p
.
join
()
When supplying the callback function using apply_async()
, the extra
logging argument is given using partial()
. multiprocessing
is none the wiser
about all of this—it simply invokes the callback function with a single value.
As a similar example, consider the problem of writing network
servers. The socketserver
module makes it relatively easy. For example,
here is a simple echo server:
from
socketserver
import
StreamRequestHandler
,
TCPServer
class
EchoHandler
(
StreamRequestHandler
):
def
handle
(
self
):
for
line
in
self
.
rfile
:
self
.
wfile
.
write
(
b
'GOT:'
+
line
)
serv
=
TCPServer
((
''
,
15000
),
EchoHandler
)
serv
.
serve_forever
()
However, suppose you want to give the EchoHandler
class an
__init__()
method that accepts an additional configuration
argument. For example:
class
EchoHandler
(
StreamRequestHandler
):
# ack is added keyword-only argument. *args, **kwargs are
# any normal parameters supplied (which are passed on)
def
__init__
(
self
,
*
args
,
ack
,
**
kwargs
):
self
.
ack
=
ack
super
()
.
__init__
(
*
args
,
**
kwargs
)
def
handle
(
self
):
for
line
in
self
.
rfile
:
self
.
wfile
.
write
(
self
.
ack
+
line
)
If you make this change, you’ll find there is no longer an obvious way to plug
it into the TCPServer
class. In fact, you’ll find that the
code now starts generating exceptions like this:
Exception happened during processing of request from ('127.0.0.1', 59834) Traceback (most recent call last): ... TypeError: __init__() missing 1 required keyword-only argument: 'ack'
At first glance, it seems impossible to fix this code, short of
modifying the source code to socketserver
or coming up with some
kind of weird workaround. However, it’s easy to resolve using partial()
—just use it to supply the value of the ack
argument, like
this:
from
functools
import
partial
serv
=
TCPServer
((
''
,
15000
),
partial
(
EchoHandler
,
ack
=
b
'RECEIVED:'
))
serv
.
serve_forever
()
In this example, the specification of the ack
argument in the
__init__()
method might look a little
funny, but it’s being specified as a keyword-only argument. This is
discussed further in Recipe 7.2.
The functionality of partial()
is sometimes replaced with a lambda
expression.
For example, the previous examples might use statements such as this:
points
.
sort
(
key
=
lambda
p
:
distance
(
pt
,
p
))
p
.
apply_async
(
add
,
(
3
,
4
),
callback
=
lambda
result
:
output_result
(
result
,
log
))
serv
=
TCPServer
((
''
,
15000
),
lambda
*
args
,
**
kwargs
:
EchoHandler
(
*
args
,
ack
=
b
'RECEIVED:'
,
**
kwargs
))
This code works, but it’s more verbose and potentially a lot more confusing
to someone reading it. Using partial()
is a bit more explicit about
your intentions (supplying values for some of the arguments).)
You have a class that only defines a single method besides __init__()
.
However, to simplify your code, you would much rather just have a simple function.
In many cases, single-method classes can be turned into functions using closures. Consider, as an example, the following class, which allows a user to fetch URLs using a kind of templating scheme.
from
urllib.request
import
urlopen
class
UrlTemplate
:
def
__init__
(
self
,
template
):
self
.
template
=
template
def
open
(
self
,
**
kwargs
):
return
urlopen
(
self
.
template
.
format_map
(
kwargs
))
# Example use. Download stock data from yahoo
yahoo
=
UrlTemplate
(
'http://finance.yahoo.com/d/quotes.csv?s={names}&f={fields}'
)
for
line
in
yahoo
.
open
(
names
=
'IBM,AAPL,FB'
,
fields
=
'sl1c1v'
):
(
line
.
decode
(
'utf-8'
))
The class could be replaced with a much simpler function:
def
urltemplate
(
template
):
def
opener
(
**
kwargs
):
return
urlopen
(
template
.
format_map
(
kwargs
))
return
opener
# Example use
yahoo
=
urltemplate
(
'http://finance.yahoo.com/d/quotes.csv?s={names}&f={fields}'
)
for
line
in
yahoo
(
names
=
'IBM,AAPL,FB'
,
fields
=
'sl1c1v'
):
(
line
.
decode
(
'utf-8'
))
In many cases, the only reason you might have a single-method class is to store
additional state for use in the method. For example, the only purpose of the
UrlTemplate
class is to hold the template
value someplace so that it can be used
in the open()
method.
Using an inner function or closure, as shown in the solution, is often more elegant. Simply stated, a closure is just a function, but with an extra environment of
the variables that are used inside the function. A key feature of a closure is that
it remembers the environment in which it was defined. Thus, in the solution, the
opener()
function remembers the value of the template
argument, and uses it
in subsequent calls.
Whenever you’re writing code and you encounter the problem of attaching additional state to a function, think closures. They are often a more minimal and elegant solution than the alternative of turning your function into a full-fledged class.
You’re writing code that relies on the use of callback functions (e.g., event handlers, completion callbacks, etc.), but you want to have the callback function carry extra state for use inside the callback function.
This recipe pertains to the use of callback functions that are found in many libraries and frameworks—especially those related to asynchronous processing. To illustrate and for the purposes of testing, define the following function, which invokes a callback:
def
apply_async
(
func
,
args
,
*
,
callback
):
# Compute the result
result
=
func
(
*
args
)
# Invoke the callback with the result
callback
(
result
)
In reality, such code might do all sorts of advanced processing involving threads, processes, and timers, but that’s not the main focus here. Instead, we’re simply focused on the invocation of the callback. Here’s an example that shows how the preceding code gets used:
>>>
def
print_result
(
result
):
...
(
'Got:'
,
result
)
...
>>>
def
add
(
x
,
y
):
...
return
x
+
y
...
>>>
apply_async
(
add
,
(
2
,
3
),
callback
=
print_result
)
Got: 5
>>>
apply_async
(
add
,
(
'hello'
,
'world'
),
callback
=
print_result
)
Got: helloworld
>>>
As you will notice, the print_result()
function only accepts a single
argument, which is the result. No other information is passed in.
This lack of information can sometimes present problems when you want
the callback to interact with other variables or parts of the
environment.
One way to carry extra information in a callback is to use a bound-method instead of a simple function. For example, this class keeps an internal sequence number that is incremented every time a result is received:
class
ResultHandler
:
def
__init__
(
self
):
self
.
sequence
=
0
def
handler
(
self
,
result
):
self
.
sequence
+=
1
(
'[{}] Got: {}'
.
format
(
self
.
sequence
,
result
))
To use this class, you would create an instance and use the bound
method handler
as the callback:
>>>
r
=
ResultHandler
()
>>>
apply_async
(
add
,
(
2
,
3
),
callback
=
r
.
handler
)
[1] Got: 5
>>>
apply_async
(
add
,
(
'hello'
,
'world'
),
callback
=
r
.
handler
)
[2] Got: helloworld
>>>
As an alternative to a class, you can also use a closure to capture state. For example:
def
make_handler
():
sequence
=
0
def
handler
(
result
):
nonlocal
sequence
sequence
+=
1
(
'[{}] Got: {}'
.
format
(
sequence
,
result
))
return
handler
Here is an example of this variant:
>>>
handler
=
make_handler
()
>>>
apply_async
(
add
,
(
2
,
3
),
callback
=
handler
)
[1] Got: 5
>>>
apply_async
(
add
,
(
'hello'
,
'world'
),
callback
=
handler
)
[2] Got: helloworld
>>>
As yet another variation on this theme, you can sometimes use a coroutine to accomplish the same thing:
def
make_handler
():
sequence
=
0
while
True
:
result
=
yield
sequence
+=
1
(
'[{}] Got: {}'
.
format
(
sequence
,
result
))
For a coroutine, you would use its send()
method as the callback, like this:
>>>
handler
=
make_handler
()
>>>
next
(
handler
)
# Advance to the yield
>>>
apply_async
(
add
,
(
2
,
3
),
callback
=
handler
.
send
)
[1] Got: 5
>>>
apply_async
(
add
,
(
'hello'
,
'world'
),
callback
=
handler
.
send
)
[2] Got: helloworld
>>>
Last, but not least, you can also carry state into a callback using an extra argument and partial function application. For example:
>>>
class
SequenceNo
:
...
def
__init__
(
self
):
...
self
.
sequence
=
0
...
>>>
def
handler
(
result
,
seq
):
...
seq
.
sequence
+=
1
...
(
'[{}] Got: {}'
.
format
(
seq
.
sequence
,
result
))
...
>>>
seq
=
SequenceNo
()
>>>
from
functools
import
partial
>>>
apply_async
(
add
,
(
2
,
3
),
callback
=
partial
(
handler
,
seq
=
seq
))
[
1
]
Got
:
5
>>>
apply_async
(
add
,
(
'hello'
,
'world'
),
callback
=
partial
(
handler
,
seq
=
seq
))
[
2
]
Got
:
helloworld
>>>
Software based on callback functions often runs the risk of turning into a huge tangled mess. Part of the issue is that the callback function is often disconnected from the code that made the initial request leading to callback execution. Thus, the execution environment between making the request and handling the result is effectively lost. If you want the callback function to continue with a procedure involving multiple steps, you have to figure out how to save and restore the associated state.
There are really two main approaches that are useful for capturing and carrying state. You can carry it around on an instance (attached to a bound method perhaps) or you can carry it around in a closure (an inner function). Of the two techniques, closures are perhaps a bit more lightweight and natural in that they are simply built from functions. They also automatically capture all of the variables being used. Thus, it frees you from having to worry about the exact state needs to be stored (it’s determined automatically from your code).
If using closures, you need to pay careful attention to mutable variables.
In the solution, the nonlocal
declaration is used to indicate that
the sequence
variable is being modified from within the callback. Without
this declaration, you’ll get an error.
The use of a coroutine as a callback handler is interesting in that it
is closely related to the closure approach. In some sense, it’s even
cleaner, since there is just a single function. Moreover, variables
can be freely modified without worrying about nonlocal
declarations. The
potential downside is that coroutines don’t tend to be as well understood as other
parts of Python. There are also a few tricky bits such as the need to
call next()
on a coroutine prior to using it. That’s something that
could be easy to forget in practice. Nevertheless, coroutines have other
potential uses here, such as the definition of an inlined callback (covered in
the next recipe).
The last technique involving partial()
is useful if all you need to do
is pass extra values into a callback. Instead of using partial()
, you’ll
sometimes see the same thing accomplished with the use of a lambda
:
>>>
apply_async
(
add
,
(
2
,
3
),
callback
=
lambda
r
:
handler
(
r
,
seq
))
[1] Got: 5
>>>
For more examples, see Recipe 7.8, which shows how to use partial()
to change argument signatures.
You’re writing code that uses callback functions, but you’re concerned about the proliferation of small functions and mind boggling control flow. You would like some way to make the code look more like a normal sequence of procedural steps.
Callback functions can be inlined into a function using generators and coroutines. To illustrate, suppose you have a function that performs work and invokes a callback as follows (see Recipe 7.10):
def
apply_async
(
func
,
args
,
*
,
callback
):
# Compute the result
result
=
func
(
*
args
)
# Invoke the callback with the result
callback
(
result
)
Now take a look at the following supporting code, which involves an Async
class and an
inlined_async
decorator:
from
queue
import
Queue
from
functools
import
wraps
class
Async
:
def
__init__
(
self
,
func
,
args
):
self
.
func
=
func
self
.
args
=
args
def
inlined_async
(
func
):
@wraps
(
func
)
def
wrapper
(
*
args
):
f
=
func
(
*
args
)
result_queue
=
Queue
()
result_queue
.
put
(
None
)
while
True
:
result
=
result_queue
.
get
()
try
:
a
=
f
.
send
(
result
)
apply_async
(
a
.
func
,
a
.
args
,
callback
=
result_queue
.
put
)
except
StopIteration
:
break
return
wrapper
These two fragments of code will allow you to inline the callback steps
using yield
statements. For example:
def
add
(
x
,
y
):
return
x
+
y
@inlined_async
def
test
():
r
=
yield
Async
(
add
,
(
2
,
3
))
(
r
)
r
=
yield
Async
(
add
,
(
'hello'
,
'world'
))
(
r
)
for
n
in
range
(
10
):
r
=
yield
Async
(
add
,
(
n
,
n
))
(
r
)
(
'Goodbye'
)
If you call test()
, you’ll get output like this:
5 helloworld 0 2 4 6 8 10 12 14 16 18 Goodbye
Aside from the special decorator and use of yield
, you will notice that
no callback functions appear anywhere (except behind the scenes).
This recipe will really test your knowledge of callback functions, generators, and control flow.
First, in code involving callbacks, the whole point is that the
current calculation will suspend and resume at some later point
in time (e.g., asynchronously). When the calculation resumes, the
callback will get executed to continue the processing. The
apply_async()
function illustrates the essential parts of executing
the callback, although in reality it might be much more complicated
(involving threads, processes, event handlers, etc.).
The idea that a calculation will suspend and resume naturally maps to
the execution model of a generator function. Specifically, the yield
operation makes a generator function emit a value and suspend. Subsequent
calls to the __next__()
or send()
method of a generator will make it
start again.
With this in mind, the core of this recipe is found in the
inline_async()
decorator function. The key idea is that the
decorator will step the generator function through all of its
yield
statements, one at a time. To do this, a result queue is
created and initially populated with a value of None
. A loop is
then initiated in which a result is popped off the queue and sent into
the generator. This advances to the next yield, at which point an
instance of Async
is received. The loop then looks at the function
and arguments, and initiates the asynchronous calculation
apply_async()
. However, the sneakiest part of this calculation is
that instead of using a normal callback function, the callback is set
to the queue put()
method.
At this point, it is left somewhat open as to precisely what happens.
The main loop immediately goes back to the top and simply executes a
get()
operation on the queue. If data is present, it must be the
result placed there by the put()
callback. If nothing is there, the
operation blocks, waiting for a result to arrive at some future time. How that might happen depends on the precise implementation
of the apply_async()
function.
If you’re doubtful that anything this crazy would work, you can try it with the multiprocessing library and have async operations executed in separate processes:
if
__name__
==
'__main__'
:
import
multiprocessing
pool
=
multiprocessing
.
Pool
()
apply_async
=
pool
.
apply_async
# Run the test function
test
()
Indeed, you’ll find that it works, but unraveling the control flow might require more coffee.
Hiding tricky control flow behind generator functions is found
elsewhere in the standard library and third-party packages. For
example, the @contextmanager
decorator in the contextlib
performs
a similar mind-bending trick that glues the entry and exit from a
context manager together across a yield
statement. The popular
Twisted package has inlined callbacks that
are also similar.
You would like to extend a closure with functions that allow the inner variables to be accessed and modified.
Normally, the inner variables of a closure are completely hidden to the outside world. However, you can provide access by writing accessor functions and attaching them to the closure as function attributes. For example:
def
sample
():
n
=
0
# Closure function
def
func
():
(
'n='
,
n
)
# Accessor methods for n
def
get_n
():
return
n
def
set_n
(
value
):
nonlocal
n
n
=
value
# Attach as function attributes
func
.
get_n
=
get_n
func
.
set_n
=
set_n
return
func
Here is an example of using this code:
>>>
f
=
sample
()
>>>
f
()
n= 0
>>>
f
.
set_n
(
10
)
>>>
f
()
n= 10
>>>
f
.
get_n
()
10
>>>
There are two main features that make this recipe work. First,
nonlocal
declarations make it possible to write functions that
change inner variables. Second, function attributes allow the
accessor methods to be attached to the closure function in a
straightforward manner where they work a lot like instance methods
(even though no class is involved).
A slight extension to this recipe can be made to have closures emulate instances of a class. All you need to do is copy the inner functions over to the dictionary of an instance and return it. For example:
import
sys
class
ClosureInstance
:
def
__init__
(
self
,
locals
=
None
):
if
locals
is
None
:
locals
=
sys
.
_getframe
(
1
)
.
f_locals
# Update instance dictionary with callables
self
.
__dict__
.
update
((
key
,
value
)
for
key
,
value
in
locals
.
items
()
if
callable
(
value
)
)
# Redirect special methods
def
__len__
(
self
):
return
self
.
__dict__
[
'__len__'
]()
# Example use
def
Stack
():
items
=
[]
def
push
(
item
):
items
.
append
(
item
)
def
pop
():
return
items
.
pop
()
def
__len__
():
return
len
(
items
)
return
ClosureInstance
()
Here’s an interactive session to show that it actually works:
>>>
s
=
Stack
()
>>>
s
<__main__.ClosureInstance object at 0x10069ed10>
>>>
s
.
push
(
10
)
>>>
s
.
push
(
20
)
>>>
s
.
push
(
'Hello'
)
>>>
len
(
s
)
3
>>>
s
.
pop
()
'Hello'
>>>
s
.
pop
()
20
>>>
s
.
pop
()
10
>>>
Interestingly, this code runs a bit faster than using a normal class definition. For example, you might be inclined to test the performance against a class like this:
class
Stack2
:
def
__init__
(
self
):
self
.
items
=
[]
def
push
(
self
,
item
):
self
.
items
.
append
(
item
)
def
pop
(
self
):
return
self
.
items
.
pop
()
def
__len__
(
self
):
return
len
(
self
.
items
)
If you do, you’ll get results similar to the following:
>>>
from
timeit
import
timeit
>>>
# Test involving closures
>>>
s
=
Stack
()
>>>
timeit
(
's.push(1);s.pop()'
,
'from __main__ import s'
)
0.9874754269840196
>>>
# Test involving a class
>>>
s
=
Stack2
()
>>>
timeit
(
's.push(1);s.pop()'
,
'from __main__ import s'
)
1.0707052160287276
>>>
As shown, the closure version runs about 8% faster. Most of that
is coming from streamlined access to the instance variables.
Closures are faster because there’s no extra self
variable involved.
Raymond Hettinger has devised an even more diabolical variant of this
idea.
However, should you be inclined to do something like this in your
code, be aware that it’s still a rather weird substitute for a real
class. For example, major features such as inheritance, properties,
descriptors, or class methods don’t work. You also have to play some
tricks to get special methods to work (e.g., see the implementation of
__len__()
in ClosureInstance
).
Lastly, you’ll run the risk of confusing people who read your code and wonder why it doesn’t look anything like a normal class definition (of course, they’ll also wonder why it’s faster). Nevertheless, it’s an interesting example of what can be done by providing access to the internals of a closure.
In the big picture, adding methods to closures might have more utility in settings where you want to do things like reset the internal state, flush buffers, clear caches, or have some kind of feedback mechanism.