9. A tour of std libs

Text processing:
 string, re, difflib, textwrap, unicodedata, stringprep, readline, rlcompleter
Binary data services:
 struct, codecs
Data types:datetime, calendar, collections, collections.abc, heapq, bisect, array, weakref, types, copy, pprint, reprlib, enum
Numeric and Math:
 numbers, math, cmath, decimal, fractions, random, statistics
Functional programming:
 itertools, functools, operator
File and directory access:
 pathlib, os.path, fileinput, stat, filecmp, tempfile, glob, fnmatch, linecache, shutil, macpath
Data peristence:
 pickle, copyreg, shelve, marshal, dbm, sqlite3
Data compression:
 zlib, gzip, bz2, lzma, zipfile, tarfile
File formats:csv, configparser, netrc, xdrlib, plistlib
Cryptographic:hashlib, hmac
OS:os, io, time, argparse, getopt, logging, getpass, curses, platform, errno, ctypes
Concurrent:threading, multiprocessing, concurrent.futures, subprocess, sched, queue, dummy_threading
Networking:socket, ssl, select, selectors, asyncio, asyncore, asynchat, signal, mmap
Internet:email, json, mailcap, mailbox, mimetypes, base64, binhex, binascii, quopri, uu
Structured markup:
 html, xml.etree, xml.dom, xml.sax
Internet protocols:
 webbrowser, cgi, wsgiref, urllib.request, urllib.response, urllib.parse, urllib.error, urllib.robotparser, http, fptlib, poplib, imaplib, nntplib, smtplib, smtpd, telnetlib, uuid, socketserver, http.server, http.cookies, http.cookiejar, xmlrpc, ipaddress
Multimedia:audioop, aifc, sunau, wave, chunk, colorsys, imghdr, sndhdr, ossaudiodev
I18n:gettext, locale
Porgram frameworks:
 turtle, cmd, shlex
GUI:tkinter
Development tools:
 pydoc, doctest, unittest, unittest.mock, test
Debugging and profiling:
 bdb, faulthandler, pdb, profile, timeit, trace, tracemalloc
Packaing and distribution:
 distutils, ensurepip, venv
Runtime:sys, sysconfig, builtins, __main__, warnings, contextlib, abc, atexit, traceback, __future__, gc, inspect, site, fpectl
Custom python interpreters:
 code, codeop
Importing:zipimport, pkgutil, modulefinder, runpy, importlib
Language:parser, ast, symtable, symbol, token, keyword, tokenize, tabnanny, pyclbr, py_compile, compileall, dis, pickletools
Misc:formatter
MS:msilib, msvcrt, winreg, winsound
Unix:posix, pwd, spwd, grp, crypt, termios, tty, pty, fcntl, pipes, resource, nis, syslog
Superseded:optparse, imp

Where you can find those standard libs ?

sys.path

/usr/lib/python2.7 /usr/lib/python3.4

9.1. datetime - Basic date and time types

There are two kinds of date and time objects: “naive” and “aware”.

An aware object has sufficient knownledge of applicable algorithmic and political time adjustments, such as time zone and daylight saving time information, to locate oitself relative to other aware objects.

Availabe types

  • datetime.date
  • datetime.time
  • datetime.dateteime
  • datetime.timedelta
  • datetime.tzinfo: An abstract base class for time zone information objects.
  • datetime.timezone: A class that implements the tzinfo abc as a fixed offset from the UTC.

Objects of these types are immutable.

Objects of the date type are always naive.

Naive:

>>> import datetime
>>> now = datetime.datetime.now()                   # Date and Time
>>> now
datetime.datetime(2014, 10, 21, 7, 32, 44, 964045)
>>> now.date()
datetime.date(2014, 10, 21)
>>> now.time()
datetime.time(7, 32, 44, 964045)

>>> day = datetime.date(2012, 12, 12)               # Date
>>> day.year, day.month, day.day, day.isoweekday(), day.isoformat()
(2012, 12, 12, 3, '2012-12-12')
>>> tm = datetime.time(19, 30, 00)                  # Time
>>> tm.hour, tm.minute, tm.second, tm.isoformat()
(19, 30, 0, '19:30:00')

>>> past = dateteime.datetime.now() - now           # Time delta
>>> past, str(past), past.total_seconds()
(datetime.timedelta(0, 615, 431954), '0:10:15.431954', 615.431954)

>>> day.strftime('%b %d %Y %a')                     # Format
'Dec 12 2012 Wed'
>>> timestr = '[Sun Oct 19 08:10:01 2014]'          # Parse
>>> datetime.datetime.strptime(timestr, '[%a %b %d %H:%M:%S %Y]')
datetime.datetime(2014, 10, 19, 8, 10, 1)

Aware:

>>> beijing = timezone(timedelta(hours=8), 'Asia/Shanghai')
>>> finland = timezone(timedelta(hours=2), 'Europe/Helsinki')

>>> t1 = datetime.datetime(2014, 10, 6, 15, 0, 0, tzinfo=beijing)
>>> str(t1), t1.tzname()
('2014-10-06 15:00:00+08:00', 'Asia/Shanghai')

>>> t2 = t1.astimezone(finland)
>>> str(t2), t2.tzname()
('2014-10-06 09:00:00+02:00', 'Europe/Helsinki')

9.2. collections - Container datetypes

namedtuple() factory function for creating tuple subclasses with named fields
deque list-like container with fast appends and pops on either end
ChainMap dict-like class for creating a single view of multiple mappings
Counter dict subclass for counting hashable objects
OrderedDict dict subclass that remembers the order entries were added
defaultdict dict subclass that class a factory function to supply missing values

namedtuple():

>>> from collections import namedtuple
>>> Person = namedtuple('Person', ['name', 'age', 'gender'])
>>> bob = Person('Bob', 30, 'male')
>>> jane = Person(name='Jane', gender='female', age=29)
>>> bob, bob[2]
(Persion(name='Bob', age=30, gender='male'), 'male')
>>> type(jane), jane.age
(<class '__main__.Persion'>, 29)

>>> bob._asdict()
OrderedDict([('name', 'Bob'), ('age', 30), ('gender', 'male')])
>>> bob._replace(name='Tom', age=52)
Persion(name='Tom', age=52, gender='male')
>>> class Person(namedtuple('Person', ['name', 'age', 'gender'])):
...   __slots__ = ()
...   @property
...   def lastname(self):
...     return self.name.split()[-1]
...
>>> john = Person('John Lennon', 75, 'male')
>>> john.lastname
'Lennon'

9.3. deque: double-ended queue

Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.

Operation list Big O deque Big O
Add in the head l.insert(0) O(n) d.appendleft O(1)
Add in the tail l.append() O(1) d.append O(1)
Del in the head l.pop(0) O(n) d.popleft O(1)
Del in the tail l.pop() O(1) d.pop O(1)
def timing(initial, setup, testing, times=3):
    print('Testing the following code for {} times ...\n{}'.format(times, testing.strip()))
    namespace = {}
    exec(initial, namespace)

    av = 0
    for i in range(times):
        exec(setup, namespace)

        begin = time.time()
        exec(testing, namespace)
        cost = time.time() - begin

        print('{}: {}'.format(i + 1, cost))
        av += cost
    print('av: {}\n'.format(av / times))
>>> timing('data = list(range(10**5))', 'l = []', '''
... for i in data:
...   l.insert(0, i)    # O(n)
... ''')
Testing the following code for 3 times ...
for i in data:
  l.insert(0, i)
1: 3.9300358295440674
2: 4.109051704406738
3: 4.1024134159088135
av: 4.04716698328654
$ python timing.py
Testing the following code for 3 times ...
for i in data:
  l.insert(0, i)        # O(N)
av: 4.171613295873006

Testing the following code for 3 times ...
for i in data:
  l.append(i)           # O(1)
av: 0.012801011403401693

Testing the following code for 3 times ...
for i in data:
  d.appendleft(i)       # O(1)
av: 0.014629840850830078

Testing the following code for 3 times ...
for i in data:
  d.append(i)           # O(1)
av: 0.014315048853556315

Testing the following code for 3 times ...
for _ in data:
  l.pop(0)              # O(n)
av: 1.6093259652455647

Testing the following code for 3 times ...
for _ in data:
  l.pop()               # O(1)
av: 0.014542102813720703

Testing the following code for 3 times ...
for _ in data:
  d.popleft()           # O(1)
av: 0.011040687561035156

Testing the following code for 3 times ...
for _ in data:
  d.pop()               # O(1)
av: 0.011482477188110352

See Time complexity

Chainmap

A ChainMap class is provided for quickly linking a number of mappings so they can be treated as a single unit.

Example of simulating Python’s internal lookup chain:

import builtins
pylookup = ChainMap(locals(), globals(), vars(builtins))

Counter

A counter tool is provided to support convenient and rapid tallies. For example:

>>> from collections import Counter
>>> words = ['red', 'blue', 'red', 'green', 'blue', 'blue']
>>> cnt = Counter(words)
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})
>>> cnt.most_common(2)
[('blue', 3), ('red', 2)]

OrderedDict

Ordered dictionaries are just like regular dictionaries but they remember the order that items were inserted. When iterating over an ordered dictionary, the items are returned in the order their keys were first added.

>>> # regular unsorted dictionary
>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}

>>> # dictionary sorted by key
>>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])

>>> assert list(o.keys()) == sorted(d.keys())

defaultdict

Dictionary with default value:

>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
...     d[k].append(v)
...
>>> list(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

This technique is simpler and faster than an equivalent technique using dict.setdefault():

>>> d = {}
>>> for k, v in s:
...     d.setdefault(k, []).append(v)
...
>>> list(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

9.4. heapq - Heap queue algorithm

>>> def heapsort(iterable):
...     'Equivalent to sorted(iterable)'
...     h = []
...     for value in iterable:
...         heappush(h, value)
...     return [heappop(h) for i in range(len(h))]
...
>>> heapsort([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

This module provides an implementation of the min heap queue algorithm, also known as the priority queue algorithm.

9.5. bisect - Array bisection algorithm

import bisect
import random

# Use a constant see to ensure that we see
# the same pseudo-random numbers each time
# we run the loop.
random.seed(1)

# Generate 20 random numbers and
# insert them into a list in sorted
# order.
l = []
for i in range(1, 20):
    r = random.randint(1, 100)
    position = bisect.bisect(l, r)
    bisect.insort(l, r)
    print('{:=2} {:=2} {}'.format(r, position, l))

9.6. array - Efficient arrays of numeric values

array.array is useful when you need a homogeneous C array of data for reasons other than doing math.

array('l')
array('u', 'hello \u2641')
array('l', [1, 2, 3, 4, 5])
array('d', [1.0, 2.0, 3.14])

See also: bytearray vs array

9.7. weakref - Weak references

A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else. However, until the object is actually destroyed the weak reference may return the object even if there are no strong references to it.

A primary use for weak references is to implement caches or mappings holding large objects, where it’s desired that a large object not be kept alive solely because it appears in a cache or mapping.

Not all objects can be weakly referenced; those objects which can include class instances, functions written in Python (but not in C), instance methods, sets, frozensets, some file objects, generators, type objects, sockets, arrays, deques, regular expression pattern objects, and code objects.

Several built-in types such as list and dict do not directly support weak references but can add support through subclassing:

class Dict(dict):
    pass

obj = Dict(red=1, green=2, blue=3)   # this object is weak referenceable

Other built-in types such as tuple and int do not support weak references even when subclassed (This is an implementation detail and may be different across various Python implementations.).

Extension types can easily be made to support weak references; see Weak Reference Support.

weakref.ref

import weakref

class BigObject:
    def __del__(self):
        print('Deleting {}'.format(self))

o = BigObject()
r = weakref.ref(o)

print('obj: {}'.format(o))
print('ref: {}'.format(r))
print('r(): {}'.format(r()))

del o
print('r(): {}'.format(r()))
obj: <__main__.BigObject instance at 0x1036f43f8>
ref: <weakref at 0x1036e5c58; to 'instance' at 0x1036f43f8>
r(): <__main__.BigObject instance at 0x1036f43f8>
Deleting <__main__.BigObject instance at 0x1036f43f8>
r(): None
...
def callback(ref):
    print('Callback {}'.format(ref))
...
r = weakref.ref(o, callback)
...
obj: <__main__.BigObject instance at 0x10237a4d0>
ref: <weakref at 0x10236bc58; to 'instance' at 0x10237a4d0>
r(): <__main__.BigObject instance at 0x10237a4d0>
Callback <weakref at 0x10236bc58; dead>
Deleting <__main__.BigObject instance at 0x10237a4d0>
r(): None

weakref.proxy

p = weakref.proxy(o)
try:
  p.attr
except ReferenceError:
  ...

weakref.WeakValueDictionary #TODO

9.8. types - Dynamic type creation and names for built-in types

>>> import types
>>> type(lambda : ...) is types.FunctionType
True

9.9. copy - Shallow and deep copy operations

>>> import copy
>>> class Object: pass
...
>>> l1 = [1, [2, Object()]]
>>> l2 = l1
>>> l3 = copy.copy(l1)
>>> l4 = copy.deepcopy(l1)

>>> l3[0] = 3
>>> l3[1][0] = 4

>>> l1
[1, [4, <__main__.Object object at 0x107d2a278>]]
>>> l2
[1, [4, <__main__.Object object at 0x107d2a278>]]
>>> l3
[3, [4, <__main__.Object object at 0x107d2a278>]]
>>> l4
[1, [2, <__main__.Object object at 0x107d2a978>]]

9.10. os.path - Common pathname manipulations

>>> import os.path
>>> os.path.sep, os.path.extsep, os.path.pardir, os.path.curdir
('/', '.', '..', '.')

>>> os.path.dirname('/one/two/three'), os.path.basename('/one/two/three')
('/one/two', 'three')
>>> os.path.join('one', 'two', 'three')
'one/two/three'
>>> os.path.splitext('/path/file.ext')
('/path/file', '.ext')

>>> os.path.expanduser('~/file.txt')
'/Users/huanghao/file.txt'            # Mac

>>> os.getcwd()
'/tmp'
>>> os.path.abspath('file.txt')
'/tmp/file.txt'
>>> os.path.realpath('file.txt')
'/tmp/file.txt'

>>> os.path.isdir('/tmp'), os.path.isfile('/etc/hosts'), os.path.islink('/var'), os.path.exists('/dev'), os.path.ismount('/dev')
(True, True, True, True, True)

9.11. tempfile - Generate temporary files and directories

>>> import tempfile

# create a temporary file and write some data to it
>>> fp = tempfile.TemporaryFile()
>>> fp.write(b'Hello world!')
# read data from file
>>> fp.seek(0)
>>> fp.read()
b'Hello world!'
# close the file, it will be removed
>>> fp.close()

# create a temporary file using a context manager
>>> with tempfile.TemporaryFile() as fp:
...     fp.write(b'Hello world!')
...     fp.seek(0)
...     fp.read()
b'Hello world!'
>>>
# file is now closed and removed

# create a temporary directory using the context manager
>>> with tempfile.TemporaryDirectory() as tmpdirname:
...     print('created temporary directory', tmpdirname)
>>>
# directory and contents have been removed

9.12. glob - Unix style pathname pattern expansion

>>> import glob
>>> glob.glob('./[0-9].*')
['./1.gif', './2.txt']
>>> glob.glob('*.gif')
['1.gif', 'card.gif']
>>> glob.glob('?.gif')
['1.gif']

>>> glob.glob('*.gif')
['card.gif']
>>> glob.glob('.c*')
['.card.gif']

9.13. shutil - High-level file operations

  • copyfileobj
  • copyfile
  • copymode
  • copystat
  • copy
  • copy2: Identical to copy() except that copy2() also attempts to preserve all file metadata.
  • copytree
  • rmtree
  • move
  • disk_usage
  • chown
  • which
  • make_archive
  • unpack_archive
  • get_terminal_size
>>> shutil.disk_usage(os.path.expanduser('~'))
usage(total=120473067520, used=51554127872, free=68656795648)

>>> shutil.get_terminal_size()
os.terminal_size(columns=130, lines=34)

>>> shutil.which('python3')
'/usr/local/bin/python3'

>>> archive_name = os.path.expanduser(os.path.join('~', 'myarchive'))
>>> root_dir = os.path.expanduser(os.path.join('~', '.ssh'))
>>> shutil.make_archive(archive_name, 'gztar', root_dir)
'/Users/tarek/myarchive.tar.gz'

The resulting archive contains:

$ tar -tzvf /Users/tarek/myarchive.tar.gz
drwx------ tarek/staff       0 2010-02-01 16:23:40 ./
-rw-r--r-- tarek/staff     609 2008-06-09 13:26:54 ./authorized_keys
-rwxr-xr-x tarek/staff      65 2008-06-09 13:26:54 ./config
-rwx------ tarek/staff     668 2008-06-09 13:26:54 ./id_dsa
-rwxr-xr-x tarek/staff     609 2008-06-09 13:26:54 ./id_dsa.pub
-rw------- tarek/staff    1675 2008-06-09 13:26:54 ./id_rsa
-rw-r--r-- tarek/staff     397 2008-06-09 13:26:54 ./id_rsa.pub
-rw-r--r-- tarek/staff   37192 2010-02-06 18:23:10 ./known_hosts

9.14. netrc - netrc file processing

::
$ cat ~/.netrc default login huanghao password 123456 machine company.com login hh password xxx
>>> import netrc
>>> import os
>>> rc = netrc.netrc(os.path.expanduser('~/.netrc'))

>>> rc.hosts
{'default': ('huanghao', None, '123456'), 'company.com': ('hh', None, 'xxx')}

>>> rc.authenticators('company.com')
('hh', None, 'xxx')

>>> rc.authenticators('home.me')
('huanghao', None, '123456')

See also Manual netrc

9.15. hashlib - Secure hashes and message digests

>>> import hashlib
>>> m = hashlib.md5()
>>> m.update(b"Nobody inspects")
>>> m.update(b" the spammish repetition")
>>> m.digest()
b'\xbbd\x9c\x83\xdd\x1e\xa5\xc9\xd9\xde\xc9\xa1\x8d\xf0\xff\xe9'
>>> m.digest_size
16
>>> m.block_size
64

>>> hashlib.sha224(b"Nobody inspects the spammish repetition").hexdigest()
'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2'

9.16. os - Miscellaneous operating system interfaces

Environments

  • name
  • uname
  • umask
  • environ

Process parameters

  • getpid: current process id
  • getppid: parent’s process id
  • getpgrp: current process group id
  • getpgid(pid): process group id with process id pid
  • getuid: real user id of current process
  • getgid: real group id of current process
  • geteuid: effective user id of current process
  • getegid: effective group id of current process
  • getgroups: list of supplemental group ids associated with the current process
  • getresuid: real, effective, saved
  • getresgid:
  • getsid: process session id
  • getlogin: name of the user logged in
  • set*

File descriptor operations

  • open
  • close
  • lseek
  • read
  • write
  • sendfile
  • dup
  • dup2
  • fchmod
  • fchown
  • fstat
  • fsync
  • ftruncate
  • lockf
  • isatty
  • openpty
  • pipe
  • pipe2

Files and directories

  • access
  • chdir
  • chflags
  • chmod
  • chown
  • chroot
  • getcwd
  • link
  • listdir
  • mkdir
  • makedirs
  • mkfifo
  • makedev
  • major
  • minor
  • readlink
  • remove
  • removedirs
  • rename
  • rmdir
  • stat
  • symlink
  • sync
  • truncate
  • unlink
  • utime
  • walk

Process management

  • abort
  • exec*
  • _exit
  • forkpty
  • kill
  • nice
  • popen
  • spawn*
  • system
  • times
  • wait
  • waitpid
  • wait3
  • wait4

Misc system information

  • sep
  • linesep
  • pathsep
  • devnull

9.17. io - Core tools for working with streams

Text and Binary I/O

f = io.StringIO("some initial text data")

f = io.BytesIO(b"some initial binary data: \x00\x01")

9.18. time - Time access and conversions

>>> import time
>>> time.time()
1413910801.16108
>>> time.ctime()
'Wed Oct 22 01:00:03 2014'

>>> time.sleep(.1)

>>> time.clock()
0.194521

9.19. argparse - Parser for command-line options, arguments and sub-commands

parser = argparse.ArgumentParser()
parser.add_argument('pattern')
parser.add_argument('files', nargs='*')
parser.add_argument('-n', '--line-numerber', action='store_true')
...
namespace = parser.parse_args()

9.20. logging - logging — Logging facility for Python

The standard API learned from log4j.

import logging

logfile = 'log.out'

logging.basicConfig(filename=logfile, level=logging.DEBUG)

logging.debug("this message should go to the log file")

logger = logging.getLogger(__name__)
logger.info("this message too")
$ cat log.out
DEBUG:root:this message should go to the log file
INFO:__main__:this message too

9.21. platform - Access to underlying platform’s identifying data

>>> import platform
>>> platform.python_version_tuple()
('3', '4', '1')
>>> platform.platform()
'Darwin-13.2.0-x86_64-i386-64bit'
>>> platform.uname()
uname_result(system='Darwin', node='huanghao-mpa', release='13.2.0', version='Darwin Kernel Version 13.2.0: Thu Apr 17 23:03:13 PDT 2014; root:xnu-2422.100.13~1/RELEASE_X86_64', machine='x86_64', processor='i386')

9.22. errno - Standard errno system symbols

>>> os.mkdir('/tmp')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileExistsError: [Errno 17] File exists: '/tmp'

>>> errno.EEXIST
17

>>> try:
...   os.mkdir('/tmp')
... except OSError as err:
...   if err.errno == errno.EEXIST:
...      print('File exists')
...   else:
...      raise
...
File exists