9. A tour of std libs¶
Text processing: | |
---|---|
string, re, difflib, textwrap, unicodedata, stringprep, readline, rlcompleter | |
Binary data services: | |
struct, codecs | |
Data types: | datetime, calendar, collections, collections.abc, heapq, bisect, array, weakref, types, copy, pprint, reprlib, enum |
Numeric and Math: | |
numbers, math, cmath, decimal, fractions, random, statistics | |
Functional programming: | |
itertools, functools, operator | |
File and directory access: | |
pathlib, os.path, fileinput, stat, filecmp, tempfile, glob, fnmatch, linecache, shutil, macpath | |
Data peristence: | |
pickle, copyreg, shelve, marshal, dbm, sqlite3 | |
Data compression: | |
zlib, gzip, bz2, lzma, zipfile, tarfile | |
File formats: | csv, configparser, netrc, xdrlib, plistlib |
Cryptographic: | hashlib, hmac |
OS: | os, io, time, argparse, getopt, logging, getpass, curses, platform, errno, ctypes |
Concurrent: | threading, multiprocessing, concurrent.futures, subprocess, sched, queue, dummy_threading |
Networking: | socket, ssl, select, selectors, asyncio, asyncore, asynchat, signal, mmap |
Internet: | email, json, mailcap, mailbox, mimetypes, base64, binhex, binascii, quopri, uu |
Structured markup: | |
html, xml.etree, xml.dom, xml.sax | |
Internet protocols: | |
webbrowser, cgi, wsgiref, urllib.request, urllib.response, urllib.parse, urllib.error, urllib.robotparser, http, fptlib, poplib, imaplib, nntplib, smtplib, smtpd, telnetlib, uuid, socketserver, http.server, http.cookies, http.cookiejar, xmlrpc, ipaddress | |
Multimedia: | audioop, aifc, sunau, wave, chunk, colorsys, imghdr, sndhdr, ossaudiodev |
I18n: | gettext, locale |
Porgram frameworks: | |
turtle, cmd, shlex | |
GUI: | tkinter |
Development tools: | |
pydoc, doctest, unittest, unittest.mock, test | |
Debugging and profiling: | |
bdb, faulthandler, pdb, profile, timeit, trace, tracemalloc | |
Packaing and distribution: | |
distutils, ensurepip, venv | |
Runtime: | sys, sysconfig, builtins, __main__, warnings, contextlib, abc, atexit, traceback, __future__, gc, inspect, site, fpectl |
Custom python interpreters: | |
code, codeop | |
Importing: | zipimport, pkgutil, modulefinder, runpy, importlib |
Language: | parser, ast, symtable, symbol, token, keyword, tokenize, tabnanny, pyclbr, py_compile, compileall, dis, pickletools |
Misc: | formatter |
MS: | msilib, msvcrt, winreg, winsound |
Unix: | posix, pwd, spwd, grp, crypt, termios, tty, pty, fcntl, pipes, resource, nis, syslog |
Superseded: | optparse, imp |
Where you can find those standard libs ?
sys.path
/usr/lib/python2.7 /usr/lib/python3.4
9.1. datetime - Basic date and time types¶
There are two kinds of date and time objects: “naive” and “aware”.
An aware object has sufficient knownledge of applicable algorithmic and political time adjustments, such as time zone and daylight saving time information, to locate oitself relative to other aware objects.
Availabe types
- datetime.date
- datetime.time
- datetime.dateteime
- datetime.timedelta
- datetime.tzinfo: An abstract base class for time zone information objects.
- datetime.timezone: A class that implements the tzinfo abc as a fixed offset from the UTC.
Objects of these types are immutable.
Objects of the date type are always naive.
Naive:
>>> import datetime
>>> now = datetime.datetime.now() # Date and Time
>>> now
datetime.datetime(2014, 10, 21, 7, 32, 44, 964045)
>>> now.date()
datetime.date(2014, 10, 21)
>>> now.time()
datetime.time(7, 32, 44, 964045)
>>> day = datetime.date(2012, 12, 12) # Date
>>> day.year, day.month, day.day, day.isoweekday(), day.isoformat()
(2012, 12, 12, 3, '2012-12-12')
>>> tm = datetime.time(19, 30, 00) # Time
>>> tm.hour, tm.minute, tm.second, tm.isoformat()
(19, 30, 0, '19:30:00')
>>> past = dateteime.datetime.now() - now # Time delta
>>> past, str(past), past.total_seconds()
(datetime.timedelta(0, 615, 431954), '0:10:15.431954', 615.431954)
>>> day.strftime('%b %d %Y %a') # Format
'Dec 12 2012 Wed'
>>> timestr = '[Sun Oct 19 08:10:01 2014]' # Parse
>>> datetime.datetime.strptime(timestr, '[%a %b %d %H:%M:%S %Y]')
datetime.datetime(2014, 10, 19, 8, 10, 1)
Aware:
>>> beijing = timezone(timedelta(hours=8), 'Asia/Shanghai')
>>> finland = timezone(timedelta(hours=2), 'Europe/Helsinki')
>>> t1 = datetime.datetime(2014, 10, 6, 15, 0, 0, tzinfo=beijing)
>>> str(t1), t1.tzname()
('2014-10-06 15:00:00+08:00', 'Asia/Shanghai')
>>> t2 = t1.astimezone(finland)
>>> str(t2), t2.tzname()
('2014-10-06 09:00:00+02:00', 'Europe/Helsinki')
9.2. collections - Container datetypes¶
namedtuple() | factory function for creating tuple subclasses with named fields |
deque | list-like container with fast appends and pops on either end |
ChainMap | dict-like class for creating a single view of multiple mappings |
Counter | dict subclass for counting hashable objects |
OrderedDict | dict subclass that remembers the order entries were added |
defaultdict | dict subclass that class a factory function to supply missing values |
namedtuple():
>>> from collections import namedtuple
>>> Person = namedtuple('Person', ['name', 'age', 'gender'])
>>> bob = Person('Bob', 30, 'male')
>>> jane = Person(name='Jane', gender='female', age=29)
>>> bob, bob[2]
(Persion(name='Bob', age=30, gender='male'), 'male')
>>> type(jane), jane.age
(<class '__main__.Persion'>, 29)
>>> bob._asdict()
OrderedDict([('name', 'Bob'), ('age', 30), ('gender', 'male')])
>>> bob._replace(name='Tom', age=52)
Persion(name='Tom', age=52, gender='male')
>>> class Person(namedtuple('Person', ['name', 'age', 'gender'])):
... __slots__ = ()
... @property
... def lastname(self):
... return self.name.split()[-1]
...
>>> john = Person('John Lennon', 75, 'male')
>>> john.lastname
'Lennon'
9.3. deque: double-ended queue¶
Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.
Operation | list | Big O | deque | Big O |
---|---|---|---|---|
Add in the head | l.insert(0) | O(n) | d.appendleft | O(1) |
Add in the tail | l.append() | O(1) | d.append | O(1) |
Del in the head | l.pop(0) | O(n) | d.popleft | O(1) |
Del in the tail | l.pop() | O(1) | d.pop | O(1) |
def timing(initial, setup, testing, times=3):
print('Testing the following code for {} times ...\n{}'.format(times, testing.strip()))
namespace = {}
exec(initial, namespace)
av = 0
for i in range(times):
exec(setup, namespace)
begin = time.time()
exec(testing, namespace)
cost = time.time() - begin
print('{}: {}'.format(i + 1, cost))
av += cost
print('av: {}\n'.format(av / times))
>>> timing('data = list(range(10**5))', 'l = []', '''
... for i in data:
... l.insert(0, i) # O(n)
... ''')
Testing the following code for 3 times ...
for i in data:
l.insert(0, i)
1: 3.9300358295440674
2: 4.109051704406738
3: 4.1024134159088135
av: 4.04716698328654
$ python timing.py
Testing the following code for 3 times ...
for i in data:
l.insert(0, i) # O(N)
av: 4.171613295873006
Testing the following code for 3 times ...
for i in data:
l.append(i) # O(1)
av: 0.012801011403401693
Testing the following code for 3 times ...
for i in data:
d.appendleft(i) # O(1)
av: 0.014629840850830078
Testing the following code for 3 times ...
for i in data:
d.append(i) # O(1)
av: 0.014315048853556315
Testing the following code for 3 times ...
for _ in data:
l.pop(0) # O(n)
av: 1.6093259652455647
Testing the following code for 3 times ...
for _ in data:
l.pop() # O(1)
av: 0.014542102813720703
Testing the following code for 3 times ...
for _ in data:
d.popleft() # O(1)
av: 0.011040687561035156
Testing the following code for 3 times ...
for _ in data:
d.pop() # O(1)
av: 0.011482477188110352
See Time complexity
A ChainMap class is provided for quickly linking a number of mappings so they can be treated as a single unit.
Example of simulating Python’s internal lookup chain:
import builtins
pylookup = ChainMap(locals(), globals(), vars(builtins))
A counter tool is provided to support convenient and rapid tallies. For example:
>>> from collections import Counter
>>> words = ['red', 'blue', 'red', 'green', 'blue', 'blue']
>>> cnt = Counter(words)
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})
>>> cnt.most_common(2)
[('blue', 3), ('red', 2)]
Ordered dictionaries are just like regular dictionaries but they remember the order that items were inserted. When iterating over an ordered dictionary, the items are returned in the order their keys were first added.
>>> # regular unsorted dictionary
>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}
>>> # dictionary sorted by key
>>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
>>> assert list(o.keys()) == sorted(d.keys())
Dictionary with default value:
>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
... d[k].append(v)
...
>>> list(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
This technique is simpler and faster than an equivalent technique using dict.setdefault():
>>> d = {}
>>> for k, v in s:
... d.setdefault(k, []).append(v)
...
>>> list(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
9.4. heapq - Heap queue algorithm¶
>>> def heapsort(iterable):
... 'Equivalent to sorted(iterable)'
... h = []
... for value in iterable:
... heappush(h, value)
... return [heappop(h) for i in range(len(h))]
...
>>> heapsort([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
This module provides an implementation of the min heap queue algorithm, also known as the priority queue algorithm.
9.5. bisect - Array bisection algorithm¶
import bisect
import random
# Use a constant see to ensure that we see
# the same pseudo-random numbers each time
# we run the loop.
random.seed(1)
# Generate 20 random numbers and
# insert them into a list in sorted
# order.
l = []
for i in range(1, 20):
r = random.randint(1, 100)
position = bisect.bisect(l, r)
bisect.insort(l, r)
print('{:=2} {:=2} {}'.format(r, position, l))
9.6. array - Efficient arrays of numeric values¶
array.array is useful when you need a homogeneous C array of data for reasons other than doing math.
array('l')
array('u', 'hello \u2641')
array('l', [1, 2, 3, 4, 5])
array('d', [1.0, 2.0, 3.14])
See also: bytearray vs array
9.7. weakref - Weak references¶
A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else. However, until the object is actually destroyed the weak reference may return the object even if there are no strong references to it.
A primary use for weak references is to implement caches or mappings holding large objects, where it’s desired that a large object not be kept alive solely because it appears in a cache or mapping.
Not all objects can be weakly referenced; those objects which can include class instances, functions written in Python (but not in C), instance methods, sets, frozensets, some file objects, generators, type objects, sockets, arrays, deques, regular expression pattern objects, and code objects.
Several built-in types such as list and dict do not directly support weak references but can add support through subclassing:
class Dict(dict):
pass
obj = Dict(red=1, green=2, blue=3) # this object is weak referenceable
Other built-in types such as tuple and int do not support weak references even when subclassed (This is an implementation detail and may be different across various Python implementations.).
Extension types can easily be made to support weak references; see Weak Reference Support.
weakref.ref
import weakref
class BigObject:
def __del__(self):
print('Deleting {}'.format(self))
o = BigObject()
r = weakref.ref(o)
print('obj: {}'.format(o))
print('ref: {}'.format(r))
print('r(): {}'.format(r()))
del o
print('r(): {}'.format(r()))
obj: <__main__.BigObject instance at 0x1036f43f8>
ref: <weakref at 0x1036e5c58; to 'instance' at 0x1036f43f8>
r(): <__main__.BigObject instance at 0x1036f43f8>
Deleting <__main__.BigObject instance at 0x1036f43f8>
r(): None
...
def callback(ref):
print('Callback {}'.format(ref))
...
r = weakref.ref(o, callback)
...
obj: <__main__.BigObject instance at 0x10237a4d0>
ref: <weakref at 0x10236bc58; to 'instance' at 0x10237a4d0>
r(): <__main__.BigObject instance at 0x10237a4d0>
Callback <weakref at 0x10236bc58; dead>
Deleting <__main__.BigObject instance at 0x10237a4d0>
r(): None
weakref.proxy
p = weakref.proxy(o) try: p.attr except ReferenceError: ...
weakref.WeakValueDictionary #TODO
9.8. types - Dynamic type creation and names for built-in types¶
>>> import types
>>> type(lambda : ...) is types.FunctionType
True
9.9. copy - Shallow and deep copy operations¶
>>> import copy
>>> class Object: pass
...
>>> l1 = [1, [2, Object()]]
>>> l2 = l1
>>> l3 = copy.copy(l1)
>>> l4 = copy.deepcopy(l1)
>>> l3[0] = 3
>>> l3[1][0] = 4
>>> l1
[1, [4, <__main__.Object object at 0x107d2a278>]]
>>> l2
[1, [4, <__main__.Object object at 0x107d2a278>]]
>>> l3
[3, [4, <__main__.Object object at 0x107d2a278>]]
>>> l4
[1, [2, <__main__.Object object at 0x107d2a978>]]
9.10. os.path - Common pathname manipulations¶
>>> import os.path
>>> os.path.sep, os.path.extsep, os.path.pardir, os.path.curdir
('/', '.', '..', '.')
>>> os.path.dirname('/one/two/three'), os.path.basename('/one/two/three')
('/one/two', 'three')
>>> os.path.join('one', 'two', 'three')
'one/two/three'
>>> os.path.splitext('/path/file.ext')
('/path/file', '.ext')
>>> os.path.expanduser('~/file.txt')
'/Users/huanghao/file.txt' # Mac
>>> os.getcwd()
'/tmp'
>>> os.path.abspath('file.txt')
'/tmp/file.txt'
>>> os.path.realpath('file.txt')
'/tmp/file.txt'
>>> os.path.isdir('/tmp'), os.path.isfile('/etc/hosts'), os.path.islink('/var'), os.path.exists('/dev'), os.path.ismount('/dev')
(True, True, True, True, True)
9.11. tempfile - Generate temporary files and directories¶
>>> import tempfile
# create a temporary file and write some data to it
>>> fp = tempfile.TemporaryFile()
>>> fp.write(b'Hello world!')
# read data from file
>>> fp.seek(0)
>>> fp.read()
b'Hello world!'
# close the file, it will be removed
>>> fp.close()
# create a temporary file using a context manager
>>> with tempfile.TemporaryFile() as fp:
... fp.write(b'Hello world!')
... fp.seek(0)
... fp.read()
b'Hello world!'
>>>
# file is now closed and removed
# create a temporary directory using the context manager
>>> with tempfile.TemporaryDirectory() as tmpdirname:
... print('created temporary directory', tmpdirname)
>>>
# directory and contents have been removed
9.12. glob - Unix style pathname pattern expansion¶
>>> import glob
>>> glob.glob('./[0-9].*')
['./1.gif', './2.txt']
>>> glob.glob('*.gif')
['1.gif', 'card.gif']
>>> glob.glob('?.gif')
['1.gif']
>>> glob.glob('*.gif')
['card.gif']
>>> glob.glob('.c*')
['.card.gif']
9.13. shutil - High-level file operations¶
- copyfileobj
- copyfile
- copymode
- copystat
- copy
- copy2: Identical to copy() except that copy2() also attempts to preserve all file metadata.
- copytree
- rmtree
- move
- disk_usage
- chown
- which
- make_archive
- unpack_archive
- get_terminal_size
>>> shutil.disk_usage(os.path.expanduser('~'))
usage(total=120473067520, used=51554127872, free=68656795648)
>>> shutil.get_terminal_size()
os.terminal_size(columns=130, lines=34)
>>> shutil.which('python3')
'/usr/local/bin/python3'
>>> archive_name = os.path.expanduser(os.path.join('~', 'myarchive'))
>>> root_dir = os.path.expanduser(os.path.join('~', '.ssh'))
>>> shutil.make_archive(archive_name, 'gztar', root_dir)
'/Users/tarek/myarchive.tar.gz'
The resulting archive contains:
$ tar -tzvf /Users/tarek/myarchive.tar.gz
drwx------ tarek/staff 0 2010-02-01 16:23:40 ./
-rw-r--r-- tarek/staff 609 2008-06-09 13:26:54 ./authorized_keys
-rwxr-xr-x tarek/staff 65 2008-06-09 13:26:54 ./config
-rwx------ tarek/staff 668 2008-06-09 13:26:54 ./id_dsa
-rwxr-xr-x tarek/staff 609 2008-06-09 13:26:54 ./id_dsa.pub
-rw------- tarek/staff 1675 2008-06-09 13:26:54 ./id_rsa
-rw-r--r-- tarek/staff 397 2008-06-09 13:26:54 ./id_rsa.pub
-rw-r--r-- tarek/staff 37192 2010-02-06 18:23:10 ./known_hosts
9.14. netrc - netrc file processing¶
- ::
- $ cat ~/.netrc default login huanghao password 123456 machine company.com login hh password xxx
>>> import netrc
>>> import os
>>> rc = netrc.netrc(os.path.expanduser('~/.netrc'))
>>> rc.hosts
{'default': ('huanghao', None, '123456'), 'company.com': ('hh', None, 'xxx')}
>>> rc.authenticators('company.com')
('hh', None, 'xxx')
>>> rc.authenticators('home.me')
('huanghao', None, '123456')
See also Manual netrc
9.15. hashlib - Secure hashes and message digests¶
>>> import hashlib
>>> m = hashlib.md5()
>>> m.update(b"Nobody inspects")
>>> m.update(b" the spammish repetition")
>>> m.digest()
b'\xbbd\x9c\x83\xdd\x1e\xa5\xc9\xd9\xde\xc9\xa1\x8d\xf0\xff\xe9'
>>> m.digest_size
16
>>> m.block_size
64
>>> hashlib.sha224(b"Nobody inspects the spammish repetition").hexdigest()
'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2'
9.16. os - Miscellaneous operating system interfaces¶
Environments
- name
- uname
- umask
- environ
Process parameters
- getpid: current process id
- getppid: parent’s process id
- getpgrp: current process group id
- getpgid(pid): process group id with process id pid
- getuid: real user id of current process
- getgid: real group id of current process
- geteuid: effective user id of current process
- getegid: effective group id of current process
- getgroups: list of supplemental group ids associated with the current process
- getresuid: real, effective, saved
- getresgid:
- getsid: process session id
- getlogin: name of the user logged in
- set*
File descriptor operations
- open
- close
- lseek
- read
- write
- sendfile
- dup
- dup2
- fchmod
- fchown
- fstat
- fsync
- ftruncate
- lockf
- isatty
- openpty
- pipe
- pipe2
Files and directories
- access
- chdir
- chflags
- chmod
- chown
- chroot
- getcwd
- link
- listdir
- mkdir
- makedirs
- mkfifo
- makedev
- major
- minor
- readlink
- remove
- removedirs
- rename
- rmdir
- stat
- symlink
- sync
- truncate
- unlink
- utime
- walk
Process management
- abort
- exec*
- _exit
- forkpty
- kill
- nice
- popen
- spawn*
- system
- times
- wait
- waitpid
- wait3
- wait4
Misc system information
- sep
- linesep
- pathsep
- devnull
9.17. io - Core tools for working with streams¶
Text and Binary I/O
f = io.StringIO("some initial text data")
f = io.BytesIO(b"some initial binary data: \x00\x01")
9.18. time - Time access and conversions¶
>>> import time
>>> time.time()
1413910801.16108
>>> time.ctime()
'Wed Oct 22 01:00:03 2014'
>>> time.sleep(.1)
>>> time.clock()
0.194521
9.19. argparse - Parser for command-line options, arguments and sub-commands¶
parser = argparse.ArgumentParser()
parser.add_argument('pattern')
parser.add_argument('files', nargs='*')
parser.add_argument('-n', '--line-numerber', action='store_true')
...
namespace = parser.parse_args()
9.20. logging - logging — Logging facility for Python¶
The standard API learned from log4j.
import logging
logfile = 'log.out'
logging.basicConfig(filename=logfile, level=logging.DEBUG)
logging.debug("this message should go to the log file")
logger = logging.getLogger(__name__)
logger.info("this message too")
$ cat log.out
DEBUG:root:this message should go to the log file
INFO:__main__:this message too
9.21. platform - Access to underlying platform’s identifying data¶
>>> import platform
>>> platform.python_version_tuple()
('3', '4', '1')
>>> platform.platform()
'Darwin-13.2.0-x86_64-i386-64bit'
>>> platform.uname()
uname_result(system='Darwin', node='huanghao-mpa', release='13.2.0', version='Darwin Kernel Version 13.2.0: Thu Apr 17 23:03:13 PDT 2014; root:xnu-2422.100.13~1/RELEASE_X86_64', machine='x86_64', processor='i386')
9.22. errno - Standard errno system symbols¶
>>> os.mkdir('/tmp')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileExistsError: [Errno 17] File exists: '/tmp'
>>> errno.EEXIST
17
>>> try:
... os.mkdir('/tmp')
... except OSError as err:
... if err.errno == errno.EEXIST:
... print('File exists')
... else:
... raise
...
File exists