Python

Different type of python

Some notable mention wrt linux:

  • Cpython -> CPython is the original Python implementation
  • PyPy -> A fast python implementation with a JIT compiler
  • Jython -> Python running on the Java Virtual Machine
  • Stackless -> Branch of CPython supporting microthreads . Seems similar to go programming language.

Profiling and optimizing python

Timing function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import time
from functools import wraps
import random

def timing(f):
    def wrap(*args):
        time1 = time.time()
        ret = f(*args)
        time2 = time.time()
        print('{:s} function took {:.3f} ms'.format(f.__name__, (time2-time1)*1000.0))
        return ret
    return wrap

@timing
def random_sort(n):
    return sorted([random.random() for i in range(n)])

using time it

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import timeit

def linear_search(mylist, find):
    for x in mylist:
        if x == find:
            return True
    return False

def linear_time():
    SETUP_CODE = '''
from __main__ import linear_search
from random import randint'''
     
    TEST_CODE = '''
mylist = [x for x in range(10000)]
find = randint(0, len(mylist))
linear_search(mylist, find)
    '''
    # timeit.repeat statement
    times = timeit.repeat(setup = SETUP_CODE,stmt = TEST_CODE,repeat = 3,number = 10000)
 
    # priniting minimum exec. time
    print('Linear search time: {}'.format(min(times)))  
 
if __name__ == "__main__":
    linear_time()

timing an script

time -p python script.py

using cprofile

python -m cProfile -s cumulative script.py

profiling memory

pip install memory_profiler
pip install psutil
python -m memory_profiler script.py

there is also one tool called guppy. And it’s a really good library.

Some less known data structure

This part of the blog is taken from PyMotw

ChainMap

The ChainMap class manages a sequence of dictionaries, and searches through them in the order they are given to find values associated with keys

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import collections

a = {1: 10, 2: 20}
b = {3: 30, 4: 40, 2:200}

m1 = collections.ChainMap(a, b)
m2 = collections.ChainMap(b, a)
print(m1[2],end=" ")
# it will print 20
print(m2[2],end="\n")
# it will print 200
print(list(m1.keys()))
print(list(m1.values()))
for k, v in m1.items():
    print('{} = {}'.format(k, v))
m1.maps = list(reversed(m1.maps))
print('m1 = {}'.format(m1[2]))
a[5]=50
print(m1[5])
m3 = m1.new_child()
m3[2] = 2000
m3[10] = 209

Counter

A Counter is a container that keeps track of how many times equivalent values are added.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import collections

c = collections.Counter()
print('Initial :', c)
c.update('apoorvakumarhseenvaramuk')
print('Sequence:', c)
c.update({'a': 1, 'd': 5})
print('Dict    :', c)
print('Most common:')
for letter, count in c.most_common(3):
    print('{}: {}'.format(letter, count))
c1 = collections.Counter('aaaaaaassddbcpoerqw')
print(c1 + c)
print(c1 - c)
print(c1 & c)
print(c1 | c2)

DefaultDict

The standard dictionary includes the method setdefault() for retrieving a value and establishing a default if the value does not exist. By contrast, defaultdict lets the caller specify the default up front when the container is initialized.

1
2
3
4
5
6
7
8
9
10
11
import collections


def default_factory():
    return 'default value'


d = collections.defaultdict(default_factory, foo='bar')
print('d:', d)
print('foo =>', d['foo'])
print('bar =>', d['bar'])

NamedTuple

The standard tuple uses numerical indexes to access its members. Similar to C structure.

1
2
3
4
5
6
7
8
9
10
11
12
13
import collections

Person = collections.namedtuple('Person', 'name age')

bob = Person(name='Bob', age=30)
print('\nRepresentation:', bob)

jane = Person(name='Jane', age=29)
print('\nField by name:', jane.name)

print('\nFields by index:')
for p in [bob, jane]:
    print('{} is {} years old'.format(*p))

It’s immutable.

1
2
3
4
5
6
7
import collections

Person = collections.namedtuple('Person', 'name age')

bob = Person(name='Bob', age=30)
print('Representation:', bob)
print('As Dictionary:', bob._asdict())

Some more detail about collection

Itertools

The chain() function takes several iterators as arguments and returns a single iterator that produces the contents of all of the inputs as though they came from a single iterator.

1
2
3
4
5
from itertools import *

for i in chain([1, 2, 3], ['a', 'b', 'c']):
    print(i, end=' ')
print()

zip_longest()

1
2
3
4
5
6
7
from itertools import *

r1 = range(3)
r2 = range(2)

print('\nzip_longest processes all of the values:')
print(list(zip_longest(r1, r2)))

islice()

The islice() function returns an iterator which returns selected items from the input iterator, by index.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from itertools import *

print('Stop at 5:')
for i in islice(range(100), 5):
    print(i, end=' ')
print('\n')

print('Start at 5, Stop at 10:')
for i in islice(range(100), 5, 10):
    print(i, end=' ')
print('\n')

print('By tens to 100:')
for i in islice(range(100), 0, 100, 10):
    print(i, end=' ')
print('\n')

starmap()

The starmap() function is similar to map(), but instead of constructing a tuple from multiple iterators, it splits up the items in a single iterator as arguments to the mapping function using the * syntax.

1
2
3
4
5
6
from itertools import *

values = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]

for i in starmap(lambda x, y: (x, y, x * y), values):
    print('{} * {} = {}'.format(*i))

fraction and count()

count() function returns an iterator that produces consecutive integers, indefinitely

1
2
3
4
5
6
7
8
9
10
from itertools import *

import fractions
from itertools import *

start = fractions.Fraction(1, 3)
step = fractions.Fraction(1, 3)

for i in zip(count(start, step), ['a', 'b', 'c']):
    print('{}: {}'.format(*i))

cycle() function returns an iterator that repeats the contents of the arguments it is given indefinitely. Since it has to remember the entire contents of the input iterator, it may consume quite a bit of memory if the iterator is long.

accumulate()

accumulate() function processes the input iterable, passing the nth and n+1st item to a function and producing the return value instead of either input. The default function used to combine the two values adds them, so accumulate() can be used to produce the cumulative sum of a series of numerical inputs.

1
2
3
4
from itertools import *

print(list(accumulate(range(5))))
print(list(accumulate('abcde')))

It is possible to combine accumulate() with any other function that takes two input values to achieve different results.

1
2
3
4
5
from itertools import *
def f(a, b):
    print(a, b)
    return b + a + b
print(list(accumulate('abcde', f)))

permutation()

1
2
3
4
5
6
from itertools import permutations
perm = permutations([1, 2, 3], 2)
for i in list(perm):
    print i

# Answer->(1, 2),(1, 3),(2, 1),(2, 3),(3, 1),(3, 2)

combination()

1
2
3
4
from itertools import combinations
comb = combinations([1, 2, 3], 2)
for i in list(comb):
    print i

I have left functools, this might be one important thing left.