performance - Optimizing simple CPU-bound loops using Cython and replacing a list -

September 15, 2012

i trying evaluate approaches, , i'm hitting stumbling block performance.

why cython code slow?? expectation code run quite bit faster (maybe nano seconds 2d loop 256 ** 2 entries) opposed milliseconds.

here test results:

$ python setup.py build_ext --inplace; python test.py running build_ext         counter: 0.00236220359802 sec        pycounter: 0.00323309898376 sec       percentage: 73.1 %

my initial code looks this:

#!/usr/bin/env python # encoding: utf-8 # filename: loop_testing.py  def generate_coords(dim, length):     """generates list of coordinates dimensions , size     provided.      parameters:         dim -- dimension         length -- size of each dimension      returns:         list of coordinates based on dim , length     """     values = []     if dim == 2:         x in xrange(length):             y in xrange(length):                 values.append((x, y))      if dim == 3:         x in xrange(length):             y in xrange(length):                 z in xrange(length):                     values.append((x, y, z))      return values

this works need, slow. given dim, length = (2, 256), see timing on ipython of approximately 2.3ms.

in attempt speed up, developed cython equivalent (i think it's equivalent).

#!/usr/bin/env python # encoding: utf-8 # filename: loop_testing.pyx # cython: boundscheck=false # cython: wraparound=false  cimport cython cython.parallel cimport prange  import numpy np cimport numpy np   ctypedef int dtype  # 2d point updater cpdef inline void _counter_2d(dtype[:, :] narr, int val) nogil:     cdef:         dtype count = 0         dtype index = 0         dtype x, y      x in range(val):         y in range(val):             narr[index][0] = x             narr[index][1] = y             index += 1  cpdef dtype[:, :] counter(dim=2, val=256):     narr = np.zeros((val**dim, dim), dtype=np.dtype('i4'))     _counter_2d(narr, val)     return narr  def pycounter(dim=2, val=256):     vals = []     x in xrange(val):         y in xrange(val):             vals.append((x, y))     return vals

and invocation of timing:

#!/usr/bin/env python # filename: test.py """ usage:     test.py [options]     test.py [options] <val>     test.py [options] <dim> <val>  options:     -h --help       message     -n              number of loops [default: 10] """  if __name__ == "__main__":     docopt import docopt     timeit import timer      args = docopt(__doc__)     dim = args.get("<dim>") or 2     val = args.get("<val>") or 256     n = args.get("-n") or 10     dim = int(dim)     val = int(val)     n = int(n)      tests = ['counter', 'pycounter']     timing = {}     test in tests:         code = "{}(dim=dim, val=val)".format(test)         variables = "dim, val = ({}, {})".format(dim, val)         setup = "from loop_testing import {}; {}".format(test, variables)         t = timer(code, setup=setup)         timing[test] = t.timeit(n) / n      test, val in timing.iteritems():         print "{:>20}: {} sec".format(test, val)     print "{:>20}: {:>.3} %".format("percentage", timing['counter'] / timing['pycounter'] * 100)

and reference, setup.py build cython code:

from distutils.core import setup cython.build import cythonize import numpy  include_path = [numpy.get_include()]  setup(     name="looping",     ext_modules=cythonize('loop_testing.pyx'),  # accepts glob pattern     include_dirs=include_path, )

edit: link working version: https://github.com/brianbruggeman/cython_experimentation

this cython code slow because of narr[index][0] = x assignment, relies heavily on python c-api. using, narr[index, 0] = x instead, translated pure c, , solves issue.

as pointed out @perimosocordiae, using cythonize annotations way go debug such issues.

in cases can worth explicitly specifying compilation flags in setup.py gcc,

setup(    [...]    extra_compile_args=['-o2', '-march=native'],    extra_link_args=['-o2', '-march=native'])

this should not necessary, assuming reasonable default compilation flags. however, instance, on linux system default appear no optimization @ , adding above flags, results in significant performance improvement.

Search This Blog

Script

performance - Optimizing simple CPU-bound loops using Cython and replacing a list -

Comments

Post a Comment

Popular posts from this blog

android - Sent Blob results empty -

javascript - Bootstrap Popover: iOS Safari strange behaviour -

ruby - How to configure keymap of Rubymine for rails console -