python - Write double (triple) sum as inner product? -
since np.dot
accelerated openblas , openmpi wondering if there possibility write double sum
for in range(n): j in range(n): b[k,l] += a[i,j,k,l] * x[i,j]
as inner product. right @ moment using
b = np.einsum("ijkl,ij->kl",a,x)
but unfortunately quite slow , uses 1 processor. ideas?
edit: benchmarked answers given until simple example, seems in same order of magnitude:
a = np.random.random([200,200,100,100]) x = np.random.random([200,200]) def b1(): return es("ijkl,ij->kl",a,x) def b2(): return np.tensordot(a, x, [[0,1], [0, 1]]) def b3(): shp = a.shape return np.dot(x.ravel(),a.reshape(shp[0]*shp[1],1)).reshape(shp[2],shp[3]) %timeit b1() %timeit b2() %timeit b3() 1 loops, best of 3: 300 ms per loop 10 loops, best of 3: 149 ms per loop 10 loops, best of 3: 150 ms per loop
concluding these results choose np.einsum, since syntax still readable , improvement other 2 factor 2x. guess next step externalize code c or fortran.
you can use np.tensordot()
:
np.tensordot(a, x, [[0,1], [0, 1]])
which use multiple cores.
edit: insteresting see how np.einsum
, np.tensordot
scale when increasing size of input arrays:
in [18]: n in range(1, 31): ....: = np.random.rand(n, n+1, n+2, n+3) ....: x = np.random.rand(n, n+1) ....: print(n) ....: %timeit np.einsum('ijkl,ij->kl', a, x) ....: %timeit np.tensordot(a, x, [[0, 1], [0, 1]]) ....: 1 1000000 loops, best of 3: 1.55 µs per loop 100000 loops, best of 3: 8.36 µs per loop ... 11 100000 loops, best of 3: 15.9 µs per loop 100000 loops, best of 3: 17.2 µs per loop 12 10000 loops, best of 3: 23.6 µs per loop 100000 loops, best of 3: 18.9 µs per loop ... 21 10000 loops, best of 3: 153 µs per loop 10000 loops, best of 3: 44.4 µs per loop
and becomes clear advantage of using tensordot
larger arrays.
Comments
Post a Comment