Prerequisite: High-Performance Array Operations with Cython | Set 1
The resulting code in the first part works fast. In this article, we will compare the performance of the code with the clip() function that is present in the NumPy library.
As to the surprise, our program is working fast as compared to the NumPy which is written in C.
Code #1 : Comparing the performances.
Python3
a = timeit( 'numpy.clip(arr2, -5, 5, arr3)' , 'from __main__ import b, c, numpy' , number = 1000 ) print ( "\nTime for NumPy clip program : " , a) b = timeit( 'sample.clip(arr2, -5, 5, arr3)' , 'from __main__ import b, c, sample' , number = 1000 ) print ( "\nTime for our program : " , b) |
Output :
Time for NumPy clip program : 8.093049556000551 Time for our program :, 3.760528204000366
Well the codes in the article required Cython typed memoryviews that simplifies the code that operates on arrays. The declaration cpdef clip() declares clip() as both a C-level and Python-level function. This means that the function call is more efficiently called by other Cython functions (e.g., if you want to invoke clip() from a different Cython function).
Two decorators are used in the code – @cython.boundscheck(False) and @cython.wraparound(False). Such are the few optional performance optimizations.
@cython.boundscheck(False) : Eliminates all array bounds checking and can be used if the indexing won’t go out of range.
@cython.wraparound(False) : Eliminates the handling of negative array indices as wrapping around to the end of the array (like with Python lists). The inclusion of these decorators can make the code run substantially faster (almost 2.5 times faster on this example when tested).
Code #2 : Variant of the clip() function that uses conditional expressions
Python3
# decorators @cython .boundscheck( False ) @cython .wraparound( False ) cpdef clip(double[:] a, double min , double max , double[:] out): if min > max : raise ValueError( "min must be <= max" ) if a.shape[ 0 ] ! = out.shape[ 0 ]: raise ValueError ( "input and output arrays must be the same size" ) for i in range (a.shape[ 0 ]): out[i] = (a[i] if a[i] < max else max ) if a[i] > min else min |
When tested, this version of the code runs over 50% faster. But how this code would stack up against a handwritten C version. After experimenting, it can be tested that a handcrafted C extension runs more than 10% slower than the version created by Cython.