Python: Tools to profile and benchmark your code

 Have you ever worked on a problem where you wanted to optimise your code or compare performance of two different version of a program. I was working on a problem to optimise a python program that; connects to a database, query a table and returns some data. The goal was to make whole process blazing fast so that dashboard users can see result under 2 second. And that is when the code profiling tool came into my toolbox. 

On a separate note, developer is required to write code and test it, but a seasoned developer also test his code for performance. Because the real test of an application is when lots of people start using it. 

Enough of the fuss, now let's understand what is code profiling. 

Code Profiling: 

Code profiling is the process of examining a program for various performance metrics, like memory utilisation, CPU, network bandwidth and time taken to perform particular calculation. 

Let's see that in action: 

import random
import time
from functools import reduce


# Define a complex calculation function
def some_complex_calculation(x, y):
# Simulate a complex calculation, e.g., multiplying two numbers
result = x * y
return result


# Using the reduce function for vectorization
def using_reduce(func, data: list):
t1 = time.time()
result = reduce(func, data)
t2 = time.time()
return result, t2 - t1


# Using a plain for loop
def using_loop(data):
t1 = time.time()
result = 1
for num in data:
result *= num
t2 = time.time()
return result, t2-t1


if __name__ == "__main__":
# Generate a list of random numbers for demonstration
data = [random.randint(1, 1000) for _ in range(10000)]
result_of_reduce, time_taken_reduce = using_reduce(some_complex_calculation, data)
print("Using reduce function:")
print("Execution Time for reduce function is:", time_taken_reduce, "seconds")

result_plain_loop, time_plain_loop = using_loop(data)
print("\nUsing plain for loop:")
print("Execution Time using loop is:", time_plain_loop, "seconds")

Using reduce function:
Execution Time for reduce function is: 0.017206430435180664 seconds

Using plain for loop:
Execution Time using loop is: 0.014813899993896484 seconds

Just runtime is not enough, I need more info:

Above program gives only runtime of the code, but most of the times you need lot other info based on what does your program do. And for that there are lots of profilers available at your disposal. 

1. cProfile: 

The first in my list is cProfile. It is a built-in Python profiler, that is part of the standard library. It provides basic but effective way to profile Python code at the function level. You can use it to measure the execution time of each function and identify potential performance bottlenecks.
import cProfile
import random


# Define a complex calculation function
def complex_calculation(x):
# Simulate a complex calculation, e.g., squaring the number
result = x * x
return result


# Generate a list of random numbers for demonstration
data = [random.randint(1, 1000) for _ in range(10000)]


# Using the map function for vectorization
def using_map():
return list(map(complex_calculation, data))


# Using a plain for loop
def using_plain_loop():
result = []
for num in data:
result.append(complex_calculation(num))
return result


if __name__ == "__main__":
print("Using map function:")
cProfile.run("using_map()")

print("\nUsing plain for loop:")
cProfile.run("using_plain_loop()")


Using map function:
     10004 function calls in 0.002 seconds
    Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.002    0.002 <string>:1(<module>)
        1    0.001    0.001    0.002    0.002 profiling.py:15(using_map)
10000    0.001    0.000    0.001    0.000 profiling.py:5(complex_calculation)
        1    0.000    0.000    0.002    0.002 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Using plain for loop:
    20004 function calls in 0.005 seconds
    Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.005    0.005 <string>:1(<module>)
        1    0.003    0.003    0.005    0.005 profiling.py:19(using_plain_loop)
  1000    0.001    0.000    0.001    0.000 profiling.py:5(complex_calculation)
        1    0.000    0.000    0.005    0.005 {built-in method builtins.exec}
10000    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

2. profile: 

What if you want line level profile of you code? profile module is here for that case. Similar to cProfile, the profile module is also part of the standard library and offers line-level profiling. It can be used for  fine-grained analysis.

import profile
import random

# Define a function that simulates some work
def my_slow_function():
total = 0
for _ in range(100000):
total += random.randint(1, 1000)
return total

# Profile the function using cProfile
if __name__ == "__main__":
profiler = profile.Profile()
profiler.run('my_slow_function()')
profiler.print_stats()

802422 function calls in 1.930 seconds
Ordered by: standard name
        ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        100000    0.107    0.000    0.107    0.000 :0(bit_length)
                1      0.000    0.000    1.916    1.916 :0(exec)
        102417    0.106    0.000    0.106    0.000 :0(getrandbits)
        300000    0.362    0.000    0.362    0.000 :0(index)
                1      0.014    0.014    0.014    0.014 :0(setprofile)
                1      0.000    0.000    1.916    1.916 <string>:1(<module>)
                1      0.117    0.117    1.916    1.916 mix_profiling.py:6(my_slow_function)
                1      0.000    0.000    1.930    1.930 profile:0(my_slow_function())
                0      0.000                 0.000              profile:0(profiler)
        100000    0.343    0.000    0.556    0.000 random.py:239(_randbelow_with_getrandbits)
        100000    0.649    0.000    1.567    0.000 random.py:292(randrange)
        100000    0.232    0.000    1.799    0.000 random.py:366(randint)


3. line_profiler: 

This is a third party package that you need to install in order to use it. And in my opinion there are not any other profiler like line_profiler in terms of readability of the profiling statistics. 
So lets get stared with pip install line_profiler
import random
from line_profiler import LineProfiler
prof = LineProfiler()


def profile(func):
from functools import wraps

@wraps(func)
def wrapper(*args, **kwargs):
try:
return prof(func)(*args, **kwargs)
finally:
prof.print_stats()

return wrapper


@profile
def my_slow_function():
total = 0
for _ in range(100000):
total += random.randint(1, 1000)
return total


if __name__ == '__main__':
my_slow_function()

Timer unit: 1e-09 s

Total time: 0.401355 s

File: /home/sanjeet/Documents/git_pod/basic_python/profiling/mix_profiling.py

Function: my_slow_function at line 38


Line #      Hits         Time  Per Hit   % Time  Line Contents

==============================================================

    38                                           @profile

    39                                           def my_slow_function():

    40         1       1535.0   1535.0      0.0      total = 0

    41    100001   26632008.0    266.3      6.6      for _ in range(100000):

    42    100000  374721383.0   3747.2     93.4          total += random.randint(1, 1000)     43         1        178.0    178.0      0.0      return total

Process finished with exit code 0

4. memory_profiler:

This package is useful in memory profiling of python code. It can help you identify memory-intensive parts of your code and potential memory leaks. This is also a third party package so you need to install it first before using it. 
pip install memory-profiler
from memory_profiler import profile
import random


@profile
def memory_intensive_function():
data = [random.randint(1, 1000) for _ in range(1000000)]
result = sum(data)
del data # To free up memory
return result


if __name__ == "__main__":
result = memory_intensive_function()
print("Result:", result)

Filename: /home/sanjeet/Documents/git_pod/basic_python/profiling/mix_profiling.py


Line #    Mem usage    Increment  Occurrences   Line Contents

=============================================================

    54     19.4 MiB     19.4 MiB           1   @profile

    55                                         def memory_intensive_function():

    56     50.2 MiB     30.9 MiB     1000003       data = [random.randint(1, 1000) for _ in range(1000000)]

    57     50.2 MiB      0.0 MiB           1       result = sum(data)

    58     21.0 MiB    -29.2 MiB           1       del data  # To free up memory

    59     21.0 MiB      0.0 MiB           1       return result


Result: 500539097

Process finished with exit code 0


5. Timeit:

Timeit is a builtin package in python standard library and is a very simple alternative of using time module, as give in the very first code example in this blog.

Enough of static code profiling, what if I want to profile a running python app?

For profiling your code while it is running there are some profilers that are listed below

1. Pyflame
2. PySpy

I will cover these and move in next update to this blog. 
I am also working on a bonus tutorial for you all Additional Bonus: cProfile and RunSnakeRun:

Till then, Happy Learning. 


SEO: Code Profiling, line_profiler, memory_profiler, cprofile, timeit, timeit example, cprofile example, python code profiling. 


Post a Comment

Previous Post Next Post