You might have experienced that your senior developer Interview for python was easy and even after answering the questions you were not selected.
The reason most probably is that you solved it like a noob and the Interview was more to figure out whether you are a noob in Python or a real gem who has been cut and polished for years.
And to avoid looking like a noob, you should say goodbye to loops and welcome vectorization in your python toolbox. But this is not only about looking cool. There are real advantages of using vectorization.
Advantages of Vectorization:
When you use simple loop it does not take advantage of pythons parallel computing capability and other optimisations that speeds up the program significantly. In shorts here are the advantages of using vectorization.
1. Performance: Vectorized operations are implemented at low-level using highly optimized C code behind the schene. This makes it faster than python loops. This difference is visible when dealing with large dataset.
2. Readability: Vectorized codes are concise and readable. You will see this in example.
3. Ease of Use: With vectorized operation you don't need to worry about boundary conditions or indexes of the loop. This reduces errors.
4. Parallelization: Many libraries that support vectorization takes use of mutli core processor and automatically parallelize the operartion that would otherwise need writing multithreading code.
5. Broadcasting: Vectorized operations can handle operations on arrays of different shapes by automatically broadcasting the compatible data types. This simplifies the operations which otherwise would need nested loop and element-wise calculation. Here is an example of numpy broadcasting or array operation:
import numpy as np
arr = np.array([1, 2, 3, 4])
result = arr + 10 # Scalar 10 is broadcasted to [10, 10, 10, 10]
# Example 2: Broadcasting a smaller array to a larger array
a = np.array([1, 2, 3])
b = np.array([10])
result = a + b # Array b is broadcasted to [10, 10, 10]
5. Concise Code: Vectorized code requires fewer lines of code than equivalent loop based codes. You will see this in example.
Let's see vectorization in Action:
Use Case -1 Creating a Nested List or Multidimensional Array
def create_nested_list_using_loops(n):
t1 = time.time()
nested_list = []
value = 1 # Initial value to start with
for _ in range(n):
row = []
for _ in range(n):
row.append(value)
value += 1 # Increment the value for the next element
nested_list.append(row)
print(nested_list)
t2 = time.time()
print(f"With for loop it took {t2-t1} seconds \n")
return nested_list
With for loop it took 5.6743621826171875e-05 seconds
def create_nested_list_using_vectorization(n):
tv1 = time.time()
value = 1 # Initial value to start with
nested_list = [[value + i + j for j in range(n)] for i in range(0, n * n, n)]
print(nested_list)
tv2 = time.time()
print(f"With vectorization it took {tv2-tv1} seconds")
With vectorization it took 3.814697265625e-05 seconds
Now beat this: Numpy
def create_multidimensional_array_numpy(shape, start_value=0, dtype=int):
t1 = time.time()
total_elements = np.prod(shape)
arr = np.arange(start_value, start_value + total_elements, dtype=dtype).reshape(shape)
print(arr)
t2 = time.time()
print(f"With vectorization it took {t2 - t1} seconds")
With numpy and vectorization it took 0.0002448558807373047 seconds
Verdict-1:
Note: Also if you run this code 10's of times sometimes you may see that loop is yeilding result faster than vectorization sometimes, that is because of caching, less amount of data and because our array being static, in real world huge and dynamic data scenario, vectorization will always win.
Let's see some other examples:
Use Case -2 Do calculation on Multidimensional list and numpy Arrays.
def multiply_nested_list_by_n_loop(nested_list, n):
t1 = time.time()
result = []
for row in nested_list:
new_row = [element * n for element in row]
result.append(new_row)
print(result)
t2 = time.time()
print(f"With for loop it took {t2 - t1} seconds \n")
def multiply_numpy_array_by_n_loop(arr, n):
t1 = time.time()
result = np.empty_like(arr)
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
result[i, j] = arr[i, j] * n
print(result)
t2 = time.time()
print(f"With for loop it took {t2 - t1} seconds \n")
def multiply_numpy_array_by_n_vectorized(arr, n):
t1 = time.time()
result = arr * n
print(result)
t2 = time.time()
print(f"With vectorization it took {t2 - t1} seconds \n")
Now if you compare the speed of these programs with non numpy and non vectorized program which will take around 3 seconds to get same result. The speed factor is
3/0.000175476 = 17096 This says that your python program now runs 10 thousand times faster than it was without numpy and vectorization. It may seems to make sense now, how Mojo programming language is faster than python.
What are some other uses of Vectorization?
1. List Comprehension:
import timeit
# Using list comprehension
def using_list_comprehension(n):
return [x * 2 for x in range(n)]
# Using a for loop
def using_for_loop(n):
result = []
for x in range(n):
result.append(x * 2)
return result
if __name__ == '__main__':
# Number of iterations for the test
n = 10000
# Measure execution time for list comprehension
list_comp_time = timeit.timeit(lambda: using_list_comprehension(n), number=1000)
# Measure execution time for a for loop
for_loop_time = timeit.timeit(lambda: using_for_loop(n), number=1000)
# Print the results
print(f"Time taken by List Comprehension: {list_comp_time:.6f} seconds")
print(f"Time taken by For Loop: {for_loop_time:.6f} seconds")
Time taken by List Comprehension: 0.495206 seconds
Time taken by For Loop: 0.699893 seconds
2. Built In Functions:
3. Python's array Module:
4. Other Libraries:
Here I have used timeit module to compare the runtime. You can learn more about it in my next blog.
Thank you and Happy Learning!
References: https://www.askpython.com/python-modules/numpy/numpy-vectorization
