- Legit Python
- Posts
- Learn NumPy from Scratch – Step by Step! (Part 1)
Learn NumPy from Scratch – Step by Step! (Part 1)
Getting Started with Data Science in Python

NumPy is one of the most essential Python libraries for data science, providing a powerful yet easy-to-use n-dimensional array. It serves as the foundation for many other data science and machine learning libraries, making it a must-learn for anyone stepping into this field. Whether working with large datasets, performing complex calculations, or just starting with numerical computing, NumPy is your gateway to efficient data manipulation in Python.
This step-by-step guide will introduce you to NumPy from scratch and help you build a solid understanding of its core features. By the end of this tutorial, you’ll know:
The fundamental concepts that make NumPy so powerful
How to create and manipulate NumPy arrays effortlessly
How to perform fast mathematical operations with NumPy
How to apply these concepts to real-world problems
To follow along smoothly, you should have basic Python knowledge. If you are new to Python, you can check out my Python eBook, which covers everything you need to get started.
Throughout this tutorial, I'll also provide hands-on code examples that you can experiment with. Feel free to tweak the code, run it yourself, and see how it works in action.
Why Use NumPy?
If you already know Python, you might be wondering: Why learn NumPy? Can’t I just use lists and loops for calculations? While Python’s built-in features are great, NumPy takes things to the next level by making numerical computing faster, cleaner, and more efficient.
Here’s why NumPy is a game-changer:
Blazing Fast Speed: NumPy is built on C, meaning operations happen much faster than using Python loops. Instead of waiting for seconds, you get results in nanoseconds!
No More Nested Loops: NumPy lets you work with entire arrays at once, removing the need for complicated loops and making your code simpler and easier to read.
Clean & Readable Code: With NumPy, your calculations look more like math equations, making your code more intuitive and easier to debug.
Trusted & Optimized: Thousands of developers contribute to NumPy, constantly improving it to be faster, reliable, and bug-free. That’s why it’s the backbone of data science in Python!
Because of these benefits, NumPy has become the standard for numerical computing, and many popular libraries—like Pandas, SciPy, and TensorFlow—are built on top of it. Learning NumPy isn’t just helpful; it’s an essential step for anyone working with data in Python.
Installing NumPy – Quick and Easy Setup
There are several ways to install NumPy, depending on your setup and workflow. If you prefer an online, hassle-free approach, platforms like Google Colab and Repl.it let you run NumPy code without any installation.
For data science professionals, Anaconda provides a robust environment with pre-installed packages, including NumPy.
However, in this tutorial, we’ll focus on setting up NumPy on a local machine using pip and Jupyter Notebook.
We’ll follow these steps:
Create a dedicated folder for our NumPy Guide
Set up a virtual environment to manage dependencies
Install NumPy and Jupyter Notebook inside the environment
Here’s how you can do it:
1. Create a Project Directory 📂
First, open your terminal (Mac/Linux) or command prompt (Windows) and run:
mkdir numpy-guide # Create a new folder
cd numpy-guide # Move into the folder
2. Set Up a Virtual Environment 🛠️
Now, let’s create a virtual environment inside this directory:
python -m venv numpy-tutorial # Create a virtual environment
To activate it:
On windows:
numpy-tutorial/Scripts/activate
On Mac/Linux:
source numpy-tutorial/bin/activate
Once activated, your terminal will show something like this:
(numpy-tutorial) $
This confirms that your virtual environment is active! 🎉
3. Install NumPy and Jupyter Notebook
With the virtual environment activated, install NumPy and Jupyter Notebook:
pip install numpy jupyter
To check if NumPy is installed correctly, write Python inside the virtual environment:
python
Then, type:
import numpy as np
print(np.__version__) # printing the installed NumPy version
If you see the version number, NumPy is installed successfully! ✅
4. Start Jupyter Notebook in Your Browser 🌐
Now, let’s launch Jupyter Notebook:
jupyter notebook
This will open Jupyter in your browser, where you can start writing and running NumPy code! 🚀
Getting Started with NumPy: A Simple Example
Before exploring NumPy’s powerful features, let’s begin with a simple example to understand its core concepts.
Imagine you're analyzing temperature readings collected throughout the week. However, some of the readings are slightly off due to sensor inconsistencies. To correct this, you decide to adjust all readings by a fixed offset while ensuring they stay within a realistic range.
Here’s how NumPy makes this process simple and efficient:
import numpy as np
# Simulated temperature readings in Celsius
temperatures = np.array([15.2, 18.5, 21.1, 16.8, 14.5, 19.3, 20.0])
# Offset to adjust readings
OFFSET = 2.0
# Apply the offset to all readings
adj_temperatures = temperatures + OFFSET
# Ensure temp don't exceed a realistic threshold (e.g., 25°C)
final_temperatures = np.clip(adj_temperatures, temperatures, 25)
print(final_temperatures)
Key Takeaways from This Example:
Creating Arrays: We use
np.array()
to create a NumPy array for storing temperature values.Vectorized Operations: Instead of looping through each element, we simply add an
OFFSET
to the entire array at once. This makes the operation both concise and efficient.Broadcasting: NumPy automatically applies the scalar
OFFSET
to each element in the array, adjusting all values in one step.Using Built-in Functions: The
np.clip()
function ensures that no temperature exceeds the maximum limit while keeping the original values as the lower bound.
Getting Into Shape: Array Shapes & Axes
Now that you’ve seen some of what NumPy can do, let’s solidify your foundation with some key concepts. When working with arrays, especially in higher dimensions, understanding their shape and axes is crucial.
At first, one-dimensional arrays (vectors) are easy to understand—they’re just lists of numbers. Two-dimensional arrays? Still manageable; think of them as spreadsheets or tables. But when you step into three dimensions, things get tricky. And four dimensions? Let’s just say visualizing them is no easy feat!
That’s why understanding array shape and axes is essential. But don’t worry, Let’s understand them step by step.
Mastering Shape
When working with multidimensional arrays, understanding their shape is crucial. At some point, visualizing complex shapes becomes impractical, and it’s easier to rely on NumPy’s tools to confirm your array’s structure.
Every NumPy array has a .shape attribute, which returns a tuple representing the size of each dimension. While the exact order of dimensions might not always be important, ensuring that your array has the expected shape before passing it into functions is critical. A simple but effective habit is to print your array’s shape as a quick sanity check.
Let’s understand this with a example:
import numpy as np
# Creating an array and reshaping it
data = np.array([
5, 12, 18, 6, 9, 15,
7, 14, 21, 8, 16, 24
]).reshape(3, 2, 2)
# Checking the shape
print("Shape of the array:", data.shape)
# Printing the array to see its structure
print(data)
# Swapping axes for better visualization
swapped = np.swapaxes(data, 1, 2)
print("\nArray after swapping axes:\n", swapped)
What’s Happening Here?
We first create a 3D array and reshape it into a
(3, 2, 2)
structure.The
.shape
attribute confirms the dimensions.Since printed multi-dimensional arrays can be tricky to read, we use
.swapaxes()
to rearrange them, making their structure easier to interpret.
Shape manipulation is fundamental in NumPy, and it will become even more important when we discuss broadcasting later. For now, just remember—a quick shape check can save you from debugging headaches!
Understanding Axes
Knowing the shape of your data is one thing, but understanding axes is just as important! In NumPy, axes are zero-indexed, meaning:
Axis 0 is the vertical axis (rows).
Axis 1 is the horizontal axis (columns).
Many NumPy functions change behavior based on the axis you specify. Let’s understand this with .sum()
, which calculates the sum of values in an array:
import numpy as np
# Creating a 3x4 array
matrix = np.array([
[10, 20, 30, 40],
[5, 15, 25, 35],
[2, 4, 6, 8]
])
# Default sum (across all elements)
print("Total sum:", matrix.sum())
# Sum along axis 0 (column-wise sum)
print("Sum along axis 0:", matrix.sum(axis=0))
# Sum along axis 1 (row-wise sum)
print("Sum along axis 1:", matrix.sum(axis=1))
What’s Happening Here?
Calling
.sum()
without an axis sums all elements in the array..sum(axis=0)
sums column-wise, giving one sum per column..sum(axis=1)
sums row-wise, giving one sum per row.
Many NumPy functions, like .mean()
, .min()
, .max()
, and .std()
, follow the same logic. If no axis is provided, they process the entire dataset. Otherwise, they compute results along the specified axis.
Broadcasting
Broadcasting allows NumPy to perform operations on arrays of different shapes by automatically expanding one of them. The rule is simple:
Arrays can be broadcast if their dimensions match or if one has a size of 1 along a particular axis.
Here’s how it works:
If the arrays have the same shape, operations happen element-wise.
If one array has a size of 1 along an axis, NumPy duplicates its values along that axis to match the other array.
Example: Let's create two arrays and see how broadcasting works:
import numpy as np
# A: 3D array of shape (3, 1, 4)
A = np.array([
[[1, 2, 3, 4]],
[[5, 6, 7, 8]],
[[9, 10, 11, 12]]
])
# B: 3D array of shape (1, 2, 4)
B = np.array([
[[10, 20, 30, 40],
[50, 60, 70, 80]]
])
# Adding A and B
result = A + B
print(result)
What’s Happening?
Axis 0: A has 3, B has 1 → B is duplicated 3 times.
Axis 1: A has 1, B has 2 → A is duplicated 2 times.
Axis 2: Both have 4 → No need to duplicate.
After broadcasting, NumPy expands both arrays to the same shape and performs element-wise addition.
Why Use Broadcasting?
No need for explicit loops → Faster execution!
Clean, concise, and memory-efficient code.
Used in machine learning, image processing, and numerical computing.
Final Words
That’s it for this part! Next, we will explore advanced NumPy operations like indexing, filtering, sorting, and aggregating. We’ll also cover data types, structured arrays, and image manipulation with Matplotlib, along with an introduction to powerful libraries like pandas and scikit-learn.
Until then, keep coding and exploring! 🚀

coding gif ~ giphy.com