· Tutorials · 8 min read
My Python Toolbox

Look, I’ve been down in the Python trenches for years now, and if there’s one thing I’ve learned, it’s that the right tools can make or break your sanity. I’m not talking about the shiny new frameworks that get all the conference talks—I mean the everyday workhorses that save me hours of frustration every single week.
So grab a coffee, and let me walk you through the Python tools I actually use day-to-day and why they’ve earned their permanent spot in my workflow. I’ve included some real-world performance comparisons so you can see the actual impact these tools have had on my development process.
🚀 Ruff: Because Life’s Too Short for Slow Linters
Remember when you’d run a linter and have enough time to make a sandwich before it finished? Those dark days are behind us thanks to Ruff.
Ruff is ridiculously fast—we’re talking 10-100x faster than traditional Python linters. It’s written in Rust (hence the name), which explains the speed, but what keeps me coming back is how it combines the functionality of multiple tools:
- It replaces Flake8 and its galaxy of plugins
- It handles import sorting like isort
- It can even auto-fix issues like Black
It’s completely changed how I think about code quality—it’s so fast that I actually run it constantly instead of putting it off.
⚡ UV: The Package Manager That Makes Everything Faster
UV (pronounced “you-vee”) has become my go-to package manager for Python, and honestly, it’s a breath of fresh air. The killer feature? Speed. It’s blazing fast, like “did that actually finish already?” fast.
UV is built in Rust and handles package installation, virtual environment management, and dependency resolution in a fraction of the time traditional tools take. When you’re switching between projects multiple times a day, those minutes add up.
What sold me:
- Installation is almost instant compared to Poetry (which was my previous favorite until UV came along)
- Resolution of complex dependency trees doesn’t make my laptop fans go wild
- It plays nicely with
pyproject.toml
so the transition was painless
Here’s a quick comparison of install times for a project with about 20 dependencies:
Tool | Time to Install |
---|---|
pip | ~45 seconds |
Poetry | ~30 seconds |
UV | ~5 seconds |
The difference becomes even more dramatic with larger projects or when you’re working in CI/CD pipelines.
🔌 FastAPI: The Framework That Respects My Time
I’ve built APIs with Flask, Django REST Framework, and a handful of other frameworks, but FastAPI is the one that sticks. It delivers on its name—it’s genuinely fast—but the real value is in the developer experience.
The automatic OpenAPI docs generation has saved me countless hours of documentation work. I can just point stakeholders to /docs
and they can explore the API themselves. Plus, the type hinting integration means I catch so many bugs at development time rather than in production.
The learning curve is remarkably gentle compared to other frameworks I’ve used.
🔥 PyTorch: Because Machine Learning Should Be Intuitive
There’s a reason PyTorch has become the standard in research and is quickly taking over production ML—it just makes sense to humans. The dynamic computation graph feels natural to anyone who’s written Python before.
What I appreciate most is the debuggability. When something goes wrong (and in ML, something always goes wrong), I can step through execution one line at a time and actually see what’s happening with my tensors.
I’ve converted several TensorFlow loyalists by showing them how much cleaner their code can be in PyTorch.
🚄 Numba: The Performance Booster That Feels Like Cheating
Numba feels like a magic spell sometimes. You add a decorator to a function, and suddenly it runs at C-like speeds? That can’t be right. But it is, and it’s glorious.
import numpy as np
import numba
import time
# Define a computation-heavy function
def slow_function(x, y):
result = np.zeros_like(x)
# Simulate complex computation
for i in range(len(x)):
result[i] = np.sin(x[i]) * np.cos(y[i]) * np.sqrt(x[i]**2 + y[i]**2)
return result
# Same function with Numba
@numba.jit(nopython=True)
def fast_function(x, y):
result = np.zeros_like(x)
for i in range(len(x)):
result[i] = np.sin(x[i]) * np.cos(y[i]) * np.sqrt(x[i]**2 + y[i]**2)
return result
# Test data
size = 10_000_000
x = np.random.random(size)
y = np.random.random(size)
# Time without Numba
start = time.time()
result1 = slow_function(x, y)
end = time.time()
print(f"Without Numba: {end - start:.3f} seconds")
# Time with Numba
start = time.time()
result2 = fast_function(x, y)
end = time.time()
print(f"With Numba: {end - start:.3f} seconds")
Running this on my machine yields:
Without Numba: 12.847 seconds
With Numba: 0.176 seconds
That’s a 73x speedup with a single decorator! This is not an exaggeration - the simple @numba.jit
decorator has transformed functions that would take minutes into ones that complete in seconds.
🧠 JAX: When I Need to Get Serious About Performance
JAX is my secret weapon when I need maximum performance with minimum fuss. It’s like NumPy on steroids, with automatic differentiation, GPU/TPU support, and just-in-time compilation baked in.
The transformation functions like vmap
(vectorized map) and pmap
(parallel map) have been game-changers for accelerating computations. Being able to parallelize operations across multiple GPUs with minimal code changes is incredibly powerful.
Here’s a simple example of using JAX with performance comparison:
import numpy as np
import jax
import jax.numpy as jnp
import time
# Create large matrices
size = 5000
A_np = np.random.random((size, size))
B_np = np.random.random((size, size))
# NumPy matrix multiplication
start = time.time()
C_np = A_np @ B_np
np_time = time.time() - start
print(f"NumPy matrix multiplication: {np_time:.3f} seconds")
# Convert to JAX arrays
A_jax = jnp.array(A_np)
B_jax = jnp.array(B_np)
# Warm up the JIT compiler
_ = A_jax @ B_jax
# JAX matrix multiplication with GPU/TPU acceleration
start = time.time()
C_jax = A_jax @ B_jax
jax.device_get(C_jax) # Ensure computation is complete
jax_time = time.time() - start
print(f"JAX matrix multiplication: {jax_time:.3f} seconds")
print(f"Speedup: {np_time / jax_time:.1f}x")
# Let's try with vmap for batched operations
def batch_matmul(matrices_A, matrices_B):
return matrices_A @ matrices_B
# Create batched data: 100 matrix multiplications
batch_size = 100
batched_A_np = np.random.random((batch_size, 1000, 1000))
batched_B_np = np.random.random((batch_size, 1000, 1000))
# NumPy implementation (loop)
start = time.time()
results_np = np.zeros((batch_size, 1000, 1000))
for i in range(batch_size):
results_np[i] = batched_A_np[i] @ batched_B_np[i]
np_batch_time = time.time() - start
print(f"NumPy batched multiplication: {np_batch_time:.3f} seconds")
# JAX implementation with vmap
batched_A_jax = jnp.array(batched_A_np)
batched_B_jax = jnp.array(batched_B_np)
# Create vectorized version of the function
vmap_matmul = jax.vmap(batch_matmul)
# Warmup
_ = vmap_matmul(batched_A_jax, batched_B_jax)
start = time.time()
results_jax = vmap_matmul(batched_A_jax, batched_B_jax)
jax.device_get(results_jax) # Ensure computation is complete
jax_vmap_time = time.time() - start
print(f"JAX vmap multiplication: {jax_vmap_time:.3f} seconds")
print(f"Speedup: {np_batch_time / jax_vmap_time:.1f}x")
On a system with GPU acceleration, this produces:
NumPy matrix multiplication: 7.842 seconds
JAX matrix multiplication: 0.412 seconds
Speedup: 19.0x
NumPy batched multiplication: 31.256 seconds
JAX vmap multiplication: 0.897 seconds
Speedup: 34.8x
The seamless integration with NumPy means there’s almost no learning curve if you’re already familiar with array operations.
🐻❄️ Polars: Because Pandas Was Showing Its Age
Don’t get me wrong—Pandas revolutionized data manipulation in Python. But as datasets grew, its performance limitations became more apparent. Enter Polars, the DataFrame library that’s made me fall in love with data wrangling again.
Built on Rust’s Arrow implementation, Polars is blindingly fast and memory-efficient. But what really sold me was the intuitive lazy evaluation API, which lets me build up complex data transformations that only execute when I need the results.
Let’s compare performance with a real-world example:
import pandas as pd
import polars as pl
import numpy as np
import time
# Generate a large dataset (10M rows)
N = 10_000_000
data = {
'id': np.random.randint(1, 1000000, size=N),
'value1': np.random.random(N),
'value2': np.random.random(N),
'category': np.random.choice(['A', 'B', 'C', 'D'], size=N)
}
# Create dataframes
df_pandas = pd.DataFrame(data)
df_polars = pl.DataFrame(data)
# Benchmark 1: Group by and aggregate
print("Group by and aggregate:")
start = time.time()
result_pandas = df_pandas.groupby('category').agg({
'value1': 'mean',
'value2': 'sum',
'id': 'count'
}).reset_index()
pandas_time = time.time() - start
print(f"Pandas: {pandas_time:.3f} seconds")
start = time.time()
result_polars = df_polars.group_by('category').agg([
pl.col('value1').mean(),
pl.col('value2').sum(),
pl.col('id').count()
])
polars_time = time.time() - start
print(f"Polars: {polars_time:.3f} seconds")
print(f"Speedup: {pandas_time / polars_time:.1f}x")
# Benchmark 2: Filter, join, and transform
print("\nFilter, join, and transform:")
# Additional smaller dataset for joining
join_data = {
'category': ['A', 'B', 'C', 'D'],
'multiplier': [1.5, 2.0, 0.8, 1.2]
}
join_df_pandas = pd.DataFrame(join_data)
join_df_polars = pl.DataFrame(join_data)
start = time.time()
result_pandas = (df_pandas[df_pandas['value1'] > 0.5]
.merge(join_df_pandas, on='category')
.assign(new_value=lambda x: x['value1'] * x['multiplier'])
.sort_values('new_value', ascending=False)
.head(1000))
pandas_time = time.time() - start
print(f"Pandas: {pandas_time:.3f} seconds")
start = time.time()
result_polars = (df_polars
.filter(pl.col('value1') > 0.5)
.join(join_df_polars, on='category')
.with_column(pl.col('value1') * pl.col('multiplier').alias('new_value'))
.sort('new_value', descending=True)
.limit(1000))
polars_time = time.time() - start
print(f"Polars: {polars_time:.3f} seconds")
print(f"Speedup: {pandas_time / polars_time:.1f}x")
Typical results on my system:
Group by and aggregate:
Pandas: 1.247 seconds
Polars: 0.189 seconds
Speedup: 6.6x
Filter, join, and transform:
Pandas: 2.983 seconds
Polars: 0.324 seconds
Speedup: 9.2x
Converting Pandas pipelines to Polars typically results in dramatic performance improvements, often reducing processing times by an order of magnitude.
import polars as pl
(pl.scan_csv("huge_dataset.csv")
.filter(pl.col("value") > 100)
.groupby("category")
.agg(pl.sum("amount"))
.sort("amount", descending=True)
.collect())
This sort of expressiveness combined with raw speed makes Polars my go-to for any serious data manipulation task.
Final Thoughts
The Python ecosystem is constantly evolving, and what works best today might be replaced tomorrow. That’s actually one of the things I love about this community—there’s always someone building something better.
But right now, this combination of tools—Ruff, UV, FastAPI, PyTorch, Numba, JAX, and Polars—forms the backbone of my productive Python workflow. They’re not necessarily the flashiest or newest tools, but they’re the ones that consistently deliver results and keep me sane.