Lesson 10: Pandas Basics — Market Data & APIs

Section 1

LESSON 10 · PANDAS BASICS

Why Pandas for Trading?

beginnerdata science

When you download NIFTY historical data, you get thousands of rows — dates, open, high, low, close, volume. Pandas is the standard Python library for handling exactly this kind of tabular data. It's fast, expressive, and purpose-built for financial time series.

📋
DataFrame
A table with rows and columns — like a spreadsheet. Each column can hold different types (price, volume, signals).

📈

Series

A single column or row of data with an index — like a labeled list. df["close"] returns a Series.

🔢

Index

Row labels — usually dates for market data. Enables fast time-based slicing and resampling.

⚡

Vectorised Ops

No loops needed. Pandas applies operations across all rows simultaneously — extremely fast on large datasets.

import pandas as pd

# Install: pip install pandas

# Create a DataFrame from NIFTY candle data
data = {
    "date":   ["2024-01-15", "2024-01-16", "2024-01-17", "2024-01-18"],
    "open":   [21450, 21580, 21320, 21410],
    "high":   [21620, 21710, 21490, 21560],
    "low":    [21390, 21500, 21280, 21350],
    "close":  [21580, 21340, 21460, 21510],
    "volume": [142500, 168700, 129300, 155100]
}

df = pd.DataFrame(data)
print(df)

Output

date open high low close volume 0 2024-01-15 21450 21620 21390 21580 142500 1 2024-01-16 21580 21710 21500 21340 168700 2 2024-01-17 21320 21490 21280 21460 129300 3 2024-01-18 21410 21560 21350 21510 155100

💡

Naming convention: In trading code, always name your DataFrame df for quick single-instrument data, or nifty_df, reliance_df for multi-instrument work. This keeps your code readable at a glance.

Section 2

Series — Single Column Data

When you select a single column from a DataFrame, you get a Series — a one-dimensional labeled array. This is what you'll work with most when calculating indicators.

# A Series is just one column
close = df["close"]
print(type(close))   # <class 'pandas.core.series.Series'>
print(close)

# Basic stats on the close Series
print(f"Mean  : {close.mean():.2f}")
print(f"Max   : {close.max()}")
print(f"Min   : {close.min()}")
print(f"Std   : {close.std():.2f}")

Output

0 21580 1 21340 2 21460 3 21510 Name: close, dtype: int64 Mean : 21472.50 Max : 21580 Min : 21340 Std : 101.04

Arithmetic on a Series

# Calculate candle body size (vectorised — no loop needed!)
df["body"] = (df["close"] - df["open"]).abs()

# Calculate upper & lower wicks
df["upper_wick"] = df["high"] - df[["open", "close"]].max(axis=1)
df["lower_wick"] = df[["open", "close"]].min(axis=1) - df["low"]

# Candle direction
df["bullish"] = df["close"] > df["open"]

print(df[["date", "body", "upper_wick", "lower_wick", "bullish"]])

Output

date body upper_wick lower_wick bullish 0 2024-01-15 130 40 10 True 1 2024-01-16 240 0 160 False 2 2024-01-17 140 0 30 True 3 2024-01-18 100 50 60 True

Section 3

Selecting & Slicing Data

Pandas gives you multiple ways to select data. In trading code you'll use these constantly — filtering signals, pulling last N candles, or accessing specific dates.

Method	Use	Example
df["col"]	Select column (Series)	df["close"]
df[["c1","c2"]]	Select multiple columns	df[["open","close"]]
df.iloc[n]	Row by integer position	df.iloc[-1] → last row
df.iloc[n:m]	Slice rows by position	df.iloc[-5:] → last 5
df.loc[label]	Row by index label	df.loc["2024-01-15"]
df[condition]	Filter rows by condition	df[df["close"] > 21500]

# Last candle (most recent)
last = df.iloc[-1]
print(f"Last close: {last['close']}")

# Last 3 candles
recent = df.iloc[-3:]
print(recent[["date", "close"]])

# Filter: only bullish candles
bullish_candles = df[df["bullish"] == True]
print(f"Bullish candles: {len(bullish_candles)}")

# Filter: candles with volume above 150,000
high_volume = df[df["volume"] > 150000]
print(high_volume[["date", "volume"]])

Output

Last close: 21510 date close 1 2024-01-16 21340 2 2024-01-17 21460 3 2024-01-18 21510 Bullish candles: 3 date volume 1 2024-01-16 168700 3 2024-01-18 155100

💡

Trading tip: df.iloc[-1] is the fastest way to get the latest candle. Use it instead of df.tail(1) when you only need a single row — it returns a Series with all column values directly accessible.

Section 4

DateTime Index — The Key to Market Data

Real market data always has timestamps. Setting the date column as the DataFrame index unlocks time-based slicing, resampling, and alignment — essential for multi-timeframe analysis.

# Build a proper datetime-indexed DataFrame
data_ts = {
    "datetime": pd.date_range("2024-01-15 09:15", periods=6, freq="5min"),
    "open":     [21450, 21480, 21510, 21490, 21530, 21555],
    "high":     [21495, 21520, 21540, 21520, 21570, 21580],
    "low":      [21430, 21465, 21490, 21475, 21520, 21540],
    "close":    [21480, 21510, 21495, 21530, 21555, 21570],
    "volume":   [12400, 9800, 11200, 8700, 13100, 10500]
}

df5 = pd.DataFrame(data_ts)
df5 = df5.set_index("datetime")   # Set datetime as index

print(df5)
print(f"\nIndex type: {type(df5.index)}")

# Access by time range
morning = df5.between_time("09:15", "09:25")
print(morning["close"])

Output

open high low close volume datetime 2024-01-15 09:15:00 21450 21495 21430 21480 12400 2024-01-15 09:20:00 21480 21520 21465 21510 9800 2024-01-15 09:25:00 21510 21540 21490 21495 11200 2024-01-15 09:30:00 21490 21520 21475 21530 8700 2024-01-15 09:35:00 21530 21570 21520 21555 13100 2024-01-15 09:40:00 21555 21580 21540 21570 10500 Index type: <class 'pandas.core.indexes.datetimes.DatetimeIndex'> datetime 2024-01-15 09:15:00 21480 2024-01-15 09:20:00 21510 2024-01-15 09:25:00 21495

Resampling Timeframes

# Convert 5-min candles to 15-min candles (OHLCV aggregation)
df15 = df5.resample("15min").agg({
    "open":   "first",
    "high":   "max",
    "low":    "min",
    "close":  "last",
    "volume": "sum"
})

print(df15)

15-min Candles

open high low close volume datetime 2024-01-15 09:15:00 21450 21540 21430 21495 33400 2024-01-15 09:30:00 21490 21580 21475 21570 32300

💜

Pro Pattern

Resampling is your multi-timeframe engine. Fetch 1-min data from the broker API and resample to 5, 15, or 60 minutes on the fly — no need to make separate API calls for each timeframe.

Section 5

Adding Columns — Signal Engine Pattern

In a real trading system, you'll keep adding computed columns to your DataFrame — moving averages, signals, P&L. This is the core DataFrame enrichment pattern.

import pandas as pd

# Simulate 10 NIFTY daily candles
closes = [21450, 21580, 21320, 21410, 21550,
          21620, 21480, 21390, 21510, 21640]

df = pd.DataFrame({"close": closes})

# 1. Simple Moving Average (3-period)
df["sma3"] = df["close"].rolling(3).mean().round(2)

# 2. Percentage change candle-to-candle
df["pct_change"] = df["close"].pct_change() * 100

# 3. Signal: close above SMA = BUY, else SELL
df["signal"] = df.apply(
    lambda row: "BUY" if row["close"] > row["sma3"] else "SELL",
    axis=1
)

print(df.tail(5).to_string())

Output

close sma3 pct_change signal 5 21620 21526.67 0.65 BUY 6 21480 21516.67 -0.65 SELL 7 21390 21496.67 -0.42 SELL 8 21510 21460.00 0.56 BUY 9 21640 21513.33 0.61 BUY

💡

rolling(n).mean() is how every moving average in pandas is calculated. The first n-1 rows will be NaN — always drop or handle NaN before using the signal in live trading.

Section 6

Essential DataFrame Methods

These methods come up in nearly every algo trading script. Know them by heart.

Method	What it does	Trading use
df.head(n)	First n rows	Inspect loaded data
df.tail(n)	Last n rows	View recent candles
df.shape	(rows, cols) tuple	Verify data loaded
df.dtypes	Column data types	Check float vs object
df.isnull().sum()	Count NaN per column	Find missing data
df.dropna()	Remove rows with NaN	Clean data after indicators
df.describe()	Min/max/mean/std stats	Quick data sanity check
df.sort_values(col)	Sort by column	Sort by date or P&L
df.reset_index()	Move index to column	After resampling
df.to_csv(path)	Save to CSV file	Export trade journal

# Common pre-trade data checks
print(f"Rows loaded    : {df.shape[0]}")
print(f"Columns        : {list(df.columns)}")
print(f"NaN in close   : {df['close'].isnull().sum()}")
print(f"Date range     : {df.index[0]} → {df.index[-1]}")

# Drop NaN rows (important after adding rolling indicators)
df_clean = df.dropna()
print(f"Clean rows     : {len(df_clean)}")

Section 7

Quiz — Test Your Knowledge

Q1. You have a DataFrame df with NIFTY candles. Which code returns only the last 5 rows?

Q2. What does df["close"].rolling(5).mean() calculate?

Q3. You want to convert 5-minute NIFTY candles to 15-minute candles. Which method should you use?

Section 8

Exercises

Exercise 01

BANKNIFTY DataFrame Builder

Create a Pandas DataFrame with 7 BANKNIFTY daily candles (use realistic prices near 48000–49500). Add columns for: body size (abs of close − open), candle direction (bullish/bearish), and whether volume exceeds 200,000. Print the full DataFrame.

Exercise 02

Rolling SMA Signal

Using a NIFTY close price Series of 15 values, compute 3-period and 5-period SMAs. Add a "signal" column: "BUY" when 3-SMA is above 5-SMA, "SELL" otherwise. Drop NaN rows and print the last 8 rows.

Exercise 03

Timeframe Resampler

Create a DataFrame of 30 one-minute NIFTY candles (use pd.date_range("2024-01-15 09:15", periods=30, freq="1min") as the index with made-up OHLCV values). Resample to 5-minute candles using proper OHLCV aggregation and print the result.

Section 9

Lesson Summary

DataFrame
2D table — rows are candles, columns are OHLCV + computed signals.

Series

Single column. Most indicator functions (rolling, pct_change) operate on a Series.

DatetimeIndex

Essential for market data — enables between_time(), resample(), and date slicing.

rolling().mean()

Core of moving average calculation. First n-1 rows will be NaN — always dropna() after.

resample()

Convert 1-min → 5-min → 15-min candles instantly with proper OHLCV aggregation.

iloc[-1]

Fastest way to access the last (most recent) candle in live trading code.

Lesson 11: Reading Market Data

Pandas BasicsDataFrames & Series

Series — Single Column Data

Arithmetic on a Series

Selecting & Slicing Data

DateTime Index — The Key to Market Data

Resampling Timeframes

Adding Columns — Signal Engine Pattern

Essential DataFrame Methods

Quiz — Test Your Knowledge

Exercises

Lesson Summary

Pandas Basics
DataFrames & Series