Learning Hub Market Data & APIs Lesson 10
Phase 2 — Lesson 1 of 8
Hub
Phase 2 · Market Data & APIs · Lesson 10

Pandas Basics
DataFrames & Series

Pandas is the backbone of every Python trading system. Learn to create, slice, filter, and manipulate market data using DataFrames — with real NIFTY OHLCV examples throughout.

~40 min
Beginner-Friendly
3 Quiz Questions
3 Exercises
Phase 2 Progress13%
📊
🎉 Phase 1 Complete — Welcome to Phase 2
Market Data & APIs
8 lessons covering Pandas, data sources, technical indicators, Zerodha KiteConnect, WebSockets & live pipelines.
Section 1
LESSON 10 · PANDAS BASICS
Why Pandas for Trading?
beginnerdata science

When you download NIFTY historical data, you get thousands of rows — dates, open, high, low, close, volume. Pandas is the standard Python library for handling exactly this kind of tabular data. It's fast, expressive, and purpose-built for financial time series.

📋
DataFrame
A table with rows and columns — like a spreadsheet. Each column can hold different types (price, volume, signals).
📈
Series
A single column or row of data with an index — like a labeled list. df["close"] returns a Series.
🔢
Index
Row labels — usually dates for market data. Enables fast time-based slicing and resampling.
Vectorised Ops
No loops needed. Pandas applies operations across all rows simultaneously — extremely fast on large datasets.
import pandas as pd

# Install: pip install pandas

# Create a DataFrame from NIFTY candle data
data = {
    "date":   ["2024-01-15", "2024-01-16", "2024-01-17", "2024-01-18"],
    "open":   [21450, 21580, 21320, 21410],
    "high":   [21620, 21710, 21490, 21560],
    "low":    [21390, 21500, 21280, 21350],
    "close":  [21580, 21340, 21460, 21510],
    "volume": [142500, 168700, 129300, 155100]
}

df = pd.DataFrame(data)
print(df)
Output
date open high low close volume 0 2024-01-15 21450 21620 21390 21580 142500 1 2024-01-16 21580 21710 21500 21340 168700 2 2024-01-17 21320 21490 21280 21460 129300 3 2024-01-18 21410 21560 21350 21510 155100
💡
Naming convention: In trading code, always name your DataFrame df for quick single-instrument data, or nifty_df, reliance_df for multi-instrument work. This keeps your code readable at a glance.
Section 2

Series — Single Column Data

When you select a single column from a DataFrame, you get a Series — a one-dimensional labeled array. This is what you'll work with most when calculating indicators.

# A Series is just one column
close = df["close"]
print(type(close))   # <class 'pandas.core.series.Series'>
print(close)

# Basic stats on the close Series
print(f"Mean  : {close.mean():.2f}")
print(f"Max   : {close.max()}")
print(f"Min   : {close.min()}")
print(f"Std   : {close.std():.2f}")
Output
0 21580 1 21340 2 21460 3 21510 Name: close, dtype: int64 Mean : 21472.50 Max : 21580 Min : 21340 Std : 101.04

Arithmetic on a Series

# Calculate candle body size (vectorised — no loop needed!)
df["body"] = (df["close"] - df["open"]).abs()

# Calculate upper & lower wicks
df["upper_wick"] = df["high"] - df[["open", "close"]].max(axis=1)
df["lower_wick"] = df[["open", "close"]].min(axis=1) - df["low"]

# Candle direction
df["bullish"] = df["close"] > df["open"]

print(df[["date", "body", "upper_wick", "lower_wick", "bullish"]])
Output
date body upper_wick lower_wick bullish 0 2024-01-15 130 40 10 True 1 2024-01-16 240 0 160 False 2 2024-01-17 140 0 30 True 3 2024-01-18 100 50 60 True
Section 3

Selecting & Slicing Data

Pandas gives you multiple ways to select data. In trading code you'll use these constantly — filtering signals, pulling last N candles, or accessing specific dates.

MethodUseExample
df["col"]Select column (Series)df["close"]
df[["c1","c2"]]Select multiple columnsdf[["open","close"]]
df.iloc[n]Row by integer positiondf.iloc[-1] → last row
df.iloc[n:m]Slice rows by positiondf.iloc[-5:] → last 5
df.loc[label]Row by index labeldf.loc["2024-01-15"]
df[condition]Filter rows by conditiondf[df["close"] > 21500]
# Last candle (most recent)
last = df.iloc[-1]
print(f"Last close: {last['close']}")

# Last 3 candles
recent = df.iloc[-3:]
print(recent[["date", "close"]])

# Filter: only bullish candles
bullish_candles = df[df["bullish"] == True]
print(f"Bullish candles: {len(bullish_candles)}")

# Filter: candles with volume above 150,000
high_volume = df[df["volume"] > 150000]
print(high_volume[["date", "volume"]])
Output
Last close: 21510 date close 1 2024-01-16 21340 2 2024-01-17 21460 3 2024-01-18 21510 Bullish candles: 3 date volume 1 2024-01-16 168700 3 2024-01-18 155100
💡
Trading tip: df.iloc[-1] is the fastest way to get the latest candle. Use it instead of df.tail(1) when you only need a single row — it returns a Series with all column values directly accessible.
Section 4

DateTime Index — The Key to Market Data

Real market data always has timestamps. Setting the date column as the DataFrame index unlocks time-based slicing, resampling, and alignment — essential for multi-timeframe analysis.

# Build a proper datetime-indexed DataFrame
data_ts = {
    "datetime": pd.date_range("2024-01-15 09:15", periods=6, freq="5min"),
    "open":     [21450, 21480, 21510, 21490, 21530, 21555],
    "high":     [21495, 21520, 21540, 21520, 21570, 21580],
    "low":      [21430, 21465, 21490, 21475, 21520, 21540],
    "close":    [21480, 21510, 21495, 21530, 21555, 21570],
    "volume":   [12400, 9800, 11200, 8700, 13100, 10500]
}

df5 = pd.DataFrame(data_ts)
df5 = df5.set_index("datetime")   # Set datetime as index

print(df5)
print(f"\nIndex type: {type(df5.index)}")

# Access by time range
morning = df5.between_time("09:15", "09:25")
print(morning["close"])
Output
open high low close volume datetime 2024-01-15 09:15:00 21450 21495 21430 21480 12400 2024-01-15 09:20:00 21480 21520 21465 21510 9800 2024-01-15 09:25:00 21510 21540 21490 21495 11200 2024-01-15 09:30:00 21490 21520 21475 21530 8700 2024-01-15 09:35:00 21530 21570 21520 21555 13100 2024-01-15 09:40:00 21555 21580 21540 21570 10500 Index type: <class 'pandas.core.indexes.datetimes.DatetimeIndex'> datetime 2024-01-15 09:15:00 21480 2024-01-15 09:20:00 21510 2024-01-15 09:25:00 21495

Resampling Timeframes

# Convert 5-min candles to 15-min candles (OHLCV aggregation)
df15 = df5.resample("15min").agg({
    "open":   "first",
    "high":   "max",
    "low":    "min",
    "close":  "last",
    "volume": "sum"
})

print(df15)
15-min Candles
open high low close volume datetime 2024-01-15 09:15:00 21450 21540 21430 21495 33400 2024-01-15 09:30:00 21490 21580 21475 21570 32300
💜
Pro Pattern
Resampling is your multi-timeframe engine. Fetch 1-min data from the broker API and resample to 5, 15, or 60 minutes on the fly — no need to make separate API calls for each timeframe.
Section 5

Adding Columns — Signal Engine Pattern

In a real trading system, you'll keep adding computed columns to your DataFrame — moving averages, signals, P&L. This is the core DataFrame enrichment pattern.

import pandas as pd

# Simulate 10 NIFTY daily candles
closes = [21450, 21580, 21320, 21410, 21550,
          21620, 21480, 21390, 21510, 21640]

df = pd.DataFrame({"close": closes})

# 1. Simple Moving Average (3-period)
df["sma3"] = df["close"].rolling(3).mean().round(2)

# 2. Percentage change candle-to-candle
df["pct_change"] = df["close"].pct_change() * 100

# 3. Signal: close above SMA = BUY, else SELL
df["signal"] = df.apply(
    lambda row: "BUY" if row["close"] > row["sma3"] else "SELL",
    axis=1
)

print(df.tail(5).to_string())
Output
close sma3 pct_change signal 5 21620 21526.67 0.65 BUY 6 21480 21516.67 -0.65 SELL 7 21390 21496.67 -0.42 SELL 8 21510 21460.00 0.56 BUY 9 21640 21513.33 0.61 BUY
💡
rolling(n).mean() is how every moving average in pandas is calculated. The first n-1 rows will be NaN — always drop or handle NaN before using the signal in live trading.
Section 6

Essential DataFrame Methods

These methods come up in nearly every algo trading script. Know them by heart.

MethodWhat it doesTrading use
df.head(n)First n rowsInspect loaded data
df.tail(n)Last n rowsView recent candles
df.shape(rows, cols) tupleVerify data loaded
df.dtypesColumn data typesCheck float vs object
df.isnull().sum()Count NaN per columnFind missing data
df.dropna()Remove rows with NaNClean data after indicators
df.describe()Min/max/mean/std statsQuick data sanity check
df.sort_values(col)Sort by columnSort by date or P&L
df.reset_index()Move index to columnAfter resampling
df.to_csv(path)Save to CSV fileExport trade journal
# Common pre-trade data checks
print(f"Rows loaded    : {df.shape[0]}")
print(f"Columns        : {list(df.columns)}")
print(f"NaN in close   : {df['close'].isnull().sum()}")
print(f"Date range     : {df.index[0]} → {df.index[-1]}")

# Drop NaN rows (important after adding rolling indicators)
df_clean = df.dropna()
print(f"Clean rows     : {len(df_clean)}")
Section 7

Quiz — Test Your Knowledge

Q1. You have a DataFrame df with NIFTY candles. Which code returns only the last 5 rows?
Q2. What does df["close"].rolling(5).mean() calculate?
Q3. You want to convert 5-minute NIFTY candles to 15-minute candles. Which method should you use?
Section 8

Exercises

Exercise 01
BANKNIFTY DataFrame Builder
Create a Pandas DataFrame with 7 BANKNIFTY daily candles (use realistic prices near 48000–49500). Add columns for: body size (abs of close − open), candle direction (bullish/bearish), and whether volume exceeds 200,000. Print the full DataFrame.
Exercise 02
Rolling SMA Signal
Using a NIFTY close price Series of 15 values, compute 3-period and 5-period SMAs. Add a "signal" column: "BUY" when 3-SMA is above 5-SMA, "SELL" otherwise. Drop NaN rows and print the last 8 rows.
Exercise 03
Timeframe Resampler
Create a DataFrame of 30 one-minute NIFTY candles (use pd.date_range("2024-01-15 09:15", periods=30, freq="1min") as the index with made-up OHLCV values). Resample to 5-minute candles using proper OHLCV aggregation and print the result.
Section 9

Lesson Summary

DataFrame
2D table — rows are candles, columns are OHLCV + computed signals.
Series
Single column. Most indicator functions (rolling, pct_change) operate on a Series.
DatetimeIndex
Essential for market data — enables between_time(), resample(), and date slicing.
rolling().mean()
Core of moving average calculation. First n-1 rows will be NaN — always dropna() after.
resample()
Convert 1-min → 5-min → 15-min candles instantly with proper OHLCV aggregation.
iloc[-1]
Fastest way to access the last (most recent) candle in live trading code.
Prev