Learning Hub Market Data & APIs Lesson 11
Phase 2 — Lesson 2 of 8
Hub
Phase 2 · Market Data & APIs · Lesson 11

Reading Market Data
CSV, yfinance & NSEpy

Before you can build an algo, you need data. Learn how to load historical OHLCV data from CSV files, fetch NIFTY and BANKNIFTY data using yfinance, and use NSEpy for NSE-specific historical data.

~40 min
Beginner-Friendly
3 Quiz Questions
3 Exercises
Phase 2 Progress25%
Section 1
LESSON 11 · READING MARKET DATA
Market Data Sources Overview
beginnerdata sourcing

Every algo trading system starts with data. In India, you'll work with three main free sources before connecting to a paid broker API: CSV files (from NSE website), yfinance (Yahoo Finance Python library), and NSEpy (NSE-specific historical data).

📄
Option 1
CSV Files — NSE Bhavcopy
NSE publishes daily Bhavcopy (price data) as CSV files at eoddata.com and nseindia.com. Best for offline backtesting with guaranteed NSE data.
📡
Option 2
yfinance — Yahoo Finance API
Free Python library to download historical OHLCV data for any NSE stock (append .NS) or index (^NSEI, ^NSEBANK). Up to 20+ years of daily data.
🏛️
Option 3
NSEpy — NSE Historical Data
Python library for fetching NSE equity and F&O historical data directly from NSE website. Includes futures and options historical prices.
# Install all three pip install pandas yfinance nsepy
Section 2

Loading Data from CSV

NSE's daily Bhavcopy CSV format is the most reliable offline data source. Here's how to load, clean, and use it with Pandas.

# Sample NSE Bhavcopy CSV structure:
# SYMBOL,SERIES,OPEN,HIGH,LOW,CLOSE,TOTTRDQTY,TIMESTAMP
# NIFTY 50,EQ,21450,21620,21390,21580,0,15-JAN-2024

import pandas as pd

# --- Load from CSV ---
df = pd.read_csv(
    "nifty_daily.csv",
    parse_dates=["TIMESTAMP"],    # auto-parse date column
    dayfirst=True                   # DD-MMM-YYYY format
)

# Rename columns to lowercase standard
df.rename(columns={
    "OPEN":       "open",
    "HIGH":       "high",
    "LOW":        "low",
    "CLOSE":      "close",
    "TOTTRDQTY":  "volume",
    "TIMESTAMP":  "date"
}, inplace=True)

# Set date as index and sort
df = df.set_index("date").sort_index()

# Keep only OHLCV columns
df = df[["open", "high", "low", "close", "volume"]]

print(df.head())
print(f"\nRows: {len(df)}, Range: {df.index[0].date()} → {df.index[-1].date()}")
Output
open high low close volume date 2024-01-15 21450 21620 21390 21580 142500 2024-01-16 21580 21710 21500 21340 168700 2024-01-17 21320 21490 21280 21460 129300 Rows: 252, Range: 2023-01-02 → 2024-01-15
💡
Always sort after loading. CSV files from NSE are sometimes in reverse chronological order. .sort_index() ensures your candles go oldest-to-newest, which is required for rolling indicators to work correctly.
Section 3

Fetching Data with yfinance

yfinance is the quickest way to get historical data in Python. For NSE stocks, append .NS to the ticker. For indices, use ^NSEI (NIFTY 50) or ^NSEBANK (BANKNIFTY).

import yfinance as yf
import pandas as pd

# ── NIFTY 50 Index (1 year of daily data) ──
nifty = yf.download(
    tickers  = "^NSEI",
    start    = "2024-01-01",
    end      = "2024-12-31",
    interval = "1d",        # daily candles
    progress = False
)

nifty.columns = [c.lower() for c in nifty.columns]
print(nifty.tail(5))
NIFTY 50 — Last 5 Days
open high low close volume Date 2024-12-25 23850.5 24100.2 23780.3 24010.6 98400 2024-12-26 24010.6 24210.8 23990.1 24150.3 115200 2024-12-27 24150.3 24320.5 24080.7 24200.9 108600 2024-12-30 24200.9 24280.4 23950.2 24080.5 125300 2024-12-31 24080.5 24350.1 24010.3 24198.8 142100

Common yfinance Tickers for Indian Markets

Instrumentyfinance TickerInterval Options
NIFTY 50^NSEI1m, 5m, 15m, 1h, 1d
BANKNIFTY^NSEBANK1m, 5m, 15m, 1h, 1d
RELIANCE NSERELIANCE.NS1m, 5m, 15m, 1h, 1d
TCS NSETCS.NS1m, 5m, 15m, 1h, 1d
HDFC Bank NSEHDFCBANK.NS1m, 5m, 15m, 1h, 1d
Infosys NSEINFY.NS1m, 5m, 15m, 1h, 1d
# ── Intraday 5-minute data (last 60 days max) ──
nifty_5m = yf.download(
    tickers  = "^NSEI",
    period   = "5d",          # last 5 trading days
    interval = "5m",          # 5-minute candles
    progress = False
)
print(f"5-min candles loaded: {len(nifty_5m)}")
print(nifty_5m.tail(3))

# ── Multiple tickers at once ──
watchlist = yf.download(
    tickers  = "RELIANCE.NS TCS.NS HDFCBANK.NS INFY.NS",
    start    = "2024-01-01",
    end      = "2024-06-30",
    interval = "1d",
    progress = False
)
# Access individual stock close prices
reliance_close = watchlist["Close"]["RELIANCE.NS"]
print(reliance_close.tail(3))
⚠️
yfinance limitation: Intraday data (1m, 5m, 15m) is only available for the last 60 days for 5m/15m and 7 days for 1m. For full historical intraday data, you need a broker API like Zerodha KiteConnect (covered in Lesson 14).
Section 4

Using NSEpy for NSE Data

NSEpy fetches data directly from NSE India's website. It's useful for getting stock-specific historical data and F&O contract prices.

from nsepy import get_history
from datetime import date
import pandas as pd

# ── Equity historical data ──
reliance = get_history(
    symbol    = "RELIANCE",
    start     = date(2024, 1, 1),
    end       = date(2024, 6, 30),
    series    = "EQ"
)

# Rename to standard OHLCV columns
reliance = reliance[["Open", "High", "Low", "Close", "Volume"]]
reliance.columns = ["open", "high", "low", "close", "volume"]
print(reliance.tail(5))

# ── NIFTY Futures historical data ──
nifty_fut = get_history(
    symbol      = "NIFTY",
    start       = date(2024, 1, 1),
    end         = date(2024, 1, 31),
    index       = True,
    futures     = True,
    expiry_date = date(2024, 1, 25)   # last Thursday of month
)
print(nifty_fut[["Open","High","Low","Close","Volume"]].head())
RELIANCE Equity — Last 5 Days
open high low close volume Date 2024-06-24 2890.50 2925.30 2875.10 2910.80 4823000 2024-06-25 2910.80 2945.60 2905.20 2930.40 5120000 2024-06-26 2930.40 2968.90 2922.10 2955.70 4657000 2024-06-27 2955.70 2972.30 2940.50 2948.20 4389000 2024-06-28 2948.20 2980.10 2935.60 2970.50 6234000
💜
When to use what
Use yfinance for quick prototyping and index data. Use NSEpy when you need F&O historical data or want to avoid Yahoo Finance's occasional data gaps. Use CSV for guaranteed offline backtesting without internet dependency.
Section 5

Data Cleaning — The Essential Step

Raw market data always has issues: missing rows, zero prices, duplicate timestamps, wrong dtypes. Always clean before analysis.

import pandas as pd

def clean_ohlcv(df: pd.DataFrame) -> pd.DataFrame:
    """Standard OHLCV cleaning pipeline for any data source."""

    # 1. Remove duplicate index timestamps
    df = df[~df.index.duplicated(keep="last")]

    # 2. Sort chronologically
    df = df.sort_index()

    # 3. Remove rows with zero or negative prices
    price_cols = ["open", "high", "low", "close"]
    df = df[(df[price_cols] > 0).all(axis=1)]

    # 4. OHLC integrity check (high must be highest)
    valid = (
        (df["high"] >= df["open"]) &
        (df["high"] >= df["close"]) &
        (df["low"]  <= df["open"]) &
        (df["low"]  <= df["close"])
    )
    bad_rows = (~valid).sum()
    if bad_rows > 0:
        print(f"⚠️  Removed {bad_rows} candles failing OHLC check")
    df = df[valid]

    # 5. Ensure numeric types
    for col in ["open","high","low","close","volume"]:
        df[col] = pd.to_numeric(df[col], errors="coerce")

    # 6. Drop any remaining NaN rows
    before = len(df)
    df = df.dropna(subset=price_cols)
    dropped = before - len(df)
    if dropped:
        print(f"⚠️  Dropped {dropped} rows with NaN prices")

    print(f"✅ Clean rows: {len(df)}")
    return df

# Use it on any data source
df_clean = clean_ohlcv(nifty)
print(df_clean.dtypes)
Output
✅ Clean rows: 248 open float64 high float64 low float64 close float64 volume float64 dtype: object

Saving Cleaned Data to CSV

# Save to CSV for offline use (no internet needed next run)
df_clean.to_csv("nifty_clean.csv")

# Reload later
df_reload = pd.read_csv("nifty_clean.csv", index_col=0, parse_dates=True)
print(f"Reloaded {len(df_reload)} rows")
💡
Cache your data locally. Always save cleaned data to CSV after fetching. In production algos, fetch once per day at market open and read from the cached file throughout the session — avoids rate limits and network errors.
Section 6

Full Data Pipeline — Fetch, Clean & Prepare

import yfinance as yf
import pandas as pd
from pathlib import Path

def fetch_nse_data(symbol: str, period: str = "1y", interval: str = "1d") -> pd.DataFrame:
    """Fetch, clean and return OHLCV data for an NSE symbol."""

    # Map symbol to yfinance ticker
    ticker_map = {
        "NIFTY":       "^NSEI",
        "BANKNIFTY":   "^NSEBANK",
        "RELIANCE":    "RELIANCE.NS",
        "TCS":         "TCS.NS",
        "HDFCBANK":    "HDFCBANK.NS",
        "INFY":        "INFY.NS",
    }
    ticker = ticker_map.get(symbol, symbol + ".NS")

    # Check cache first
    cache_path = Path(f"data/{symbol}_{interval}.csv")
    if cache_path.exists():
        print(f"📂 Loading {symbol} from cache...")
        return pd.read_csv(cache_path, index_col=0, parse_dates=True)

    # Fetch from yfinance
    print(f"📡 Fetching {symbol} ({ticker}) from yfinance...")
    raw = yf.download(ticker, period=period, interval=interval, progress=False)
    raw.columns = [c.lower() for c in raw.columns]

    # Clean
    raw = raw[["open","high","low","close","volume"]].dropna()

    # Save to cache
    cache_path.parent.mkdir(exist_ok=True)
    raw.to_csv(cache_path)
    print(f"✅ Saved to cache: {cache_path}")
    return raw

# Fetch NIFTY and BANKNIFTY daily data
nifty_df    = fetch_nse_data("NIFTY")
banknifty_df = fetch_nse_data("BANKNIFTY")

print(f"\nNIFTY    : {len(nifty_df)} days")
print(f"BANKNIFTY: {len(banknifty_df)} days")
Output
📡 Fetching NIFTY (^NSEI) from yfinance... ✅ Saved to cache: data/NIFTY_1d.csv 📂 Loading BANKNIFTY from cache... NIFTY : 252 days BANKNIFTY: 252 days
Section 7

Quiz

Q1. To fetch NIFTY 50 daily data using yfinance, which ticker string should you use?
Q2. After loading data from CSV, which operation is most important before running any rolling indicator?
Q3. yfinance's intraday 5-minute data is limited to the last how many days?
Section 8

Exercises

Exercise 01
yfinance Multi-Stock Downloader
Use yfinance to download 6 months of daily data for RELIANCE.NS, TCS.NS, and HDFCBANK.NS simultaneously. Print the shape of the result and the last 3 rows of each stock's close price.
Exercise 02
Data Cleaner Function
Write a function validate_ohlcv(df) that checks: (1) no NaN in OHLCV columns, (2) high ≥ open and high ≥ close, (3) low ≤ open and low ≤ close, (4) volume ≥ 0. Print how many rows fail each check and return a clean DataFrame.
Exercise 03
Cached Data Pipeline
Build a get_data(symbol, interval) function that: checks if data/{symbol}_{interval}.csv exists, loads from cache if present, fetches from yfinance and saves if not. Test with NIFTY 1d and BANKNIFTY 5m data.
Section 9

Lesson Summary

CSV / Bhavcopy
Best for offline backtesting. Use pd.read_csv(..., parse_dates=True) and always sort_index().
yfinance
Free historical data — append .NS for NSE stocks, ^NSEI/^NSEBANK for indices. 60-day limit on intraday.
NSEpy
NSE-native library. Best for F&O historical data and NSE-specific queries.
Data Cleaning
Always check for duplicates, zero prices, OHLC integrity, and NaN before using data in signals.
Cache Strategy
Fetch once, save to CSV, reload from cache. Avoids rate limits and network failures in production.
.NS ticker
NSE-listed stocks on yfinance require ".NS" suffix. RELIANCE → RELIANCE.NS.
Prev