📚 Python DS/ML/DL/NLP Libraries — Categories Index¶
📊 1. Data Analysis & Numerical Foundations¶
Core Numerical
| ID |
Library |
Rationale |
Status |
| 1.0 |
NumPy 🔥 |
Foundation of scientific Python (arrays, linear algebra). |
Active |
| 1.1 |
SciPy 🔥 |
Scientific routines (optimization, stats, integration). |
Active |
Tabular Data
| ID |
Library |
Rationale |
Status |
| 1.2 |
Pandas 🔥 |
Standard for structured/tabular data. |
Active |
| 1.3 |
Polars |
Rust-powered DataFrames; faster than Pandas. |
Active (Rising) |
Distributed & Big Data
| ID |
Library |
Rationale |
Status |
| 1.4 |
Dask |
Parallel/distributed NumPy/Pandas. |
Active |
| 1.5 |
Vaex |
Out-of-core DataFrames for huge data. |
Active (Niche) |
| 1.6 |
Modin |
Parallelized Pandas (Ray/Dask). |
Active (Rising) |
| 1.7 |
PySpark |
Python API for Apache Spark. |
Active |
| 1.8 |
PyFlink |
Python API for Apache Flink. |
Active |
.
| pip module |
Library |
Status |
| py4j |
Py4J (bridge used by PySpark) |
Active |
Statistical/Utilities
| ID |
Library |
Rationale |
Status |
| 1.9 |
StatsModels 🔥 |
Statistical models (ARIMA, regression). |
Active |
| 1.10 |
Pingouin |
Beginner-friendly statistical tests. |
Active (Niche) |
| 1.11 |
SymPy |
Symbolic math/algebra. |
Active |
Miscellanea
| pip module |
Library |
Status |
| cmdstanpy |
CmdStanPy (Stan) |
Active |
| pystan |
PyStan (Stan) |
Active |
| joblib |
Joblib (serialization, parallelism) |
Active |
| tabulate |
Tabulate (pretty tables) |
Active |
| lxml |
lxml (XML/HTML parsing, often in NLP pipelines) |
Active |
| openpyxl |
OpenPyXL |
Active |
| xlrd |
xlrd |
Active |
| pyarrow |
Apache Arrow |
Active |
📈 2. Visualization & Plotting¶
Core Plotting
| ID |
Library |
Rationale |
Status |
| 2.0 |
Matplotlib 🔥 |
Base 2D plotting library. |
Active |
| 2.1 |
Seaborn 🔥 |
Statistical viz on top of Matplotlib. |
Active |
Interactive
| ID |
Library |
Rationale |
Status |
| 2.2 |
Plotly 🔥 |
Interactive, web-ready charts. |
Active |
| 2.3 |
Bokeh |
Browser/dashboards viz. |
Active |
| 2.4 |
Altair |
Declarative graphics (Vega-Lite). |
Active |
Dashboards
| ID |
Library |
Rationale |
Status |
| 2.5 |
Dash |
Plotly-based dashboards. |
Active |
| 2.6 |
Streamlit 🔥 |
Simple dashboards for ML/data apps. |
Active |
Specialized
| ID |
Library |
Rationale |
Status |
| 2.7 |
PyVista |
3D mesh viz. |
Active (Niche) |
| 2.8 |
GraphViz |
Graph visualization engine. |
Active |
| 2.8.1 |
PyDot |
GraphViz DOT interface. |
Active |
| 2.9 |
WordCloud |
Text frequency clouds. |
Active |
| 2.10 |
Holoviews |
High-level API across viz libs. |
Active |
| 2.11 |
Datashader |
Scalable viz for large data. |
Active (Niche) |
Miscellanea
| ID |
Library |
Rationale |
Status |
| 2.12 |
squarify |
Squarify (treemaps) |
Active |
| 2.13 |
pixiedust |
PixieDust (Jupyter viz helper) |
Active |
| 2.14 |
ipywidgets |
ipywidgets (Jupyter widgets) |
Active |
🤖 3. Machine Learning (Classical)¶
Core ML
| ID |
Library |
Rationale |
Status |
| 3.1.0 |
scikit-learn 🔥 |
Standard ML toolkit. |
Active |
| 3.1.1 |
StatsModels 🔥 |
Adds statistical rigor. |
Active |
Gradient Boosting
| ID |
Library |
Rationale |
Status |
| 3.1.2 |
XGBoost 🔥 |
Kaggle-winning gradient boosting. |
Active |
| 3.1.3 |
LightGBM 🔥 |
Fast, memory-efficient boosting. |
Active |
| 3.1.4 |
CatBoost |
Categorical boosting. |
Active |
Explainability
| ID |
Library |
Rationale |
Status |
| 3.1.5 |
Eli5 |
Debugging & feature importance. |
Active (Stable) |
| 3.1.6 |
SHAP 🔥 |
Shapley explanations. |
Active |
| 3.1.7 |
LIME |
Local model explanations. |
Active |
AutoML & Feature Eng.
| ID |
Library |
Rationale |
Status |
| 3.1.8 |
Featuretools |
Auto feature engineering. |
Active |
| 3.1.9 |
PyCaret |
Low-code AutoML pipelines. |
Active |
| 3.1.10 |
H2O.ai |
Enterprise AutoML. |
Active |
Feature Engineering & Extensions
| ID |
Library |
Rationale |
Status |
| 3.1.11 |
mlxtend |
MLxtend (extensions for ML) |
Active |
| 3.1.12 |
category_encoders |
Category Encoders |
Active |
Dimensionality Reduction
| ID |
Library |
Rationale |
Status |
| 3.1.13 |
UMAP |
Fast nonlinear reduction. |
Active |
| 3.1.14 |
openTSNE |
Optimized t-SNE. |
Active (Niche) |
🧬 4. Deep Learning¶
Core DL Frameworks
| ID |
Library |
Rationale |
Status |
| 4.0 |
TensorFlow 🔥 |
Production-scale DL (Google). |
Active |
| 4.1 |
PyTorch 🔥 |
Research & industry leader. |
Active |
| 4.2 |
JAX 🔥 |
NumPy-like API + auto-diff. |
Active (Rising) |
| 4.3 |
PaddlePaddle |
Baidu’s DL framework. |
Active |
| 4.4 |
MXNet |
Amazon’s DL lib. |
Declining |
High-Level APIs
| ID |
Library |
Rationale |
Status |
| 4.0.1 |
Keras 🔥 |
High-level TF API. |
Active |
| 4.1.1 |
FastAI 🔥 |
Simplified PyTorch. |
Active |
| 4.1.2 |
PyTorch Lightning |
Structured PyTorch. |
Active |
| 4.2.1 |
Flax |
JAX’s official high-level lib. |
Active |
| 4.2.2 |
Haiku |
DeepMind’s JAX research lib. |
Active |
GPU-Accelerated
| ID |
Library |
Rationale |
Status |
| 4.5 |
cuML |
GPU ML (RAPIDS). |
Active |
| 4.* |
cuda-python |
CUDA Python API |
Active |
Legacy
| ID |
Library |
Rationale |
Status |
| 4.6 |
Theano |
Pioneering DL lib. |
Deprecated |
| 4.7 |
CNTK |
Microsoft toolkit. |
Legacy |
| 4.8 |
Caffe |
Early CV DL lib. |
Legacy |
| 4.9 |
Dist-Keras |
Distributed Keras. |
Deprecated |
| 4.10 |
PyBrain |
Early ML/DL. |
Legacy |
| 4.11 |
Fuel |
Data pipelines (Theano). |
Deprecated |
🧠5. NLP & Text Processing¶
Classical NLP
| ID |
Library |
Rationale |
Status |
| 5.0 |
NLTK |
Classical toolkit. |
Active (Stable) |
| 5.1 |
TextBlob |
Simple sentiment/text API. |
Active |
| 5.1.1 |
Pattern |
Web mining + NLP. |
Stable |
Industrial Pipelines
| ID |
Library |
Rationale |
Status |
| 5.2 |
spaCy 🔥 |
Industrial NLP (NER, POS). |
Active |
| 5.3 |
CoreNLP |
Stanford’s Java-based NLP. |
Active |
| 5.4 |
Stanza |
Stanford’s PyTorch NLP. |
Active |
Transformers Ecosystem
| ID |
Library |
Rationale |
Status |
| 5.5 |
Transformers 🔥 |
HuggingFace pretrained LLMs. |
Active |
| 5.5.1 |
sentence-transformers 🔥 |
Semantic embeddings. |
Active |
| 5.5.2 |
Tokenizers |
Fast tokenization (HF). |
Active |
| 5.5.3 |
Accelerate |
Multi-GPU training utils. |
Active |
| 5.5.4 |
LiteLLM |
Unified API for LLMs. |
Active (Rising) |
Multilingual & Topic Modeling
| ID |
Library |
Rationale |
Status |
| 5.6 |
GenSim 🔥 |
Topic modeling, word2vec/doc2vec. |
Active |
| 5.7 |
Polyglot |
Multilingual NLP. |
Stable |
Research-Oriented
| ID |
Library |
Rationale |
Status |
| 5.8 |
AllenNLP |
PyTorch research NLP. |
Active |
| 5.9 |
Flair |
Lightweight PyTorch NLP. |
Active |
Finance / Data APIs (used in NLP/TS)
| ID |
Library |
Rationale |
Status |
| 5.10 |
nsepy |
NSEpy (stock data) |
Active |
| 5.11 |
yfinance |
yfinance |
Active |
ðŸ‘ï¸ 6. Computer Vision¶
Core CV
| ID |
Library |
Rationale |
Status |
| 6.0 |
OpenCV 🔥 |
Standard CV toolkit. |
Active |
Image Utilities
| ID |
Library |
Rationale |
Status |
| 6.1 |
Pillow |
Image processing. |
Active |
| 6.2 |
scikit-image |
Scientific image processing. |
Active |
Dataset & Evaluation
| ID |
Library |
Rationale |
Status |
| 6.3 |
FiftyOne 🔥 |
Dataset/eval mgmt. |
Active |
| 6.4 |
Albumentations 🔥 |
Data augmentation. |
Active |
| 6.5 |
imgaug |
Image aug. |
Active |
DL CV Frameworks
| ID |
Library |
Rationale |
Status |
| 6.6 |
Detectron2 🔥 |
PyTorch detection framework. |
Active |
| 6.7 |
MMDetection 🔥 |
Modular detection framework. |
Active |
| 6.8 |
Kornia |
Differentiable CV ops. |
Active |
| 6.9 |
Timm 🔥 |
PyTorch image models. |
Active |
🌠7. Web & Deployment¶
Web Frameworks
| ID |
Library |
Rationale |
Status |
| 7.0 |
Flask 🔥 |
Lightweight APIs. |
Active |
| 7.1 |
Django 🔥 |
Full-stack framework. |
Active |
| 7.2 |
FastAPI 🔥 |
Async APIs. |
Active (Rising) |
| 7.3 |
Tornado |
Async networking. |
Active |
API & HTTP
| ID |
Library |
Rationale |
Status |
| 7.4 |
Requests 🔥 |
Standard HTTP lib. |
Active |
| 7.5 |
HTTPX |
Async HTTP. |
Active |
Scraping & Automation
| ID |
Library |
Rationale |
Status |
| 7.6 |
Scrapy |
Crawling/scraping. |
Active |
| 7.7 |
Selenium |
Browser automation. |
Active |
| 7.8 |
Playwright |
Modern async automation. |
Active |
| 7.9 |
BeautifulSoup |
HTML parsing. |
Active |
Deployment & Task Queues
| ID |
Library |
Rationale |
Status |
| 7.10 |
Gunicorn 🔥 |
WSGI server. |
Active |
| 7.11 |
Uvicorn 🔥 |
ASGI server. |
Active |
| 7.12 |
Celery |
Task queue. |
Active |
| 7.13 |
RQ |
Redis-based jobs. |
Active |
| 7.14 |
Daphne |
ASGI server for Django. |
Active |
Miscellanea
| ID |
Library |
Rationale |
Status |
| 7.15 |
simplejson |
simplejson (JSON utilities) |
Active |
| 7.16 |
mlflow |
MLflow (deployment, experiment tracking) |
Active |
| 7.17 |
mapbox |
Mapbox SDK (geospatial APIs) |
Active |
Ⳡ8. Time Series¶
Classical
| ID |
Library |
Rationale |
Status |
| 8.0 |
StatsModels 🔥 |
ARIMA, SARIMA. |
Active |
| 8.1 |
pmdarima |
Auto-ARIMA. |
Active |
Modern Forecasting
| ID |
Library |
Rationale |
Status |
| 8.2 |
Prophet 🔥 |
Easy forecasting. |
Active |
| 8.3 |
Darts 🔥 |
Unified TS toolkit. |
Active |
| 8.4 |
GluonTS |
MXNet-based. |
Declining |
| 8.5 |
Kats |
Meta’s TS lib. |
Active |
| 8.6 |
Orbit |
Uber’s Bayesian TS. |
Active |
| 8.7 |
PyTorch Forecasting |
PyTorch forecasting. |
Active |
| 8.8 |
PyCaret-TS |
AutoML for TS. |
Active |
Scalable Utilities
| ID |
Library |
Rationale |
Status |
| 8.9 |
StatsForecast 🔥 |
Scalable forecasting. |
Active |
| 8.10 |
sktime 🔥 |
Unified ML for TS. |
Active |
| 8.11 |
tsfresh |
Feature extraction. |
Active |
Miscellanea
| ID |
Library |
Rationale |
Status |
| 8.12 |
ruptures |
Ruptures (changepoint detection) |
Active |
🧪 9. Testing & Quality¶
Core Testing
| ID |
Library |
Rationale |
Status |
| 9.0 |
PyTest 🔥 |
Standard testing. |
Active |
| 9.1 |
unittest |
Built-in. |
Active |
| 9.2 |
nose2 |
Legacy successor. |
Maintenance |
Property-Based
| ID |
Library |
Rationale |
Status |
| 9.3 |
Hypothesis 🔥 |
Auto-generated tests. |
Active |
Coverage & Quality
| ID |
Library |
Rationale |
Status |
| 9.4 |
coverage.py 🔥 |
Coverage measurement. |
Active |
| 9.5 |
tox |
Multi-env tests. |
Active |
| 9.6 |
pytest-cov |
Coverage plugin. |
Active |
| 9.7 |
bandit |
Security linting. |
Active |
| 9.8 |
flake8 🔥 |
Linting & style. |
Active |
| 9.9 |
black 🔥 |
Code formatting. |
Active |
| 9.10 |
mypy 🔥 |
Static typing. |
Active |
| 9.11 |
pylint |
Static analysis. |
Active |
Mocking & Utilities
| ID |
Library |
Rationale |
Status |
| 9.12 |
mock |
unittest mocking. |
Active |
| 9.13 |
responses |
API mocking. |
Active |
| 9.14 |
vcrpy |
HTTP replay. |
Active |
Miscellanea
| ID |
Library |
Rationale |
Status |
| 9.15 |
nbformat |
nbformat (Jupyter notebooks) |
Active |
| 9.16 |
pandoc |
Pandoc (doc conversion) |
Active |
| 9.17 |
python-docx |
python-docx (Word docs) |
Active |
| 9.18 |
tomli |
tomli (TOML parsing) |
Active |
🎮 10. Game Development¶
2D Game Dev
| ID |
Library |
Rationale |
Status |
| 10.0 |
PyGame 🔥 |
Most popular 2D library. |
Active |
| 10.1 |
PyKyra |
SDL-based, legacy. |
Legacy |
3D & Physics
| ID |
Library |
Rationale |
Status |
| 10.2 |
Panda3D |
3D engine. |
Active |
| 10.3 |
Ursina |
Simplified 3D. |
Active |
| 10.4 |
PyOpenGL |
OpenGL bindings. |
Active |
| 10.5 |
Arcade 🔥 |
Modern 2D/3D lib. |
Active |
| 10.6 |
PyBullet |
Physics simulation. |
Active |
Multimedia & Utilities
| ID |
Library |
Rationale |
Status |
| 10.7 |
Pyglet |
Windowing & multimedia. |
Active |
| 10.8 |
Kivy |
Cross-platform UI/game dev. |
Active |
| 10.9 |
Ren’Py |
Visual novel engine. |
Active |
📂 Data Handling & Databases¶
11.1 ORMs & Schema/Migrations¶
| ID |
Library |
Rationale |
Status |
| 11.1.0 |
SQLAlchemy 🔥 |
De-facto DB toolkit/ORM for Python; works with most SQL backends. |
Active |
| 11.1.1 |
SQLModel |
Pydantic-flavored ORM on top of SQLAlchemy; modern type-safe models. |
Active (Rising) |
| 11.1.2 |
Alembic |
Schema migrations for SQLAlchemy projects. |
Active |
11.2 Analytical & Embedded Engines¶
| ID |
Library |
Rationale |
Status |
| 11.2.0 |
DuckDB 🔥 |
In-process analytical SQL engine; super handy for Parquet/CSV/Arrow. |
Active (Rising) |
| 11.2.1 |
sqlite3 (stdlib) 🔥 |
Zero-config embedded SQL DB; perfect for small/medium apps. |
Active |
| 11.2.2 |
clickhouse-connect |
Client for ClickHouse columnar OLAP DB. |
Active |
| 11.2.3 |
google-cloud-bigquery |
Python client for BigQuery analytics. |
Active |
11.3 Database Drivers & Clients¶
| ID |
Library |
Rationale |
Status |
| 11.3.0 |
psycopg2 🔥 |
Canonical PostgreSQL driver; battle-tested. |
Active |
| 11.3.1 |
asyncpg |
High-performance async Postgres driver. |
Active |
| 11.3.2 |
mysqlclient |
Fast MySQL driver (C bindings). |
Active |
| 11.3.3 |
PyMySQL |
Pure-Python MySQL driver (easy install). |
Active |
| 11.3.4 |
oracledb (cx_Oracle) |
Oracle Database driver. |
Active |
| 11.3.5 |
pyodbc |
ODBC bridge to many SQL databases. |
Active |
| 11.3.6 |
pymongo 🔥 |
Official MongoDB driver for Python. |
Active |
| 11.3.7 |
redis (redis-py) 🔥 |
Redis client for caching, queues, pub/sub. |
Active |
| 11.3.8 |
elasticsearch |
Elasticsearch client for search/analytics. |
Active |
11.4 Columnar, Files & Spreadsheet I/O¶
| ID |
Library |
Rationale |
Status |
| 11.4.0 |
pyarrow 🔥 |
Arrow/Parquet/Feather I/O; zero-copy bridges; dataframe interop. |
Active |
| 11.4.1 |
fastparquet |
Alternative Parquet engine (Pandas/Arrow ecosystem). |
Active |
| 11.4.2 |
h5py |
HDF5 I/O for large array data. |
Active |
| 11.4.3 |
tables (PyTables) |
Hierarchical datasets on HDF5 with indexing/compression. |
Active |
| 11.4.4 |
openpyxl 🔥 |
Read/write Excel .xlsx files. |
Active |
| 11.4.5 |
xlsxwriter |
Create Excel .xlsx (write-only, fast). |
Active |
| 11.4.6 |
xlrd |
Legacy Excel reader (.xls; .xlsx support deprecated). |
Legacy |
11.5 DataFrames & Bridges (practical handling)¶
| ID |
Library |
Rationale |
Status |
| 11.5.0 |
pandas 🔥 |
The standard DataFrame for ETL, joins, and DB I/O. |
Active |
| 11.5.1 |
polars |
Lightning-fast DataFrame; great with Arrow/Parquet. |
Active (Rising) |
| 11.5.2 |
SQLAlchemy-Pandas I/O |
pandas.read_sql/to_sql via SQLAlchemy engines. |
Active |
🔥 Must-Learn (2025, 📂 Data Handling & Databases)¶
- SQLAlchemy → ORM + universal DB toolkit.
- DuckDB → modern analytical SQL engine.
- sqlite3 (stdlib) → embedded relational DB.
- psycopg2 → canonical PostgreSQL driver.
- pymongo → MongoDB access.
- redis (redis-py) → caching/queues, near real-time.
- pyarrow → Parquet/Arrow interoperability.
- openpyxl → Excel support.
- pandas → still the backbone for tabular handling.
These give you: universal SQL access (ORM + drivers) → local/analytical engines → fast file/columnar I/O → production-ready data handling.
✅ Reformatted for clarity & readability:
- Each Category separate.
- Tables properly aligned.
- 🔥 clearly marks Must Learn.