📚 Python DS/ML/DL/NLP Libraries — Categories Index¶


📊 1. Data Analysis & Numerical Foundations¶

Core Numerical

ID Library Rationale Status
1.0 NumPy 🔥 Foundation of scientific Python (arrays, linear algebra). Active
1.1 SciPy 🔥 Scientific routines (optimization, stats, integration). Active

Tabular Data

ID Library Rationale Status
1.2 Pandas 🔥 Standard for structured/tabular data. Active
1.3 Polars Rust-powered DataFrames; faster than Pandas. Active (Rising)

Distributed & Big Data

ID Library Rationale Status
1.4 Dask Parallel/distributed NumPy/Pandas. Active
1.5 Vaex Out-of-core DataFrames for huge data. Active (Niche)
1.6 Modin Parallelized Pandas (Ray/Dask). Active (Rising)
1.7 PySpark Python API for Apache Spark. Active
1.8 PyFlink Python API for Apache Flink. Active

.

pip module Library Status
py4j Py4J (bridge used by PySpark) Active

Statistical/Utilities

ID Library Rationale Status
1.9 StatsModels 🔥 Statistical models (ARIMA, regression). Active
1.10 Pingouin Beginner-friendly statistical tests. Active (Niche)
1.11 SymPy Symbolic math/algebra. Active

Miscellanea

pip module Library Status
cmdstanpy CmdStanPy (Stan) Active
pystan PyStan (Stan) Active
joblib Joblib (serialization, parallelism) Active
tabulate Tabulate (pretty tables) Active
lxml lxml (XML/HTML parsing, often in NLP pipelines) Active
openpyxl OpenPyXL Active
xlrd xlrd Active
pyarrow Apache Arrow Active

📈 2. Visualization & Plotting¶

Core Plotting

ID Library Rationale Status
2.0 Matplotlib 🔥 Base 2D plotting library. Active
2.1 Seaborn 🔥 Statistical viz on top of Matplotlib. Active

Interactive

ID Library Rationale Status
2.2 Plotly 🔥 Interactive, web-ready charts. Active
2.3 Bokeh Browser/dashboards viz. Active
2.4 Altair Declarative graphics (Vega-Lite). Active

Dashboards

ID Library Rationale Status
2.5 Dash Plotly-based dashboards. Active
2.6 Streamlit 🔥 Simple dashboards for ML/data apps. Active

Specialized

ID Library Rationale Status
2.7 PyVista 3D mesh viz. Active (Niche)
2.8 GraphViz Graph visualization engine. Active
2.8.1 PyDot GraphViz DOT interface. Active
2.9 WordCloud Text frequency clouds. Active
2.10 Holoviews High-level API across viz libs. Active
2.11 Datashader Scalable viz for large data. Active (Niche)

Miscellanea

ID Library Rationale Status
2.12 squarify Squarify (treemaps) Active
2.13 pixiedust PixieDust (Jupyter viz helper) Active
2.14 ipywidgets ipywidgets (Jupyter widgets) Active

🤖 3. Machine Learning (Classical)¶

Core ML

ID Library Rationale Status
3.1.0 scikit-learn 🔥 Standard ML toolkit. Active
3.1.1 StatsModels 🔥 Adds statistical rigor. Active

Gradient Boosting

ID Library Rationale Status
3.1.2 XGBoost 🔥 Kaggle-winning gradient boosting. Active
3.1.3 LightGBM 🔥 Fast, memory-efficient boosting. Active
3.1.4 CatBoost Categorical boosting. Active

Explainability

ID Library Rationale Status
3.1.5 Eli5 Debugging & feature importance. Active (Stable)
3.1.6 SHAP 🔥 Shapley explanations. Active
3.1.7 LIME Local model explanations. Active

AutoML & Feature Eng.

ID Library Rationale Status
3.1.8 Featuretools Auto feature engineering. Active
3.1.9 PyCaret Low-code AutoML pipelines. Active
3.1.10 H2O.ai Enterprise AutoML. Active

Feature Engineering & Extensions

ID Library Rationale Status
3.1.11 mlxtend MLxtend (extensions for ML) Active
3.1.12 category_encoders Category Encoders Active

Dimensionality Reduction

ID Library Rationale Status
3.1.13 UMAP Fast nonlinear reduction. Active
3.1.14 openTSNE Optimized t-SNE. Active (Niche)

🧬 4. Deep Learning¶

Core DL Frameworks

ID Library Rationale Status
4.0 TensorFlow 🔥 Production-scale DL (Google). Active
4.1 PyTorch 🔥 Research & industry leader. Active
4.2 JAX 🔥 NumPy-like API + auto-diff. Active (Rising)
4.3 PaddlePaddle Baidu’s DL framework. Active
4.4 MXNet Amazon’s DL lib. Declining

High-Level APIs

ID Library Rationale Status
4.0.1 Keras 🔥 High-level TF API. Active
4.1.1 FastAI 🔥 Simplified PyTorch. Active
4.1.2 PyTorch Lightning Structured PyTorch. Active
4.2.1 Flax JAX’s official high-level lib. Active
4.2.2 Haiku DeepMind’s JAX research lib. Active

GPU-Accelerated

ID Library Rationale Status
4.5 cuML GPU ML (RAPIDS). Active
4.* cuda-python CUDA Python API Active

Legacy

ID Library Rationale Status
4.6 Theano Pioneering DL lib. Deprecated
4.7 CNTK Microsoft toolkit. Legacy
4.8 Caffe Early CV DL lib. Legacy
4.9 Dist-Keras Distributed Keras. Deprecated
4.10 PyBrain Early ML/DL. Legacy
4.11 Fuel Data pipelines (Theano). Deprecated

🧠 5. NLP & Text Processing¶

Classical NLP

ID Library Rationale Status
5.0 NLTK Classical toolkit. Active (Stable)
5.1 TextBlob Simple sentiment/text API. Active
5.1.1 Pattern Web mining + NLP. Stable

Industrial Pipelines

ID Library Rationale Status
5.2 spaCy 🔥 Industrial NLP (NER, POS). Active
5.3 CoreNLP Stanford’s Java-based NLP. Active
5.4 Stanza Stanford’s PyTorch NLP. Active

Transformers Ecosystem

ID Library Rationale Status
5.5 Transformers 🔥 HuggingFace pretrained LLMs. Active
5.5.1 sentence-transformers 🔥 Semantic embeddings. Active
5.5.2 Tokenizers Fast tokenization (HF). Active
5.5.3 Accelerate Multi-GPU training utils. Active
5.5.4 LiteLLM Unified API for LLMs. Active (Rising)

Multilingual & Topic Modeling

ID Library Rationale Status
5.6 GenSim 🔥 Topic modeling, word2vec/doc2vec. Active
5.7 Polyglot Multilingual NLP. Stable

Research-Oriented

ID Library Rationale Status
5.8 AllenNLP PyTorch research NLP. Active
5.9 Flair Lightweight PyTorch NLP. Active

Finance / Data APIs (used in NLP/TS)

ID Library Rationale Status
5.10 nsepy NSEpy (stock data) Active
5.11 yfinance yfinance Active

👁️ 6. Computer Vision¶

Core CV

ID Library Rationale Status
6.0 OpenCV 🔥 Standard CV toolkit. Active

Image Utilities

ID Library Rationale Status
6.1 Pillow Image processing. Active
6.2 scikit-image Scientific image processing. Active

Dataset & Evaluation

ID Library Rationale Status
6.3 FiftyOne 🔥 Dataset/eval mgmt. Active
6.4 Albumentations 🔥 Data augmentation. Active
6.5 imgaug Image aug. Active

DL CV Frameworks

ID Library Rationale Status
6.6 Detectron2 🔥 PyTorch detection framework. Active
6.7 MMDetection 🔥 Modular detection framework. Active
6.8 Kornia Differentiable CV ops. Active
6.9 Timm 🔥 PyTorch image models. Active

🌐 7. Web & Deployment¶

Web Frameworks

ID Library Rationale Status
7.0 Flask 🔥 Lightweight APIs. Active
7.1 Django 🔥 Full-stack framework. Active
7.2 FastAPI 🔥 Async APIs. Active (Rising)
7.3 Tornado Async networking. Active

API & HTTP

ID Library Rationale Status
7.4 Requests 🔥 Standard HTTP lib. Active
7.5 HTTPX Async HTTP. Active

Scraping & Automation

ID Library Rationale Status
7.6 Scrapy Crawling/scraping. Active
7.7 Selenium Browser automation. Active
7.8 Playwright Modern async automation. Active
7.9 BeautifulSoup HTML parsing. Active

Deployment & Task Queues

ID Library Rationale Status
7.10 Gunicorn 🔥 WSGI server. Active
7.11 Uvicorn 🔥 ASGI server. Active
7.12 Celery Task queue. Active
7.13 RQ Redis-based jobs. Active
7.14 Daphne ASGI server for Django. Active

Miscellanea

ID Library Rationale Status
7.15 simplejson simplejson (JSON utilities) Active
7.16 mlflow MLflow (deployment, experiment tracking) Active
7.17 mapbox Mapbox SDK (geospatial APIs) Active

⏳ 8. Time Series¶

Classical

ID Library Rationale Status
8.0 StatsModels 🔥 ARIMA, SARIMA. Active
8.1 pmdarima Auto-ARIMA. Active

Modern Forecasting

ID Library Rationale Status
8.2 Prophet 🔥 Easy forecasting. Active
8.3 Darts 🔥 Unified TS toolkit. Active
8.4 GluonTS MXNet-based. Declining
8.5 Kats Meta’s TS lib. Active
8.6 Orbit Uber’s Bayesian TS. Active
8.7 PyTorch Forecasting PyTorch forecasting. Active
8.8 PyCaret-TS AutoML for TS. Active

Scalable Utilities

ID Library Rationale Status
8.9 StatsForecast 🔥 Scalable forecasting. Active
8.10 sktime 🔥 Unified ML for TS. Active
8.11 tsfresh Feature extraction. Active

Miscellanea

ID Library Rationale Status
8.12 ruptures Ruptures (changepoint detection) Active

🧪 9. Testing & Quality¶

Core Testing

ID Library Rationale Status
9.0 PyTest 🔥 Standard testing. Active
9.1 unittest Built-in. Active
9.2 nose2 Legacy successor. Maintenance

Property-Based

ID Library Rationale Status
9.3 Hypothesis 🔥 Auto-generated tests. Active

Coverage & Quality

ID Library Rationale Status
9.4 coverage.py 🔥 Coverage measurement. Active
9.5 tox Multi-env tests. Active
9.6 pytest-cov Coverage plugin. Active
9.7 bandit Security linting. Active
9.8 flake8 🔥 Linting & style. Active
9.9 black 🔥 Code formatting. Active
9.10 mypy 🔥 Static typing. Active
9.11 pylint Static analysis. Active

Mocking & Utilities

ID Library Rationale Status
9.12 mock unittest mocking. Active
9.13 responses API mocking. Active
9.14 vcrpy HTTP replay. Active

Miscellanea

ID Library Rationale Status
9.15 nbformat nbformat (Jupyter notebooks) Active
9.16 pandoc Pandoc (doc conversion) Active
9.17 python-docx python-docx (Word docs) Active
9.18 tomli tomli (TOML parsing) Active

🎮 10. Game Development¶

2D Game Dev

ID Library Rationale Status
10.0 PyGame 🔥 Most popular 2D library. Active
10.1 PyKyra SDL-based, legacy. Legacy

3D & Physics

ID Library Rationale Status
10.2 Panda3D 3D engine. Active
10.3 Ursina Simplified 3D. Active
10.4 PyOpenGL OpenGL bindings. Active
10.5 Arcade 🔥 Modern 2D/3D lib. Active
10.6 PyBullet Physics simulation. Active

Multimedia & Utilities

ID Library Rationale Status
10.7 Pyglet Windowing & multimedia. Active
10.8 Kivy Cross-platform UI/game dev. Active
10.9 Ren’Py Visual novel engine. Active

📂 Data Handling & Databases¶

11.1 ORMs & Schema/Migrations¶

ID Library Rationale Status
11.1.0 SQLAlchemy 🔥 De-facto DB toolkit/ORM for Python; works with most SQL backends. Active
11.1.1 SQLModel Pydantic-flavored ORM on top of SQLAlchemy; modern type-safe models. Active (Rising)
11.1.2 Alembic Schema migrations for SQLAlchemy projects. Active

11.2 Analytical & Embedded Engines¶

ID Library Rationale Status
11.2.0 DuckDB 🔥 In-process analytical SQL engine; super handy for Parquet/CSV/Arrow. Active (Rising)
11.2.1 sqlite3 (stdlib) 🔥 Zero-config embedded SQL DB; perfect for small/medium apps. Active
11.2.2 clickhouse-connect Client for ClickHouse columnar OLAP DB. Active
11.2.3 google-cloud-bigquery Python client for BigQuery analytics. Active

11.3 Database Drivers & Clients¶

ID Library Rationale Status
11.3.0 psycopg2 🔥 Canonical PostgreSQL driver; battle-tested. Active
11.3.1 asyncpg High-performance async Postgres driver. Active
11.3.2 mysqlclient Fast MySQL driver (C bindings). Active
11.3.3 PyMySQL Pure-Python MySQL driver (easy install). Active
11.3.4 oracledb (cx_Oracle) Oracle Database driver. Active
11.3.5 pyodbc ODBC bridge to many SQL databases. Active
11.3.6 pymongo 🔥 Official MongoDB driver for Python. Active
11.3.7 redis (redis-py) 🔥 Redis client for caching, queues, pub/sub. Active
11.3.8 elasticsearch Elasticsearch client for search/analytics. Active

11.4 Columnar, Files & Spreadsheet I/O¶

ID Library Rationale Status
11.4.0 pyarrow 🔥 Arrow/Parquet/Feather I/O; zero-copy bridges; dataframe interop. Active
11.4.1 fastparquet Alternative Parquet engine (Pandas/Arrow ecosystem). Active
11.4.2 h5py HDF5 I/O for large array data. Active
11.4.3 tables (PyTables) Hierarchical datasets on HDF5 with indexing/compression. Active
11.4.4 openpyxl 🔥 Read/write Excel .xlsx files. Active
11.4.5 xlsxwriter Create Excel .xlsx (write-only, fast). Active
11.4.6 xlrd Legacy Excel reader (.xls; .xlsx support deprecated). Legacy

11.5 DataFrames & Bridges (practical handling)¶

ID Library Rationale Status
11.5.0 pandas 🔥 The standard DataFrame for ETL, joins, and DB I/O. Active
11.5.1 polars Lightning-fast DataFrame; great with Arrow/Parquet. Active (Rising)
11.5.2 SQLAlchemy-Pandas I/O pandas.read_sql/to_sql via SQLAlchemy engines. Active

🔥 Must-Learn (2025, 📂 Data Handling & Databases)¶

  • SQLAlchemy → ORM + universal DB toolkit.
  • DuckDB → modern analytical SQL engine.
  • sqlite3 (stdlib) → embedded relational DB.
  • psycopg2 → canonical PostgreSQL driver.
  • pymongo → MongoDB access.
  • redis (redis-py) → caching/queues, near real-time.
  • pyarrow → Parquet/Arrow interoperability.
  • openpyxl → Excel support.
  • pandas → still the backbone for tabular handling.

These give you: universal SQL access (ORM + drivers) → local/analytical engines → fast file/columnar I/O → production-ready data handling.


✅ Reformatted for clarity & readability:

  • Each Category separate.
  • Tables properly aligned.
  • 🔥 clearly marks Must Learn.