๐ Python DS/ML/DL/NLP Libraries โ Categories Index
A carefully structured, comprehensive overview of Pythonโs most important libraries across Data Science, Machine Learning, Deep Learning, NLP, CV, Time Series, Testing, Web development, Databases, and more.
Each section includes:
- โ๏ธ Aligned tables
- โ๏ธ ๐ฅ Must-Learn highlights
- โ๏ธ Clear rationales
- โ๏ธ Beginner-friendly categorization
๐ 1. Data Analysis & Numerical Foundations
๐ข Core Numerical
| ID | Library | Rationale | Status |
|---|
| 1.0 | NumPy ๐ฅ | Foundation of scientific Python (arrays, LA). | Active |
| 1.1 | SciPy ๐ฅ | Optimization, statistics, scientific routines. | Active |
๐ Tabular Data
| ID | Library | Rationale | Status |
|---|
| 1.2 | Pandas ๐ฅ | Standard for tabular/structured data. | Active |
| 1.3 | Polars | Rust-powered DataFrames; very fast. | Active (Rising) |
๐ญ Distributed & Big Data
| ID | Library | Rationale | Status |
|---|
| 1.4 | Dask | Parallel/distributed NumPy/Pandas. | Active |
| 1.5 | Vaex | Out-of-core DataFrames for huge data. | Active (Niche) |
| 1.6 | Modin | Parallelized Pandas via Ray/Dask. | Active (Rising) |
| 1.7 | PySpark | Python API for Apache Spark. | Active |
| 1.8 | PyFlink | Python API for Apache Flink. | Active |
| pip module | Library | Status |
|---|
| py4j | Py4J (PySpark bridge) | Active |
๐ Statistical / Utilities
| ID | Library | Rationale | Status |
|---|
| 1.9 | StatsModels ๐ฅ | Statistical models (ARIMA, regression) | Active |
| 1.10 | Pingouin | Simple statistical tests | Active |
| 1.11 | SymPy | Symbolic math | Active |
๐งบ Miscellaneous Utilities
| pip module | Library | Status |
|---|
| cmdstanpy | CmdStanPy | Active |
| pystan | PyStan | Active |
| joblib | Serialization + parallelism | Active |
| tabulate | Table formatting | Active |
| lxml | XML/HTML parsing | Active |
| openpyxl | Excel I/O | Active |
| xlrd | Excel (.xls legacy) | Legacy |
| pyarrow | Apache Arrow | Active |
๐ 2. Visualization & Plotting
๐จ Core Plotting
| ID | Library | Rationale | Status |
|---|
| 2.0 | Matplotlib ๐ฅ | Base 2D plotting | Active |
| 2.1 | Seaborn ๐ฅ | Statistical visualization | Active |
๐งญ Interactive Visualization
| ID | Library | Rationale | Status |
|---|
| 2.2 | Plotly ๐ฅ | Interactive, web-ready charts | Active |
| 2.3 | Bokeh | Browser-based dashboards | Active |
| 2.4 | Altair | Declarative (Vega-Lite) graphics | Active |
๐ฅ๏ธ Dashboards
| ID | Library | Rationale | Status |
|---|
| 2.5 | Dash | Plotly dashboarding | Active |
| 2.6 | Streamlit ๐ฅ | Simple ML/data apps | Active |
๐ฌ Specialized Visualization
| ID | Library | Rationale | Status |
|---|
| 2.7 | PyVista | 3D mesh viz | Active (Niche) |
| 2.8 | GraphViz | Graph drawing engine | Active |
| 2.8.1 | PyDot | GraphViz DOT interface | Active |
| 2.9 | WordCloud | Text frequency clouds | Active |
| 2.10 | Holoviews | High-level API across viz stacks | Active |
| 2.11 | Datashader | Large-scale visualization | Active (Niche) |
| ID | Library | Rationale | Status |
|---|
| 2.12 | squarify | Treemaps | Active |
| 2.13 | pixiedust | Jupyter visualization | Active |
| 2.14 | ipywidgets | Interactive widgets | Active |
๐ค 3. Machine Learning (Classical)
๐ง Core ML Libraries
| ID | Library | Rationale | Status |
|---|
| 3.1.0 | scikit-learn ๐ฅ | Standard ML toolkit | Active |
| 3.1.1 | StatsModels ๐ฅ | Adds statistical rigor | Active |
๐ฒ Gradient Boosting
| ID | Library | Rationale | Status |
|---|
| 3.1.2 | XGBoost ๐ฅ | Kaggle-winning boosting | Active |
| 3.1.3 | LightGBM ๐ฅ | Fast, memory-efficient boosting | Active |
| 3.1.4 | CatBoost | Categorical boosting | Active |
๐ Explainability
| ID | Library | Rationale | Status |
|---|
| 3.1.5 | Eli5 | Debugging & feature importance | Active |
| 3.1.6 | SHAP ๐ฅ | Shapley explanations | Active |
| 3.1.7 | LIME | Local explanations | Active |
โ๏ธ AutoML & Feature Engineering
| ID | Library | Rationale | Status |
|---|
| 3.1.8 | Featuretools | Auto feature engineering | Active |
| 3.1.9 | PyCaret | Low-code AutoML pipelines | Active |
| 3.1.10 | H2O.ai | Enterprise AutoML | Active |
๐ง ML Extensions
| ID | Library | Rationale | Status |
|---|
| 3.1.11 | mlxtend | ML extensions | Active |
| 3.1.12 | category_encoders | Encoding utilities | Active |
๐งฉ Dimensionality Reduction
| ID | Library | Rationale | Status |
|---|
| 3.1.13 | UMAP | Fast nonlinear reduction | Active |
| 3.1.14 | openTSNE | Optimized t-SNE | Active (Niche) |
๐งฌ 4. Deep Learning
๐๏ธ Core DL Frameworks
| ID | Library | Rationale | Status |
|---|
| 4.0 | TensorFlow ๐ฅ | Production-scale DL | Active |
| 4.1 | PyTorch ๐ฅ | Research & industry leader | Active |
| 4.2 | JAX ๐ฅ | NumPy + auto-diff + accelerators | Active (Rising) |
| 4.3 | PaddlePaddle | Baiduโs DL framework | Active |
| 4.4 | MXNet | Amazon DL library | Declining |
๐งฑ High-Level APIs
| ID | Library | Rationale | Status |
|---|
| 4.0.1 | Keras ๐ฅ | High-level TF API | Active |
| 4.1.1 | FastAI ๐ฅ | Simplified PyTorch | Active |
| 4.1.2 | PyTorch Lightning | Structured training | Active |
| 4.2.1 | Flax | JAX high-level API | Active |
| 4.2.2 | Haiku | DeepMind JAX library | Active |
โก GPU-Accelerated ML
| ID | Library | Rationale | Status |
|---|
| 4.5 | cuML | GPU ML (RAPIDS) | Active |
| 4.* | cuda-python | CUDA Python API | Active |
๐ฐ๏ธ Legacy DL Libraries
| ID | Library | Rationale | Status |
|---|
| 4.6 | Theano | Pioneering DL | Deprecated |
| 4.7 | CNTK | Microsoft toolkit | Legacy |
| 4.8 | Caffe | Early DL | Legacy |
| 4.9 | Dist-Keras | Distributed Keras | Deprecated |
| 4.10 | PyBrain | Early ML/DL | Legacy |
| 4.11 | Fuel | Data pipelines | Deprecated |
๐ง 5. NLP & Text Processing
๐ Classical NLP
| ID | Library | Rationale | Status |
|---|
| 5.0 | NLTK | Classical toolkit | Active (Stable) |
| 5.1 | TextBlob | Simple sentiment API | Active |
| 5.1.1 | Pattern | Web mining + NLP | Stable |
๐ญ Industrial NLP Pipelines
| ID | Library | Rationale | Status |
|---|
| 5.2 | spaCy ๐ฅ | Industrial NLP pipeline | Active |
| 5.3 | CoreNLP | Stanford NLP (Java-based) | Active |
| 5.4 | Stanza | Stanford NLP (PyTorch) | Active |
| ID | Library | Rationale | Status |
|---|
| 5.5 | Transformers ๐ฅ | Pretrained LLMs | Active |
| 5.5.1 | sentence-transformers ๐ฅ | Semantic embeddings | Active |
| 5.5.2 | Tokenizers | Fast tokenization (HF) | Active |
| 5.5.3 | Accelerate | Multi-GPU utilities | Active |
| 5.5.4 | LiteLLM | Unified API for many LLMs | Active (Rising) |
๐ Multilingual & Topic Modeling
| ID | Library | Rationale | Status |
|---|
| 5.6 | GenSim ๐ฅ | Topic modeling & embeddings | Active |
| 5.7 | Polyglot | Multilingual NLP | Stable |
๐ฌ Research NLP
| ID | Library | Rationale | Status |
|---|
| 5.8 | AllenNLP | Research NLP | Active |
| 5.9 | Flair | Lightweight PyTorch NLP | Active |
๐น Finance APIs (Used in NLP/TS)
| ID | Library | Rationale | Status |
|---|
| 5.10 | nsepy | Stock market API | Active |
| 5.11 | yfinance | Finance data API | Active |
๐๏ธ 6. Computer Vision
๐งฟ Core CV
| ID | Library | Rationale | Status |
|---|
| 6.0 | OpenCV ๐ฅ | Standard CV toolkit | Active |
๐ผ๏ธ Image Utilities
| ID | Library | Rationale | Status |
|---|
| 6.1 | Pillow | Image processing | Active |
| 6.2 | scikit-image | Scientific image processing | Active |
๐ Dataset Management
| ID | Library | Rationale | Status |
|---|
| 6.3 | FiftyOne ๐ฅ | Dataset/eval management | Active |
| 6.4 | Albumentations ๐ฅ | Data augmentation | Active |
| 6.5 | imgaug | Data augmentation | Active |
๐ฅ DL Frameworks for CV
| ID | Library | Rationale | Status |
|---|
| 6.6 | Detectron2 ๐ฅ | PyTorch object detection | Active |
| 6.7 | MMDetection ๐ฅ | Modular CV detection | Active |
| 6.8 | Kornia | Differentiable CV ops | Active |
| 6.9 | Timm ๐ฅ | PyTorch image models | Active |
๐ 7. Web & Deployment
๐งฑ Web Frameworks
| ID | Library | Rationale | Status |
|---|
| 7.0 | Flask ๐ฅ | Lightweight APIs | Active |
| 7.1 | Django ๐ฅ | Full-stack framework | Active |
| 7.2 | FastAPI ๐ฅ | Async APIs | Active (Rising) |
| 7.3 | Tornado | Async networking | Active |
๐ HTTP & API Clients
| ID | Library | Rationale | Status |
|---|
| 7.4 | Requests ๐ฅ | Standard HTTP client | Active |
| 7.5 | HTTPX | Async HTTP client | Active |
๐ Scraping & Automation
| ID | Library | Rationale | Status |
|---|
| 7.6 | Scrapy | Crawling/scraping | Active |
| 7.7 | Selenium | Browser automation | Active |
| 7.8 | Playwright | Async automation | Active |
| 7.9 | BeautifulSoup | HTML parsing | Active |
๐ Deployment & Queues
| ID | Library | Rationale | Status |
|---|
| 7.10 | Gunicorn ๐ฅ | WSGI server | Active |
| 7.11 | Uvicorn ๐ฅ | ASGI server | Active |
| 7.12 | Celery | Task queue | Active |
| 7.13 | RQ | Redis queue | Active |
| 7.14 | Daphne | ASGI server | Active |
Misc
| ID | Library | Rationale | Status |
|---|
| 7.15 | simplejson | JSON utilities | Active |
| 7.16 | mlflow | ML experiment tracking | Active |
| 7.17 | mapbox | Geospatial APIs | Active |
โณ 8. Time Series
โฑ๏ธ Classical TS
| ID | Library | Rationale | Status |
|---|
| 8.0 | StatsModels ๐ฅ | ARIMA/SARIMA | Active |
| 8.1 | pmdarima | Auto-ARIMA | Active |
๐ฎ Modern Forecasting
| ID | Library | Rationale | Status |
|---|
| 8.2 | Prophet ๐ฅ | Easy forecasting | Active |
| 8.3 | Darts ๐ฅ | Unified TS toolkit | Active |
| 8.4 | GluonTS | MXNet TS toolkit | Declining |
| 8.5 | Kats | Meta TS library | Active |
| 8.6 | Orbit | Uber Bayesian TS | Active |
| 8.7 | PyTorch Forecasting | Forecasting with PT | Active |
| 8.8 | PyCaret-TS | AutoML for TS | Active |
๐ Scalable & Utility TS
| ID | Library | Rationale | Status |
|---|
| 8.9 | StatsForecast ๐ฅ | Scalable forecasting | Active |
| 8.10 | sktime ๐ฅ | Unified TS ML | Active |
| 8.11 | tsfresh | Feature extraction | Active |
๐งจ TS Miscellaneous
| ID | Library | Rationale | Status |
|---|
| 8.12 | ruptures | Changepoint detection | Active |
๐งช 9. Testing & Quality
๐งช Core Testing
| ID | Library | Rationale | Status |
|---|
| 9.0 | PyTest ๐ฅ | Standard testing | Active |
| 9.1 | unittest | Built-in testing | Active |
| 9.2 | nose2 | Legacy successor | Maintenance |
๐ฒ Property-Based Testing
| ID | Library | Rationale | Status |
|---|
| 9.3 | Hypothesis ๐ฅ | Auto-generated tests | Active |
๐ Coverage & Quality
| ID | Library | Rationale | Status |
|---|
| 9.4 | coverage.py ๐ฅ | Coverage measurement | Active |
| 9.5 | tox | Multi-env testing | Active |
| 9.6 | pytest-cov | Coverage plugin | Active |
| 9.7 | bandit | Security linting | Active |
| 9.8 | flake8 ๐ฅ | Linting | Active |
| 9.9 | black ๐ฅ | Code formatting | Active |
| 9.10 | mypy ๐ฅ | Static typing | Active |
| 9.11 | pylint | Static analysis | Active |
๐งฐ Mocking & Utilities
| ID | Library | Rationale | Status |
|---|
| 9.12 | mock | unittest mocking | Active |
| 9.13 | responses | API mocking | Active |
| 9.14 | vcrpy | HTTP replay | Active |
| ID | Library | Rationale | Status |
|---|
| 9.15 | nbformat | Jupyter formats | Active |
| 9.16 | pandoc | Doc conversion | Active |
| 9.17 | python-docx | Word files | Active |
| 9.18 | tomli | TOML parsing | Active |
๐ฎ 10. Game Development
๐ฎ 2D Game Dev
| ID | Library | Rationale | Status |
|---|
| 10.0 | PyGame ๐ฅ | Popular 2D library | Active |
| 10.1 | PyKyra | SDL-based | Legacy |
๐งฑ 3D & Physics
| ID | Library | Rationale | Status |
|---|
| 10.2 | Panda3D | 3D engine | Active |
| 10.3 | Ursina | Simplified 3D | Active |
| 10.4 | PyOpenGL | OpenGL bindings | Active |
| 10.5 | Arcade ๐ฅ | Modern 2D/3D engine | Active |
| 10.6 | PyBullet | Physics simulation | Active |
| ID | Library | Rationale | Status |
|---|
| 10.7 | Pyglet | Multimedia/UI toolkit | Active |
| 10.8 | Kivy | Cross-platform UI | Active |
| 10.9 | RenโPy | Visual novel engine | Active |
๐ 11. Data Handling & Databases
๐งฉ 11.1 ORMs & Migrations
| ID | Library | Rationale | Status |
|---|
| 11.1.0 | SQLAlchemy ๐ฅ | Universal DB toolkit + ORM | Active |
| 11.1.1 | SQLModel | Pydantic-style ORM | Active (Rising) |
| 11.1.2 | Alembic | DB schema migrations | Active |
๐งฎ 11.2 Analytical & Embedded Engines
| ID | Library | Rationale | Status |
|---|
| 11.2.0 | DuckDB ๐ฅ | In-process analytical SQL engine | Active (Rising) |
| 11.2.1 | sqlite3 (stdlib) ๐ฅ | Lightweight embedded SQL DB | Active |
| 11.2.2 | clickhouse-connect | ClickHouse OLAP client | Active |
| 11.2.3 | google-cloud-bigquery | BigQuery client | Active |
๐ 11.3 Database Drivers & Clients
| ID | Library | Rationale | Status |
|---|
| 11.3.0 | psycopg2 ๐ฅ | PostgreSQL driver | Active |
| 11.3.1 | asyncpg | Async Postgres | Active |
| 11.3.2 | mysqlclient | MySQL (C bindings) | Active |
| 11.3.3 | PyMySQL | Pure-Python MySQL | Active |
| 11.3.4 | oracledb | Oracle DB driver | Active |
| 11.3.5 | pyodbc | ODBC bridge | Active |
| 11.3.6 | pymongo ๐ฅ | MongoDB driver | Active |
| 11.3.7 | redis ๐ฅ | Redis caching/queues | Active |
| 11.3.8 | elasticsearch | Elastic client | Active |
๐ 11.4 Columnar Files & Spreadsheet I/O
| ID | Library | Rationale | Status |
|---|
| 11.4.0 | pyarrow ๐ฅ | Arrow/Parquet/HDF5 interop | Active |
| 11.4.1 | fastparquet | Parquet engine | Active |
| 11.4.2 | h5py | HDF5 file I/O | Active |
| 11.4.3 | tables | PyTables over HDF5 | Active |
| 11.4.4 | openpyxl ๐ฅ | Excel .xlsx read/write | Active |
| 11.4.5 | xlsxwriter | Excel write-only engine | Active |
| 11.4.6 | xlrd | Legacy Excel reader | Legacy |
๐ 11.5 DataFrame Bridges
| ID | Library | Rationale | Status |
|---|
| 11.5.0 | pandas ๐ฅ | Core DataFrame | Active |
| 11.5.1 | polars | Fast Arrow-native DF | Active (Rising) |
| 11.5.2 | SQLAlchemy-Pandas | SQL I/O bridges | Active |
๐ฅ Must-Learn (2025 โ Data Handling & Databases)
Memorize this pathway:
- SQLAlchemy โ Universal DB toolkit / ORM
- DuckDB โ Analytical SQL engine
- sqlite3 โ Embedded SQL
- psycopg2 โ PostgreSQL
- pymongo โ MongoDB
- redis โ Caching, queues
- pyarrow โ Parquet/Arrow I/O
- openpyxl โ Excel
- pandas โ Backbone of ETL
โก๏ธ Covers: SQL โ NoSQL โ Analytical engines โ Distributed I/O โ Production DB access.