AWS SageMaker AI Cheat Sheet

01Platform Setup & ConfigSDK + CLI + IAM

pip install sagemaker boto3★
Core Python SDK + AWS SDK — start here.
aws configure★
Set AWS Access Key, Secret, Region, output format.
import sagemaker; session = sagemaker.Session()★
High-level helper for training/deploying; wraps boto3.
role = sagemaker.get_execution_role()★
IAM role SageMaker assumes — needs S3, ECR, CloudWatch.
sm = boto3.client('sagemaker')
Low-level API client — fine-grained control.
sm_rt = boto3.client('sagemaker-runtime')
Separate client needed to call a deployed endpoint.
iam:PassRolerequired
User/role invoking create-model must have PassRole to the execution role.

02Studio & Dev Environmentswhere you write code

SageMaker Studio★
Full browser IDE: JupyterLab, Code Editor, Canvas, Pipelines, JumpStart in one place.
SageMaker Studio Classic
Previous version of Studio — still supported but being superseded.
Notebook Instances
Managed EC2 with Jupyter; you pick instance type + EBS size. Simpler but less integrated.
Studio Lab
Free lightweight environment for learning — no AWS account needed.
Canvas★
No-code AutoML UI for analysts — data prep, model build, FMs, what-if analysis.
SageMaker Unified Studionew 2025
Unified platform combining analytics, AI, and data access (Lakehouse, Redshift, Bedrock).

03Data Prep & Labeling60–70% of ML effort

Ground Truth★
Human labeling + ML-assisted auto-labeling (Mechanical Turk or private workforce).
Data Wrangler★
Visual drag-&-drop prep: 300+ transforms, data quality reports, export to Pipeline.
Feature Store
Online store (real-time <10 ms lookup) + Offline store (S3/Athena, training). Reuse features across teams.
Processing Jobs
Run custom scripts (SKLearnProcessor, SparkProcessor) — ETL at scale before training.
SageMaker Clarify
Detect bias in your dataset before training; get statistical bias metrics per feature.
CSV: no header rowgotcha
Built-in algorithms expect no header in CSVs; first column = label for supervised tasks.

04Built-in Algorithms: Supervisedtabular + time-series

XGBoost★
Gradient-boosted trees for classification + regression. CPU only — use ml.m5 (memory-bound), not ml.p2/p3.
Linear Learner★
Fast linear/logistic models; trains many in parallel, returns best. Scales to very large data.
K-Nearest Neighbors (k-NN)
Index-based lookup for classification/regression. High accuracy, larger index size.
Factorization Machines
Handles sparse high-dimensional data — click prediction, recommendations.
DeepAR
Time-series forecasting via RNN; trains on many related series simultaneously.
Object2Vec
Learns dense embeddings — find similar docs, duplicate tickets, recommendations.
AutoGluon-Tabular / CatBoost / LightGBM
Modern AutoML ensembles; also available as JumpStart built-ins.

05Built-in Algorithms: Unsupervisedno labels needed

K-Means★
Groups data into K clusters — customer segmentation, topic discovery.
PCA (Principal Component Analysis)★
Dimensionality reduction: removes weakly-correlated features before training.
Random Cut Forest (RCF)★
Anomaly detection: assigns anomaly score to each data point. Works on streaming data.
IP Insights
Learns normal IP–entity associations; flags suspicious logins or account access.
LDA (Latent Dirichlet Allocation)
Topic modeling: discover hidden topics in a document corpus (unsupervised).
NTM (Neural Topic Model)
Neural alternative to LDA; often faster and more scalable.

06Built-in Algorithms: Text & VisionNLP + computer vision

BlazingText★
Super-fast Word2Vec embeddings + text classification. GPU for skipgram; multi-CPU for batch_skipgram.
Seq2Seq
RNN+attention encoder-decoder: machine translation, summarization, speech-to-text.
Image Classification (MXNet / TF)
Label images. MXNet supports incremental training (seed from prior model). GPU required.
Object Detection (MXNet / TF)
Find + classify objects with bounding boxes. GPU (P3, G4dn, G5) recommended.
Semantic Segmentation
Pixel-level classification — identifies what's at every position in an image.
Pipe mode: RecordIO onlygotcha
Pipe input mode requires Protobuf RecordIO format — CSV is not supported in Pipe mode.

07Training JobsSDK + CLI + instance tips

estimator = XGBoost(image_uri, role, instance_type="ml.m5.xlarge")★
Estimator object describes the training config: image, role, compute, hyperparams.
estimator.set_hyperparameters(num_round=100)
Pass algorithm-specific params before fit().
estimator.fit({"train": "s3://bucket/train"})★
Launches the managed training job; blocks until done.
aws sagemaker create-training-job --cli-input-json file://job.json
CLI alternative; specify AlgorithmSpec, ResourceConfig, InputDataConfig, OutputDataConfig.
aws sagemaker list-training-jobs --status-equals InProgress
Monitor active jobs from terminal.
EnableManagedSpotTraining=True★
Use EC2 Spot instances → up to 80% cost savings. Set MaxWaitTimeInSeconds + use checkpointing.
Data modes: File · Pipe · FastFile
File=downloads all (default). Pipe=streams RecordIO (fast startup, no EBS limit). FastFile=POSIX stream, any format.
CPU: ml.m5 · ml.c5 GPU: ml.p3 · ml.g4dn · ml.g5
Tabular → C5/M5. Deep learning vision/NLP → P3/G4dn/G5. XGBoost = memory-bound, prefer M5 over C5.

08JumpStart & AutoMLpre-built + zero-code ML

JumpStart★
ML hub: pre-trained FMs (LLaMA, Mistral, Falcon, Stable Diffusion, BERT), algorithms, solution templates. One-click fine-tune + deploy.
from sagemaker.jumpstart.model import JumpStartModel
SDK entry point; specify model_id and model_version.
Autopilot★
AutoML: auto-detects problem type (classification/regression/forecasting), tries 100s of candidates, returns ranked leaderboard.
Canvas★
Autopilot's no-code UI (now its primary home). Analysts build, evaluate, and deploy models without writing code.
AutoMLJob → leaderboard → deploy
Autopilot generates a fully editable notebook showing every preprocessing + training step.
FMEval (Clarify)
Evaluate + compare foundation models on accuracy, fairness, toxicity — use your own prompts or built-in datasets.

09Hyperparameter Tuning (AMT)automatic model tuning

HyperparameterTuner(estimator, objective_metric, hp_ranges, max_jobs, max_parallel_jobs)★
Wraps an Estimator; you define the metric to optimize and the search space.
tuner.fit({…})
Launches all training trials; SageMaker picks next params based on results.
Strategies: Bayesian · Random · Grid · Hyperband
Bayesian = default, learns from prior runs. Hyperband = early-stops poor trials fast.
WarmStartConfig(type='IDENTICAL_DATA_ALGORITHM')
Reuse knowledge from a previous tuning job — saves time on sequential experiments.
aws sagemaker create-hyper-parameter-tuning-job
CLI equivalent for automation/CI pipelines.

10Inference Endpoints4 modes — pick the right one

Real-time endpoint★
Persistent HTTPS endpoint; sub-second latency; always-on cost. Best for web apps. Supports Production Variants (A/B tests).
Serverless Inference★
Scale-to-zero; cold start ~1 s; pay-per-call. Best for spiky/infrequent traffic. No idle charges.
Asynchronous Inference
Queue-based; payloads up to 1 GB; minutes-long processing. For large batches or long model runtimes.
Batch Transform
Offline, no persistent endpoint; process an entire S3 dataset at once. Cheapest for bulk inference.
aws sagemaker create-endpoint-config / create-endpoint★
Two separate CLI steps: define config (model + variant + instance), then create endpoint from config.
aws sagemaker-runtime invoke-endpoint --endpoint-name x★
Call a real-time endpoint from CLI; body = your payload.
Production Variants
Multiple model versions on one endpoint with traffic weights — enables A/B testing and canary rollouts.

11MLOps: Pipelines & Model RegistryCI/CD for ML

Pipeline(steps=[ProcessingStep, TrainingStep, RegisterModel, ConditionStep…])★
DAG of ML steps — reusable, version-controlled, triggered by events or schedule.
pipeline.upsert() → pipeline.start()★
Upsert = create or update. Start = begin execution; returns a PipelineExecution object.
Model Registry → ModelPackageGroup → ModelPackage★
Group = named model (like a repo). Package = a versioned artifact. Approve a version to allow deployment.
aws sagemaker create-model-package --approval-status Approved
CLI registration; approved versions can be deployed via endpoint config.
Model Monitor
Schedule periodic checks: DataQuality, ModelQuality, BiasMonitor, FeatureAttribution. Alerts via CloudWatch on drift.
Model Cards
Machine-readable governance documentation — intended use, training data, eval results, risk ratings.

12Responsible AI & DebuggingClarify · Debugger · Experiments

Clarify — bias detection★
Pre-training: imbalance in data. Post-training: biased predictions. Reports bias metric per feature group.
Clarify — SHAP explainability
Feature importance scores for each prediction. Runs in bulk (batch) or online (real-time endpoint).
Debugger
Real-time hooks into training: tensor values, gradients, weights. Built-in rules: LossNotDecreasing, Overfit, VanishingGradient.
Debugger ProfilerReport
Auto-generates CPU/GPU utilization report — find bottlenecks in your training loop.
Experiments
Track runs, parameters, metrics, artifacts in a searchable store. Compare trials side by side in Studio.
Managed MLflowDec 2025
Serverless MLflow in SageMaker — no infra setup; scales automatically; integrates with Studio + Pipelines.

13CLI Quick Referenceaws sagemaker …

aws sagemaker create-training-job --cli-input-json file://j.json★
Start a training job; define AlgorithmSpec + ResourceConfig in JSON.
aws sagemaker describe-training-job --training-job-name my-job
Get status, metrics, model artifact S3 path.
aws sagemaker list-training-jobs --status-equals InProgress
Paginated list; filter by status, creation time, name prefix.
aws sagemaker create-model --model-name x
Register a model artifact from S3 with an inference container image.
aws sagemaker create-endpoint --endpoint-name x★
Deploy endpoint from an existing endpoint-config.
aws sagemaker describe-endpoint --endpoint-name x
Check InService / Creating / Failed status.
aws sagemaker delete-endpoint --endpoint-name x★
Stop billing — always delete unused endpoints!
aws sagemaker-runtime invoke-endpoint --endpoint-name x --body …★
Call the endpoint; response written to --outfile.
aws sagemaker create-pipeline / start-pipeline-execution
Create + trigger a SageMaker Pipeline from JSON definition.

★AWS AI/ML Services Landscapebeyond SageMaker

Bedrock★
Managed FMs via API — Claude, Llama, Mistral, Titan, Stable Diffusion. No ML infra needed; pay-per-token.
Rekognition
Computer vision: detect labels, faces, text, unsafe content, celebrities. Fully managed, no training required.
Comprehend
NLP: sentiment, entities (people/org/location), key phrases, PII detection, topic modeling. Comprehend Medical for healthcare.
Textract
OCR++ — extracts text, forms (key-value), and tables from PDFs/images. Beyond basic OCR.
Transcribe
Speech-to-text with speaker diarization, custom vocabulary, PII redaction.
Polly
Text-to-speech with neural voices and SSML markup control.
Translate
Neural machine translation; custom terminology support.
Lex
Conversational AI — build chatbots and voice bots with the same tech as Alexa.
Forecast
Time-series forecasting as a service; no ML expertise needed. Similar to DeepAR but fully managed.
Personalize
Real-time recommendations (like Amazon.com's engine) — content, products, similar items.
Kendra
Intelligent enterprise search with semantic understanding across internal documents.
Amazon Q Developer
AI coding assistant (formerly CodeWhisperer) — inline completions, code reviews, CLI helpers.

AWS SageMaker AI cheat sheet

Workflow patterns & decision guides

Choose your inference mode

SageMaker Pipelines flow

SageMaker AI feature map by lifecycle phase

Worth memorizing