Question 1

What are the key security risks when deploying ML models to production?

Accepted Answer

The key security risks in ML deployment include: model artifact tampering (attackers replacing model weights with backdoored versions), inference API abuse (unauthorized access, rate limit bypass, data exfiltration through queries), container escape vulnerabilities (breaking out of the model serving container to access host systems), supply chain attacks (compromised dependencies in the model serving stack), serialization vulnerabilities (pickle deserialization attacks in PyTorch/sklearn models), GPU memory leaks (sensitive data from previous inference requests persisting in GPU memory), and side-channel attacks (timing or power analysis revealing model architecture or training data).

Question 2

How do I secure model artifacts in a production pipeline?

Accepted Answer

Secure model artifacts with: cryptographic signing using GPG or Sigstore to verify model integrity before deployment, encryption at rest using AES-256 in your model registry, checksums (SHA-256) embedded in deployment manifests that are verified on load, access-controlled model registries (MLflow, Weights & Biases, or cloud-native like SageMaker Model Registry) with audit logging, immutable storage for production model versions, and separation of training and serving environments so training credentials cannot access production model stores.

Question 3

Should ML inference endpoints use mTLS or API keys?

Accepted Answer

Use both in a defense-in-depth approach. API keys provide application-level authentication and are easy to implement and rotate. mTLS (mutual TLS) provides transport-level authentication ensuring both client and server verify each other's identity, preventing man-in-the-middle attacks. For internal services within a Kubernetes cluster, use mTLS via a service mesh (Istio, Linkerd). For external-facing endpoints, use API keys or OAuth2 tokens at the application layer plus TLS for transport encryption. Never expose inference endpoints without authentication, even in internal networks, because lateral movement from a compromised service could reach your model.

Question 4

How do I prevent pickle deserialization attacks in model serving?

Accepted Answer

Pickle deserialization is the most dangerous vulnerability in Python ML deployments. Mitigations include: use safe serialization formats (ONNX, SavedModel, SafeTensors) instead of pickle when possible, never load models from untrusted sources, implement allowlist-based deserialization using fickling or picklescan to detect malicious payloads before loading, run model loading in a sandboxed environment with restricted filesystem and network access, verify model checksums before loading, and use seccomp profiles to limit system calls available during deserialization. PyTorch's torch.load now supports a weights_only=True parameter that prevents arbitrary code execution.

Question 5

What monitoring should I implement for production ML systems?

Accepted Answer

Production ML monitoring should cover: inference latency and throughput (detect degradation attacks or resource exhaustion), prediction distribution drift (model outputs shifting from expected patterns), input data validation failures (spike in malformed or adversarial inputs), authentication failures and rate limit hits (brute force or extraction attempts), model version mismatches (unexpected model swaps), resource utilization anomalies (GPU memory, CPU spikes indicating exploitation), error rate monitoring (sudden increases may indicate attacks), and audit logging of all prediction requests with hashed inputs, outputs, model version, and user identity for forensic analysis.

Secure ML Deployment Guide

Select Your Deployment Environment

Why ML Deployment Security Differs from Traditional Software

Container Security for Model Serving

API Gateway and Rate Limiting

Secrets Management and Model Registries

Network Policies and Segmentation

Monitoring and Incident Response

Supply Chain Security

Frequently Asked Questions

Michael Lip

Secure ML Deployment Guide

Select Your Deployment Environment

Checklist Export

Why ML Deployment Security Differs from Traditional Software

Container Security for Model Serving

API Gateway and Rate Limiting

Secrets Management and Model Registries

Network Policies and Segmentation

Monitoring and Incident Response

Supply Chain Security

Frequently Asked Questions

Related Tools

ML Model Security Audit Tool

Model Robustness Scorer

AI Red Team Checklist

Michael Lip