Original Research

ML Model License Guide — Understanding Open Source AI Licenses

Comprehensive comparison of ML model licenses with a permissions matrix. Covers Apache 2.0, MIT, Llama Community, Gemma, CC-BY-NC, and 10+ other license types used in open-source AI.

By Michael Lip · Updated April 2026

Methodology

License terms are sourced from the official license text of each model as published on Hugging Face, GitHub, or the provider's website. Permissions are interpreted based on the plain text of each license. This is not legal advice — consult a lawyer for production use. Model examples are current as of April 2026. Stack Overflow data was fetched via the public API on April 10, 2026.

License Permissions Matrix

License Commercial Use Fine-tuning Redistribution Patent Grant Attribution Req. Example Models
Apache 2.0 Yes Yes Yes Yes Yes Mistral 7B, Falcon 40B, Grok-1, Arctic, Qwen 2.5
MIT Yes Yes Yes No Yes Phi-3, Phi-4, OLMo
Llama 3.1 Community Conditional Yes Conditional Yes Yes Llama 3, Llama 3.1, Llama 3.3
Gemma License Conditional Yes Conditional Yes Yes Gemma 2, Gemma 2 27B
CC-BY-4.0 Yes Yes Yes No Yes OpenAssistant, Dolly 2.0 (data)
CC-BY-NC-4.0 No Yes Non-commercial No Yes Command R+ (original), some Stability models
OpenRAIL-M Conditional Yes Conditional No Yes BLOOM, Stable Diffusion 2.x
Stability Community Conditional Yes Conditional No Yes Stable Diffusion 3, SDXL Turbo
DeepSeek License Conditional Yes Conditional No Yes DeepSeek V3, DeepSeek R1
Mistral Research No Yes No No Yes Mistral Large (weights-only)
GPL-3.0 Copyleft Yes Must share alike Yes Yes GPT-J training code, some tools
Yi License Conditional Yes Conditional No Yes Yi-Large, Yi-34B
Cohere C4AI Conditional Yes Conditional No Yes Command R, Command R+ (v2)

License Conditions Detail

License Key Restriction User Limit Derivative Naming
Llama 3.1 Community700M MAU threshold for separate license700M MAUMust include "Llama" in name
Gemma LicenseProhibited harmful uses; redistribution requires same licenseNone specifiedNo requirement
OpenRAIL-MBehavioral use restrictions (no harm, surveillance, discrimination)NoneNo requirement
Stability CommunityRevenue threshold; no competing products$1M revenueNo requirement
DeepSeek LicenseNo competing model training; usage limitsVariesNo requirement
CC-BY-NC-4.0No commercial use of any kindN/ANo requirement

Frequently Asked Questions

Can I use Llama 3 commercially?

Yes, the Llama 3/3.1 Community License allows commercial use for organizations with fewer than 700 million monthly active users. Above that threshold, you need a separate license from Meta. You must include the attribution notice "Built with Llama" in your product and comply with the Acceptable Use Policy, which prohibits certain harmful use cases.

What is the most permissive ML model license?

Apache 2.0 and MIT are the most permissive licenses. Apache 2.0 grants patent rights and allows commercial use, modification, and redistribution with minimal requirements (attribution and license notice). MIT is even simpler with just attribution required. Models using Apache 2.0 include Mistral 7B, Falcon 40B, Grok-1, and Arctic.

What is the difference between open-weights and open-source for AI?

Open-weights means the model weights are downloadable and usable, but training data, training code, and evaluation code may not be available. True open-source means all components are available: weights, training data, training code, and documentation. Most "open-source" LLMs like Llama and Mistral are technically open-weights. OLMo from AI2 is a truly open-source model.

Can I fine-tune and redistribute a model with a custom license?

It depends on the specific license. Apache 2.0 and MIT allow unrestricted fine-tuning and redistribution. The Llama Community License allows fine-tuning and redistribution but requires naming derivative models with "Llama" and maintaining the license. The Gemma License allows fine-tuning but has redistribution conditions. CC-BY-NC licenses prohibit any commercial use of fine-tuned derivatives.

Which license should I choose for my own ML model?

If you want maximum adoption, use Apache 2.0 — it provides patent protection and is well-understood by legal teams. If you want simplicity, use MIT. If you want to prevent large companies from competing with your model, consider a license with a user threshold (like Llama's 700M MAU limit). If you want attribution for research, use CC-BY-4.0. Avoid creating custom licenses unless you have legal counsel.