License Conflict: llama3.2 vs CC BY-NC 4.0

by qiuqiu666 - opened Jun 28

Jun 28

Hi, I’d like to report a potential license conflict in tensorblock/llama3.2-typhoon2-t1-3b-research-preview-GGUF. Based on the model card, this model is distributed under the LLaMA 3.2 Community License. However, the training dataset scb10x/typhoon-t1-3b-research-preview-data is published under the CC BY-NC 4.0 license.

This combination raises potential license compatibility concerns, as LLaMA 3.2 and CC BY-NC 4.0 impose different, and potentially conflicting, restrictions on model use, redistribution, and downstream licensing

⚠️ Key incompatibilities:

LLaMA 3.2 License:
  • Prohibits relicensing or sublicensing under other licenses (e.g., CC BY-NC)
  • Allows limited commercial use (MAU under threshold), subject to Meta’s Acceptable Use Policy
  • Requires that derivative models retain the LLaMA name and be distributed under the LLaMA 3.2 License

CC BY-NC 4.0 License:
  • Strictly prohibits any commercial use of the dataset and derivative works
  • Requires attribution to the dataset authors
  • May apply NonCommercial restrictions to downstream outputs (including trained models)

This could lead to uncertainty for downstream users regarding:

 • Whether the model can be used for research or commercial applications
 • Whether attribution to the dataset is required (currently not mentioned)
 • Whether commercial usage restrictions under CC BY-NC are being fully inherited

While both licenses limit commercial use, they do so in different and incompatible ways:

CC BY-NC 4.0 flatly prohibits commercial use;
LLaMA 3.2 allows conditional commercial use under specific thresholds;
LLaMA 3.2 prohibits relicensing, meaning the model cannot legally inherit the CC BY-NC license, even if required.

This makes it legally unclear how the model should be used, and may result in a license violation of either or both upstream sources.

🔹 Suggestion:

To help clarify the licensing situation and ensure alignment with upstream terms, here are a few options to consider:

1. Clearly document in the model card or README that the model was trained on CC BY-NC 4.0–licensed data, and is thus subject to non-commercial use only.
2. Add attribution for the dataset (e.g., a "Data sources" section with license info and dataset link).
3. Clarify that even though the model is released under the LLaMA 3.2 License, users must also comply with the dataset’s non-commercial use requirement.
4. If commercial use is desired, consider retraining the model on datasets with more permissive licenses.

Hope this helps! Let me know if you have any questions or need more info.

Thanks for your attention!

qiuqiu666

Jul 10

Your reply would be much appreciated!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment