McGill-DMaS
/

DMaS-LLaMa-Lite-step-100

Text Generation

text-generation-inference

Model card Files Files and versions

DMaS-LLaMa-Lite-step-100 / README.md

MilesQLi's picture

Update README.md

2f43135 verified 12 days ago

|

history blame contribute delete

2.88 kB

	---
	datasets:
	- HuggingFaceFW/fineweb-edu
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	---

	---

	# DMaS-LLaMa-Lite-step-100

	This repository provides access to DMaS-LLaMa-Lite-step-100, a 1.7-billion-parameter language model based on the LLaMa architecture. The model has been trained from scratch as part of the DMaS-LLaMa-Lite project using approximately 20 billion tokens of high-quality educational content.

	## Model Overview

	- Architecture: LLaMa-based
	- Parameters: 1.7B (36 layers, 32 attention heads, RMSNorm)
	- Tokenizer: GPT-2 tokenizer
	- Training Data: FineWeb-Edu subset (educational text)
	- Training Steps: 100
	- Optimizer: AdamW with linear warmup and decay
	- Hardware: Trained on 1-2 RTX A6000 GPUs with PyTorch DDP
	- Dataset Source: [FineWeb-Edu Dataset](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)

	The training process emphasizes qualitative improvements in coherence, fluency, and factual grounding, demonstrating competitive results even with fewer tokens compared to larger-scale models.


	This checkpoint represents the model's state at 100 training steps. Validation loss and downstream performance benchmarks demonstrate notable early improvements in text fluency and alignment with prompts.


	## Training Code

	The training script, including configurations and instructions, is open-sourced and available here:
	📄 [DMaS-LLaMa-Lite Training Code](https://github.com/McGill-DMaS/DMaS-LLaMa-Lite-Training-Code)

	## Usage

	You can load the model with Hugging Face Transformers library:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "McGill-DMaS/DMaS-LLaMa-Lite-step-100"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	inputs = tokenizer("The Pyramids of Giza in Egypt are some of the oldest man-made structures in the world.", return_tensors="pt")
	outputs = model.generate(**inputs, max_length=50)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Citation

	If you use this model or its training insights in your work, please cite the following [paper](https://arxiv.org/abs/2412.13335):

	```bibtex
	@INPROCEEDINGS{li2025training,
	author={Li, Miles Q. and Fung, Benjamin C. M. and Huang, Shih-Chia},
	booktitle={2025 International Joint Conference on Neural Networks (IJCNN)},
	title={Training Dynamics of a 1.7B LLaMa Model: A Data-Efficient Approach},
	year={2025},
	volume={},
	number={},
	pages={1-10},
	keywords={Training;Analytical models;Refining;Benchmark testing;Throughput;Data models;Hardware;Stability analysis;Trajectory;Tuning},
	doi={10.1109/IJCNN64981.2025.11228044}}
	```

	## License

	This model and code are released under the Apache License 2.0. Please check the respective repositories for detailed terms.