atsuki-yamaguchi commited on
Commit
3251033
·
verified ·
1 Parent(s): 139c09f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ language:
4
+ - en
5
+ - te
6
+ base_model:
7
+ - atsuki-yamaguchi/gemma-2-9b-te-30K-align
8
+ ---
9
+
10
+ # Gemma2 9B for Telugu + ElChat
11
+
12
+ This model is built on top of [atsuki-yamaguchi/gemma-2-9b-te-30K-align](https://huggingface.co/atsuki-yamaguchi/gemma-2-9b-te-30K-align).
13
+ The model uses the ElChat approach to mitigate catastrophic forgetting of the original capabilities of the source Gemma2 model.
14
+
15
+ ## Model Details
16
+
17
+ * **Vocabulary**: This model has an additional 100 target vocabulary.
18
+ * **Target vocabulary initialization**: The target weights of the embedding were initialized using Align.
19
+ * **Training**: This model was additionally pre-trained on 30K target language sentences sampled from CC-100. The training was conducted with the 2x2LS/MTP/512 strategies introduced in the paper.
20
+ * **Post-hoc adaptation**: This model used ElChat, a training-free, post-hoc method. See https://arxiv.org/abs/2412.11704 for details.
21
+
22
+ ## Model Description
23
+
24
+ - **Language:** Telugu
25
+ - **License:** Gemma Terms of Use
26
+ - **Fine-tuned from model:** google/gemma-2-9b
27
+
28
+
29
+ ## Model Sources
30
+
31
+ - **Repository:** https://github.com/gucci-j/lowres-cve
32
+ - **Paper:** https://arxiv.org/abs/2406.11477
33
+
34
+ ## How to Get Started with the Model
35
+ Use the code below to get started with the model.
36
+ ```python
37
+ from transformers import AutoTokenizer, AutoModelForCausalLM
38
+
39
+ model = AutoModelForCausalLM.from_pretrained(
40
+ "atsuki-yamaguchi/gemma-2-9b-te-30K-align-merge"
41
+ )
42
+ tokenizer = AutoTokenizer.from_pretrained(
43
+ "atsuki-yamaguchi/gemma-2-9b-te-30K-align-merge"
44
+ )
45
+ ```
46
+
47
+
48
+ ## Citation
49
+ ```
50
+ @article{yamaguchi-etal-2024-effectively,
51
+ title={How Can We Effectively Expand the Vocabulary of LLMs with 0.01GB of Target Language Text?},
52
+ author={Atsuki Yamaguchi and Aline Villavicencio and Nikolaos Aletras},
53
+ year={2024},
54
+ journal={ArXiv},
55
+ year={2024},
56
+ volume={abs/2406.11477},
57
+ url={https://arxiv.org/abs/2406.11477},
58
+ }
59
+ ```