TeenyTinyLlama
TeenyTinyLlama is a pair of compact language models based on the Llama 2 architecture trained on a Brazilian Portuguese corpus.
-
Paper • 2401.16640 • Published • 10
-
TeenyTinyLlama-Chat
🦙4Generate text responses to questions or instructions
nicholasKluge/TeenyTinyLlama-460m
Text Generation • 0.5B • Updated • 3.55k • 11Note 460 million-parameter version of the TeenyTinyLlama.
nicholasKluge/TeenyTinyLlama-460m-awq
Text Generation • 0.1B • Updated • 13 • 1Note 460 million-parameter version (4-bit quantized via AWQ) of the TeenyTinyLlama.
nicholasKluge/TeenyTinyLlama-460m-Chat
Text Generation • 0.5B • Updated • 51 • 3Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the Instruct-Aira Dataset version 2.0.
nicholasKluge/TeenyTinyLlama-460m-Chat-awq
Text Generation • 0.1B • Updated • 21 • 1Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the Instruct-Aira Dataset version 2.0 (4-bit quantized via AWQ).
nicholasKluge/TeenyTinyLlama-460m-HateBR
Text Classification • 0.4B • Updated • 8 • 2Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the HateBR dataset.
nicholasKluge/TeenyTinyLlama-460m-FaQuAD-NLI
Text Classification • 0.4B • Updated • 13Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the FaQuAD-NLI dataset.
nicholasKluge/TeenyTinyLlama-460m-IMDB
Text Classification • 0.4B • Updated • 12 • 1Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the IMDB dataset.
nicholasKluge/TeenyTinyLlama-460m-Assin2
Text Classification • 0.4B • Updated • 10Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the Assin2 dataset.
nicholasKluge/TeenyTinyLlama-460m-AgNews
Text Classification • 0.4B • Updated • 15Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the AgNews dataset.
nicholasKluge/TeenyTinyLlama-160m
Text Generation • 0.2B • Updated • 45 • 7Note 160 million-parameter version of the TeenyTinyLlama.
nicholasKluge/TeenyTinyLlama-160m-HateBR
Text Classification • 0.1B • Updated • 28Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the HateBR dataset.
nicholasKluge/TeenyTinyLlama-160m-FaQuAD-NLI
Text Classification • 0.1B • Updated • 9Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the FaQuAD-NLI dataset.
nicholasKluge/TeenyTinyLlama-160m-IMDB
Text Classification • 0.1B • Updated • 14Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the IMDB dataset.
nicholasKluge/TeenyTinyLlama-160m-Assin2
Text Classification • 0.1B • Updated • 9Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the Assin2 dataset.
nicholasKluge/TeenyTinyLlama-160m-AgNews
Text Classification • 0.1B • Updated • 8Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the AgNews dataset.
nicholasKluge/Pt-Corpus
Viewer • Updated • 5.77M • 203 • 3Note Pt-Corpus is a concatenation of several portions of Brazilian Portuguese datasets found in the Hub, with approximately 4.1B tokens. This version does not have instructional content.
nicholasKluge/Pt-Corpus-tokenized
Viewer • Updated • 2.02M • 171Note Tokenized version of the Pt-Corpus (performed using the TeenyTinyLlama tokenizer).
nicholasKluge/Pt-Corpus-Instruct
Viewer • Updated • 10.6M • 783 • 3Note Pt-Corpus Instruct is a concatenation of several portions of Brazilian Portuguese datasets found in the Hub, with approximately 6.2B tokens. This version of the corpus includes several instances of conversational and general instructional data.
nicholasKluge/Pt-Corpus-Instruct-tokenized
Viewer • Updated • 3.06M • 299Note Tokenized version of the Pt-Corpus-Instruct (performed using the TeenyTinyLlama tokenizer).
nicholasKluge/instruct-aira-dataset-v2
Viewer • Updated • 163k • 118 • 5Note A collection of single-turn conversations between an assistant and a user.