-
togethercomputer/StripedHyena-Hessian-7B
Text Generation ⢠8B ⢠Updated ⢠20 ⢠66 -
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Paper ⢠2312.08618 ⢠Published ⢠15 -
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper ⢠2312.07987 ⢠Published ⢠41 -
LLM360: Towards Fully Transparent Open-Source LLMs
Paper ⢠2312.06550 ⢠Published ⢠57
Collections
Discover the best community collections!
Collections including paper arxiv:2401.02385
-
Llemma: An Open Language Model For Mathematics
Paper ⢠2310.10631 ⢠Published ⢠56 -
Mistral 7B
Paper ⢠2310.06825 ⢠Published ⢠55 -
Qwen Technical Report
Paper ⢠2309.16609 ⢠Published ⢠37 -
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper ⢠2309.11568 ⢠Published ⢠11
-
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper ⢠2310.10638 ⢠Published ⢠30 -
Magicoder: Source Code Is All You Need
Paper ⢠2312.02120 ⢠Published ⢠82 -
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion
Paper ⢠2312.14327 ⢠Published ⢠8 -
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
Paper ⢠2312.14187 ⢠Published ⢠50
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper ⢠2309.04662 ⢠Published ⢠24 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper ⢠2309.04827 ⢠Published ⢠17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper ⢠2309.05516 ⢠Published ⢠10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper ⢠2309.03907 ⢠Published ⢠12
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠99 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ⢠2307.08691 ⢠Published ⢠9 -
Mixtral of Experts
Paper ⢠2401.04088 ⢠Published ⢠161 -
Mistral 7B
Paper ⢠2310.06825 ⢠Published ⢠55
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper ⢠2309.05519 ⢠Published ⢠78 -
Large Language Model for Science: A Study on P vs. NP
Paper ⢠2309.05689 ⢠Published ⢠21 -
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper ⢠2309.06126 ⢠Published ⢠18 -
Large Language Models for Compiler Optimization
Paper ⢠2309.07062 ⢠Published ⢠24
-
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper ⢠2310.10837 ⢠Published ⢠11 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper ⢠2310.11453 ⢠Published ⢠105 -
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper ⢠2310.16795 ⢠Published ⢠27 -
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper ⢠2310.16836 ⢠Published ⢠14
-
togethercomputer/StripedHyena-Hessian-7B
Text Generation ⢠8B ⢠Updated ⢠20 ⢠66 -
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Paper ⢠2312.08618 ⢠Published ⢠15 -
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper ⢠2312.07987 ⢠Published ⢠41 -
LLM360: Towards Fully Transparent Open-Source LLMs
Paper ⢠2312.06550 ⢠Published ⢠57
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠99 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ⢠2307.08691 ⢠Published ⢠9 -
Mixtral of Experts
Paper ⢠2401.04088 ⢠Published ⢠161 -
Mistral 7B
Paper ⢠2310.06825 ⢠Published ⢠55
-
Llemma: An Open Language Model For Mathematics
Paper ⢠2310.10631 ⢠Published ⢠56 -
Mistral 7B
Paper ⢠2310.06825 ⢠Published ⢠55 -
Qwen Technical Report
Paper ⢠2309.16609 ⢠Published ⢠37 -
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper ⢠2309.11568 ⢠Published ⢠11
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper ⢠2309.05519 ⢠Published ⢠78 -
Large Language Model for Science: A Study on P vs. NP
Paper ⢠2309.05689 ⢠Published ⢠21 -
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper ⢠2309.06126 ⢠Published ⢠18 -
Large Language Models for Compiler Optimization
Paper ⢠2309.07062 ⢠Published ⢠24
-
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper ⢠2310.10638 ⢠Published ⢠30 -
Magicoder: Source Code Is All You Need
Paper ⢠2312.02120 ⢠Published ⢠82 -
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion
Paper ⢠2312.14327 ⢠Published ⢠8 -
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
Paper ⢠2312.14187 ⢠Published ⢠50
-
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper ⢠2310.10837 ⢠Published ⢠11 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper ⢠2310.11453 ⢠Published ⢠105 -
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper ⢠2310.16795 ⢠Published ⢠27 -
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper ⢠2310.16836 ⢠Published ⢠14
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper ⢠2309.04662 ⢠Published ⢠24 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper ⢠2309.04827 ⢠Published ⢠17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper ⢠2309.05516 ⢠Published ⢠10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper ⢠2309.03907 ⢠Published ⢠12