CLIMB-MAO

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

suchirsalhan authored a paper 15 days ago

BLiSS 1.0: Evaluating Bilingual Learner Competence in Second Language Small Language Models

suchirsalhan authored a paper 15 days ago

What is the Best Sequence Length for BABYLM?

suchirsalhan authored a paper 15 days ago

Teacher Demonstrations in a BabyLM's Zone of Proximal Development for Contingent Multi-Turn Interaction

View all activity

Organization Card

Community About org cards

🔥 NEW! BLiSS: Evaluating Bilingual Learner Competence in Second Language Small Language Models

Cross-lingual extensions of the BabyLM Shared Task beyond English incentivise the development of Small Language Models that simulate a much wider range of language acquisition scenarios, including code-switching, simultaneous and successive bilingualism and second language acquisition. However, to our knowledge, there is no benchmark of the formal competence of cognitively-inspired models of L2 acquisition, or L2LMs. To address this, we introduce the Benchmark of Learner Interlingual Syntactic Structure (BLiSS).

BLiSS consists of 1.5M naturalistic minimal pairs dataset derived from errorful sentence–correction pairs in parallel learner corpora. These are systematic patterns --overlooked by standard benchmarks of the formal competence of Language Models -- which we use to evaluate L2LMs trained in a variety of training regimes on specific properties of L2 learner language to provide a linguistically-motivated framework for controlled measure of the interlanguage competence of L2LMs.

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies. Available from: https://arxiv.org/abs/2410.22886.

Salhan et al (2024) creates age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually.

The MAO-CHILDES dataset contains extract orthographic datasets for French, German, Japanese and Chinese and several other lower-resource languages. It is part of a wider effort for cognitively-inspired pretraining using resources from Language Acquistiion.

You can also find pretrained BabyLMs for French, German, Japanese and Chinese with three different cognitively-inspired curriculum learning in the branches of each language-specific BabyLM.

Collections 2

models 39

datasets 4

climb-mao/Spanish-BabyLM

Viewer • Updated Jun 15 • 12k • 214 • 1

climb-mao/Romanian-BabyLM

Viewer • Updated May 31 • 21.8k • 78

climb-mao/Bulgarian-BabyLM

Viewer • Updated May 18 • 2.69M • 25

climb-mao/MAO-CHILDES

Updated Apr 8 • 200

CLIMB-MAO

AI & ML interests

Recent Activity

🔥 NEW! BLiSS: Evaluating Bilingual Learner Competence in Second Language Small Language Models

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies. Available from: https://arxiv.org/abs/2410.22886.

Collections 2

climb-mao/bliss_efcamdat_cn

climb-mao/bliss_clc

climb-mao/english-childes-curricula

climb-mao/french-childes-curricula

climb-mao/chinese-childes-curricula

climb-mao/spanish-childes-curricula

climb-mao/bliss_efcamdat_cn

climb-mao/bliss_clc

climb-mao/english-childes-curricula

climb-mao/french-childes-curricula

climb-mao/chinese-childes-curricula

climb-mao/spanish-childes-curricula

models 39

climb-mao/bliss_efcamdat_cn

climb-mao/bliss_clc

climb-mao/babylm-B2-fine-tuned-iter5

climb-mao/babylm-B2-fine-tuned-iter4

climb-mao/babylm-fine-tuned-B2-checkpoint448

climb-mao/babylm-B2-fine-tuned

climb-mao/babylm-A-fine-tuned

climb-mao/babylm-C-fine-tuned

climb-mao/CEFR-larger-C-babylm-urop-shivan

climb-mao/CEFR-C-babylm-urop-shivan

datasets 4

climb-mao/Spanish-BabyLM

climb-mao/Romanian-BabyLM

climb-mao/Bulgarian-BabyLM

climb-mao/MAO-CHILDES

AI & ML interests

Recent Activity

Team members 8

🔥 NEW! BLiSS: Evaluating Bilingual Learner Competence in Second Language Small Language Models

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies. Available from: https://arxiv.org/abs/2410.22886.

Collections 2

models 39 Sort: Recently updated

datasets 4 Sort: Recently updated

models 39

datasets 4