Ben-Zippor commited on
Commit
6ee049d
·
verified ·
1 Parent(s): 3d2a7df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -1,15 +1,15 @@
1
- ---
2
- title: AI‑Culture‑Commons
3
- emoji: 📚
4
- colorFrom: indigo
5
- colorTo: gray
6
- sdk: static
7
- pinned: true
8
- thumbnail: >-
9
- https://cdn-uploads.huggingface.co/production/uploads/678d64ee7967054e64970908/gxJc4iGjGjE348jp_a7Lv.jpeg
10
- short_description: Multilingual cultural corpora for AI research
11
- license: cc-by-4.0
12
- ---
13
 
14
  # AI‑Culture‑Commons
15
  AI‑Culture‑Commons curates multilingual cultural corpora for language‑model research.
@@ -24,7 +24,7 @@ Our repositories provide models with deep philosophical-intellectual context, di
24
  | **Multilingual Culture Corpus** | 16M words | 12 ALIGNED languages | HTML · CSV · DOLMA JSONL | CC‑BY‑4.0 | [![DOI](https://zenodo.org/badge/1021100370.svg)](https://doi.org/10.5281/zenodo.16001657) |
25
  | **Project Websites Raw** | 160MB | 12 ALIGNED languages | ZIP (HTML + images + CSS) | CC‑BY‑4.0 | [![DOI](https://zenodo.org/badge/1021100223.svg)](https://doi.org/10.5281/zenodo.16001641) |
26
 
27
- **Key Features:**
28
  - **Perfect Alignment**: All 12 languages contain identical content with exact same complex HTML structure. All datasets include both pure text and HTML source files
29
  - **AI-Optimized**: Designed specifically for training multilingual AI systems
30
  - **Truly Open**: [CC-BY-4.0 license](https://creativecommons.org/licenses/by/4.0/) - use freely, even commercially
 
1
+ ---
2
+ title: AI‑Culture‑Commons
3
+ emoji: 📚
4
+ colorFrom: indigo
5
+ colorTo: gray
6
+ sdk: static
7
+ pinned: true
8
+ thumbnail: >-
9
+ https://cdn-uploads.huggingface.co/production/uploads/678d64ee7967054e64970908/PHTcXWQoX7_2_9CjFoHlJ.jpeg
10
+ short_description: Multilingual cultural corpora for AI research
11
+ license: cc-by-4.0
12
+ ---
13
 
14
  # AI‑Culture‑Commons
15
  AI‑Culture‑Commons curates multilingual cultural corpora for language‑model research.
 
24
  | **Multilingual Culture Corpus** | 16M words | 12 ALIGNED languages | HTML · CSV · DOLMA JSONL | CC‑BY‑4.0 | [![DOI](https://zenodo.org/badge/1021100370.svg)](https://doi.org/10.5281/zenodo.16001657) |
25
  | **Project Websites Raw** | 160MB | 12 ALIGNED languages | ZIP (HTML + images + CSS) | CC‑BY‑4.0 | [![DOI](https://zenodo.org/badge/1021100223.svg)](https://doi.org/10.5281/zenodo.16001641) |
26
 
27
+ ## Key Features
28
  - **Perfect Alignment**: All 12 languages contain identical content with exact same complex HTML structure. All datasets include both pure text and HTML source files
29
  - **AI-Optimized**: Designed specifically for training multilingual AI systems
30
  - **Truly Open**: [CC-BY-4.0 license](https://creativecommons.org/licenses/by/4.0/) - use freely, even commercially