mkluczek (Marcin Kluczek)

updated a dataset 17 days ago

Major-TOM/Core-S2RGB-SigLIP

Viewer • Updated 16 days ago • 20.3M • 533 • 9

liked a Space 19 days ago

Joy Caption Beta One

🖼

819

Generate captions for images with various styles and options

liked a dataset 4 months ago

wangyi111/Copernicus-Embed-025deg

Preview • Updated Jul 31 • 247 • 2

liked a model 4 months ago

apple/FastVLM-0.5B

Text Generation • 0.8B • Updated Sep 3 • 5.96k • 361

upvoted an article 6 months ago

Article

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

Jul 9

•

739

liked a dataset 6 months ago

embed2scale/SSL4EO-S12-downstream

Updated May 23 • 2.15k • 3

upvoted an article 6 months ago

Article

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

Jul 1

•

132

liked a Space 7 months ago

TerraMind Blue-Sky Challenge

🌍

29

Submit geospatial AI ideas for a bi-monthly award

upvoted 3 articles 7 months ago

Article

Vision Language Models (Better, faster, stronger)

+3

May 12

•

572

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

+5

May 21

•

244

Article

🪆 Introduction to Matryoshka Embedding Models

+1

Feb 23, 2024

•

184

liked a dataset 7 months ago

Major-TOM/Core-S2L1C-DeCUR

Viewer • Updated Apr 28 • 27.1M • 220 • 5

posted an update 8 months ago

Post

372

Expansion of Global and Dense Open Embeddings Dataset of Earth 🌍

We updated our previous embeddings release with three models MMEarth and DeCUR-S2, DeCUR-S1 of the Major TOM embeddings dataset, developed in collaboration with CloudFerro S.A. asterisk labs and Φ-lab, European Space Agency - ESA. Together with @mikonvergence , Jędrzej S. Bojanowski, we extend the open-access collection of open dataset of Copernicus embeddings built at global scale, providing dense coverage across the entire acquisition area of Sentinel-1 and Sentinel-2 sensors.

Total embedding resources after the update:
- 51 TB of AI-embeddings generated from processed Sentinel data,
- over 40 billion embedding vectors,
- processing of 147 TB of raw satellite data,
- analysis covering more than 15 million Sentinel-1 and Sentinel-2 scenes and more than 16 trillion pixels.

This project delivers open and free vectorized expansions of Major TOM datasets available on CREODIAS and Hugging Face, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.

Datasets:
Major-TOM/Core-S2L2A-MMEarth
Major-TOM/Core-S2L1C-DeCUR
Major-TOM/Core-S1RTC-DeCUR

#EarthObservation #AI #CloudFerro #asterisklabs #ESA

published a dataset 8 months ago

Major-TOM/Core-S2L1C-DeCUR

Viewer • Updated Apr 28 • 27.1M • 220 • 5

updated 2 datasets 8 months ago

Major-TOM/Core-S2L1C-DeCUR

Viewer • Updated Apr 28 • 27.1M • 220 • 5

Major-TOM/Core-S2L2A-MMEarth

Updated Apr 26 • 2.02k • 3

liked a model 8 months ago

mespinosami/COP-GEN-Beta

Updated Apr 19 • 3

liked a Space 8 months ago

COP GEN Beta

🌍

8

Official demo for the COP-GEN-Beta model

reacted to mikonvergence's post with 🚀👍 8 months ago

Post

843

🔵 𝐂𝐎𝐏-𝐆𝐄𝐍-𝐁𝐞𝐭𝐚: 𝐔𝐧𝐢𝐟𝐢𝐞𝐝 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐌𝐨𝐝𝐞𝐥𝐥𝐢𝐧𝐠 𝐨𝐟 𝐂𝐎𝐏𝐞𝐫𝐧𝐢𝐜𝐮𝐬 𝐈𝐦𝐚𝐠𝐞𝐫𝐲 𝐓𝐡𝐮𝐦𝐛𝐧𝐚𝐢𝐥𝐬

Today we release a prototype of COP-GEN - a universal generative model for Copernicus data. 𝐂𝐎𝐏-𝐆𝐄𝐍-𝐁𝐞𝐭𝐚 is a model trained globally on the thumbnails of the Major TOM Core datasets, including Sentinel-2 L1C, Sentinel-2 L2A, Sentinel-1 RTC, and COP-DEM GLO-30.

⚖️ 𝐌𝐨𝐝𝐞𝐥 mespinosami/COP-GEN-Beta

📱 𝐃𝐞𝐦𝐨 mikonvergence/COP-GEN-Beta

How is it universal? COP-GEN learns a joint generative process of all modalities, which means that it can reconstruct data from any subset of present observations. 𝐖𝐢𝐭𝐡𝐨𝐮𝐭 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐚𝐥𝐥𝐲 to perform any of these tasks it can be used to approximate:

✅ Sentinel-1 to Sentinel-2 translation

✅ Elevation estimation from Sentinel-2 or Sentinel-1

✅ Atmospheric Correction (L1C to L2A pipeline)

✅ Atmospheric Generation (L2A to L1C)

✅ ...and any other task involving translation between the supported modalities

On its own, the model can be used as a useful prior for estimating the data likelihood distribution for Copernicus data. COP-GEN-Beta learns joint, conditional, and marginal distributions within a single unified backbone, allowing to flexibly sample any modality given any condition.

Why is it Beta? Because thumbnails are a low-cost representation of the data that scales well and we managed to develop this prototype quite fast. We are currently developing the more costly COP-GEN model that supports the original data. For now, we wanted to showcase the prototype and make it available to the community for a test!

🌐 𝐖𝐞𝐛𝐬𝐢𝐭𝐞 https://miquel-espinosa.github.io/cop-gen

💻 𝐂𝐨𝐝𝐞 https://github.com/miquel-espinosa/COP-GEN-Beta

📄 𝐏𝐚𝐩𝐞𝐫 https://arxiv.org/pdf/2504.08548

1 reply

·

Marcin Kluczek

AI & ML interests

Recent Activity

Organizations

Major-TOM/Core-S2RGB-SigLIP

Joy Caption Beta One

wangyi111/Copernicus-Embed-025deg

apple/FastVLM-0.5B

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

embed2scale/SSL4EO-S12-downstream

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

TerraMind Blue-Sky Challenge

Vision Language Models (Better, faster, stronger)

nanoVLM: The simplest repository to train your VLM in pure PyTorch

🪆 Introduction to Matryoshka Embedding Models

Major-TOM/Core-S2L1C-DeCUR

Major-TOM/Core-S2L1C-DeCUR

Major-TOM/Core-S2L1C-DeCUR

Major-TOM/Core-S2L2A-MMEarth

mespinosami/COP-GEN-Beta

COP GEN Beta

Marcin Kluczek

AI & ML interests

Recent Activity

Organizations

mkluczek's activity

Joy Caption Beta One

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

TerraMind Blue-Sky Challenge

Vision Language Models (Better, faster, stronger)

nanoVLM: The simplest repository to train your VLM in pure PyTorch

🪆 Introduction to Matryoshka Embedding Models

COP GEN Beta