otpensource-vision / README.md

Update README.md

8eaec31 verified 10 months ago

4.95 kB

	---
	language:
	- ko
	- en
	library_name: transformers
	base_model: Bllossom/llama-3.2-Korean-Bllossom-AICA-5B
	tags:
	- vision-language
	- korean
	- image-to-text
	- multilingual
	- fashion
	- e-commerce
	- text-classification
	- text-generation-inference
	- transformers
	- unsloth
	- mllama
	datasets:
	- hateslopacademy/otpensource_data
	inference: true
	license: cc-by-4.0
	model_name: otpensource-vision
	size_categories: 1K<n<10K
	task_categories:
	- image-to-text
	- text-classification
	task_ids:
	- image-captioning
	- sentiment-analysis
	---

	# otpensource-vision

	## 모델 설명

	otpensource-vision은 Bllossom/llama-3.2-Korean-Bllossom-AICA-5B를 기반으로 학습된 Vision-Language 모델입니다. 해당 모델은 한국어와 영어로 작성된 텍스트와 이미지를 결합하여 다양한 태스크를 수행할 수 있도록 설계되었습니다.

	### 주요 특징
	- Bllossom 기반 학습: llama-3.2-Korean-Bllossom-AICA-5B를 기반으로 학습된 모델로, 언어 모델과 시각-언어 모델의 장점을 모두 제공합니다.
	- Vision-Language 태스크 지원: 이미지를 입력받아 텍스트 정보를 생성하거나, 텍스트 입력만으로 자연어 처리 태스크를 수행할 수 있습니다.
	- 패션 데이터를 활용한 학습: 한국어 패션 데이터셋(otpensource_data)을 활용하여 옷의 카테고리, 색상, 계절, 특징 등 관련 정보를 추출하도록 학습되었습니다.
	- 상업적 활용 가능: 라이선스는 CC-BY-4.0으로 상업적 이용이 가능합니다.

	---

	## 모델 세부사항

	### 학습 데이터
	모델 학습에 사용된 데이터셋:
	- [otpensource_dataset](https://huggingface.co/datasets/hateslopacademy/otpensource_dataset):
	- 약 9000개의 패션 데이터로 구성
	- 옷의 카테고리, 색상, 계절, 특징, 이미지 URL 등을 포함하며, Vision-Language 학습에 최적화

	### 학습 방식
	- 기반 모델: Bllossom/llama-3.2-Korean-Bllossom-AICA-5B
	- GPU 요구사항: A100 40GB 이상 권장
	- 최적화: Vision-Language 태스크와 한국어 텍스트 태스크를 통합적으로 학습

	---

	## 주요 사용 사례

	### Vision-Language 태스크
	1. 이미지 분석
	- 입력된 이미지에서 옷의 카테고리, 색상, 계절, 특징을 추출하여 JSON 형식으로 반환.
	- 예시:
	```json
	{
	"category": "트렌치코트",
	"gender": "여",
	"season": "SS",
	"color": "네이비",
	"material": "",
	"feature": "트렌치코트"
	}
	```

	2. 언어모델 태스크
	- 텍스트만 입력했을 때 자연어 처리를 수행하며, 질문 응답, 텍스트 요약, 감정 분석 등 다양한 태스크 수행 가능.

	---

	## 학습 및 성능

	### LogicKor 벤치마크 성능 (Bllossom 기반 모델 성능)
	\| Category \| Single Turn \| Multi Turn \|
	\|----------------\|-------------\|------------\|
	\| Reasoning \| 6.57 \| 5.29 \|
	\| Math \| 6.43 \| 6.29 \|
	\| Writing \| 9.14 \| 8.71 \|
	\| Coding \| 8.00 \| 9.14 \|
	\| Understanding \| 8.14 \| 9.29 \|
	\| Grammar \| 6.71 \| 4.86 \|

	### 학습 구성
	- 모델 크기: 5B 파라미터
	- 학습 데이터 크기: 약 9000개의 시각-언어 데이터
	- 평가 결과: 패션 관련 태스크에서 높은 정확도와 효율성 제공

	---

	## 코드 예시

	### Vision-Language 태스크

	```python
	from transformers import MllamaForConditionalGeneration, MllamaProcessor
	import torch
	from PIL import Image
	import requests

	model = MllamaForConditionalGeneration.from_pretrained(
	'otpensource-vision',
	torch_dtype=torch.bfloat16,
	device_map='auto'
	)
	processor = MllamaProcessor.from_pretrained('otpensource-vision')

	url = "https://image.msscdn.net/thumbnails/images/prd_img/20240710/4242307/detail_4242307_17205916382801_big.jpg?w=1200"
	image = Image.open(requests.get(url, stream=True).raw)

	messages = [
	{'role': 'user', 'content': [
	{'type': 'image', 'image': image},
	{'type': 'text', 'text': '이 옷의 정보를 JSON으로 알려줘.'}
	]}
	]

	input_text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	inputs = processor(
	image=image,
	text=input_text,
	add_special_tokens=False,
	return_tensors="pt",
	).to(model.device)

	output = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
	print(processor.decode(output[0]))
	```

	# Uploaded finetuned model

	- Developed by: hateslopacademy
	- License: apache-2.0
	- Finetuned from model : Bllossom/llama-3.2-Korean-Bllossom-AICA-5B

	This mllama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)