Commit
·
f5442ab
1
Parent(s):
603198b
Update: 최종 완료 모델에 대한 README 확정
Browse files
README.md
CHANGED
|
@@ -20,25 +20,25 @@ model-index:
|
|
| 20 |
name: text2text-generation # Optional. Example: Speech Recognition
|
| 21 |
metrics:
|
| 22 |
- type: bleu # Required. Example: wer. Use metric id from https://hf.co/metrics
|
| 23 |
-
value: 0.
|
| 24 |
name: eval_bleu # Optional. Example: Test WER
|
| 25 |
-
verified:
|
| 26 |
- type: rouge1 # Required. Example: wer. Use metric id from https://hf.co/metrics
|
| 27 |
-
value: 0.
|
| 28 |
name: eval_rouge1 # Optional. Example: Test WER
|
| 29 |
-
verified:
|
| 30 |
- type: rouge2 # Required. Example: wer. Use metric id from https://hf.co/metrics
|
| 31 |
-
value: 0.
|
| 32 |
name: eval_rouge2 # Optional. Example: Test WER
|
| 33 |
-
verified:
|
| 34 |
- type: rougeL # Required. Example: wer. Use metric id from https://hf.co/metrics
|
| 35 |
-
value: 0.
|
| 36 |
name: eval_rougeL # Optional. Example: Test WER
|
| 37 |
-
verified:
|
| 38 |
- type: rougeLsum # Required. Example: wer. Use metric id from https://hf.co/metrics
|
| 39 |
-
value: 0.
|
| 40 |
name: eval_rougeLsum # Optional. Example: Test WER
|
| 41 |
-
verified:
|
| 42 |
---
|
| 43 |
|
| 44 |
# ko-barTNumText(TNT Model🧨): Try Number To Korean Reading(숫자를 한글로 바꾸는 모델)
|
|
@@ -78,33 +78,12 @@ aihub에서 데이터를 받으실 분은 한국인일 것이므로, 한글로
|
|
| 78 |
|
| 79 |
|
| 80 |
## Uses
|
| 81 |
-
This Model is inferenced token BACKWARD. so, you have to `flip` before `tokenizer.decode()` <br />
|
| 82 |
-
해당 모델은 inference시 역순으로 예측합니다. (밥을 6시에 먹었어 -> 어 먹었 시에 여섯 을 밥) <br />
|
| 83 |
-
때문에 `tokenizer.decode`를 수행하기 전에, `flip`으로 역순으로 치환해주세요.
|
| 84 |
-
|
| 85 |
Want see more detail follow this URL [KoGPT_num_converter](https://github.com/ddobokki/KoGPT_num_converter) <br /> and see `bart_inference.py` and `bart_train.py`
|
| 86 |
-
|
| 87 |
-
class BartText2TextGenerationPipeline(Text2TextGenerationPipeline):
|
| 88 |
-
def postprocess(self, model_outputs, return_type=ReturnType.TEXT, clean_up_tokenization_spaces=False):
|
| 89 |
-
records = []
|
| 90 |
-
reversed_model_outputs = torch.flip(model_outputs["output_ids"][0], dims=[-1])
|
| 91 |
-
for output_ids in reversed_model_outputs:
|
| 92 |
-
if return_type == ReturnType.TENSORS:
|
| 93 |
-
record = {f"{self.return_name}_token_ids": output_ids}
|
| 94 |
-
elif return_type == ReturnType.TEXT:
|
| 95 |
-
record = {
|
| 96 |
-
f"{self.return_name}_text": self.tokenizer.decode(
|
| 97 |
-
output_ids,
|
| 98 |
-
skip_special_tokens=True,
|
| 99 |
-
clean_up_tokenization_spaces=clean_up_tokenization_spaces,
|
| 100 |
-
)
|
| 101 |
-
}
|
| 102 |
-
records.append(record)
|
| 103 |
-
return records
|
| 104 |
-
```
|
| 105 |
## Evaluation
|
| 106 |
Just using `evaluate-metric/bleu` and `evaluate-metric/rouge` in huggingface `evaluate` library <br />
|
| 107 |
-
[Training wanDB URL](https://wandb.ai/bart_tadev/BartForConditionalGeneration/runs/
|
|
|
|
| 108 |
## How to Get Started With the Model
|
| 109 |
```python
|
| 110 |
from transformers.pipelines import Text2TextGenerationPipeline
|
|
@@ -112,8 +91,7 @@ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
|
| 112 |
texts = ["그러게 누가 6시까지 술을 마시래?"]
|
| 113 |
tokenizer = AutoTokenizer.from_pretrained("lIlBrother/ko-barTNumText")
|
| 114 |
model = AutoModelForSeq2SeqLM.from_pretrained("lIlBrother/ko-barTNumText")
|
| 115 |
-
|
| 116 |
-
seq2seqlm_pipeline = BartText2TextGenerationPipeline(model=model, tokenizer=tokenizer)
|
| 117 |
kwargs = {
|
| 118 |
"min_length": 0,
|
| 119 |
"max_length": 1206,
|
|
|
|
| 20 |
name: text2text-generation # Optional. Example: Speech Recognition
|
| 21 |
metrics:
|
| 22 |
- type: bleu # Required. Example: wer. Use metric id from https://hf.co/metrics
|
| 23 |
+
value: 0.9313276940897475 # Required. Example: 20.90
|
| 24 |
name: eval_bleu # Optional. Example: Test WER
|
| 25 |
+
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
|
| 26 |
- type: rouge1 # Required. Example: wer. Use metric id from https://hf.co/metrics
|
| 27 |
+
value: 0.9607081256861959 # Required. Example: 20.90
|
| 28 |
name: eval_rouge1 # Optional. Example: Test WER
|
| 29 |
+
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
|
| 30 |
- type: rouge2 # Required. Example: wer. Use metric id from https://hf.co/metrics
|
| 31 |
+
value: 0.9394649136169404 # Required. Example: 20.90
|
| 32 |
name: eval_rouge2 # Optional. Example: Test WER
|
| 33 |
+
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
|
| 34 |
- type: rougeL # Required. Example: wer. Use metric id from https://hf.co/metrics
|
| 35 |
+
value: 0.9605735834651536 # Required. Example: 20.90
|
| 36 |
name: eval_rougeL # Optional. Example: Test WER
|
| 37 |
+
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
|
| 38 |
- type: rougeLsum # Required. Example: wer. Use metric id from https://hf.co/metrics
|
| 39 |
+
value: 0.9605993760190767 # Required. Example: 20.90
|
| 40 |
name: eval_rougeLsum # Optional. Example: Test WER
|
| 41 |
+
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
|
| 42 |
---
|
| 43 |
|
| 44 |
# ko-barTNumText(TNT Model🧨): Try Number To Korean Reading(숫자를 한글로 바꾸는 모델)
|
|
|
|
| 78 |
|
| 79 |
|
| 80 |
## Uses
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
Want see more detail follow this URL [KoGPT_num_converter](https://github.com/ddobokki/KoGPT_num_converter) <br /> and see `bart_inference.py` and `bart_train.py`
|
| 82 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
## Evaluation
|
| 84 |
Just using `evaluate-metric/bleu` and `evaluate-metric/rouge` in huggingface `evaluate` library <br />
|
| 85 |
+
[Training wanDB URL](https://wandb.ai/bart_tadev/BartForConditionalGeneration/runs/326xgytt?workspace=user-bart_tadev)
|
| 86 |
+
|
| 87 |
## How to Get Started With the Model
|
| 88 |
```python
|
| 89 |
from transformers.pipelines import Text2TextGenerationPipeline
|
|
|
|
| 91 |
texts = ["그러게 누가 6시까지 술을 마시래?"]
|
| 92 |
tokenizer = AutoTokenizer.from_pretrained("lIlBrother/ko-barTNumText")
|
| 93 |
model = AutoModelForSeq2SeqLM.from_pretrained("lIlBrother/ko-barTNumText")
|
| 94 |
+
seq2seqlm_pipeline = Text2TextGenerationPipeline(model=model, tokenizer=tokenizer)
|
|
|
|
| 95 |
kwargs = {
|
| 96 |
"min_length": 0,
|
| 97 |
"max_length": 1206,
|