HiTZ's Capitalization & Punctuation model for Basque
You need to install torch, transformers and sentencepiece on your enviroment.
Inference example:
from transformers import pipeline
model_path = "./eu_norm-eu" # Path to the folder where the model's files are
device = 0 # 0-->GPU, -1-->CPU
segment_list = ["kaixo egun on guztioi", "faktoria e i te beko irratian entzuten da", "gutxi gora behera ehuneko berrogeita bikoa","lau zortzi hamabost hamasei hogeita hiru berrogeita bi", "nire jaio urtea mila bederatziehun eta laurogeita hamasei da", "informazio gehiago hitz puntu e hatxe u puntu eus web horrian"]
translator = pipeline(task="translation", model=model_path, tokenizer=model_path, device=device)
result_list = translator(segment_list)
cp_segment_list = [result["translation_text"] for result in result_list]
for text, cp_text in zip(segment_list, cp_segment_list):
print(f"Normalized: {text}\n With C&P: {cp_text}\n")
Expected output:
Normalized: kaixo egun on guztioi
With C&P: Kaixo, egun on guztioi.
Normalized: faktoria e i te beko irratian entzuten da
With C&P: Faktoria EiTBko irratian entzuten da.
Normalized: gutxi gora behera ehuneko berrogeita bikoa
With C&P: Gutxi gora behera %42koa.
Normalized: lau zortzi hamabost hamasei hogeita hiru berrogeita bi
With C&P: Lau, zortzi, hamabost, hamasei, hogeita hiru, berrogeita bi.
Normalized: nire jaio urtea mila bederatziehun eta laurogeita hamasei da
With C&P: Nire jaio urtea 1996 da.
Normalized: informazio gehiago hitz puntu e hatxe u puntu eus web horrian
With C&P: Informazio gehiago hitz.ehu.eus web horrian.
- Downloads last month
- 42