Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
datasets:
|
| 3 |
+
- llm-book/JGLUE
|
| 4 |
+
language:
|
| 5 |
+
- ja
|
| 6 |
+
library_name: Transformers PHP
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
tags:
|
| 9 |
+
- onnx
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
https://huggingface.co/masato12/bert-base-japanese-v3-jsts-with-tokenizer with ONNX weights to be compatible with Transformers PHP
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
# bert-base-japanese-v3-jsts
|
| 16 |
+
|
| 17 |
+
「[大規模言語モデル入門](https://www.amazon.co.jp/dp/4297136333)」の第5章で紹介している(意味類似度計算)のモデルです。
|
| 18 |
+
[cl-tohoku/bert-base-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3)を[JGLUE](https://huggingface.co/datasets/llm-book/JGLUE)のJSTSデータセットでファインチューニングして構築されています。
|
| 19 |
+
|
| 20 |
+
## 関連リンク
|
| 21 |
+
|
| 22 |
+
* [GitHubリポジトリ](https://github.com/ghmagazine/llm-book)
|
| 23 |
+
* [Colabノートブック(訓練)](https://colab.research.google.com/github/ghmagazine/llm-book/blob/main/chapter5/5-4-sts-finetuning.ipynb)
|
| 24 |
+
* [Colabノートブック(推論)](https://colab.research.google.com/github/ghmagazine/llm-book/blob/main/chapter5/5-4-sts-analysis.ipynb)
|
| 25 |
+
* [データセット](https://huggingface.co/datasets/llm-book/JGLUE)
|
| 26 |
+
* [大規模言語モデル入門(Amazon.co.jp)](https://www.amazon.co.jp/dp/4297136333/)
|
| 27 |
+
* [大規模言語モデル入門(gihyo.jp)](https://gihyo.jp/book/2023/978-4-297-13633-8)
|
| 28 |
+
|
| 29 |
+
## 使い方
|
| 30 |
+
```python
|
| 31 |
+
from transformers import pipeline
|
| 32 |
+
|
| 33 |
+
text_sim_pipeline = pipeline(
|
| 34 |
+
model="llm-book/bert-base-japanese-v3-jsts",
|
| 35 |
+
function_to_apply="none",
|
| 36 |
+
)
|
| 37 |
+
text = "川べりでサーフボードを持った人たちがいます"
|
| 38 |
+
sim_text = "サーファーたちが川べりに立っています"
|
| 39 |
+
# textとsim_textの類似度を計算
|
| 40 |
+
result = text_sim_pipeline({"text": text, "text_pair": sim_text})
|
| 41 |
+
print(result["score"])
|
| 42 |
+
# 3.5703558921813965
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
## ライセンス
|
| 46 |
+
|
| 47 |
+
[Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
| 48 |
+
---
|
| 49 |
+
|
| 50 |
+
Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).
|