File size: 5,186 Bytes
5194b26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---

language: en
license: apache-2.0
model_name: gpt2-lm-head-10.onnx
tags:
- validated
- text
- machine_comprehension
- gpt-2
---

<!--- SPDX-License-Identifier: Apache-2.0 -->

# GPT-2

## Use-cases
Transformer-based language model for text generation.

## Description
[GPT-2](https://openai.com/blog/better-language-models/) is a large transformer-based language model with a simple objective: predict the next word, given all of the previous words within some text.

## Model

|Model        |Download  | Download (with sample test data)|ONNX version|Opset version|Accuracy |
|-------------|:--------------|:--------------|:--------------|:--------------|:--------------|
|GPT-2       |[522.81 MB](model/gpt2-10.onnx) | [438.3 MB](model/gpt2-10.tar.gz)| 1.6 | 10 |mAP of [0.024](https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit#gid=0)|
|GPT-2-LM-HEAD |[664.87 MB](model/gpt2-lm-head-10.onnx) | [607 MB](model/gpt2-lm-head-10.tar.gz)| 1.6 | 10 |mAP of [0.024](https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit#gid=0)|


### Source
PyTorch GPT-2 ==> ONNX GPT-2
PyTorch GPT-2 + script changes ==> ONNX GPT-2-LM-HEAD


## Inference
The script for ONNX model conversion and ONNX Runtime inference is [here](dependencies/GPT2-export.py).

### Input to model
Sequence of words as a string. Example: "Here is some text to encode : Hello World", tokenized by Byte-Pair-Encoding.
**input_ids**: Indices of input tokens in the vocabulary. It's a long tensor of dynamic shape (batch_size, sequence_length).







### Preprocessing steps

Use ```tokenizer.encode()``` to encode the input text:

```python

text = "Here is some text to encode : Hello World"

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

tokens_tensor = torch.tensor([torch.tensor(tokenizer.encode(text))])

```



### Output of model

For GPT-2 model:



**last_hidden_state**: Sequence of hidden-states at the last layer of the model. It's a float tensor of size (batch_size, sequence_length, hidden_size).

**past**: pre-computed hidden-states. It's a list of tensors (key and values in the attention blocks) of size (batch_size, num_heads, sequence_length, sequence_length), one per each layer.



Output of this model is the tuple (last_hidden_state, past)



For GPT-2-LM-HEAD model:



**prediction_scores**: Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). It's a float tensor of size (batch_size, sequence_length, vocab_size).

**past**: pre-computed hidden-states. It's a list of tensors (key and values in the attention blocks) of size (batch_size, num_heads, sequence_length, sequence_length), one per each layer.



Output of this model is the tuple (prediction_scores, past)



Note that output_hidden_states=False and output_attentions=False in the PretrainedConfig configs.



### Postprocessing steps

For GPT-2 model:



```python

outputs = model(input_ids)

last_hidden_states = outputs[0]

```



For GPT-2-LM-HEAD model, to generate next 10 words:

```

import numpy as np

import torch

import torch.nn.functional as F

from transformers import GPT2Tokenizer



batch_size = 1

length = 10

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')



text = "Here is some text to encode : Hello World!"

tokens = np.array(tokenizer.encode(text))

context = torch.tensor(tokens, device=device, dtype=torch.long).unsqueeze(0).repeat(batch_size, 1)

prev = context

output = context



for i in range(length):

outputs = model(prev)

logits = outputs[0]

logits = logits[:, -1, :]

log_probs = F.softmax(logits, dim=-1)

_, prev = torch.topk(log_probs, k=1, dim=-1)

output = torch.cat((output, prev), dim=1)



output = output[:, len(tokens):].tolist()

generated = 0

for i in range(batch_size):

generated += 1

text = tokenizer.decode(output[i])

print(text)

```

<hr>



## Dataset (Train and validation)

The original model from OpenAI is pretrained on a dataset of [8 million web pages](https://openai.com/blog/better-language-models).

The pretrained model is referenced in  [huggingface/transformers](https://github.com/huggingface/transformers/blob/master/transformers/modeling_gpt2.py) repository as a causal (unidirectional) transformer pre-trained using language modeling on a very large corpus of ~40 GB of text data.

https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin



<hr>



## Validation accuracy

Metric and benchmarking details are provided by HuggingFace in this [post](https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2).

<hr>





## Publication/Attribution

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, andIlya Sutskever. Language Models are Unsupervised Multitask Learners. 2019.



## References

This model is converted directly from [huggingface/transformers](https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_gpt2.py).

<hr>



## Contributors

Negin Raoof

Joddiy Zhang

<hr>



## License

Apache 2.0 License

<hr>