OwenElliott commited on
Commit
6a6a2dd
·
verified ·
1 Parent(s): 6353c49

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -3
README.md CHANGED
@@ -1,3 +1,65 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # Marqo Chimera Arctic bge M
6
+
7
+ This is a chimera model which concatenates embeddings from [Snowflake/snowflake-arctic-embed-m-v1.5](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5) and [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). This model produces an embedding with 1536 dimensions (768+768) and has a total of 218M parameters (109+109).
8
+
9
+
10
+ ## Usage
11
+
12
+ ```python
13
+ import torch
14
+ from torch.nn.functional import normalize
15
+ from transformers import AutoModel, AutoTokenizer
16
+
17
+
18
+ # Load the model and tokenizer.
19
+ tokenizer = AutoTokenizer.from_pretrained("Marqo/marqo-chimera-arctic-bge-m")
20
+ model = AutoModel.from_pretrained("Marqo/marqo-chimera-arctic-bge-m", trust_remote_code=True)
21
+ model.eval()
22
+
23
+ # Model constants.
24
+ query_prefix = 'Represent this sentence for searching relevant passages: '
25
+
26
+ # Your queries and docs.
27
+ queries = ['what is snowflake?', 'Where can I get the best tacos?']
28
+ documents = ['The Data Cloud!', 'Mexico City of Course!']
29
+
30
+
31
+ # Add query prefix and tokenize queries and docs.
32
+ queries_with_prefix = [f"{query_prefix}{q}" for q in queries]
33
+ query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=512)
34
+ document_tokens = tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=512)
35
+
36
+ # Use the model to generate text embeddings.
37
+ with torch.inference_mode():
38
+ query_embeddings = model(**query_tokens)
39
+ document_embeddings = model(**document_tokens)
40
+
41
+ # Remember to normalize embeddings.
42
+ query_embeddings = normalize(query_embeddings)
43
+ document_embeddings = normalize(document_embeddings)
44
+
45
+ # Scores via dotproduct.
46
+ scores = query_embeddings @ document_embeddings.T
47
+
48
+ # Pretty-print the results.
49
+ for query, query_scores in zip(queries, scores):
50
+ doc_score_pairs = list(zip(documents, query_scores))
51
+ doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
52
+ print(f'Query: "{query}"')
53
+ for document, score in doc_score_pairs:
54
+ print(f'Score: {score:.4f} | Document: "{document}"')
55
+ print()
56
+
57
+ #### OUTPUT ####
58
+ # Query: "what is snowflake?"
59
+ # Score: 0.3025 | Document: "The Data Cloud!"
60
+ # Score: 0.2297 | Document: "Mexico City of Course!"
61
+
62
+ # Query: "Where can I get the best tacos?"
63
+ # Score: 0.4512 | Document: "Mexico City of Course!"
64
+ # Score: 0.2336 | Document: "The Data Cloud!"
65
+ ```