Inference Provider

VERIFIED
256,855 monthly requests

AI & ML interests

making models accessible

Recent Activity

SmerkyG  updated a model 22 days ago
featherless-ai/QRWKV-72B
KaraKaraWitch  new activity about 1 month ago
featherless-ai/try-this-model:Hihi9
SmerkyG  updated a model about 2 months ago
featherless-ai/QRWKV-QwQ-32B
View all activity

Articles

Hihi9

#15 opened about 1 month ago by
pznhi
KaraKaraWitch 
posted an update 4 months ago
view post
Post
443
What if LLMs used thinking emojis to develop their state?

:blob_think: Normal Thinking
:thinkies: Casual Thinking
:Thonk: Serious Thinking
:think_bold: Critical Thinking
:thinkspin: Research Thinking
:thinkgod: Deep Research Thinking

The last 2 are gifs. But the upload doesn't render them :)

(Credits: SwayStar123 on EAI suggested it to be a range selector, Original base idea was from me)
  • 1 reply
·

QRWKV in, Qwerky out

#4 opened 4 months ago by
wxgeorge
DarinVerheijke 
updated a Space 5 months ago
wxgeorge 
updated a Space 5 months ago

Add link to paper

#3 opened 5 months ago by
nielsr

Add pipeline tag

2
#1 opened 5 months ago by
nielsr
KaraKaraWitch 
posted an update 5 months ago
view post
Post
339
"What's wrong with using huggingface transformers?"

Here's a quick example. Am I supposed to be going in with the full knowledge of the inner workings of a LLM model?
import pathlib
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("<ModernBERT>")
# Triton is **required**, but no where in the documentation is specified that triton is needed.
# Installing triton in windows isn't super straightforward. Thankfully someone has already built wheels for it.
#  - https://github.com/woct0rdho/triton-windows/releases

model = AutoModelForSequenceClassification.from_pretrained(
    "<ModernBERT>",  # reference_compile=False
)
# By default it uses CPU. Which is slow. Move to a cuda device.
# This will actually error out if you use "gpu" instead.
model = model.to("cuda")


with torch.no_grad():
    # Not setting `return_tensors="pt"` causes
    #   File "C:\Program Files\Python310\lib\site-packages\transformers\modeling_utils.py", line 5311, in warn_if_padding_and_no_attention_mask
    #     if self.config.pad_token_id in input_ids[:, [-1, 0]]:
    #   TypeError: list indices must be integers or slices, not tuple
    # or...
    #  File "C:\Program Files\Python310\lib\site-packages\transformers\models\modernbert\modeling_modernbert.py", line 836, in forward
    #    batch_size, seq_len = input_ids.shape[:2]
    #  AttributeError: 'list' object has no attribute 'shape'
    block = tokenizer(
        pathlib.Path("test-fic.txt").read_text("utf-8"), return_tensors="pt"
    )
    block = block.to("cuda")
    # **block is needed to fix "AttributeError: 'NoneType' object has no attribute 'unsqueeze'" on attention_mask.unsqueeze(-1)
    logits = model(**block).logits

# Not moving to cpu will cause the sigmoid/softmax ops to fail.
logits = logits.to("cpu")
# print(logits)
predicted_class_ids = torch.softmax(logits, -1)[
    0
].numpy()

  • 3 replies
·
KaraKaraWitch 
posted an update 6 months ago
view post
Post
2760
> New Model
> Looks at Model Card
> "Open-Weights"
  • 1 reply
·