Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
16
6
25
Emin Temiz
PRO
etemiz
Follow
zaylen6565's profile picture
Monk69's profile picture
Anaslbrahim85's profile picture
91 followers
·
23 following
https://pickabrain.ai
etemiz
etemiz
etemiz
AI & ML interests
Alignment
Recent Activity
liked
a model
about 20 hours ago
huihui-ai/Huihui-GLM-4.6-abliterated-GGUF
reacted
to
georgewritescode
's
post
with 🚀
2 days ago
Announcing Artificial Analysis Long Context Reasoning (AA-LCR), a new benchmark to evaluate long context performance through testing reasoning capabilities across multiple long documents (~100k tokens) The focus of AA-LCR is to replicate real knowledge work and reasoning tasks, testing capability critical to modern AI applications spanning document analysis, codebase understanding, and complex multi-step workflows. AA-LCR is 100 hard text-based questions that require reasoning across multiple real-world documents that represent ~100k input tokens. Questions are designed so answers cannot be directly found but must be reasoned from multiple information sources, with human testing verifying that each question requires genuine inference rather than retrieval. Key takeaways: ➤ Today’s leading models achieve ~70% accuracy: the top three places go to OpenAI o3 (69%), xAI Grok 4 (68%) and Qwen3 235B 2507 Thinking (67%) ➤👀 We also already have gpt-oss results! 120B performs close to o4-mini (high), in-line with OpenAI claims regarding model performance. We will be following up shortly with a Intelligence Index for the models. ➤ 100 hard text-based questions spanning 7 categories of documents (Company Reports, Industry Reports, Government Consultations, Academia, Legal, Marketing Materials and Survey Reports) ➤ ~100k tokens of input per question, requiring models to support a minimum 128K context window to score on this benchmark ➤ ~3M total unique input tokens spanning ~230 documents to run the benchmark (output tokens typically vary by model) We’re adding AA-LCR to the Artificial Analysis Intelligence Index, and taking the version number to v2.2. Artificial Analysis Intelligence Index v2.2 now includes: MMLU-Pro, GPQA Diamond, AIME 2025, IFBench, LiveCodeBench, SciCode and AA-LCR. Link to dataset: https://huggingface.co/datasets/ArtificialAnalysis/AA-LCR
replied
to
their
post
3 days ago
I realized when I ask longer answers to my questions, the models sometimes produce completely opposite answer. What could be the reason? I do mostly CPT. Should I convert my dataset to SFT and give longer reasonings too for it to have integrity? Example: Is the yolk of an egg more beneficial or the white? Answer in 100 words. Answer: Yolk is more beneficial because .......... Example: Is the yolk of an egg more beneficial or the white? Answer in 500 words. Answer: White is more beneficial because .......... Edit: These happen in temp = 0.0
View all activity
Organizations
None yet
etemiz
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
published
an
article
5 months ago
view article
Article
Curation is All You Need
Aug 1
•
2
published
an
article
8 months ago
view article
Article
Fine Tuning Gemma 3 For Human Alignment
May 17
•
4
published
an
article
9 months ago
view article
Article
Benchmarking Human Alignment of Grok 3
Apr 15
•
2
published
an
article
9 months ago
view article
Article
AHA Leaderboard
Mar 30
•
4
published
an
article
10 months ago
view article
Article
Building a Beneficial AI
Mar 16
•
6
published
an
article
10 months ago
view article
Article
Ways to Align AI with Human Values
Feb 26
published
an
article
11 months ago
view article
Article
The AHA Indicator
Feb 1
•
3
published
an
article
11 months ago
view article
Article
DeepSeek R1 Human Alignment Tests
Jan 25
•
1
published
an
article
about 1 year ago
view article
Article
Symbiotic Intelligence
Nov 19, 2024
•
3