Top Open Source AI This Week
The most engaging open source ai content from this week, curated by AI News.
-
1
LLM rankings are not a ladder: experimental results from a transitive benchmark graph [D]
I built a small website called LLM Win: https://llm-win.com It turns LLM benchmark results into a directed graph: text If model A beats model B on benchmark X, add an edge A -> B. Then it search...
Reddit - Machine Learning · 2 days ago -
2
[2605.07731] Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs
Abstract page for arXiv paper 2605.07731: Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs
arXiv - AI · about 9 hours ago -
3
[P] QLoRA Fine-Tuning of Qwen2.5-1.5B for CEFR English Proficiency Classification (A1–C2) [P]
I fine-tuned Qwen2.5-1.5B for multi-class CEFR English proficiency classification using QLoRA (4-bit NF4). The goal was to classify English text into one of the 6 CEFR levels (A1 → C2), which can b...
Reddit - Machine Learning · 7 days ago -
4
Locally running Mistral on an i7 from 2017 so I don't waste water or ram
submitted by /u/Heavy-Factor-1919 [link] [comments]
Reddit - Artificial Intelligence · 1 day ago -
5
[2605.02069] Pair2Score: Pairwise-to-Absolute Transfer for LLM-Based Essay Scoring
Abstract page for arXiv paper 2605.02069: Pair2Score: Pairwise-to-Absolute Transfer for LLM-Based Essay Scoring
arXiv - AI · 6 days ago -
6
[2507.01955] How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Abstract page for arXiv paper 2507.01955: How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
arXiv - AI · 7 days ago -
7
[2505.18244] Emergent Hierarchical Structure in Large Language Models: An Information-Theoretic Framework for Multi-Scale Representation
Abstract page for arXiv paper 2505.18244: Emergent Hierarchical Structure in Large Language Models: An Information-Theoretic Framework for Multi-Scale Representation
arXiv - AI · 4 days ago -
8
[Hiring] Relations Manager for AI (Remote)
Hiring: AI industry-savvy outreach / ecosystem operator (contract or freelance) I run a small AI company building proprietary domain-specific models, and I need someone who understands the AI indus...
Reddit - ML Jobs · 4 days ago -
9
[2605.01148] Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
Abstract page for arXiv paper 2605.01148: Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
arXiv - AI · 6 days ago -
10
[2605.00914] The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate
Abstract page for arXiv paper 2605.00914: The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate
arXiv - AI · 6 days ago -
11
[2501.19201] Efficient Reasoning with Hidden Thinking
Abstract page for arXiv paper 2501.19201: Efficient Reasoning with Hidden Thinking
arXiv - AI · 6 days ago -
12
Meta sued by major book publishers over copyright infringement | The Verge
Five major publishers, including Macmillan, McGraw-Hill, Cengage, and others, are suing Meta over claims that the company copied their works to train its Llama AI models.
The Verge - AI · 6 days ago -
13
MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required
A Blog post by Lablab.ai AMD Developer Hackathon on Hugging Face
Hugging Face Blog · 3 days ago -
14
[2605.04177] Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa
Abstract page for arXiv paper 2605.04177: Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa
arXiv - Machine Learning · 4 days ago -
15
MachinaCheck: Building a Multi-Agent CNC Manufacturability System on AMD MI300X
A Blog post by Lablab.ai AMD Developer Hackathon on Hugging Face
Hugging Face Blog · about 21 hours ago -
16
[2512.22671] Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2
Abstract page for arXiv paper 2512.22671: Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2
arXiv - AI · 4 days ago -
17
Made a tool that builds its own training data and improves each cycle by learning from what it got wrong
The basic idea is pretty simple. You give it a few seed prompts. It generates instruction-response pairs, an LLM scores each one, the good ones go into your training set and the bad ones become the...
Reddit - Artificial Intelligence · 6 days ago -
18
[2605.02914] When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models
Abstract page for arXiv paper 2605.02914: When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models
arXiv - Machine Learning · 5 days ago -
19
EMO: Pretraining mixture of experts for emergent modularity
A Blog post by Ai2 on Hugging Face
Hugging Face Blog · 3 days ago -
20
[2605.03226] Self-Mined Hardness for Safety Fine-Tuning
Abstract page for arXiv paper 2605.03226: Self-Mined Hardness for Safety Fine-Tuning
arXiv - Machine Learning · 5 days ago
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime