Top Open Source AI This Week

The most engaging open source ai content from this week, curated by AI News.

This Week This Month Guide Trending

1

LLM rankings are not a ladder: experimental results from a transitive benchmark graph [D]

I built a small website called LLM Win: https://llm-win.com It turns LLM benchmark results into a directed graph: text If model A beats model B on benchmark X, add an edge A -> B. Then it search...

Reddit - Machine Learning · 2 days ago
2

[2605.07731] Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs

Abstract page for arXiv paper 2605.07731: Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs

arXiv - AI · about 9 hours ago
3

[P] QLoRA Fine-Tuning of Qwen2.5-1.5B for CEFR English Proficiency Classification (A1–C2) [P]

I fine-tuned Qwen2.5-1.5B for multi-class CEFR English proficiency classification using QLoRA (4-bit NF4). The goal was to classify English text into one of the 6 CEFR levels (A1 → C2), which can b...

Reddit - Machine Learning · 7 days ago
4

Locally running Mistral on an i7 from 2017 so I don't waste water or ram

submitted by /u/Heavy-Factor-1919 [link] [comments]

Reddit - Artificial Intelligence · 1 day ago
5

[2605.02069] Pair2Score: Pairwise-to-Absolute Transfer for LLM-Based Essay Scoring

Abstract page for arXiv paper 2605.02069: Pair2Score: Pairwise-to-Absolute Transfer for LLM-Based Essay Scoring

arXiv - AI · 6 days ago
6

[2507.01955] How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Abstract page for arXiv paper 2507.01955: How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

arXiv - AI · 7 days ago
7

[2505.18244] Emergent Hierarchical Structure in Large Language Models: An Information-Theoretic Framework for Multi-Scale Representation

Abstract page for arXiv paper 2505.18244: Emergent Hierarchical Structure in Large Language Models: An Information-Theoretic Framework for Multi-Scale Representation

arXiv - AI · 4 days ago
8

[Hiring] Relations Manager for AI (Remote)

Hiring: AI industry-savvy outreach / ecosystem operator (contract or freelance) I run a small AI company building proprietary domain-specific models, and I need someone who understands the AI indus...

Reddit - ML Jobs · 4 days ago
9

[2605.01148] Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

Abstract page for arXiv paper 2605.01148: Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

arXiv - AI · 6 days ago
10

[2605.00914] The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate

Abstract page for arXiv paper 2605.00914: The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate

arXiv - AI · 6 days ago
11

[2501.19201] Efficient Reasoning with Hidden Thinking

Abstract page for arXiv paper 2501.19201: Efficient Reasoning with Hidden Thinking

arXiv - AI · 6 days ago
12

Meta sued by major book publishers over copyright infringement | The Verge

Five major publishers, including Macmillan, McGraw-Hill, Cengage, and others, are suing Meta over claims that the company copied their works to train its Llama AI models.

The Verge - AI · 6 days ago
13

MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required

A Blog post by Lablab.ai AMD Developer Hackathon on Hugging Face

Hugging Face Blog · 3 days ago
14

[2605.04177] Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

Abstract page for arXiv paper 2605.04177: Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

arXiv - Machine Learning · 4 days ago
15

MachinaCheck: Building a Multi-Agent CNC Manufacturability System on AMD MI300X

A Blog post by Lablab.ai AMD Developer Hackathon on Hugging Face

Hugging Face Blog · about 21 hours ago
16

[2512.22671] Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

Abstract page for arXiv paper 2512.22671: Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

arXiv - AI · 4 days ago
17

Made a tool that builds its own training data and improves each cycle by learning from what it got wrong

The basic idea is pretty simple. You give it a few seed prompts. It generates instruction-response pairs, an LLM scores each one, the good ones go into your training set and the bad ones become the...

Reddit - Artificial Intelligence · 6 days ago
18

[2605.02914] When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models

Abstract page for arXiv paper 2605.02914: When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models

arXiv - Machine Learning · 5 days ago
19

EMO: Pretraining mixture of experts for emergent modularity

A Blog post by Ai2 on Hugging Face

Hugging Face Blog · 3 days ago
20

[2605.03226] Self-Mined Hardness for Safety Fine-Tuning

Abstract page for arXiv paper 2605.03226: Self-Mined Hardness for Safety Fine-Tuning

arXiv - Machine Learning · 5 days ago

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime