Top Open Source AI This Week

The most engaging open source ai content from this week, curated by AI News.

  1. 1

    LLM rankings are not a ladder: experimental results from a transitive benchmark graph [D]

    I built a small website called LLM Win: https://llm-win.com It turns LLM benchmark results into a directed graph: text If model A beats model B on benchmark X, add an edge A -> B. Then it search...

    Reddit - Machine Learning · 2 days ago
  2. 2

    [2605.07731] Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs

    Abstract page for arXiv paper 2605.07731: Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs

    arXiv - AI · about 9 hours ago
  3. 3

    [P] QLoRA Fine-Tuning of Qwen2.5-1.5B for CEFR English Proficiency Classification (A1–C2) [P]

    I fine-tuned Qwen2.5-1.5B for multi-class CEFR English proficiency classification using QLoRA (4-bit NF4). The goal was to classify English text into one of the 6 CEFR levels (A1 → C2), which can b...

    Reddit - Machine Learning · 7 days ago
  4. 4

    Locally running Mistral on an i7 from 2017 so I don't waste water or ram

    submitted by /u/Heavy-Factor-1919 [link] [comments]

    Reddit - Artificial Intelligence · 1 day ago
  5. 5

    [2605.02069] Pair2Score: Pairwise-to-Absolute Transfer for LLM-Based Essay Scoring

    Abstract page for arXiv paper 2605.02069: Pair2Score: Pairwise-to-Absolute Transfer for LLM-Based Essay Scoring

    arXiv - AI · 6 days ago
  6. 6

    [2507.01955] How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

    Abstract page for arXiv paper 2507.01955: How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

    arXiv - AI · 7 days ago
  7. 7

    [2505.18244] Emergent Hierarchical Structure in Large Language Models: An Information-Theoretic Framework for Multi-Scale Representation

    Abstract page for arXiv paper 2505.18244: Emergent Hierarchical Structure in Large Language Models: An Information-Theoretic Framework for Multi-Scale Representation

    arXiv - AI · 4 days ago
  8. 8

    [Hiring] Relations Manager for AI (Remote)

    Hiring: AI industry-savvy outreach / ecosystem operator (contract or freelance) I run a small AI company building proprietary domain-specific models, and I need someone who understands the AI indus...

    Reddit - ML Jobs · 4 days ago
  9. 9

    [2605.01148] Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

    Abstract page for arXiv paper 2605.01148: Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

    arXiv - AI · 6 days ago
  10. 10

    [2605.00914] The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate

    Abstract page for arXiv paper 2605.00914: The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate

    arXiv - AI · 6 days ago
  11. 11

    [2501.19201] Efficient Reasoning with Hidden Thinking

    Abstract page for arXiv paper 2501.19201: Efficient Reasoning with Hidden Thinking

    arXiv - AI · 6 days ago
  12. 12

    Meta sued by major book publishers over copyright infringement | The Verge

    Five major publishers, including Macmillan, McGraw-Hill, Cengage, and others, are suing Meta over claims that the company copied their works to train its Llama AI models.

    The Verge - AI · 6 days ago
  13. 13

    MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required

    A Blog post by Lablab.ai AMD Developer Hackathon on Hugging Face

    Hugging Face Blog · 3 days ago
  14. 14

    [2605.04177] Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

    Abstract page for arXiv paper 2605.04177: Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

    arXiv - Machine Learning · 4 days ago
  15. 15

    MachinaCheck: Building a Multi-Agent CNC Manufacturability System on AMD MI300X

    A Blog post by Lablab.ai AMD Developer Hackathon on Hugging Face

    Hugging Face Blog · about 21 hours ago
  16. 16

    [2512.22671] Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

    Abstract page for arXiv paper 2512.22671: Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

    arXiv - AI · 4 days ago
  17. 17

    Made a tool that builds its own training data and improves each cycle by learning from what it got wrong

    The basic idea is pretty simple. You give it a few seed prompts. It generates instruction-response pairs, an LLM scores each one, the good ones go into your training set and the bad ones become the...

    Reddit - Artificial Intelligence · 6 days ago
  18. 18

    [2605.02914] When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models

    Abstract page for arXiv paper 2605.02914: When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models

    arXiv - Machine Learning · 5 days ago
  19. 19

    EMO: Pretraining mixture of experts for emergent modularity

    A Blog post by Ai2 on Hugging Face

    Hugging Face Blog · 3 days ago
  20. 20

    [2605.03226] Self-Mined Hardness for Safety Fine-Tuning

    Abstract page for arXiv paper 2605.03226: Self-Mined Hardness for Safety Fine-Tuning

    arXiv - Machine Learning · 5 days ago

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime