[2603.25112] Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Open weights models, datasets, and frameworks
Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Abstract page for arXiv paper 2603.24772: Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Val...
Abstract page for arXiv paper 2603.25325: How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
Abstract page for arXiv paper 2505.02881: Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Abstract page for arXiv paper 2512.12411: Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs
Abstract page for arXiv paper 2603.02041: EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post...
Abstract page for arXiv paper 2603.01973: CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production
Abstract page for arXiv paper 2603.00917: Prompt Sensitivity and Answer Consistency of Small Open-Source Large Language Models on Clinica...
Abstract page for arXiv paper 2508.03716: FeynTune: Large Language Models for High-Energy Theory
Abstract page for arXiv paper 2509.16622: Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
Abstract page for arXiv paper 2505.00624: FineScope : SAE-guided Data Selection Enables Domain Specific LLM Pruning and Finetuning
Elon Musk criticizes OpenAI during a deposition, asserting that their technology has not led to severe consequences, highlighting his vie...
The article discusses a newly developed open-source tool called Automated, designed to simplify workflow automation by allowing users to ...
AudioMuse-AI-DCLAP is a distilled version of the LAION CLAP model for music, allowing users to search songs by text through a shared embe...
The discussion revolves around Qwen3.5's MoE architecture, debating whether its low active parameter count signifies a significant breakt...
This article presents a minimal implementation of discrete text diffusion in Python, inspired by Karpathy's MicroGPT, showcasing the core...
This article discusses an experiment using steelman prompting to evaluate bias in six major LLMs, focusing on their interpretations of 1 ...
Elon Musk criticizes OpenAI's safety record in a deposition for his lawsuit against the company, claiming his AI venture, xAI, prioritize...
The article discusses recent fixes for the AMD XDNA Ryzen AI driver in Linux 7.0-rc2, highlighting improvements and updates that enhance ...
Tessera introduces an innovative protocol for AI-to-AI knowledge transfer, enabling models to share learned knowledge without direct arch...
AgentHub proposes a registry for AI agents that enhances discoverability, verifiability, and reproducibility, addressing gaps in current ...
SPD Learn is a new Python library designed for geometric deep learning, specifically for neural decoding using symmetric positive definit...
TorchLean is a framework that formalizes neural networks within the Lean 4 theorem prover, enabling precise semantics for execution and v...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime