[2603.25112] Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Open weights models, datasets, and frameworks
Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Abstract page for arXiv paper 2603.24772: Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Val...
Abstract page for arXiv paper 2603.25325: How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
Sovereign Mohawk is a Go-based runtime for federated learning that addresses scaling and trust issues, achieving empirical validation for...
Molmo2 introduces a new family of open-weight vision-language models that excel in video understanding and grounding, featuring innovativ...
MIP Candy is a modular framework built on PyTorch for medical image processing, offering a flexible pipeline for data handling, training,...
POMDPPlanners is an open-source Python package designed for the empirical evaluation of POMDP planning algorithms, integrating advanced f...
Buzzdetect is an open-source AI tool that uses machine learning and microphones to monitor pollinator activity in real-time, providing a ...
OpenAI successfully dismissed xAI's trade secrets lawsuit, with the court ruling that xAI failed to demonstrate any misconduct by OpenAI ...
Spanish startup Multiverse Computing has launched a free compressed version of its HyperNova 60B AI model, claiming it outperforms Mistra...
The article discusses mlx-onnx, a tool that converts MLX models into ONNX format for execution in web browsers using WebGPU, targeting de...
The article explores the potential for a peer-to-peer (P2P) distributed AI model, emphasizing a decentralized approach that relies on ver...
A Reddit user is seeking programming buddies to collaborate with on coding projects, inviting all types of programmers to join the initia...
New Relic has launched an AI agent platform and enhanced OpenTelemetry tools to improve data observability for enterprises, allowing bett...
This article introduces a minimalist implementation of Recursive Language Models (RLMs), providing a tutorial and open-source code reposi...
Whisper-Accent is a project aimed at enhancing Whisper's performance in recognizing accented English speech, providing tools for research...
The discussion highlights concerns over the prevalence of academic papers in machine learning that lack accompanying code, questioning th...
The paper introduces Emotion-LLaMAv2 and MMEVerse, a new framework and benchmark aimed at enhancing multimodal emotion understanding thro...
Interpreto is an open-source library designed for interpreting HuggingFace transformers, offering both attribution methods and concept-ba...
The paper presents Selective Chain-of-Thought (Selective CoT), a method to enhance medical question answering efficiency using large lang...
This paper introduces the 'Curse of Depth' in Large Language Models (LLMs), revealing that many deep layers are ineffective due to Pre-La...
The paper presents SafePickle, a machine-learning-based scanner designed to detect malicious Pickle-based ML models, achieving a high F1-...
Hexagon-MLIR presents an open-source compilation stack designed for Qualcomm's NPUs, enhancing AI workload performance by optimizing Trit...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime