[2603.25112] Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Open weights models, datasets, and frameworks
Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Abstract page for arXiv paper 2603.24772: Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Val...
Abstract page for arXiv paper 2603.25325: How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
This article evaluates the ability of small and reasoning large language models (LLMs) to assess journal article quality, revealing that ...
The paper presents BEP, a novel Binary Error Propagation algorithm for training Binary Neural Networks (BNNs) that enables efficient back...
This article presents a novel method for detecting backdoors in LoRA adapters by analyzing their weight matrices, achieving high accuracy...
OpenAI has hired Peter Steinberger, creator of the popular OpenClaw AI assistant, who will continue to support the open-source project wh...
The article discusses the implementation of a self-hosted Claude swarm on cloud infrastructure, focusing on its resilience during system ...
OpenAI's hiring of Peter Steinberger, creator of OpenClaw, signals a shift in the AI landscape towards developing robust AI agents capabl...
NVIDIA has launched the Nemotron-Nano-9B-v2-Japanese, a lightweight language model designed to enhance Japanese language understanding an...
Mistral AI has acquired Koyeb, a startup focused on simplifying AI app deployment, marking its first acquisition to enhance its cloud inf...
S-EB-GNN-Q is an open-source JAX framework designed for semantic-aware resource allocation in 6G networks, focusing on energy minimizatio...
Cohere has launched Tiny Aya, a family of open multilingual models that support over 70 languages and can run on everyday devices, enhanc...
ModSSC is an open-source Python framework designed for semi-supervised classification, enhancing reproducibility and experimentation acro...
This paper presents SAFE, a framework for automated proof generation in Rust code, addressing the challenge of insufficient human-written...
Orcheo is an open-source platform designed to streamline conversational search by offering a modular architecture, production-ready infra...
This article presents a multi-agent framework for medical AI that enhances clinical query processing by leveraging fine-tuned language mo...
The paper presents a novel approach to font classification using DINOv2, achieving high accuracy with minimal parameter tuning and introd...
The article presents DCTracks, a new open dataset designed for machine learning-based track reconstruction in drift chambers, featuring s...
This article discusses a global audit of Large Language Models (LLMs) focusing on geographic and socioeconomic biases in AI governance, h...
The paper presents EmbeWebAgent, a framework for embedding web agents into existing user interfaces, enhancing their robustness and actio...
This paper explores the PyCM library for evaluating multi-class classifiers, emphasizing the importance of diverse evaluation metrics in ...
OpenAI has hired the creator of OpenClaw, an innovative open-source AI assistant that performs various tasks, marking a significant devel...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime