[2603.25112] Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Open weights models, datasets, and frameworks
Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Abstract page for arXiv paper 2603.24772: Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Val...
Abstract page for arXiv paper 2603.25325: How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
The article discusses how synthetic personas can help overcome data scarcity in AI development in Japan, showcasing NTT DATA's innovative...
The article discusses the development of two open neuromorphic processors, Catalyst N1 and N2, which achieve feature parity with Intel's ...
The SoftDTW-CUDA for PyTorch package offers a fast and memory-efficient implementation of Soft Dynamic Time Warping, optimized for GPU us...
The article discusses the dual impact of AI coding tools on open-source software, highlighting both the ease of feature development and t...
EVMbench is an open-source benchmark developed by OpenAI and Paradigm to evaluate AI agents' capabilities in handling smart contract secu...
The article discusses various hyperparameter optimization libraries in machine learning, including hyperopt, Optuna, sklearn.GridSearchCV...
A Reddit user shares their first implementation of a Transformer architecture using PyTorch, detailing the structure and parameters used,...
This article investigates the integration and management of pre-trained models (PTMs) in open-source software projects, introducing the c...
This article presents a novel approach combining Chain-of-Thought (CoT) and Retrieval Augmented Generation (RAG) to improve rare disease ...
The article presents VERA-MH, an open-source evaluation tool designed to assess the safety of AI in mental health contexts, focusing on s...
The paper presents MARLEM, a novel multi-agent reinforcement learning framework designed for studying implicit cooperation in decentraliz...
This article investigates the mental state reasoning of language models (LMs) using 41 open-weight models, revealing insights into their ...
This article explores the geometric limitations of steering personality traits in large language models (LLMs), revealing that traits are...
Arthur Mensch sees a major transition under way, with traditional SaaS services being replaced by proprietary AI apps.
Utterance is an open-source SDK designed to improve voice app interactions by addressing issues with pauses and interruptions, inviting c...
Gradio's new gr.HTML feature allows users to create interactive web apps using a single Python file, enabling seamless integration of fro...
Brian Heseung Kim introduces an open-source framework designed to help researchers utilize LLM coding assistants for efficient data analy...
IBM and UC Berkeley explore the failures of enterprise agents in IT automation, utilizing IT-Bench and MAST to diagnose issues and improv...
The article presents a speculative theory suggesting that OpenAI engineered the viral success of OpenClaw to promote its own products, ra...
Indian AI lab Sarvam launches new large language models, including 30B and 105B parameter models, aiming to challenge foreign AI systems ...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime