[2603.25112] Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Open weights models, datasets, and frameworks
Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Abstract page for arXiv paper 2603.24772: Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Val...
Abstract page for arXiv paper 2603.25325: How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
SplitLight is an open-source toolkit designed to enhance the evaluation of recommender systems by providing measurable and comparable dat...
The article presents Pyramid MoA, a probabilistic framework designed to optimize inference costs in large language models (LLMs) while ma...
The paper presents Nazrin, a graph neural network-based theorem proving agent that utilizes atomic tactics to enhance machine-assisted th...
The article presents the Federated Learning Playground, an interactive platform designed to teach core concepts of Federated Learning thr...
The paper presents CodeCompass, a solution to the Navigation Paradox in code intelligence, highlighting the distinction between navigatio...
The paper discusses OpenClaw, Moltbook, and ClawdLab, highlighting their role in creating a dataset for AI interactions and proposing Cla...
This paper evaluates the reasoning capabilities of Large Language Models (LLMs) through General Game Playing tasks, revealing performance...
The paper examines three-digit addition in Meta-Llama-3-8B, focusing on how arithmetic results are determined post-routing, emphasizing t...
This article provides a comprehensive guide on deploying Open Source Vision Language Models (VLMs) on NVIDIA Jetson devices, detailing th...
The article discusses a remarkable achievement in AI inference speed, showcasing a chatbot that processes 17k tokens per second using a l...
OpenLanguageModel (OLM) is an open-source PyTorch library designed for training language models, emphasizing simplicity and modularity fo...
The article discusses the need for evaluating conference paper submissions based on code quality, emphasizing its importance for job read...
Guide Labs introduces Steerling-8B, an open-sourced interpretable LLM designed to enhance understanding of AI model outputs by tracing to...
The article introduces 3LM, a benchmark designed to evaluate Arabic LLMs in STEM and coding, addressing gaps in existing assessments focu...
The article discusses the development of torch-continuum, a library that optimizes PyTorch performance by auto-detecting GPU settings, ai...
AstroMLab 4 introduces a 70B-parameter AI model specialized for astronomy, achieving benchmark-topping performance in Q&A tasks, surpassi...
AWED-FiNER introduces an innovative tool for Fine-grained Named Entity Recognition (FgNER) across 36 languages, enhancing NLP capabilitie...
This article examines the impact of AI libraries on open source software (OSS) projects, analyzing their adoption in Python and Java to u...
The paper introduces VeriSoftBench, a benchmark for formal verification in Lean, highlighting its limitations and performance insights fr...
JAX-Privacy is a new library aimed at simplifying the implementation of differentially private machine learning, offering both customizat...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime