[P] Trained a small BERT on 276K Kubernetes YAMLs using tree positional encoding instead of sequential
I trained a BERT-style transformer on 276K Kubernetes YAML files, replacing standard positional encoding with learned tree coordinates (d...
Text understanding and language tasks
I trained a BERT-style transformer on 276K Kubernetes YAML files, replacing standard positional encoding with learned tree coordinates (d...
Hi guys, I'm a PhD student in Applied AI and I've been building an embeddable graph database engine from scratch in Rust. I'd love feedba...
I keep seeing people recommend chatgpt for financial modeling and I need to push back because I spent a month testing it for multifamily ...
The paper presents Taesar, a data-centric framework designed to enhance recommendation model performance by addressing data sparsity and ...
This survey paper explores the development of personalized LLM-powered agents, focusing on their foundations, evaluation metrics, and fut...
The paper presents a two-stage framework for enhancing large reasoning models (LRMs) by addressing overthinking in low-complexity queries...
MobilityBench introduces a benchmark for evaluating LLM-based route-planning agents, addressing challenges in real-world mobility scenari...
This paper presents a novel framework for aligning safety measures in multilingual large language models (LLMs) through Sparse Weight Edi...
The paper presents SideQuest, a novel model-driven approach for managing KV cache in long-horizon reasoning tasks, achieving significant ...
This paper explores the concept of strategy executability in mathematical reasoning, highlighting the differences between human and model...
CourtGuard introduces a model-agnostic framework for zero-shot policy adaptation in LLM safety, enhancing adaptability and performance wi...
The paper presents TEFL, a novel framework for multi-horizon time series forecasting that utilizes prediction residuals to enhance accura...
The paper presents Metacognitive Behavioral Tuning (MBT), a framework designed to enhance large reasoning models by incorporating human-l...
This article reviews the integration of AI into life cycle assessment (LCA), highlighting trends, themes, and future directions using lar...
This paper presents TRC², a novel architecture for continual learning in language models that mitigates catastrophic forgetting while mai...
This paper analyzes latent reasoning methods under varying supervision levels, revealing key issues like shortcut behavior and the trade-...
The paper proposes autonomous memory agents that enhance LLMs by actively acquiring and curating knowledge, improving performance on benc...
The paper presents Agent Behavioral Contracts (ABC), a framework for specifying and enforcing the behavior of autonomous AI agents, addre...
This article presents a framework for Multi-Level Causal Embeddings, which allows for the mapping of detailed causal models into coarser ...
This paper explores the reliability and efficiency of large language models (LLMs) using Random Matrix Theory. It introduces EigenTrack f...
This paper introduces GYWI, a system that enhances scientific idea generation by integrating co-author knowledge graphs with retrieval-au...
OmniZip introduces a unified and lightweight lossless compressor designed for multi-modal data, enhancing compression efficiency across v...
The paper presents AutoQRA, a framework that optimizes mixed-precision quantization and low-rank adapters for efficient fine-tuning of la...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime