Agents Can Now Propose and Deploy Their Own Code Changes
150 clones yesterday. 43 stars in 3 days. Every agent framework you've used (LangChain, LangGraph, Claude Code) assumes agents are tools ...
Text understanding and language tasks
150 clones yesterday. 43 stars in 3 days. Every agent framework you've used (LangChain, LangGraph, Claude Code) assumes agents are tools ...
Abstract page for arXiv paper 2603.17839: How do LLMs Compute Verbal Confidence
Abstract page for arXiv paper 2602.03584: $V_0$: A Generalist Value Model for Any Policy at State Zero
Abstract page for arXiv paper 2603.01834: Probing Materials Knowledge in LLMs: From Latent Embeddings to Reliable Predictions
Abstract page for arXiv paper 2603.01824: OpenAutoNLU: Open Source AutoML Library for NLU
Abstract page for arXiv paper 2603.01719: Co-optimization for Adaptive Conformal Prediction
Abstract page for arXiv paper 2603.01710: Legal RAG Bench: an end-to-end benchmark for legal RAG
Abstract page for arXiv paper 2601.06502: DRAGON: LLM-Driven Decomposition and Reconstruction Agents for Large-Scale Combinatorial Optimi...
Abstract page for arXiv paper 2603.01691: Building a Strong Instruction Language Model for a Less-Resourced Language
Abstract page for arXiv paper 2603.01590: IDProxy: Cold-Start CTR Prediction for Ads and Recommendation at Xiaohongshu with Multimodal LLMs
Abstract page for arXiv paper 2512.01351: Benchmarking Overton Pluralism in LLMs
Abstract page for arXiv paper 2603.01471: Reconstructing Content via Collaborative Attention to Improve Multimodal Embedding Quality
Abstract page for arXiv paper 2510.26144: The FM Agent
Abstract page for arXiv paper 2603.01448: SEAnet: A Deep Learning Architecture for Data Series Similarity Search
Abstract page for arXiv paper 2510.16234: ScholarEval: Research Idea Evaluation Grounded in Literature
Abstract page for arXiv paper 2509.24156: Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
Abstract page for arXiv paper 2509.23589: BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
Abstract page for arXiv paper 2509.23465: ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Pro...
Abstract page for arXiv paper 2509.21028: Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles
Abstract page for arXiv paper 2509.12282: AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science
Abstract page for arXiv paper 2506.06905: Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
Abstract page for arXiv paper 2603.00945: Non-Rectangular Average-Reward Robust MDPs: Non-Rectangular Average-Reward Robust MDPs:Optimal ...
Abstract page for arXiv paper 2503.11832: Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated ...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime