[2602.07238] Is there "Secret Sauce'' in Large Language Model Development?
Abstract page for arXiv paper 2602.07238: Is there "Secret Sauce'' in Large Language Model Development?
GPT, Claude, Gemini, and other LLMs
Abstract page for arXiv paper 2602.07238: Is there "Secret Sauce'' in Large Language Model Development?
Abstract page for arXiv paper 2602.01203: Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse
Abstract page for arXiv paper 2601.01322: LinMU: Multimodal Understanding Made Linear
Abstract page for arXiv paper 2505.15504: Exploiting Low-Dimensional Manifold of Features for Few-Shot Whole Slide Image Classification
Abstract page for arXiv paper 2505.13109: FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
Abstract page for arXiv paper 2505.12186: Self-Destructive Language Model
Abstract page for arXiv paper 2502.01481: Intrinsic Entropy of Context Length Scaling in LLMs
Abstract page for arXiv paper 2505.02881: Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Abstract page for arXiv paper 2505.02872: Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading
Abstract page for arXiv paper 2504.02010: When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reason...
Abstract page for arXiv paper 2503.12988: ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM
Abstract page for arXiv paper 2503.21735: GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics
Abstract page for arXiv paper 2503.06749: Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Abstract page for arXiv paper 2503.06238: Token-Efficient Item Representation via Images for LLM Recommender Systems
Abstract page for arXiv paper 2404.08480: Using ChatGPT for Data Science Analyses
Abstract page for arXiv paper 2503.03862: Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Mode...
Abstract page for arXiv paper 2503.02879: Wikipedia in the Era of LLMs: Evolution and Risks
Abstract page for arXiv paper 2502.12179: Sparse Shift Autoencoders for Identifying Concepts from Large Language Model Activations
Abstract page for arXiv paper 2502.04326: WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Abstract page for arXiv paper 2412.19496: Multi-PA: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models
Abstract page for arXiv paper 2411.03292: Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive ...
Abstract page for arXiv paper 2410.13648: SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
Abstract page for arXiv paper 2410.05254: GLEE: A Unified Framework and Benchmark for Language-based Economic Environments
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime