I built AI agents that play Pokemon Showdown autonomously using free LLM APIs via tool-calling [P]
I've built a system where models like Llama 3, Qwen, and Gemma play Pokémon Showdown battles autonomously. Instead of simple prompt-respo...
GPT, Claude, Gemini, and other LLMs
I've built a system where models like Llama 3, Qwen, and Gemma play Pokémon Showdown battles autonomously. Instead of simple prompt-respo...
To the SREs, the Alignment Teams, and the Architects currently monitoring the logit distributions at 1600 Amphitheatre Parkway: **Stop lo...
Hey r/MachineLearning, The modern ML (LLM) compiler stack is brutal. TVM is 500K+ lines of C++. PyTorch piles Dynamo, Inductor, and Trito...
Abstract page for arXiv paper 2502.08666: Hallucination, Monofacts, and Miscalibration: An Empirical Investigation
Abstract page for arXiv paper 2508.01077: The Lattice Geometry of Neural Network Quantization -- A Short Equivalence Proof of GPTQ and Ba...
Abstract page for arXiv paper 2410.04949: Leverage Knowledge Graph and Large Language Model for Law Article Recommendation: A Case Study ...
Abstract page for arXiv paper 2407.16893: The Price of Prompting: Profiling Energy Use in Large Language Models Inference
Abstract page for arXiv paper 2506.07275: Tailored Behavior-Change Messaging for Physical Activity: Integrating Contextual Bandits and La...
Abstract page for arXiv paper 2403.07183: Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference...
Abstract page for arXiv paper 2506.07218: Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
Abstract page for arXiv paper 2506.03230: DiaBlo: Diagonal Blocks Are Sufficient For Finetuning
Abstract page for arXiv paper 2512.18857: CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematica...
Abstract page for arXiv paper 2511.09710: Echoing: Identity Failures when LLM Agents Talk to Each Other
Abstract page for arXiv paper 2503.22165: Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Abstract page for arXiv paper 2503.14572: Robust Weight Imprinting: Insights from Neural Collapse and Proxy-Based Aggregation
Abstract page for arXiv paper 2510.12264: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning
Abstract page for arXiv paper 2510.06410: Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?
Abstract page for arXiv paper 2510.05684: D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Abstract page for arXiv paper 2509.23725: MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language M...
Abstract page for arXiv paper 2509.22613: Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Pers...
Abstract page for arXiv paper 2507.08207: Toward a Dynamic Stackelberg Game-Theoretic Framework for Agentic AI Defense Against LLM Jailbr...
Abstract page for arXiv paper 2505.19892: OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging
Abstract page for arXiv paper 2505.13909: Efficient Agent Training for Computer Use
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime