Trained a Qwen2.5-0.5B-Instruct bf16 model on Reddit post summarization task with GRPO [P]
So, a few days back I shared a post where I trained a tiny Qwen2.5-0.5B-Instruct model on smoltldr (reddit post summarization dataset of ...
ML algorithms, training, and inference
So, a few days back I shared a post where I trained a tiny Qwen2.5-0.5B-Instruct model on smoltldr (reddit post summarization dataset of ...
Meta is working to build an AI version of its CEO Mark Zuckerberg, which he will use to interact with employees, according to a report fr...
We build our inner voices from the voices we're in dialogue with. Vygotsky established this nearly a century ago. For people in sustained...
Abstract page for arXiv paper 2603.24961: Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math
Abstract page for arXiv paper 2603.24943: FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context...
Abstract page for arXiv paper 2603.24933: Decoding Market Emotions in Cryptocurrency Tweets via Predictive Statement Classification with ...
Abstract page for arXiv paper 2603.24929: LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics
Abstract page for arXiv paper 2603.24904: On the Foundations of Trustworthy Artificial Intelligence
Abstract page for arXiv paper 2603.24866: How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical G...
Abstract page for arXiv paper 2603.24853: Resisting Humanization: Ethical Front-End Design Choices in AI for Sensitive Contexts
Abstract page for arXiv paper 2603.24787: ReLope: KL-Regularized LoRA Probes for Multimodal LLM Routing
Abstract page for arXiv paper 2603.24768: Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineeri...
Abstract page for arXiv paper 2603.24747: Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach
Abstract page for arXiv paper 2603.24742: Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour
Abstract page for arXiv paper 2603.24676: When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs
Abstract page for arXiv paper 2603.24621: ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence
I built CodexLib (https://codexlib.io) — a curated repository of 100+ deep knowledge bases in compressed, AI-optimized format. The idea: ...
Most people just type into ChatGPT like it's Google. Claude with a structured system prompt using XML tags behaves like a completely diff...
Hi everyone, I'm a master's student working on anatomy-aware unsupervised anomaly detection in chest X-rays. My thesis uses ADAM v2 (Auto...
Expert Data Science Consultant | 20-Year Track Record As a seasoned data scientist and fractional leader, I excel at tackling complex pro...
Wrote up the process of pushing Qwen 3.5 27B (dense, FP8) to 1.1M total tok/s on 96 B200 GPUs with vLLM v0.18.0. DP=8 nearly 4x'd through...
Energy-based model This article will compare EBMs to multi-layered perceptrons, and addresses a lingering question : Whether or not EBMs ...
Quick insight from building retrieval infrastructure for AI agents: Most agents stuff 50,000 tokens of context into every prompt. They re...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime