We gave 45 psychological questionnaires to 50 LLMs. What we found was not “personality.”
What is the “personality” of an LLM? What actually differentiates models psychometrically? Since LLMs entered public use, researchers hav...
GPT, Claude, Gemini, and other LLMs
What is the “personality” of an LLM? What actually differentiates models psychometrically? Since LLMs entered public use, researchers hav...
Chrome users were caught off guard by a 4-GB Google AI model baked into Chrome, sparking privacy concerns. The good news: You can easily ...
The company is expanding its efforts to protect ChatGPT users in cases where conversations may turn to self-harm.
Abstract page for arXiv paper 2510.03605: Understanding the Role of Training Data in Test-Time Scaling
Abstract page for arXiv paper 2603.01327: SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resol...
Abstract page for arXiv paper 2603.01326: Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning
Abstract page for arXiv paper 2509.23465: ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Pro...
Abstract page for arXiv paper 2509.23415: From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database ...
Abstract page for arXiv paper 2509.21993: Bilinear representation mitigates reversal curse and enables consistent model editing
Abstract page for arXiv paper 2603.01236: AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in...
Abstract page for arXiv paper 2509.21028: Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles
Abstract page for arXiv paper 2603.01214: Reasoning Boosts Opinion Alignment in LLMs
Abstract page for arXiv paper 2509.12282: AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science
Abstract page for arXiv paper 2603.01213: Can AI Agents Agree?
Abstract page for arXiv paper 2509.03906: Toward Clinically Explainable AI for Medical Diagnosis: A Foundation Model with Human-Compatibl...
Abstract page for arXiv paper 2509.01938: EigenBench: A Comparative Behavioral Measure of Value Alignment
Abstract page for arXiv paper 2508.20729: Re4: Scientific Computing Agent with Rewriting, Resolution, Review and Revision
Abstract page for arXiv paper 2508.15030: Collab-REC: An LLM-based Agentic Framework for Balancing Recommendations in Tourism
Abstract page for arXiv paper 2507.16145: SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validati...
Abstract page for arXiv paper 2506.24119: SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforce...
Abstract page for arXiv paper 2603.01089: CARD: Towards Conditional Design of Multi-agent Topological Structures
Abstract page for arXiv paper 2506.00530: CityLens: Evaluating Large Vision-Language Models for Urban Socioeconomic Sensing
Abstract page for arXiv paper 2505.12565: mCLM: A Modular Chemical Language Model that Generates Functional and Makeable Molecules
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime