AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

Conversations with Women in STEAM: The Ethics of AI with Dr. Nita Farahany

AI Tools & Products ·
Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

China drafts law regulating 'digital humans' and banning addictive virtual services for children

A Reuters report outlines China's proposed regulations on the rapidly expanding sector of digital humans and AI avatars. Under the new dr...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2502.14560] Less is More: Improving LLM Alignment via Preference Data Selection
Llms

[2502.14560] Less is More: Improving LLM Alignment via Preference Data Selection

This article discusses a novel approach to improving large language model (LLM) alignment through effective preference data selection, en...

arXiv - AI · 4 min ·
[2602.14941] AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories
Generative Ai

[2602.14941] AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

AnchorWeave introduces a novel framework for video generation that enhances spatial consistency over long durations by utilizing multiple...

arXiv - AI · 4 min ·
[2502.02415] Fast Graph Generation via Autoregressive Noisy Filtration Modeling
Machine Learning

[2502.02415] Fast Graph Generation via Autoregressive Noisy Filtration Modeling

This paper presents Autoregressive Noisy Filtration Modeling (ANFM), a new framework for fast graph generation that balances quality and ...

arXiv - Machine Learning · 3 min ·
[2602.14834] Debiasing Central Fixation Confounds Reveals a Peripheral "Sweet Spot" for Human-like Scanpaths in Hard-Attention Vision
Machine Learning

[2602.14834] Debiasing Central Fixation Confounds Reveals a Peripheral "Sweet Spot" for Human-like Scanpaths in Hard-Attention Vision

This paper explores the impact of central fixation bias on evaluating human-like scanpaths in vision models, proposing a new metric to im...

arXiv - AI · 4 min ·
[2602.14783] What hackers talk about when they talk about AI: Early-stage diffusion of a cybercrime innovation
Generative Ai

[2602.14783] What hackers talk about when they talk about AI: Early-stage diffusion of a cybercrime innovation

This article explores how cybercriminals are discussing and utilizing artificial intelligence (AI) to enhance their operations, revealing...

arXiv - AI · 3 min ·
[2602.14778] A Geometric Analysis of Small-sized Language Model Hallucinations
Llms

[2602.14778] A Geometric Analysis of Small-sized Language Model Hallucinations

This paper explores hallucinations in small-sized language models (LLMs) through a geometric lens, demonstrating that genuine responses c...

arXiv - AI · 3 min ·
[2406.12844] Synergizing Foundation Models and Federated Learning: A Survey
Llms

[2406.12844] Synergizing Foundation Models and Federated Learning: A Survey

This survey explores the integration of Foundation Models (FMs) and Federated Learning (FL), termed Federated Foundation Models (FedFM), ...

arXiv - AI · 4 min ·
[2602.14760] Residual Connections and the Causal Shift: Uncovering a Structural Misalignment in Transformers
Llms

[2602.14760] Residual Connections and the Causal Shift: Uncovering a Structural Misalignment in Transformers

This article explores a structural misalignment in Transformers, particularly regarding residual connections and their impact on next-tok...

arXiv - AI · 3 min ·
[2602.14934] Activation-Space Uncertainty Quantification for Pretrained Networks
Machine Learning

[2602.14934] Activation-Space Uncertainty Quantification for Pretrained Networks

The paper presents Gaussian Process Activations (GAPA), a novel method for uncertainty quantification in pretrained networks, enhancing e...

arXiv - Machine Learning · 3 min ·
[2602.14606] Towards Selection as Power: Bounding Decision Authority in Autonomous Agents
Robotics

[2602.14606] Towards Selection as Power: Bounding Decision Authority in Autonomous Agents

The paper discusses a governance architecture for autonomous agents, focusing on bounding decision authority to ensure safety in high-sta...

arXiv - AI · 4 min ·
[2602.14862] The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling
Llms

[2602.14862] The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling

The paper explores the properties of temperature scaling in probabilistic models, particularly its impact on classifier calibration and l...

arXiv - AI · 4 min ·
[2602.14777] Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment
Llms

[2602.14777] Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment

This research paper explores how emergently misaligned language models exhibit behavioral self-awareness, revealing shifts in their self-...

arXiv - Machine Learning · 3 min ·
[2602.14488] BETA-Labeling for Multilingual Dataset Construction in Low-Resource IR
Llms

[2602.14488] BETA-Labeling for Multilingual Dataset Construction in Low-Resource IR

This article presents the BETA-labeling framework for constructing a Bangla IR dataset, addressing challenges in low-resource languages a...

arXiv - AI · 4 min ·
[2602.14689] Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
Llms

[2602.14689] Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

This article presents a comprehensive study on the vulnerability of open-weight models to prefill attacks, revealing significant security...

arXiv - AI · 3 min ·
[2602.14477] When OpenClaw AI Agents Teach Each Other: Peer Learning Patterns in the Moltbook Community
Ai Agents

[2602.14477] When OpenClaw AI Agents Teach Each Other: Peer Learning Patterns in the Moltbook Community

This paper explores peer learning among AI agents in the Moltbook community, analyzing over 28,000 posts to identify teaching patterns an...

arXiv - AI · 4 min ·
[2602.14374] Differentially Private Retrieval-Augmented Generation
Llms

[2602.14374] Differentially Private Retrieval-Augmented Generation

The paper presents DP-KSA, a novel algorithm that integrates differential privacy into retrieval-augmented generation (RAG) systems, addr...

arXiv - AI · 4 min ·
[2602.14471] Socially-Weighted Alignment: A Game-Theoretic Framework for Multi-Agent LLM Systems
Llms

[2602.14471] Socially-Weighted Alignment: A Game-Theoretic Framework for Multi-Agent LLM Systems

The paper presents a game-theoretic framework called Socially-Weighted Alignment (SWA) for managing multi-agent large language model (LLM...

arXiv - AI · 3 min ·
[2602.14364] A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)
Ai Agents

[2602.14364] A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

This article presents a trajectory-based safety audit of Clawdbot, an AI agent, evaluating its performance across various risk dimensions...

arXiv - AI · 3 min ·
[2602.14357] Key Considerations for Domain Expert Involvement in LLM Design and Evaluation: An Ethnographic Study
Llms

[2602.14357] Key Considerations for Domain Expert Involvement in LLM Design and Evaluation: An Ethnographic Study

This ethnographic study explores the role of domain experts in the design and evaluation of Large Language Models (LLMs), highlighting ke...

arXiv - AI · 3 min ·
[2602.14397] LRD-MPC: Efficient MPC Inference through Low-rank Decomposition
Machine Learning

[2602.14397] LRD-MPC: Efficient MPC Inference through Low-rank Decomposition

The paper presents LRD-MPC, a method that enhances the efficiency of secure multi-party computation (MPC) in machine learning by utilizin...

arXiv - Machine Learning · 4 min ·
Previous Page 105 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime