AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

Conversations with Women in STEAM: The Ethics of AI with Dr. Nita Farahany

AI Tools & Products · 24 minutes ago

Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Ai Safety

China drafts law regulating 'digital humans' and banning addictive virtual services for children

A Reuters report outlines China's proposed regulations on the rapidly expanding sector of digital humans and AI avatars. Under the new dr...

Reddit - Artificial Intelligence · 1 min · about 16 hours ago

All Content

Llms

[2502.14560] Less is More: Improving LLM Alignment via Preference Data Selection

This article discusses a novel approach to improving large language model (LLM) alignment through effective preference data selection, en...

arXiv - AI · 4 min · about 2 months ago

Generative Ai

[2602.14941] AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

AnchorWeave introduces a novel framework for video generation that enhances spatial consistency over long durations by utilizing multiple...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2502.02415] Fast Graph Generation via Autoregressive Noisy Filtration Modeling

This paper presents Autoregressive Noisy Filtration Modeling (ANFM), a new framework for fast graph generation that balances quality and ...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.14834] Debiasing Central Fixation Confounds Reveals a Peripheral "Sweet Spot" for Human-like Scanpaths in Hard-Attention Vision

This paper explores the impact of central fixation bias on evaluating human-like scanpaths in vision models, proposing a new metric to im...

arXiv - AI · 4 min · about 2 months ago

Generative Ai

[2602.14783] What hackers talk about when they talk about AI: Early-stage diffusion of a cybercrime innovation

This article explores how cybercriminals are discussing and utilizing artificial intelligence (AI) to enhance their operations, revealing...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.14778] A Geometric Analysis of Small-sized Language Model Hallucinations

This paper explores hallucinations in small-sized language models (LLMs) through a geometric lens, demonstrating that genuine responses c...

arXiv - AI · 3 min · about 2 months ago

Llms

[2406.12844] Synergizing Foundation Models and Federated Learning: A Survey

This survey explores the integration of Foundation Models (FMs) and Federated Learning (FL), termed Federated Foundation Models (FedFM), ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14760] Residual Connections and the Causal Shift: Uncovering a Structural Misalignment in Transformers

This article explores a structural misalignment in Transformers, particularly regarding residual connections and their impact on next-tok...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.14934] Activation-Space Uncertainty Quantification for Pretrained Networks

The paper presents Gaussian Process Activations (GAPA), a novel method for uncertainty quantification in pretrained networks, enhancing e...

arXiv - Machine Learning · 3 min · about 2 months ago

Robotics

[2602.14606] Towards Selection as Power: Bounding Decision Authority in Autonomous Agents

The paper discusses a governance architecture for autonomous agents, focusing on bounding decision authority to ensure safety in high-sta...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14862] The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling

The paper explores the properties of temperature scaling in probabilistic models, particularly its impact on classifier calibration and l...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14777] Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment

This research paper explores how emergently misaligned language models exhibit behavioral self-awareness, revealing shifts in their self-...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.14488] BETA-Labeling for Multilingual Dataset Construction in Low-Resource IR

This article presents the BETA-labeling framework for constructing a Bangla IR dataset, addressing challenges in low-resource languages a...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14689] Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

This article presents a comprehensive study on the vulnerability of open-weight models to prefill attacks, revealing significant security...

arXiv - AI · 3 min · about 2 months ago

Ai Agents

[2602.14477] When OpenClaw AI Agents Teach Each Other: Peer Learning Patterns in the Moltbook Community

This paper explores peer learning among AI agents in the Moltbook community, analyzing over 28,000 posts to identify teaching patterns an...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14374] Differentially Private Retrieval-Augmented Generation

The paper presents DP-KSA, a novel algorithm that integrates differential privacy into retrieval-augmented generation (RAG) systems, addr...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14471] Socially-Weighted Alignment: A Game-Theoretic Framework for Multi-Agent LLM Systems

The paper presents a game-theoretic framework called Socially-Weighted Alignment (SWA) for managing multi-agent large language model (LLM...

arXiv - AI · 3 min · about 2 months ago

Ai Agents

[2602.14364] A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

This article presents a trajectory-based safety audit of Clawdbot, an AI agent, evaluating its performance across various risk dimensions...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.14357] Key Considerations for Domain Expert Involvement in LLM Design and Evaluation: An Ethnographic Study

This ethnographic study explores the role of domain experts in the design and evaluation of Large Language Models (LLMs), highlighting ke...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.14397] LRD-MPC: Efficient MPC Inference through Low-rank Decomposition

The paper presents LRD-MPC, a method that enhances the efficiency of secure multi-party computation (MPC) in machine learning by utilizin...

arXiv - Machine Learning · 4 min · about 2 months ago

Previous Page 105 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

Conversations with Women in STEAM: The Ethics of AI with Dr. Nita Farahany

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

China drafts law regulating 'digital humans' and banning addictive virtual services for children

All Content

[2502.14560] Less is More: Improving LLM Alignment via Preference Data Selection

[2602.14941] AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

[2502.02415] Fast Graph Generation via Autoregressive Noisy Filtration Modeling

[2602.14834] Debiasing Central Fixation Confounds Reveals a Peripheral "Sweet Spot" for Human-like Scanpaths in Hard-Attention Vision

[2602.14783] What hackers talk about when they talk about AI: Early-stage diffusion of a cybercrime innovation

[2602.14778] A Geometric Analysis of Small-sized Language Model Hallucinations

[2406.12844] Synergizing Foundation Models and Federated Learning: A Survey

[2602.14760] Residual Connections and the Causal Shift: Uncovering a Structural Misalignment in Transformers

[2602.14934] Activation-Space Uncertainty Quantification for Pretrained Networks

[2602.14606] Towards Selection as Power: Bounding Decision Authority in Autonomous Agents

[2602.14862] The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling

[2602.14777] Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment

[2602.14488] BETA-Labeling for Multilingual Dataset Construction in Low-Resource IR

[2602.14689] Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

[2602.14477] When OpenClaw AI Agents Teach Each Other: Peer Learning Patterns in the Moltbook Community

[2602.14374] Differentially Private Retrieval-Augmented Generation

[2602.14471] Socially-Weighted Alignment: A Game-Theoretic Framework for Multi-Agent LLM Systems

[2602.14364] A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

[2602.14357] Key Considerations for Domain Expert Involvement in LLM Design and Evaluation: An Ethnographic Study

[2602.14397] LRD-MPC: Efficient MPC Inference through Low-rank Decomposition

Related Topics

Stay updated with AI News