AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Washington needs AI guardrails — now | Opinion
Ai Safety

Washington needs AI guardrails — now | Opinion

We need legislation that draws clear lines on what AI systems may and may not do on behalf of the United States government

AI Tools & Products · 3 min ·
[2601.12910] SciCoQA: Quality Assurance for Scientific Paper--Code Alignment
Ai Safety

[2601.12910] SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

Abstract page for arXiv paper 2601.12910: SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

arXiv - AI · 3 min ·
[2509.21385] Debugging Concept Bottleneck Models through Removal and Retraining
Machine Learning

[2509.21385] Debugging Concept Bottleneck Models through Removal and Retraining

Abstract page for arXiv paper 2509.21385: Debugging Concept Bottleneck Models through Removal and Retraining

arXiv - Machine Learning · 4 min ·

All Content

[2603.23780] Lightweight Fairness for LLM-Based Recommendations via Kernelized Projection and Gated Adapters
Llms

[2603.23780] Lightweight Fairness for LLM-Based Recommendations via Kernelized Projection and Gated Adapters

Abstract page for arXiv paper 2603.23780: Lightweight Fairness for LLM-Based Recommendations via Kernelized Projection and Gated Adapters

arXiv - Machine Learning · 3 min ·
Bernie Sanders and AOC propose a ban on data center construction | TechCrunch
Ai Safety

Bernie Sanders and AOC propose a ban on data center construction | TechCrunch

Senator Bernie Sanders and Rep. Alexandria Ocasio-Cortez introduced companion legislation to halt construction on new data centers until ...

TechCrunch - AI · 4 min ·
New Bernie Sanders AI Safety Bill Would Halt Data Center Construction | WIRED
Ai Safety

New Bernie Sanders AI Safety Bill Would Halt Data Center Construction | WIRED

The US senator said on Tuesday that a moratorium would give lawmakers time to "ensure that AI is safe." Alexandria Ocasio-Cortez will int...

Wired - AI · 9 min ·
[2602.07023] Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching through Stock-Market Simulation
Llms

[2602.07023] Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching through Stock-Market Simulation

Abstract page for arXiv paper 2602.07023: Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching throug...

arXiv - AI · 4 min ·
[2512.02487] Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding
Llms

[2512.02487] Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding

Abstract page for arXiv paper 2512.02487: Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Und...

arXiv - AI · 4 min ·
[2511.12449] MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding
Llms

[2511.12449] MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding

Abstract page for arXiv paper 2511.12449: MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Un...

arXiv - AI · 4 min ·
[2508.10149] Prediction-Powered Inference with Inverse Probability Weighting
Machine Learning

[2508.10149] Prediction-Powered Inference with Inverse Probability Weighting

Abstract page for arXiv paper 2508.10149: Prediction-Powered Inference with Inverse Probability Weighting

arXiv - Machine Learning · 3 min ·
[2309.07250] All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks
Machine Learning

[2309.07250] All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks

Abstract page for arXiv paper 2309.07250: All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks

arXiv - Machine Learning · 4 min ·
[2502.01969] Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration
Llms

[2502.01969] Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration

Abstract page for arXiv paper 2502.01969: Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration

arXiv - AI · 4 min ·
[2603.15033] Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion
Machine Learning

[2603.15033] Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion

Abstract page for arXiv paper 2603.15033: Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion

arXiv - Machine Learning · 4 min ·
[2603.07990] MJ1: Multimodal Judgment via Grounded Verification
Machine Learning

[2603.07990] MJ1: Multimodal Judgment via Grounded Verification

Abstract page for arXiv paper 2603.07990: MJ1: Multimodal Judgment via Grounded Verification

arXiv - Machine Learning · 3 min ·
[2506.05520] Toward Data Systems That Are Business Semantic Centric and AI Agents Assisted
Ai Safety

[2506.05520] Toward Data Systems That Are Business Semantic Centric and AI Agents Assisted

Abstract page for arXiv paper 2506.05520: Toward Data Systems That Are Business Semantic Centric and AI Agents Assisted

arXiv - AI · 4 min ·
[2512.04165] Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity
Machine Learning

[2512.04165] Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity

Abstract page for arXiv paper 2512.04165: Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity

arXiv - Machine Learning · 4 min ·
[2603.23463] InverFill: One-Step Inversion for Enhanced Few-Step Diffusion Inpainting
Machine Learning

[2603.23463] InverFill: One-Step Inversion for Enhanced Few-Step Diffusion Inpainting

Abstract page for arXiv paper 2603.23463: InverFill: One-Step Inversion for Enhanced Few-Step Diffusion Inpainting

arXiv - AI · 3 min ·
[2603.23419] Biased Error Attribution in Multi-Agent Human-AI Systems Under Delayed Feedback
Robotics

[2603.23419] Biased Error Attribution in Multi-Agent Human-AI Systems Under Delayed Feedback

Abstract page for arXiv paper 2603.23419: Biased Error Attribution in Multi-Agent Human-AI Systems Under Delayed Feedback

arXiv - AI · 4 min ·
[2507.00026] RedTopic: Toward Topic-Diverse Red Teaming of Large Language Models
Llms

[2507.00026] RedTopic: Toward Topic-Diverse Red Teaming of Large Language Models

Abstract page for arXiv paper 2507.00026: RedTopic: Toward Topic-Diverse Red Teaming of Large Language Models

arXiv - AI · 4 min ·
[2603.23184] ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment
Llms

[2603.23184] ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment

Abstract page for arXiv paper 2603.23184: ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment

arXiv - AI · 4 min ·
[2603.23219] Decoding AI Authorship: Can LLMs Truly Mimic Human Style Across Literature and Politics?
Llms

[2603.23219] Decoding AI Authorship: Can LLMs Truly Mimic Human Style Across Literature and Politics?

Abstract page for arXiv paper 2603.23219: Decoding AI Authorship: Can LLMs Truly Mimic Human Style Across Literature and Politics?

arXiv - Machine Learning · 4 min ·
[2603.22920] The EU AI Act and the Rights-based Approach to Technological Governance
Ai Safety

[2603.22920] The EU AI Act and the Rights-based Approach to Technological Governance

Abstract page for arXiv paper 2603.22920: The EU AI Act and the Rights-based Approach to Technological Governance

arXiv - AI · 3 min ·
[2603.22912] From the AI Act to a European AI Agency: Completing the Union's Regulatory Architecture
Ai Safety

[2603.22912] From the AI Act to a European AI Agency: Completing the Union's Regulatory Architecture

Abstract page for arXiv paper 2603.22912: From the AI Act to a European AI Agency: Completing the Union's Regulatory Architecture

arXiv - AI · 3 min ·
Previous Page 4 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime