AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

The state of AI safety in four fake graphs

submitted by /u/tekz [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Machine Learning

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

Abstract page for arXiv paper 2603.14267: DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and ...

arXiv - AI · 4 min · about 15 hours ago

Llms

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

Abstract page for arXiv paper 2601.22440: AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Value...

arXiv - AI · 4 min · about 15 hours ago

All Content

Machine Learning

[2503.15477] What Makes a Reward Model a Good Teacher? An Optimization Perspective

Abstract page for arXiv paper 2503.15477: What Makes a Reward Model a Good Teacher? An Optimization Perspective

arXiv - Machine Learning · 4 min · 29 days ago

Llms

[2601.22664] Real-Time Aligned Reward Model beyond Semantics

Abstract page for arXiv paper 2601.22664: Real-Time Aligned Reward Model beyond Semantics

arXiv - AI · 4 min · 29 days ago

Machine Learning

[2510.10285] Reallocating Attention Across Layers to Reduce Multimodal Hallucination

Abstract page for arXiv paper 2510.10285: Reallocating Attention Across Layers to Reduce Multimodal Hallucination

arXiv - AI · 3 min · 29 days ago

Llms

[2509.24159] RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment

Abstract page for arXiv paper 2509.24159: RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment

arXiv - AI · 4 min · 29 days ago

Llms

[2507.19364] Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges

Abstract page for arXiv paper 2507.19364: Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges

arXiv - AI · 4 min · 29 days ago

Machine Learning

[2602.23518] Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models

Abstract page for arXiv paper 2602.23518: Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Gener...

arXiv - Machine Learning · 4 min · 29 days ago

Machine Learning

[2602.24245] Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text

Abstract page for arXiv paper 2602.24245: Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text

arXiv - Machine Learning · 3 min · 29 days ago

Llms

[2602.24014] Interpretable Debiasing of Vision-Language Models for Social Fairness

Abstract page for arXiv paper 2602.24014: Interpretable Debiasing of Vision-Language Models for Social Fairness

arXiv - AI · 3 min · 29 days ago

Llms

[2602.23971] Ask don't tell: Reducing sycophancy in large language models

Abstract page for arXiv paper 2602.23971: Ask don't tell: Reducing sycophancy in large language models

arXiv - AI · 4 min · 29 days ago

Machine Learning

[2602.23887] Uncovering sustainable personal care ingredient combinations using scientific modelling

Abstract page for arXiv paper 2602.23887: Uncovering sustainable personal care ingredient combinations using scientific modelling

arXiv - AI · 3 min · 29 days ago

Machine Learning

[2602.23947] Hierarchical Concept-based Interpretable Models

Abstract page for arXiv paper 2602.23947: Hierarchical Concept-based Interpretable Models

arXiv - Machine Learning · 3 min · 29 days ago

Llms

[2602.23652] 3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection

Abstract page for arXiv paper 2602.23652: 3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection

arXiv - AI · 3 min · 29 days ago

Llms

[2602.23638] FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA

Abstract page for arXiv paper 2602.23638: FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA

arXiv - Machine Learning · 4 min · 29 days ago

Llms

[2602.23636] FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation

Abstract page for arXiv paper 2602.23636: FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation

arXiv - Machine Learning · 4 min · 29 days ago

Llms

[2602.23588] Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioning

Abstract page for arXiv paper 2602.23588: Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image ...

arXiv - Machine Learning · 4 min · 29 days ago

Llms

[2602.23580] BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation

Abstract page for arXiv paper 2602.23580: BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners...

arXiv - AI · 4 min · 29 days ago

Machine Learning

[2602.23507] Sample Size Calculations for Developing Clinical Prediction Models: Overview and pmsims R package

Abstract page for arXiv paper 2602.23507: Sample Size Calculations for Developing Clinical Prediction Models: Overview and pmsims R package

arXiv - Machine Learning · 4 min · 29 days ago

Machine Learning

[2602.23447] SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection

Abstract page for arXiv paper 2602.23447: SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection

arXiv - Machine Learning · 4 min · 29 days ago

Ai Safety

[2602.23378] Now You See Me: Designing Responsible AI Dashboards for Early-Stage Health Innovation

Abstract page for arXiv paper 2602.23378: Now You See Me: Designing Responsible AI Dashboards for Early-Stage Health Innovation

arXiv - AI · 4 min · 29 days ago

Llms

[2602.23605] SleepLM: Natural-Language Intelligence for Human Sleep

Abstract page for arXiv paper 2602.23605: SleepLM: Natural-Language Intelligence for Human Sleep

arXiv - AI · 3 min · 29 days ago

Previous Page 28 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

The state of AI safety in four fake graphs

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

All Content

[2503.15477] What Makes a Reward Model a Good Teacher? An Optimization Perspective

[2601.22664] Real-Time Aligned Reward Model beyond Semantics

[2510.10285] Reallocating Attention Across Layers to Reduce Multimodal Hallucination

[2509.24159] RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment

[2507.19364] Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges

[2602.23518] Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models

[2602.24245] Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text

[2602.24014] Interpretable Debiasing of Vision-Language Models for Social Fairness

[2602.23971] Ask don't tell: Reducing sycophancy in large language models

[2602.23887] Uncovering sustainable personal care ingredient combinations using scientific modelling

[2602.23947] Hierarchical Concept-based Interpretable Models

[2602.23652] 3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection

[2602.23638] FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA

[2602.23636] FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation

[2602.23588] Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioning

[2602.23580] BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation

[2602.23507] Sample Size Calculations for Developing Clinical Prediction Models: Overview and pmsims R package

[2602.23447] SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection

[2602.23378] Now You See Me: Designing Responsible AI Dashboards for Early-Stage Health Innovation

[2602.23605] SleepLM: Natural-Language Intelligence for Human Sleep

Related Topics

Stay updated with AI News