AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

The state of AI safety in four fake graphs

submitted by /u/tekz [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization
Machine Learning

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

Abstract page for arXiv paper 2603.14267: DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and ...

arXiv - AI · 4 min ·
[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations
Llms

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

Abstract page for arXiv paper 2601.22440: AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Value...

arXiv - AI · 4 min ·

All Content

[2503.15477] What Makes a Reward Model a Good Teacher? An Optimization Perspective
Machine Learning

[2503.15477] What Makes a Reward Model a Good Teacher? An Optimization Perspective

Abstract page for arXiv paper 2503.15477: What Makes a Reward Model a Good Teacher? An Optimization Perspective

arXiv - Machine Learning · 4 min ·
[2601.22664] Real-Time Aligned Reward Model beyond Semantics
Llms

[2601.22664] Real-Time Aligned Reward Model beyond Semantics

Abstract page for arXiv paper 2601.22664: Real-Time Aligned Reward Model beyond Semantics

arXiv - AI · 4 min ·
[2510.10285] Reallocating Attention Across Layers to Reduce Multimodal Hallucination
Machine Learning

[2510.10285] Reallocating Attention Across Layers to Reduce Multimodal Hallucination

Abstract page for arXiv paper 2510.10285: Reallocating Attention Across Layers to Reduce Multimodal Hallucination

arXiv - AI · 3 min ·
[2509.24159] RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment
Llms

[2509.24159] RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment

Abstract page for arXiv paper 2509.24159: RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment

arXiv - AI · 4 min ·
[2507.19364] Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges
Llms

[2507.19364] Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges

Abstract page for arXiv paper 2507.19364: Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges

arXiv - AI · 4 min ·
[2602.23518] Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models
Machine Learning

[2602.23518] Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models

Abstract page for arXiv paper 2602.23518: Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Gener...

arXiv - Machine Learning · 4 min ·
[2602.24245] Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text
Machine Learning

[2602.24245] Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text

Abstract page for arXiv paper 2602.24245: Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text

arXiv - Machine Learning · 3 min ·
[2602.24014] Interpretable Debiasing of Vision-Language Models for Social Fairness
Llms

[2602.24014] Interpretable Debiasing of Vision-Language Models for Social Fairness

Abstract page for arXiv paper 2602.24014: Interpretable Debiasing of Vision-Language Models for Social Fairness

arXiv - AI · 3 min ·
[2602.23971] Ask don't tell: Reducing sycophancy in large language models
Llms

[2602.23971] Ask don't tell: Reducing sycophancy in large language models

Abstract page for arXiv paper 2602.23971: Ask don't tell: Reducing sycophancy in large language models

arXiv - AI · 4 min ·
[2602.23887] Uncovering sustainable personal care ingredient combinations using scientific modelling
Machine Learning

[2602.23887] Uncovering sustainable personal care ingredient combinations using scientific modelling

Abstract page for arXiv paper 2602.23887: Uncovering sustainable personal care ingredient combinations using scientific modelling

arXiv - AI · 3 min ·
[2602.23947] Hierarchical Concept-based Interpretable Models
Machine Learning

[2602.23947] Hierarchical Concept-based Interpretable Models

Abstract page for arXiv paper 2602.23947: Hierarchical Concept-based Interpretable Models

arXiv - Machine Learning · 3 min ·
[2602.23652] 3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection
Llms

[2602.23652] 3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection

Abstract page for arXiv paper 2602.23652: 3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection

arXiv - AI · 3 min ·
[2602.23638] FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA
Llms

[2602.23638] FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA

Abstract page for arXiv paper 2602.23638: FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA

arXiv - Machine Learning · 4 min ·
[2602.23636] FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation
Llms

[2602.23636] FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation

Abstract page for arXiv paper 2602.23636: FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation

arXiv - Machine Learning · 4 min ·
[2602.23588] Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioning
Llms

[2602.23588] Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioning

Abstract page for arXiv paper 2602.23588: Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image ...

arXiv - Machine Learning · 4 min ·
[2602.23580] BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation
Llms

[2602.23580] BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation

Abstract page for arXiv paper 2602.23580: BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners...

arXiv - AI · 4 min ·
[2602.23507] Sample Size Calculations for Developing Clinical Prediction Models: Overview and pmsims R package
Machine Learning

[2602.23507] Sample Size Calculations for Developing Clinical Prediction Models: Overview and pmsims R package

Abstract page for arXiv paper 2602.23507: Sample Size Calculations for Developing Clinical Prediction Models: Overview and pmsims R package

arXiv - Machine Learning · 4 min ·
[2602.23447] SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection
Machine Learning

[2602.23447] SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection

Abstract page for arXiv paper 2602.23447: SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection

arXiv - Machine Learning · 4 min ·
[2602.23378] Now You See Me: Designing Responsible AI Dashboards for Early-Stage Health Innovation
Ai Safety

[2602.23378] Now You See Me: Designing Responsible AI Dashboards for Early-Stage Health Innovation

Abstract page for arXiv paper 2602.23378: Now You See Me: Designing Responsible AI Dashboards for Early-Stage Health Innovation

arXiv - AI · 4 min ·
[2602.23605] SleepLM: Natural-Language Intelligence for Human Sleep
Llms

[2602.23605] SleepLM: Natural-Language Intelligence for Human Sleep

Abstract page for arXiv paper 2602.23605: SleepLM: Natural-Language Intelligence for Human Sleep

arXiv - AI · 3 min ·
Previous Page 28 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime