AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Secure governance accelerates financial AI revenue growth
Ai Safety

Secure governance accelerates financial AI revenue growth

Financial institutions are learning to deploy compliant AI solutions for greater revenue growth and market advantage.

AI News - General · 13 min ·
When Agentic AI Browsers Outrun Governance
Ai Safety

When Agentic AI Browsers Outrun Governance

Agentic AI browsers introduce new enterprise risk. Learn how AI governance helps leaders assess exposure, oversight gaps, and safe adopti...

AI Tools & Products · 14 min ·
Llms

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Greetings all - I've posted mostly in r/claudecode and r/aigamedev a couple of times previously. Working with CC for personal projects re...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2506.12108] A Lightweight IDS for Early APT Detection Using a Novel Feature Selection Method
Machine Learning

[2506.12108] A Lightweight IDS for Early APT Detection Using a Novel Feature Selection Method

This article presents a novel feature selection method for a lightweight intrusion detection system (IDS) aimed at early detection of Adv...

arXiv - AI · 4 min ·
[2510.01031] Secure and reversible face anonymization with diffusion models
Machine Learning

[2510.01031] Secure and reversible face anonymization with diffusion models

This paper presents a novel framework for secure and reversible face anonymization using diffusion models, addressing challenges in image...

arXiv - Machine Learning · 4 min ·
[2509.25369] Generative Value Conflicts Reveal LLM Priorities
Llms

[2509.25369] Generative Value Conflicts Reveal LLM Priorities

This paper introduces ConflictScope, a tool for evaluating how large language models (LLMs) prioritize conflicting values, revealing insi...

arXiv - Machine Learning · 4 min ·
[2504.12522] Evaluating the Diversity and Quality of LLM Generated Content
Llms

[2504.12522] Evaluating the Diversity and Quality of LLM Generated Content

This article evaluates the diversity and quality of content generated by large language models (LLMs), highlighting the trade-offs betwee...

arXiv - AI · 4 min ·
[2411.08254] Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy
Llms

[2411.08254] Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy

The paper presents VALTEST, a framework for validating test cases generated by large language models (LLMs) using semantic entropy, impro...

arXiv - AI · 4 min ·
[2502.05435] Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning
Machine Learning

[2502.05435] Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning

This paper presents the Unbiased Sliced Wasserstein RBF kernel, a novel approach for enhancing audio captioning systems by addressing exp...

arXiv - Machine Learning · 4 min ·
[2502.04758] Differential Privacy of Quantum and Quantum-Inspired Classical Recommendation Algorithms
Machine Learning

[2502.04758] Differential Privacy of Quantum and Quantum-Inspired Classical Recommendation Algorithms

The paper explores the differential privacy of quantum and quantum-inspired classical recommendation algorithms, demonstrating their inhe...

arXiv - Machine Learning · 3 min ·
[2601.11620] A Mind Cannot Be Smeared Across Time
Ai Safety

[2601.11620] A Mind Cannot Be Smeared Across Time

The paper explores the relationship between consciousness and computational processes in machines, arguing that the timing of computation...

arXiv - AI · 4 min ·
[2510.00922] On Discovering Algorithms for Adversarial Imitation Learning
Ai Agents

[2510.00922] On Discovering Algorithms for Adversarial Imitation Learning

This paper presents Discovered Adversarial Imitation Learning (DAIL), a novel approach to improving stability in Adversarial Imitation Le...

arXiv - AI · 4 min ·
[2509.17956] "I think this is fair": Uncovering the Complexities of Stakeholder Decision-Making in AI Fairness Assessment
Ai Safety

[2509.17956] "I think this is fair": Uncovering the Complexities of Stakeholder Decision-Making in AI Fairness Assessment

This article explores how non-expert stakeholders assess fairness in AI decision-making, revealing complexities that extend beyond tradit...

arXiv - AI · 4 min ·
[2602.08470] Learning Credal Ensembles via Distributionally Robust Optimization
Machine Learning

[2602.08470] Learning Credal Ensembles via Distributionally Robust Optimization

This paper presents CreDRO, a novel approach to learning credal ensembles using distributionally robust optimization, enhancing model rob...

arXiv - Machine Learning · 4 min ·
[2602.05535] Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification
Llms

[2602.05535] Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

This paper presents Evidential Uncertainty Quantification (EUQ) to detect misbehaviors in large vision-language models (LVLMs), addressin...

arXiv - Machine Learning · 4 min ·
[2601.18231] Rethinking Cross-Modal Fine-Tuning: Optimizing the Interaction between Feature Alignment and Target Fitting
Machine Learning

[2601.18231] Rethinking Cross-Modal Fine-Tuning: Optimizing the Interaction between Feature Alignment and Target Fitting

This paper presents a framework for optimizing cross-modal fine-tuning by addressing the interaction between feature alignment and target...

arXiv - Machine Learning · 4 min ·
[2512.05865] Sparse Attention Post-Training for Mechanistic Interpretability
Machine Learning

[2512.05865] Sparse Attention Post-Training for Mechanistic Interpretability

The paper presents a novel post-training method that enhances transformer attention sparsity while maintaining performance, revealing ins...

arXiv - Machine Learning · 4 min ·
[2509.26238] Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
Llms

[2509.26238] Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

This paper presents Truncated Polynomial Classifiers (TPCs) for dynamic safety monitoring in large language models, enhancing efficiency ...

arXiv - Machine Learning · 4 min ·
[2602.23259] Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving
Machine Learning

[2602.23259] Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving

This paper presents the Risk-aware World Model Predictive Control (RaWMPC) framework aimed at enhancing generalization in end-to-end auto...

arXiv - AI · 4 min ·
[2602.23235] Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents
Ai Safety

[2602.23235] Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents

The paper presents GUIPruner, a framework for enhancing the efficiency of high-resolution GUI agents by addressing spatiotemporal redunda...

arXiv - AI · 4 min ·
[2509.15429] Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data
Ai Safety

[2509.15429] Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data

This paper presents a Random Matrix Theory-guided approach to sparse PCA for single-cell RNA-seq data, enhancing dimensionality reduction...

arXiv - Machine Learning · 4 min ·
[2602.23203] ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation
Generative Ai

[2602.23203] ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation

ColoDiff introduces a novel framework for generating colonoscopy videos that ensures dynamic consistency and content awareness, addressin...

arXiv - AI · 4 min ·
[2507.03772] Skewed Score: A statistical framework to assess autograders
Llms

[2507.03772] Skewed Score: A statistical framework to assess autograders

The paper presents a statistical framework for assessing autograders used in evaluating LLM outputs, addressing reliability and bias issu...

arXiv - Machine Learning · 4 min ·
Previous Page 32 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime