AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

Secure governance accelerates financial AI revenue growth

Financial institutions are learning to deploy compliant AI solutions for greater revenue growth and market advantage.

AI News - General · 13 min · 15 minutes ago

Ai Safety

When Agentic AI Browsers Outrun Governance

Agentic AI browsers introduce new enterprise risk. Learn how AI governance helps leaders assess exposure, oversight gaps, and safe adopti...

AI Tools & Products · 14 min · 33 minutes ago

Llms

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Greetings all - I've posted mostly in r/claudecode and r/aigamedev a couple of times previously. Working with CC for personal projects re...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

All Content

Machine Learning

[2506.12108] A Lightweight IDS for Early APT Detection Using a Novel Feature Selection Method

This article presents a novel feature selection method for a lightweight intrusion detection system (IDS) aimed at early detection of Adv...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2510.01031] Secure and reversible face anonymization with diffusion models

This paper presents a novel framework for secure and reversible face anonymization using diffusion models, addressing challenges in image...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2509.25369] Generative Value Conflicts Reveal LLM Priorities

This paper introduces ConflictScope, a tool for evaluating how large language models (LLMs) prioritize conflicting values, revealing insi...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2504.12522] Evaluating the Diversity and Quality of LLM Generated Content

This article evaluates the diversity and quality of content generated by large language models (LLMs), highlighting the trade-offs betwee...

arXiv - AI · 4 min · about 1 month ago

Llms

[2411.08254] Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy

The paper presents VALTEST, a framework for validating test cases generated by large language models (LLMs) using semantic entropy, impro...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2502.05435] Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning

This paper presents the Unbiased Sliced Wasserstein RBF kernel, a novel approach for enhancing audio captioning systems by addressing exp...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2502.04758] Differential Privacy of Quantum and Quantum-Inspired Classical Recommendation Algorithms

The paper explores the differential privacy of quantum and quantum-inspired classical recommendation algorithms, demonstrating their inhe...

arXiv - Machine Learning · 3 min · about 1 month ago

Ai Safety

[2601.11620] A Mind Cannot Be Smeared Across Time

The paper explores the relationship between consciousness and computational processes in machines, arguing that the timing of computation...

arXiv - AI · 4 min · about 1 month ago

Ai Agents

[2510.00922] On Discovering Algorithms for Adversarial Imitation Learning

This paper presents Discovered Adversarial Imitation Learning (DAIL), a novel approach to improving stability in Adversarial Imitation Le...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2509.17956] "I think this is fair": Uncovering the Complexities of Stakeholder Decision-Making in AI Fairness Assessment

This article explores how non-expert stakeholders assess fairness in AI decision-making, revealing complexities that extend beyond tradit...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.08470] Learning Credal Ensembles via Distributionally Robust Optimization

This paper presents CreDRO, a novel approach to learning credal ensembles using distributionally robust optimization, enhancing model rob...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.05535] Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

This paper presents Evidential Uncertainty Quantification (EUQ) to detect misbehaviors in large vision-language models (LVLMs), addressin...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2601.18231] Rethinking Cross-Modal Fine-Tuning: Optimizing the Interaction between Feature Alignment and Target Fitting

This paper presents a framework for optimizing cross-modal fine-tuning by addressing the interaction between feature alignment and target...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2512.05865] Sparse Attention Post-Training for Mechanistic Interpretability

The paper presents a novel post-training method that enhances transformer attention sparsity while maintaining performance, revealing ins...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2509.26238] Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

This paper presents Truncated Polynomial Classifiers (TPCs) for dynamic safety monitoring in large language models, enhancing efficiency ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.23259] Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving

This paper presents the Risk-aware World Model Predictive Control (RaWMPC) framework aimed at enhancing generalization in end-to-end auto...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.23235] Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents

The paper presents GUIPruner, a framework for enhancing the efficiency of high-resolution GUI agents by addressing spatiotemporal redunda...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2509.15429] Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data

This paper presents a Random Matrix Theory-guided approach to sparse PCA for single-cell RNA-seq data, enhancing dimensionality reduction...

arXiv - Machine Learning · 4 min · about 1 month ago

Generative Ai

[2602.23203] ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation

ColoDiff introduces a novel framework for generating colonoscopy videos that ensures dynamic consistency and content awareness, addressin...

arXiv - AI · 4 min · about 1 month ago

Llms

[2507.03772] Skewed Score: A statistical framework to assess autograders

The paper presents a statistical framework for assessing autograders used in evaluating LLM outputs, addressing reliability and bias issu...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 32 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

Secure governance accelerates financial AI revenue growth

When Agentic AI Browsers Outrun Governance

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

All Content

[2506.12108] A Lightweight IDS for Early APT Detection Using a Novel Feature Selection Method

[2510.01031] Secure and reversible face anonymization with diffusion models

[2509.25369] Generative Value Conflicts Reveal LLM Priorities

[2504.12522] Evaluating the Diversity and Quality of LLM Generated Content

[2411.08254] Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy

[2502.05435] Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning

[2502.04758] Differential Privacy of Quantum and Quantum-Inspired Classical Recommendation Algorithms

[2601.11620] A Mind Cannot Be Smeared Across Time

[2510.00922] On Discovering Algorithms for Adversarial Imitation Learning

[2509.17956] "I think this is fair": Uncovering the Complexities of Stakeholder Decision-Making in AI Fairness Assessment

[2602.08470] Learning Credal Ensembles via Distributionally Robust Optimization

[2602.05535] Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

[2601.18231] Rethinking Cross-Modal Fine-Tuning: Optimizing the Interaction between Feature Alignment and Target Fitting

[2512.05865] Sparse Attention Post-Training for Mechanistic Interpretability

[2509.26238] Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

[2602.23259] Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving

[2602.23235] Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents

[2509.15429] Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data

[2602.23203] ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation

[2507.03772] Skewed Score: A statistical framework to assess autograders

Related Topics

Stay updated with AI News