[2604.01613] Pseudo-Quantized Actor-Critic Algorithm for Robustness to

[2604.01613] Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error

arXiv - Machine Learning April 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.01613: Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error

Computer Science > Machine Learning arXiv:2604.01613 (cs) [Submitted on 2 Apr 2026] Title:Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error Authors:Taisuke Kobayashi View a PDF of the paper titled Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error, by Taisuke Kobayashi View PDF HTML (experimental) Abstract:In reinforcement learning (RL), temporal difference (TD) errors are widely adopted for optimizing value and policy functions. However, since the TD error is defined by a bootstrap method, its computation tends to be noisy and destabilize learning. Heuristics to improve the accuracy of TD errors, such as target networks and ensemble models, have been introduced so far. While these are essential approaches for the current deep RL algorithms, they cause side effects like increased computational cost and reduced learning efficiency. Therefore, this paper revisits the TD learning algorithm based on control as inference, deriving a novel algorithm capable of robust learning against noisy TD errors. First, the distribution model of optimality, a binary random variable, is represented by a sigmoid function. Alongside forward and reverse Kullback-Leibler divergences, this new model derives a robust learning rule: when the sigmoid function saturates with a large TD error probably due to noise, the gradient vanishes, implicitly excluding it from learning. Furthermore, the two divergences exhibit distinct...

Originally published on April 03, 2026. Curated by AI News.

Llms

Anthropic investigates unauthorized access to restricted Claude Mythos AI model

Anthropic investigates unauthorized access to restricted Claude Mythos AI model - SiliconANGLE

AI Tools & Products · 5 min · 1 minute ago

Machine Learning

ALGORITHMIC WARFARE: AI Models Used by Pentagon Susceptible to Foreign Influence

AI Tools & Products · 2 minutes ago

Llms

Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

I built Arc Sentry, a pre-generation prompt injection detector for open-weight LLMs. Instead of scanning text for patterns after the fact...

Reddit - Artificial Intelligence · 1 min · 14 minutes ago

Llms

2b or not 2b ? Custom LLM Scheduling Competition [P]

Hey everyone, I am generally interested in resource management and notably reducing the token cost for a given answer. So I just launched...

Reddit - Machine Learning · 1 min · 14 minutes ago

[2604.01613] Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error

About this article

Related Articles

Anthropic investigates unauthorized access to restricted Claude Mythos AI model

ALGORITHMIC WARFARE: AI Models Used by Pentagon Susceptible to Foreign Influence

Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

2b or not 2b ? Custom LLM Scheduling Competition [P]

No comments

Stay updated with AI News