[2602.15640] Latency-aware Human-in-the-Loop Reinforcement Learning for Semantic Communications
Summary
The paper presents a framework for latency-aware human-in-the-loop reinforcement learning in semantic communications, addressing the balance between semantic fidelity and latency in critical services.
Why It Matters
As communication systems evolve, ensuring timely and accurate data transmission becomes crucial, especially in safety-critical applications. This research introduces a novel approach that integrates human feedback into reinforcement learning, enhancing the efficiency of semantic communication systems while meeting strict latency requirements.
Key Takeaways
- Introduces a time-constrained human-in-the-loop reinforcement learning framework.
- Balances semantic fidelity with strict latency requirements for immersive services.
- Utilizes a constrained Markov decision process to optimize human feedback integration.
- Demonstrates improved performance over baseline schedulers in simulations.
- Provides a practical blueprint for latency-aware semantic adaptation in communication networks.
Electrical Engineering and Systems Science > Signal Processing arXiv:2602.15640 (eess) [Submitted on 17 Feb 2026] Title:Latency-aware Human-in-the-Loop Reinforcement Learning for Semantic Communications Authors:Peizheng Li, Xinyi Lin, Adnan Aijaz View a PDF of the paper titled Latency-aware Human-in-the-Loop Reinforcement Learning for Semantic Communications, by Peizheng Li and 2 other authors View PDF HTML (experimental) Abstract:Semantic communication promises task-aligned transmission but must reconcile semantic fidelity with stringent latency guarantees in immersive and safety-critical services. This paper introduces a time-constrained human-in-the-loop reinforcement learning (TC-HITL-RL) framework that embeds human feedback, semantic utility, and latency control within a semantic-aware Open radio access network (RAN) architecture. We formulate semantic adaptation driven by human feedback as a constrained Markov decision process (CMDP) whose state captures semantic quality, human preferences, queue slack, and channel dynamics, and solve it via a primal--dual proximal policy optimization algorithm with action shielding and latency-aware reward shaping. The resulting policy preserves PPO-level semantic rewards while tightening the variability of both air-interface and near-real-time RAN intelligent controller processing budgets. Simulations over point-to-multipoint links with heterogeneous deadlines show that TC-HITL-RL consistently meets per-user timing constraints, out...