[2509.12456] Reinforcement Learning-Based Market Making as a Stochastic Control on Non-Stationary Limit Order Book Dynamics
Summary
This paper explores the use of reinforcement learning for market making in non-stationary limit order book dynamics, presenting a practical implementation and performance evaluation of a market-making agent.
Why It Matters
With the increasing complexity of financial markets, traditional market-making strategies may fall short. This research provides insights into how reinforcement learning can adapt to dynamic market conditions, enhancing decision-making for market makers and potentially improving market stability.
Key Takeaways
- Reinforcement learning can optimize market-making strategies in volatile environments.
- The Proximal-Policy Optimization (PPO) algorithm is effectively implemented for market-making agents.
- The study highlights the importance of modeling non-stationary market dynamics for better agent performance.
- A simulator-based environment is proposed as a training tool for reinforcement learning agents.
- Results indicate that the reinforcement learning agent can outperform traditional methods under changing market conditions.
Quantitative Finance > Trading and Market Microstructure arXiv:2509.12456 (q-fin) [Submitted on 15 Sep 2025 (v1), last revised 14 Feb 2026 (this version, v2)] Title:Reinforcement Learning-Based Market Making as a Stochastic Control on Non-Stationary Limit Order Book Dynamics Authors:Rafael Zimmer, Oswaldo Luiz do Valle Costa View a PDF of the paper titled Reinforcement Learning-Based Market Making as a Stochastic Control on Non-Stationary Limit Order Book Dynamics, by Rafael Zimmer and 1 other authors View PDF HTML (experimental) Abstract:Reinforcement Learning has emerged as a promising framework for developing adaptive and data-driven strategies, enabling market makers to optimize decision-making policies based on interactions with the limit order book environment. This paper explores the integration of a reinforcement learning agent in a market-making context, where the underlying market dynamics have been explicitly modeled to capture observed stylized facts of real markets, including clustered order arrival times, non-stationary spreads and return drifts, stochastic order quantities and price volatility. These mechanisms aim to enhance stability of the resulting control agent, and serve to incorporate domain-specific knowledge into the agent policy learning process. Our contributions include a practical implementation of a market making agent based on the Proximal-Policy Optimization (PPO) algorithm, alongside a comparative evaluation of the agent's performance under ...