[2312.11797] Data-Driven Merton's Strategies via Policy Randomization

arXiv - Machine Learning February 17, 2026 4 min read Article

Summary

This paper explores Merton's expected utility maximization problem in incomplete markets, introducing a data-driven approach using policy randomization to optimize portfolio management strategies.

Why It Matters

The study is significant as it bridges reinforcement learning and portfolio management, providing a novel method to tackle complex financial problems without requiring detailed model assumptions. This could enhance decision-making in finance, especially in volatile markets.

Key Takeaways

Introduces a data-driven method for Merton's utility maximization problem.
Utilizes policy randomization to optimize portfolio strategies in incomplete markets.
Demonstrates the effectiveness of reinforcement learning algorithms over traditional methods.
Establishes a policy improvement theorem applicable to financial decision-making.
Provides empirical evidence of the proposed method's superior performance in stochastic environments.

Quantitative Finance > Portfolio Management arXiv:2312.11797 (q-fin) [Submitted on 19 Dec 2023 (v1), last revised 14 Feb 2026 (this version, v3)] Title:Data-Driven Merton's Strategies via Policy Randomization Authors:Min Dai, Yuchao Dong, Yanwei Jia, Xun Yu Zhou View a PDF of the paper titled Data-Driven Merton's Strategies via Policy Randomization, by Min Dai and 3 other authors View PDF HTML (experimental) Abstract:We study Merton's expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown. The agent under consideration is a price taker who has access only to the stock and factor value processes and the instantaneous volatility. We propose an auxiliary problem in which the agent can invoke policy randomization according to a specific class of Gaussian distributions, and prove that the mean of its optimal Gaussian policy solves the original Merton problem. With randomized policies, we are in the realm of continuous-time reinforcement learning (RL) recently developed in Wang et al. (2020) and Jia and Zhou (2022a, 2022b, 2023), enabling us to solve the auxiliary problem in a data-driven way without having to estimate the model primitives. Specifically, we establish a policy improvement theorem based on which we design both online and offline actor-critic RL algorithms for learning Merton's strategies. A key insight from this study is that RL in general a...

Read Original Article