Reinforcement Learning for financial portfolio optimization

An in-depth look at Reinforcement Learning for financial portfolio optimization

By Infodoor Engineering Team • May 28, 2026

Applying Reinforcement Learning (RL) to financial portfolio management has grown significantly, shifting from theoretical academic papers to robust, production-ready frameworks. In this domain, portfolio making is typically modeled as a Markov Decision Process (MDP) where the actions are continuous vectors representing asset allocation weights across a defined universe of assets (Liu et al., 2021).

Several powerful open-source RL libraries and platforms are specifically designed or frequently adapted for financial portfolio optimization:

1. Domain-Specific Financial RL Frameworks

These libraries are explicitly built for quantitative finance and come pre-packaged with financial data pipelines, specialized trading environments, and standard backtesting metrics.

FinRL (by AI4Finance)

FinRL is the most widely adopted open-source framework designed specifically for automated trading and portfolio management using Deep Reinforcement Learning (DRL) (Liu et al., 2021).

Key Features: It features a highly modular three-layer architecture (Data, Agent, and Environment) (Liu et al., 2021). It abstracts market frictions such as transaction costs, slippage, and liquidity constraints (Liu et al., 2021).
Portfolio Capabilities: It natively supports portfolio allocation environments where agents output asset weight vectors to maximize cumulative returns or risk-adjusted metrics like the Sharpe or Sortino ratio.
Algorithmic Support: Integrates state-of-the-art model-free DRL algorithms, including Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), and Soft Actor-Critic (SAC) via underlying engines like Stable-Baselines3 and ElegantRL (Liu et al., 2021).

TradeMaster & FinWorld (by NTU Singapore)

Developed by researchers at Nanyang Technological University, these represent the cutting edge of unified quantitative trading ecosystems.

TradeMaster: An all-in-one open-source platform that maps mainstream quantitative tasks—including portfolio management—into high-fidelity RL environments (Sun et al., 2023). It includes 13 real-world financial datasets and benchmark implementations of specialized algorithms like AlphaMix+ (a risk-sensitive mixture-of-experts model) (Sun et al., 2023).
FinWorld: A newer, comprehensive open-source platform designed for end-to-end financial AI research (Zhang et al., 2025). It bridges the gap between traditional RL frameworks and the multimodal integration of Large Language Models (LLMs), allowing developers to build sentiment-aware portfolio agents (Zhang et al., 2025).

2. General-Purpose RL Toolkits (With Financial Extensions)

Many developers prefer to build custom portfolio management environments using OpenAI Gym/Gymnasium and pair them with production-tested, general-purpose RL libraries.

Stable-Baselines3 (SB3)

While not a finance library, SB3 is the gold standard for reliable, PyTorch-based RL implementations.

Portfolio Relevance: Because portfolio optimization requires continuous action spaces (outputting weights between 0 and 1 that sum to 1), algorithms like PPO, SAC, and TD3 provided by SB3 are heavily utilized.
Implementation: Quantitative developers typically write a custom gymnasium.Env that handles historical price tensors, tracks portfolio value, and applies a rebalancing cost penalty to the reward function.

Ray/RLlib

An industry-grade, highly scalable RL framework built on top of Ray.

Portfolio Relevance: If you are dealing with a massive universe of assets (e.g., hundreds of stocks simultaneously) or running heavy parallel simulations across distributed GPU clusters, RLlib is the tool of choice (Liu et al., 2021). It excels at multi-agent setups and scale.

3. Key Components When Configuring These Frameworks

When setting up any of these open-source tools for portfolio construction, the mathematical environment is generally structured as follows:

+-----------------------------------------------------------------------+
|                              ENVIRONMENT                              |
|                                                                       |
|  [State Space] ---------> [RL Agent] ----------> [Action Space]       |
|  - Historical Prices      (e.g., PPO, SAC)       - Portfolio Weights  |
|  - Technical Indicators                          - Vector sum = 1     |
|  - Previous Weights                                                   |
|                                                                       |
|  [Reward Function] <--------------------------------------------------+
|  - Sharpe Ratio / Net Returns minus Transaction Friction              |
+-----------------------------------------------------------------------+

State Space ($S$): Typically represented as a matrix of historical asset prices or returns, often augmented with technical indicators (e.g., MACD, RSI) and the agent’s previous portfolio weight allocation to calculate rebalancing costs (Santos et al., 2023).
Action Space ($A$): A continuous vector $a_t = w_t$, where $w_t$ represents the target weight assigned to each asset, bounded by $\sum w_i = 1$ for a long-only, non-leveraged portfolio.
Reward Function ($R$): The critical piece of reward shaping. Rather than optimizing purely for raw returns, robust frameworks shape rewards around risk-adjusted profiles (like the Differential Sharpe Ratio) while penalizing transaction costs to prevent the agent from over-trading (Yan et al., 2024).

Are you looking to build a custom portfolio environment from scratch using standard Gym/PyTorch tools, or do you prefer to leverage a pre-built pipeline like FinRL to benchmark standard algorithms quickly?

References

Liu, X.-Y., Yang, H., Gao, J., & Wang, C. D. (2021). FinRL: Deep reinforcement learning framework to automate trading in quantitative finance. ACM International Conference on AI in Finance, 2021. https://doi.org/10.1145/3490354.3494366 Cited by: 238

Santos, G. C., Garruti, D., Barboza, F., de Souza, K. G., Domingos, J. C., & Veiga, A. (2023). Management of investment portfolios employing reinforcement learning. PeerJ Computer Science, 9, e1695. https://doi.org/10.7717/peerj-cs.1695 Cited by: 6

Sun, S., Qin, M., Wang, X., & An, B. (2023). PRUDEX-Compass: Towards systematic evaluation of reinforcement learning in financial markets. arXiv. https://doi.org/10.48550/arxiv.2302.00586 Cited by: 16

Yan, R., Jin, J., & Han, K. (2024). Reinforcement learning for deep portfolio optimization. Electronic Research Archive, 32(9), 5176–5200. https://doi.org/10.3934/era.2024239 Cited by: 23

Zhang, W., Zhao, Y., Zong, C., Wang, X., & An, B. (2025). FinWorld: An all-in-one open-source platform for end-to-end financial AI research and deployment. arXiv. https://doi.org/10.48550/arxiv.2508.02292 Cited by: 3