Abstract:
This thesis investigates the advancement and utilization of a Deep Q-Network (DQN)
incorporating a Long Short-Term Memory (LSTM) feature extractor for algorithmic trading. The suggested model seeks to identify temporal connections in financial time series
data and improve decision-making in stock trading. We utilize LSTM to extract useful
features from the time series and implement DQN to acquire effective trading strategies
via reinforcement learning. The design combines the DQN’s capacity to learn optimal
policies with LSTM’s proficiency in managing sequential data, allowing the model to
make more educated trading decisions.
The methodology incorporates experience replay and employs two neural networks, one
for online learning and another for target Q-values, to ensure training stability. Hyperparameter tuning is conducted with Optuna, and the model is optimized utilizing the Adam
optimizer, incorporating Kaiming Normal weight initialization and layer normalization in
the LSTM. We examine two reward functions, focusing not only on performance but also
on the agent’s risk aversion.
The methodology is assessed across different asset classes, including the S&P 500, gold,
and specific stocks such as Disney and Intel, utilizing performance indicators such as the
Sharpe Ratio, Sortino Ratio, and Maximum Drawdown for evaluation.
The model showed promising results, being able to generate profits; however, not consistently. This thesis continues the past research on a hybrid architecture that integrates
advanced reinforcement learning with time series feature extraction, offering novel insights into the capabilities of deep learning models for financial trading.