Want to learn the steps of developing an automated financial trading strategy using deep reinforcement learning? We wrote this article to guide you through our experience at Manxmachina an artificial intelligence agency based at Wall Street, New York.

How much of the financial trading is algorithmic?

DIfficult to assess but it seems to be somewhere between 50-70% based on different sources. If we consider the use of machines for insights generation and data analysis for trading (centaurian trading) then it might be nearly all of the trading. Almost impossible for an amateur without access to advanced machine learning to get an edge over the pro traders.

What is Reinforcement Learning (RL)?

It is a way to teach machines to learn by rewarding them when doing something right and “punishing” them when the outcome of their actions is not the one we need.

What is Deep Reinforcement Learning (DRL)?

The machine needs a way to understand the environment. If we use deep learning for this part then the system is called deep RL.

What is the advantage of DRL for financial trading?

You can have one system that can analyze the data and do the trading at the same time with or without human supervision. This is an end to end differentiable as we call it, which means you can optimize data analysis and trading at the same time.

You cannot do that with any other machine learning method. If you build something predictive that works by analyzing data, then you still have to build a strategy to trade with it. Not elegant. The less mechanical pieces you have, the more probable something will not break.

What are the pieces of a DRL algorithmic trader at a high level?

Conquering the markets with a DRL trader is one of the most exciting and difficult endeavors for any trader. What do you need to do it?


You are free to choose whatever security you wish as long as you have data for these. The more data the merrier when you use machine learning but theoretically you can choose even recently deployed cryptocurrencies to make your portfolio. We have tried all markets. It all depends on your risk appetite. Crypto and Forex are high risk. Rest is less risky. Be our guest on choosing the market you like.

PRO TIP: Mind the slippage

The number one rule for algorithmic trading and backtesting that all is THE SLIPPAGE. You need to understand the effect of your trades on slippage. This might be the difference between a successful algo and a messy real-life deployment that looked awesome on backtesting. You need to make sure you calculate slippage in your backtesting either through average remove from potential profit or ideally from level II price action on the order book you are backtesting. If you do not calculate that your algo will look nice on paper but will be a very bad deployment. In algo trading slang we call this effect the “scalper’s nightmare” or “algo slippage puking” and many other fun stuff which are not really fun when you trade live and lose money.

Tips on how to minimize the effect of the slippage in your backtesting

Make sure you choose liquid markets. The more liquidity, the higher the probability your order will be filled at the price your algo needs. In cryptocurrency, the order books are also open. Use them when you can. You can infer also slippage from other indicators e.g. volume or volatility and trade only when there a low probability for slippage or split your order among different brokers especially when the order is a big one.


This is the part that is uber crucial. Spend lots of time to go through due diligence for your data. As with every algorithm, garbage in, garbage out is also true here.

PRO TIP: Create a committee of algorithms

Create many algos that behave differently in different time frames. Then use the ones that behave well in the last few months to trade live. Or if you are using a committee of algos give to those that behave better in the recent past higher weight on the decision whether to trade or not. Keep on backtesting as new data come in and if the behavior of the market and/or the performance of the algo changes switch algo immediately.


Every time you get your hands on financial data consider them a messy entangled piece of property till you have rigorously cleaned it. It should take some time but it is very worth the effort. The amount of time needed for these mundane jobs is very important and you need to be very detail oriented. What do I do with the missing values? Simple is better. Use mean average inputting for small amounts of missing data, remove the data if missing data are too much for a certain non-holiday period.


Can I blend numerical data with classification data, e.g. numerical values with sentiment over news?

You can but it works much better if you use separate neural networks for these tasks and then use the representations of these neural networks to feed the RL trader. The “blend them all and throw them to the neural nets and let they should get it” is a no, no and it never worked for anyone.

You should imagine the neural networks like the brains of a baby. They need to be guided and taught and given nice examples without blending lots of stuff together and messing up the inputs. Do this splitting of similar stuff to different neural networks and you will immediately see results. Instead of one neural network to conquer all, think of the whole process and break it to steps. Then build separate neural networks for each step. You will achieve superior results; always.


This part of our AI trader system is responsible for taking the inputs from the DRNN and making a decision. This is the command centre of our AI algorithmic trader. What kind of decisions should it be able to make? We will choose three possible decisions BUY, SELL & HOLD (or HODL which is a famous quote in the crypto world).

There are also many ways to train your RL system;. We need one that can take continuous values and not only snapshots of the reality. We need to learn actions based on data analysis in a continuous feed of sensory data like a trader being in front of his terminal. In literature and to our experience the actor based reinforcement learning methods work better. Not to worry what it means. Just saving you time if you choose to go deeper here also.


Ok. All the pieces are in place. All we got to do now is to initiate the training. We need to present the RL with a window of data and let is experiment with different actionable scenarios while calculating the P&L (and Sharpe Ratio on later stages) in every case scenario. We need to be punishing it if it does badly and be rewarding it if it does well. This will happen in many iterations and overtime the AI trader will learn to trade in a style that will maximize the reward.


Backtesting an AI trader is very very important and needs to happen in out of sample data as we call it which means a part of the data that the AI trader has never seen, either in training or during testing. Only if an algorithm performs well in this kind of data we should even consider starting testing it in a live trading environment. Keep a good amount of data for backtesting, data that your algo has not been training or testing on and make sure you have all the representative market situations (volatile, steady, bearish, bullish etc). Make also sure that you have been representing the market correctly e.g. splits of stocks, spin offs, companies that failed so you do not have survivor bias.


You spent lots of time and done lots of experiments. You managed to achieve a good Sharpe Ratio for your AI trader after lots of trial and error.

How would you know if your AI Trader is ready to go live? What are the signs of a well trained AI trader? Consider the steps on the next page a checklist.

If you cannot check any of those do not even think of deploying.

Never be hasty. It is going to take you a while to create an amazing algo trader. And you never know if it works until you test it with real money. In real life, it takes about $200K to develop and test several strategies until you get one to work out. But then the return is 20-30x the dev amount for as long as your edge will last.

Succeeding in this will be one of the most beautiful moments in your life.

So spend time to take care of all the necessary details and check the boxes below to avoid mistakes that might cost you dearly.


  • I have normalized by data before training my AI trader.
  • I have normalized the dev and test dataset with the mean and standard deviation of my training data only.
  • I have a sufficient amount of data to train my AI trader with. Sufficient is 100K to more than 1M of rows of historical time series. Anything below that and I am very suspicious of performance looking good in backtesting.
  • My AI trader seems to perform well for a long period in the past and in different market conditions.
  • My AI trader seems to perform well also in another security that I haven’t trained it at the same time series timestep (this is a huge go go go signal).
  • My algorithm does not get destroyed in the test set while it does well in training and dev sets. If it gets destroyed in the test set it means that it has overfitted and I need to either bring more data and/or look for less deep neural network architectures or increase my dropout percentage.
  • My algorithm performs very badly in the train set and cannot be trained. I consider using more inputs for my neural network or increase the depth of your neural network or play with the hyper parameters of the neural network and RL agent. Have I initialised the weights for the deep part before starting the training?
  • My algorithm performs well in all train and test sets and I am ready to try live but I need to keep on training with smaller learning rates for my deep neural network. There are some more dollars of performance to squeeze out of it before I let it out in the jungle.
  • I am not deploying immediately in a live environment with super high margin and uber volatile conditions. Instead, I started with a demo account and trade in demo for a while to see what happens. Alternatively I can trade live but use micro trades to gain some confidence and see where I need to intervene.
  • I have an easy way to switch it off, a “Panic Button” that is the Kill Switch and can do it even while I am away from my trading terminal.
  • I have been developing it with a friend so my biases and his biases are somehow balanced.
  • I doubled checked all the above and I am sure that I am ready to go LIVE! Fingers, toes crossed and all lucky rabbits foots and lucky horseshoes are around my trading station.
  • Finally, I am not deploying live in full moon or during holidays.

Alexandros Louizos, MD
Alexandros Louizos, MD

Alexandros Louizos, MD is a vascular surgeon that left his career in surgery in 2013 to join the artificial intelligence revolution. After working for 2 years as a data scientist he decided to leave the corporate career to start his own company in 2015. He is a 2x entrepreneur of AI-related companies, Galaxy.AI (VC funded with $2.9M), and his latest venture is bootstrapped. He has designed and executed artificial intelligence systems currently in production is Fortune 500 companies. What gives him happiness is helping other dreamers to learn data science.

Leave a Reply

Your email address will not be published.