Do backtests really work

August 25, 2018August 27, 2018
by Diamond Zemaitis

A lot of people very often doubt the validity of a backtest. I, myself had heard this numerous times from people that I have discussed the topic with. And it has merit because, quite often, backtests don’t exercise the past properly to simulate live market conditions correctly. That does not have to be the case though. If you know what you are doing and understand the basis of how to write a proper backtest you can achieve incredible simulation of actual market performance.

First and foremost – make sure your strategy logic requires execution on bar open only and does not get executed mid bar.

Why?

Simply put, majority of backtesting software simulates past data on a bar by bar basis and not on tick by tick. Because of this reason you lose the whole process of bar development, so you can never know how your analysis values changed intra-bar. Let’s say you trade SMA (simple moving avg.) cross-overs, if your bar starts going down very rapidly during it’s lifecycle, your SMA values might start crossing over, which would generate a signal to perform a short trade. But then the bar might rally back up in it’s last minutes causing your SMA values to no longer cross each other. This would eliminate the signal that appeared earlier. Now if you execute your strategy on every tick or close to that in live market conditions you would have gotten a signal and posted a trade. But if you perform a backtest and lose the development process of this bar – you wont see this trade in the backtest results. This is hands down the largest issue that most people come across when they start writing backtests. This also causes your backtest to distort the actual performance draaaastically and this backtest can not be trusted.

If, however you are using a backtester that generates bars on tick by tick basis, you might improve the overall picture, and utilize intra-bar execution, but you must be very cautious still by doing that and must fully understand the implications of your automated trading logic. Another thing to note if you are using a tick by tick bar simulation you will most likely not be able to backtest long periods back, because ticks are very data-heavy. Imagine that your daily bar is a bucket of sand. Ticks will be sand grains in this bucket. So you might not be able to get that much ticks back historically simply because it is very inefficient. This might cause you to have too little results to properly evaluate your backtest (if you love getting your hands dirty, read this article about how to calculate the needed size of sample data to properly evaluate the results).

Why bar open and not bar close?

You should avoid posting pre-post market orders (unless of course your strategy requires that in particular). If you execute your strategies on bar close you will be forced to post your orders post market (you can avoid by posting your trades lets say 5 min before bar close, but this is not a good practice). This will distort your actual backtest results because the backtester wont be able to simulate properly how your order would have gotten filled post market. Executing on open allows for better simulation as well as forces you to take care of gaps properly.

A very important thing to note here is this: when executing everything on bar open you MUST have your strategy check the values of the previous bars and never check the current bar. Let’s say your strategy is looking for breakouts. You should in this case check if previous bar broke out and not the current. Why?

Because you are trying to simulate live market conditions and in live market conditions when your bar opens you don’t have the close value of the bar, while in simulated backtest – you do have bar close right away on bar open. you might forget that, so this also reduces the human error factor. Therefore, your backtest should reflect that.

Next major thing is proper statistical evaluation. This boils down to two main points:

1 – large enough and diverse enough sample data population:

Markets change, sometimes they change rapidly sometimes gradually. There are various economic cycles, various seasons, various events that you need to overview in order to get the correct picture. Take for example the seasons of the year. everyone knows the saying – “sell in May and go away”. Well this has a valid background to it. trading usually slows down during summer as people tend to spend more time on vacation and just enjoying life. The pace picks back up in September. Now if you are testing an intraday strategy, say 1h bars – you must take the seasons into consideration. Your strategy might work one way when trading is active, and completely differently during summer when things slow down. So you need to properly select a look-back period.

One very important thing to consider when backtesting larger timeframes are actually the economic cycles, is economy in recession is it stagnated is it growing? When backtesting you should aim to include as many as possible various economic cycles in your look back period – again simply because your strategy might yield different results during different economic situations. and you need to know that and evaluate that and be prepared for that.

Another thing to make sure of is that you get enough trades to properly paint a picture. Again – read the article linked above, if you want to get mathy and calculate properly how many results you need to evaluate the performance with a certain margin of error.

2 – proper trading performance evaluation tools

Using proper tools to evaluate your backtesting performance is crucial – I highly recommend spending a lot of time wrapping your head around the values that you use to evaluate this performance. truly understand the implications of each formula, understand their shortcomings and aims. Every value is different so think what it is saying about your backtest. For example: there are some values that are very popular, but actually don’t really mean anything on their own. i.e. the Sharpe ratio. One of the most famous ways to evaluate portfolio managers (or in modern day – a backtest) performance. Now I have heard people say – Sharpe between x an y is good, less than z is bad and bla bla bla. Thing to understand is that Sharpe ratio was created TO COMPARE two or more portfolio manager’s performance. This means that by itself the value does not say much. But wen you take two backtests – one with Sharpe of x and another of y, now you can compare these, and evaluate which is better.

So read up on your units, people!

And of course, as a propagator of robust statistics – use as many of robust statistical units as you can, as those do not rely on look back period as much as their non-robust counterparts. If you don’t know what the heck are robust statistics – read this first. Initially I aimed to release 3 – 5 additional blog posts going in to detail on various robust units, but lately I have decided that the topic of robust statistics requires its own book as it is completely unexplored when it comes to trading. So as you read this – I am actually writing an e-book on Robust Statistics in trading. If you want a copy when this e-book is out, put your email below and I’ll tell you how to get it ?

To sum up

Writing a backtest requires for you to really understand what you are doing and what you are attempting to achieve. And what you are attempting to achieve is simulate live market conditions – not create a perfect P&L curve.

Keep in mind that a backtest will never simulate live markets 100%, but by avoiding statistical outliers, using solid logic and generally understanding what you are doing you can achieve significant live market simulation. The higher % of simulation you aim to achieve the simpler your backtest must be, with every additional check you put it the % goes down. So again, understand what you are doing and aim to achieve the best simulation by using solid logic, and not curve-fitting. Don’t get hung up on the perfect results, it is very likely that your strategies will yield terrible results initially, so just scrap the logic and start over, don’t attempt to introduce more and more “checks” and “ifs” if the core logic of the strategy is terrible.

Best,