With this blog posting, I’d like to talk a bit about unit testing. Unit testing, for those of you who haven’t heard of it, is roughly speaking the idea that your code base should be:
- split into atomic, functional chunks (“units”) rather than a huge, monolithic script
- constantly tested to ensure it behaves as designed
What I like about unit testing is that it forces us to pause and think about what we’re doing and how it fits into the greater environment before we begin to hack away at our code. It also makes us write modular pieces of code, which are much easier to expand, maintain, and reuse. All of these are worthwhile benefits in the long run, and the short-term costs are negligible: we just have to write a little bit of code up front for the testing framework. Let’s explore this workflow with an actual (albeit simplified) example, the source code for which can be downloaded from http://www.mathworks.com/matlabcentral/fileexchange/43627-download-daily-data-from-google-and-yahoo-finance.
Example: Equity identifier conversions
One hiccup in this workflow is that these two sources don’t use the same identifiers for their equities. Consider Michelin trading on the Euronext Paris exchange: on Google Finance, you would search for it by “EPA:ML”, while on Yahoo! Finance, you would use “ML.PA”. (More precisely, we use webpage-scraping functions to download the historical price information, and these identifiers appear in the URLs.) What we need, then, is a pair of functions that can convert Google-formatted tickers to the Yahoo! format and vice versa.
Wait a minute!
Before diving in and starting to program these functions, let’s pause and plan the test suite for one of them, say “convertGoogleToYahooTickers.m”. First, what sorts of inputs should this function expect? Well, it ought to be at least one Google-formatted ticker symbol. If it’s just a single ticker, then a character string is fine—at which point the function should return a Yahoo-formatted character string. In the syntax of MATLAB’s functional unit testing framework, this requirement is equivalent to:
Next, we could next consider what to do if the input is empty. We could generate an error (and test for it with the verifyError command), but instead I prefer to let it run without error and return an empty string—let another part of the code catch that as a problem or deal with it as appropriate:
Testing the code
Finishing the job
While creating this test suite, I asked myself several other questions: if this function should check to see if the ticker symbols are valid (it shouldn’t; the set of valid symbols is constantly changing and is best left to an actual data downloading command) and just how fastidious it needs to be about incorrect syntaxes (it isn’t: as a utility function, this is more likely to be called by a helper function than directly by the end user, so we needn’t worry too much about, say, it being called with a numerical input).
Depending on your needs, your test suites may be much more rich and comprehensive than this one—the important thing, though, is that unit testing requires you to ask these questions before you start writing your code. This planning step is always worth the initial investment simply because it eliminates many more problems down the road.
If you want to dive deeper into the code, you’ll notice that there’s a unit test for every function in the suite. Some of the tests confirm that errors are generated when they should or that the results match only to within a certain tolerance (which is in general preferable to checking for equality down to the last bit).
I’ll be honest and admit that the code base existed before the unit tests did—but I’m glad that I did write unit tests all the same! I only discovered the bug described above because of the tests, I realized that the function “calculateAdjustedClose” was far more useful as the stand-alone function it now is instead of as the subroutine of “getGoogleDailyData” it originally was, and I discovered another subtle bug in “calculateAdjustedClose”. These unit tests were really worth it, even for this existing code base.