Saturday 31 March 2012

Back-Testing: Advice and Caveats for the Beginner

In this article, I look at the dos-and-don'ts of back-testing. Now we have our 'scraped' price data, we are in a position to develop trading strategies to apply to it. As previously mentioned, I tend to test new strategies initially in Metastock and then take the formulae/indicators therein and build them into my back-testing application as new methods.

There are a number of basic caveats to back-testing that you need to be aware of and some I have come a real cropper on:
  •  First, and potentially most dangerous, is allowing your system 'see' future data - by this I mean you must not let your 'tests' access any data that could be potentially in the future. This can be really subtle and difficult to debug and not especially obvious. The best way to get around this problem is to be really disciplined in your coding and to isolate data based on age. I do this by creating 'subsets' of data with Linq using the 'current' date and the earliest date in your back-testing model. As the 'subset' is a Linq table object and contains all the price data I need: High, Low, Opening, Close and Volume, all I subsequently need to do is separate out the bits I need such as a date array and a closing price array.
  • Second, and a potential danger to us working with 'free' data, is Survivor Bias. This is where stocks have dropped out of the indices over the years for performance reasons and therefore your initial dataset is already biased in favour of 'survivors'. There is no easy way round this, unless you want to fork-out for a 'clean' dataset - I believe that if your testing is thorough enough and your sample sizes are large enough, then this will not necessarily be a problem.
  • Third, and most important, avoid 'curve fitting'. By this I mean that if you add countless parameters to your model, you should not be surprised if you get excellent returns in back-testing. The art of model development is definitely 'less is more' - you should aim to reduce your parameters to an absolute minimum, that way your model will perform in the widest ranges and types of market. The sign of a good model is how few, and how simple the parameters are. You should aim to continually test and reduce your parameters until you see no observable change in your results. This is a hard point to make, but crucial, especially for us amateurs. I suggest you read Ernie Chan's opinions on simplifying trading models either in his blog or in his book.
  • Fourth, compounding. I made this mistake for a while, my back-testing model would use the returns of previous trades to fund future ones. This looks great and does help you to see the effects of compounding, however it does not help in testing or verifying the success or otherwise of your model. You need to strip-out such effects from your initial model testing so that you are testing only the veracity or otherwise of your parameters rather than the vagaries of market timing.
Finally, make sure to look out for the obvious. If your model starts to perform really well in back-testing, assume you have something wrong. Take you model apart and test it bit-by-bit. In the early days of my model building I realised I was feeding historical data into the model in reverse! Compartmentalise your development and test each piece in isolation - strip-out all of your parameters and then add them in one-by-one to see their impacts or otherwise on your model.

Oh, and make sure you back-up and carefully document each stage of the model building. I use source control to help me do this - this is invaluable in going back and looking at what you have or haven't done in the past.

Tuesday 27 March 2012

Position Sizing Strategies - Kelly's F

When you have a trading model you are confident of and you want to maximise your gains, I suggest you look at the Kelly Criterion or  Kelly's F. Originally devised as a method to optimise transmission over noisy telegraph wires, Kelly's F has been applied to gambling as a way of maximising returns. In turn it has also been applied to forex and stock market trading. A good article on the subject can be found here.

Unlike using ATR as a measure of volatility and thus a method of formulating your position size, Kelly's F is based upon the premise that previous gambles and outcomes are an indicator of future success. This is where the formula has its critics as one is assuming that the number of positive outcomes and return for a trade are an indicator of your system's likely future performance. Strangely, I have found Kelly's F to be a more successful position sizing strategy than ATR, but it takes a brave heart! In the model below, I calculate Kelly's F for each share based upon historical performance against my model. I also 'dilute' the KellyF recommendation by 50% - this is very common amongst traders and is know as a 'Half Kelly'. The code I use to calculate KellyF is shown below:

 public decimal GetKellyF(string epic)  
     {  
       if (_transactions.WinningTradeCount(epic) > 1)  
       {  
         if (_transactions.LosingTradeCount() == 0)  
         {  
           return 1;  
         }  
         decimal? avgWin = _transactions.WinningTradeValue(epic);  
         decimal? avgLose = _transactions.LosingTradeValue(epic) * -1;  
         decimal? avgWinningTradeValue = avgWin / _transactions.WinningTradeCount(epic);  
         decimal? avgLosingTradeValue = avgLose / _transactions.LosingTradeCount(epic);  
         int tranCount = _transactions.WinningTradeCount(epic) 
                                  + _transactions.LosingTradeCount(epic);  
         decimal winPct = (decimal)_transactions.WinningTradeCount(epic) / tranCount;  
         decimal? wlRatio = avgWinningTradeValue / avgLosingTradeValue;  
         decimal? kelly = ((wlRatio + 1) * winPct - 1) / wlRatio;  
         if (kelly != null)  
           return (decimal)kelly;  
         return 1;  
       }  
       return 0;  
     }  
   }  

Points to note here - many of the calculations shown here such as 'WinningTradeValue' and 'WinningTradeCount' are calculated within my transcations class - this is a big class and too large to print here - if anyone wants it I can send it to them. The key line here is:

 decimal? kelly = ((wlRatio + 1) * winPct - 1) / wlRatio;  

This will calculate KellyF for each share so long as a history exists. I then use the results of the KellyF calculation to work out the size of the shareholding using this line of code:

  noStocksHeld = (int)((Balance * KellyF * 0.5m) / BuyPrice);  

(Note the '0.5m' to limit the final size of the shareholding). KellyF can get pretty aggressive in the size of holdings it demands, so you have to tone it down. I have only really seen KellyF being used in options and forex trading, so this is my customisation to make it work in stocks and shares. I have had a lot of success with it over the years.

Sunday 25 March 2012

Position Sizing Strategies - Using ATR

Strangely, one of the biggest factors for maximising your returns is accurately calculating the size of the position you want to take after your system has found something to buy. From personal experience, position sizing has made a massive difference to my returns as well as limiting me to an exceptable risk.

I commonly use two types of position sizing - if I have just started investing in a system and am still lacking in confidence as to its long-term viability (backtesting withstanding), I will use one based upon Average True Range. If I have a system I feel confident in and am less risk-averse I will use a system based upon Kelly's F. I will expand upon ATR now and will cover Kelly's F in more detailed in a future article.

ATR Position Sizing

Average True Range was developed by the famous technical analyst and trader, Welles Wilder Jr. It is an indicator of price volatility and its use to us is in factoring the amount we invest based upon the recent volatility of the share price. This method of position sizing has been widely promoted by Turtle Traders - for a summary of how they use it, read the following PDF. In order to use ATR we need to calculate it for the last x days - I use 20 days, this seems to be quite a common value. I loop through the current price range I have in my back-testing model with the following code:

            var catr = new CalcAverageTrueRange();
            var atr = new List<decimal>();

            foreach (Price p in thisEpicsPrices)
            {
                atr.Add(catr.CalcATR(ATRDays, p.High, p.Low, prvClose));
                prvClose = p.Close;
            }

The CalcATR function is 'hand-rolled' version adapted to C# from a Metastock formula:

public decimal CalcATR(int days, decimal high, decimal low, decimal previousClose)
        {
            decimal highMinusLow = high - low;
            decimal highMinusPrvClose = Math.Abs(high - previousClose);
            decimal lowMinusPrvClose = Math.Abs(low - previousClose);
            decimal tr = Math.Max(lowMinusPrvClose, Math.Max(highMinusLow, highMinusPrvClose));

            if (_atrCount < days)
            {
                if (previousClose > 0)
                    _trTotal += tr;
            }

            if (_atrCount == days)
            {
                _lastatr = _trTotal / days;
                _atrCount++;
                return Math.Round(_lastatr, 2);
            }

            if (_atrCount > days)
            {
                _lastatr = (((_lastatr * (days - 1)) + tr) / days);
                _atrCount++;
                return Math.Round(_lastatr, 2);
            }

            _atrCount++;
            return 0;
        }

Hopefully the parameters are self explanatory. From the array that is returned, that latest date is the last value (assuming you are using this in a 'live' system). I have to do one small conversion before I use it and that is to convert it to 'n' (same as the Turtle Traders use):

  decimal n = atr[i] / 100; //Convert today's atr to 'n'

I then calculate the number to shares to purchase with the following line of code:

noStocksHeld = (int)(positionSizeValue / (2 * n));

I calculate positionSizeValue as being my total asset value (all the shares I currently own multiplied by the current price), and how much I have in cash. I then multiply this by a risk factor - I use 1%. Simply, this formula limits the amount you invest in this share based on its current volatility - the more volatile the share, the less it will let you invest. The value 'n' is also used to generate stop-losses, however I use different techniques.

Saturday 24 March 2012

Adding to your toolbox

Once your start playing with your data, your will quickly realise that there are many useful technical and financial functions you need to create your own trading strategy. Development of a successful strategy will see you needing to developing two related systems - one for back-testing and one for actual trading, and for both of these you will need a set of tools to help you calculate all the indicators/formulae you will want to use.

You could write the formulas from scratch, although personally, I have found many to be tediously complex and indecipherable, especially when trying to re-interpret them from other applications such as Metastock. A good dll to add to your library is TA-Lib, this comes with a comprehensive list of functions that will set you on your way. Best of all it is opensource which is fairly unusual for anything finance-related.

Just a few tips on using these functions in your C# programs - I assume you are retrieving your data from an SQL database - first, you need to retrieve your data as an  strongly-typed list object:

 IOrderedQueryable<Price> prices = (from p in _db.Prices
                                                   where p.Date <= date
                                                   where p.Date >= _earliestDate
                                                   where p.Epic == epic
                                                   orderby p.Date ascending
                                                   select p);

Next, as most functions in TA-Lib want double arrays (double[]) for input prices, you need to cast your data as follows:

double[] priceCloseSubset = (from p in prices select Convert.ToDouble(p.Close)).ToArray();

You should now be able to utilise these functions in your own designs.

Friday 23 March 2012

Bulk Inserts into SQL

Following on from the article on downloading prices from Google, today we shall focus on how to quickly and efficiently insert them into an SQL database. On an average day I will download some 750k prices - if I were to insert them record-by-record using a Linq statement, it would take in excess of half and hour. To get round these problems, use the following code to perform a bulk insert into the database:

 public static void CopyData(DataTable sourceTable, string dbTable)
        {
            using (var cn = new SqlConnection(Database.ConnectionString))
            {
                cn.Open();

                using (var s = new SqlBulkCopy(Database.ConnectionString))
                {
                    s.DestinationTableName = dbTable;
                    s.NotifyAfter = 10000;
                    s.SqlRowsCopied += SqlRowsCopied;
                    s.WriteToServer(sourceTable);
                    s.Close();
                }
            }
        }

        private static void SqlRowsCopied(object sender, SqlRowsCopiedEventArgs e)
        {
            Console.WriteLine("-- Copied {0} rows.", e.RowsCopied);
        }

The datatable 'sourceTable' is formatted to exactly match the SQL table (name, data type etc).  The method SqlRowsCopied just keeps you updated as to where it is in the insert process.

With this method you can easily insert all 750k records within a matter of seconds.

Monday 19 March 2012

Latest Holdings

As part of this blog, I plan to list current stock picks generated by my latest model that I have nicknamed the 'ShortWave'. This model is rather basic and is built upon MACD, RSI(4) and ADX. I shall just report its current notional holdings and see how things develop. I have no interest in these shares, and, as part of my policy of 'having no opinion', I often do not even know what the companies do or even what sector they are in.


Epic BuyPrice BuyDate
ARM 5.765 29/11/11
BGC 3.65 08/03/12
DRX 5.505 09/12/11
HIK 7.485 02/03/12
IGG 4.759 15/02/12
IMG 6.195 28/02/12
KENZ 4.664 13/03/12
SNR 1.86 15/02/12


MetaStock Conversion Utility

One invaluable tool for model development I haven't mentioned is Metastock. I have used this for many years to 'fine tune' the development of strategies and indicators. I tend to develop and test 'indicators' in Metastock and then convert this logic into C# code.

For the amateur trader, the biggest issue in using Metastock is how to convert data for it. I originally used end-of-day services such as those provided by Paritech, however, I found this somewhat restrictive as I wanted to control what data I converted e.g. FTSE100 or FTSE250 only. Unfortunately, Metastock does not appear keen to open up the workings of its data format and as such I have never found a pure open-source method of converting data for it. At the heart of my conversion code I use a dll called 'Metalib' (http://trading-tools.com)- my version of the dll is very old now (2007) but still does the job - it is proprietary so you will need to buy it.

Points to note in the ConverterEngine method are:
  •  The EPIC codes and Prices are pulled in from SQL using a Linq query.
  • All existing Metastock data files are deleted to start with.
  • There are two parts to the method: OpenSecurityBySymbol and AppendDataRec - the first creates the security file whilst the second adds subsequent price data to it.
  • The Sort method just sorts the files by name - the filenames are meaningless gibberish so the Sort method arranges them for when you open them in Metastock.
I use this conversion method with the version 11 of Metastock with no problems. It runs nightly after the price scrape has completed.

  
private readonly MLReader _metaReader;
private readonly MLWriter _metaWriter;
private readonly MLRegistration _mReg;
private MetastockDataContext _db;
private IQueryable _hemsList; 

public void ConverterEngine()  
     {  
       _db = new MetastockDataContext();  
       _db.Connection.ConnectionString = Database.ConnectionString;  
       _hemsList = from epic in _db.StatsLists  
             where epic.FTSE_Index.Replace("\"", "") == "FTSE100"  
             || epic.FTSE_Index.Replace("\"", "") == "FTSE250"  
             orderby epic.EPIC  
             select epic;  
       Console.WriteLine("[][][][][][][][]Doing METASTOCK conversion[][][][][][][][]");  
       _mReg.SetRegistrationInfo(PUT_YOUR_REG_HERE, KEY_HERE);  
       if (!Directory.Exists("C:/Metastock/Securities"))  
       {  
         Directory.CreateDirectory("C:/Metastock/Securities");  
       }  
       foreach (string sFile in Directory.GetFiles("C:/Metastock/Securities"))  
       {  
         File.Delete(sFile);  
       }  
       foreach (StatsList epicItem in _hemsList)  
       {  
         string epic = epicItem.EPIC.Replace("\"", "");  
         IQueryable<Price> prices = from p in _db.Prices  
                       where p.Epic == epic  
                       orderby p.Date ascending  
                       select p;  
         if (prices.Any())  
         {  
           Console.WriteLine("Processing " + epicItem.Share_Name.Replace("\"", ""));  
           foreach (Price price in prices)  
           {  
             DateTime priceDate = price.Date;  
             string yahooDate = priceDate.Year + priceDate.Month.ToString("00") +  
                       priceDate.Day.ToString("00");  
             _metaWriter.OpenDirectory("C:/Metastock/Securities");  
             if (_metaWriter.bSymbolExists[price.Epic])  
             {  
               _metaWriter.OpenSecurityBySymbol(price.Epic);  
               _metaWriter.AppendDataRec(Convert.ToInt32(yahooDate), 0,  
                            Convert.ToSingle(price.Open),  
                            Convert.ToSingle(price.High),  
                            Convert.ToSingle(price.Low),  
                            Convert.ToSingle(price.Close),  
                            Convert.ToSingle(price.Volume),  
                            0f);  
             }  
             else  
             {  
               _metaWriter.AppendSecurity(price.Epic, epicItem.Share_Name.Replace("\"", ""),  
                             PERIODICITY.Daily);  
               _metaWriter.AppendDataRec(Convert.ToInt32(yahooDate), 0,  
                            Convert.ToSingle(price.Open),  
                            Convert.ToSingle(price.High),  
                            Convert.ToSingle(price.Low),  
                            Convert.ToSingle(price.Close),  
                            Convert.ToSingle(price.Volume),  
                            0f);  
               _metaWriter.CloseSecurity();  
             }  
           }  
         }  
       }  
       _metaWriter.CloseDirectory();  
       Sort();  
     }  
     // Private Methods (1)   
     private void Sort()  
     {  
       _metaWriter.OpenDirectory(@"C:\MetaStock\Securities");  
       _metaWriter.Sort();  
       _metaWriter.CloseDirectory();  
     }  

Stock Market Analysis: An Amateur's Perspective

For the many years I have been investing, I have tried to analyse the stocks from a number of angles in attempt to achieve some understanding of what is going on.

I originally started from the perspective of Technical Analysis. I bought the books, studied the charts, read the blogs and eventually came to the conclusion that I might as well be studying tea leaves. My initial foray into investing was driven by whatever 'gut feeling' the charts told me and, I discovered, I was particular unsuccessful at it. I realised that my best defence was to ignore my own 'opinions' and to focus on the facts. For a long time I followed the 'Mechanical Investing' forum on the 'Motley Fool' website which seemed to offer a glimmer of hope in that it's mantra that investment decisions should be based upon a calculated response to inputs rather than personal feeling.

Ultimately, I got into designing 'filters' for fundamental data, especially using such sites as Hemscott (now part of MorningStar) and ShareLockHolmes. I still like fundamental data for choosing stocks, however I believe that larger and quicker gains can only be achieved through price-based means (rightly or wrongly). This is how I ended-up being interested in algorithmic trading and this seems to perfectly balance my interest in software with the mental challenges of developing strategies.

As has been dicussed previously in this blog, I have not yet entered the intra-day realm of trading mainly due to a lack of knowledge and time, although not doubt this will come.

Software and Tools

 I am a self-taught coder. As I have mentioned before, I do not work in the software or finance industries and therefore my code probably lacks the polish and modernity of others.

I like to write most of my code in C#. The reason for this is not just the ease of the language but also the tools that go with it. I have used Java on and off over the years, but have never really liked the tools available (or perhaps understood them). With C# and the whole Microsoft thing, there is just an ease of use and integration that makes life a bit easier for the amateur.

Language choices aside, I write all the software within Visual Studio 10 with the help of a fantastic utility Resharper by Jetbrains to check what I am laying down. With the advent of Linq, I am no longer forced to write stored procedures or other 'off-line' code that cannot be intellisensed. I think Linq is the best thing Microsoft has come up with in a long time!

Another useful utility that comes to mind is Regionerate. This tool will re-layout the order of your code to make it more readable. The tool is a bit old now - it doesn't seem the author supports it any longer, however, the existing version still seems to work with VS10.

I use an MS SQL Server to store data. With some tables being upwards of a million rows, this is the best and easiest way to handle data. I am currently using the *free* version (express I think they call it). You will need to also download the 'management studio' utility to help you build tables etc. MS SQL is an invaluable tool, especially when developing your strategies. It is worth learning some of the basics of the SQL language to help you query the data on file.

Sunday 18 March 2012

On Simplicity

When I first started designing algorithms to trade stocks, I was fascinated with all things related to artifical intelligence and truly believed that this was the way forward in developing an 'all seeing' formula for success.

My early attempts at algo-trading were notable for the complexity of the models I built and, in due course, the realisation that I was merely adjusting variables and parameters in order to 'fit' my model more closely to a back-tested curve. It was only after reading Ernie Chan's great book Quantitative Trading did I realise that it is more important to work on simplifying one's model that trying to make it all singing and all dancing. Strangely, the hardest thing about model development is to simplify it. You know you are getting close to a good model when adding one more variable or parameter moves it away from the optimum. I have found through 'gut feel' rather than empirical evidence, that anymore than 4 or 5 variables and the same number of parameters makes a model less than optimal.

A simple model will allow it to work in more varied market environments, with more resilience and for a longer time. When you are reading about the complexity of the Wall Street quant models, always remember that simplicity is the friend of the amateur.

The Daily 'Scrape'

In the last post we talked about about how Google 'hides' its price data behind its own custom id's and how we can retrieve and match them to real EPICs for our own uses. In this post we will cover the actual exercise of scraping price data and the pitfalls to watch out for. A word of warning before we progress, the code listed below is not as well documented as I would like, however please ask if you have questions.

I download pricing data everyday for all companies in the FTSE100 and FTSE250. I also download 6 years-worth of data for each EPIC every day (750,000 records). You may ask why delete and re-download everything every day? The reason is, I believe there are errors in the daily data that are sometimes corrected by Google and I wouldn't catch these 'corrections' if I didn't download everyday. I may be very wrong in this opinion, but it makes me feel happier.

The actual 'work' of fetching the Google price data is covered by the following code:

   protected static DataTable DownLoadPricesFromGoogle(DateTime latestDate, 
                                      DateTime earliestDate, string googleCid)  
     {  
     Begin:  
       string urlTemplate =  
         @"http://finance.google.co.uk/finance/historical?cid=
                     [symbol]&startdate=[earliestMonth]%2F[earliestDay]%2F[earliestYear]
                      &enddate=[latestMonth]%2F[latestDay]%2F[latestYear]&output=csv";  
       string latestMonth = latestDate.Month.ToString("00");  
       string latestDay = latestDate.Day.ToString("00");  
       string latestYear = latestDate.Year.ToString("00").Substring(2, 2);  
       string earliestMonth = earliestDate.Month.ToString("00");  
       string earliestDay = earliestDate.Day.ToString("00");  
       string earliestYear = earliestDate.Year.ToString("00").Substring(2, 2);  
       urlTemplate = urlTemplate.Replace("[symbol]", googleCid);  
       urlTemplate = urlTemplate.Replace("[latestMonth]", latestMonth);  
       urlTemplate = urlTemplate.Replace("[latestDay]", latestDay);  
       urlTemplate = urlTemplate.Replace("[latestYear]", latestYear);  
       urlTemplate = urlTemplate.Replace("[earliestMonth]", earliestMonth);  
       urlTemplate = urlTemplate.Replace("[earliestDay]", earliestDay);  
       urlTemplate = urlTemplate.Replace("[earliestYear]", earliestYear);  
       string history;  
       var wc = new WebClient();  
       try  
       {  
         history = wc.DownloadString(urlTemplate);  
         Thread.Sleep(1000);  
       }  
       catch (WebException ex)  
       {  
         if (ex.Status == WebExceptionStatus.ProtocolError)  
         {  
           return null;  
         }  
         Thread.Sleep(3000);  
         goto Begin;  
       }  
       finally  
       {  
         wc.Dispose();  
       }  
       return Parse(history, true);  
     }  

The 'Begin:' at the top of the code is a horrible 'kludge' put in to get around any delay in fetching the page - I have as yet not found a better way. The rest of the code is fairly self explanatory.

The 'Parse' method converts the document stream into a datatable - to be honest I should get around to rewriting this as there is really no need for this and a much simpler method such as a simple data transport object would suffice:

  public static DataTable Parse(TextReader stream, bool headers)  
     {  
       var table = new DataTable();  
       table.TableNewRow += Table_NewRow;  
       var csv = new CsvStream(stream);  
       string[] row = csv.GetNextRow();  
       if (row == null)  
         return null;  
       if (headers)  
       {  
         foreach (string header in row)  
         {  
           if (!string.IsNullOrEmpty(header) && !table.Columns.Contains(header))  
             if (header.Contains("Date"))  
             {  
               table.Columns.Add("Date", typeof(string));  
             }  
             else  
             {  
               table.Columns.Add(header, typeof(string));  
             }  
           else  
             table.Columns.Add(GetNextColumnHeader(table), typeof(string));  
         }  
         row = csv.GetNextRow();  
       }  
       while (row != null)  
       {  
         while (row.Length > table.Columns.Count)  
           table.Columns.Add(GetNextColumnHeader(table), typeof(string));  
         if (String.IsNullOrEmpty(row[1]))  
         {  
           Debug.WriteLine("`");  
         }  
         table.Rows.Add(row);  
         row = csv.GetNextRow();  
       }  
       return table;  
     }  
     private static void Table_NewRow(object sender, DataTableNewRowEventArgs e)  
     {  
       Console.WriteLine("Table_NewRow Event: RowState={0}",  
                e.Row.RowState);  
     }  
     private static string GetNextColumnHeader(DataTable table)  
     {  
       int c = 1;  
       while (true)  
       {  
         string h = "Column" + c++;  
         if (!table.Columns.Contains(h))  
           return h;  
       }  
     }  

The 'RefreshPriceData' method controls the program flow of the 'scrape' - key points to note are:

  • the 'Persistence' class is purely a utility class to clean-up the backend database prior to the receiving of a new set of data. To be honest, with the advent of Linq-to-Sql, I could probably get rid of 'Persistence' and keep everything in line.

  • The code checks for the presence of 'GoogleEpicCodes.xml' in the path - as previously mentioned, this file is the 'key' for matching Google id's to real EPICs. If the file doesn't exist, it is created in the 'FetchGoogleCIDs' method.
  • I use a store of known EPICs in my database to filter what prices I am going after - these EPICs are checked against the data from 'GoogleEpicCodes.xml' to ensure validity and to filter for only FTSE100 and FTSE250 companies - the LINQ code to fetch the EPICs from the SQL database is:
    IQueryable<string> epics = (from e in _db.StatsLists  
                         where e.FTSE_Index == "FTSE100" || e.FTSE_Index == "FTSE250"  
                         orderby e.EPIC  
                         select e.EPIC);

  •  An interesting piece of code surrounds the line:
     if (!_dt.Rows.Contains(keysFind))
    The reason for this is that I have found occaisionally that Google will throw the same price/date pair in twice - I don't know if this is designed to confound scrapers, but this small check before adding the record will save you much grief later.

 public void RefreshPriceData()  
     {  
       _db = new DataClassesDataContext();  
       _db.Connection.ConnectionString = Database.ConnectionString;  
       //Delete all prices first  
       Persistence.ExecuteDelete();  
       string path = Environment.CurrentDirectory;  
       path = string.Format("{0}\\", path);  
       if (!File.Exists(path + "GoogleEpicCodes.xml"))  
       {  
         FetchGoogleCIDs();  
       }  
       DataTable gId = GetGoogleCids(path + "GoogleEpicCodes.xml");  
       IQueryable<string> epics = (from e in _db.StatsLists  
                     where e.FTSE_Index == "FTSE100" || e.FTSE_Index == "FTSE250"  
                     orderby e.EPIC  
                     select e.EPIC);  
       DataTable _dt = Persistence.CreateDataTable();  
       var keys = new DataColumn[2];  
       keys[0] = _dt.Columns["EPIC"];  
       keys[1] = _dt.Columns["Date"];  
       int count = 0;  
       foreach (DataRow row in gId.Rows)  
       {  
         if (epics.Contains(row[0]))  
         {  
           Console.WriteLine("Starting on " + row[0]);  
           DataTable tempdt = DownLoadPricesFromGoogle(row[1].ToString());  
           if (tempdt == null)  
           {  
             continue;  
           }  
           var keysFind = new object[2];  
           foreach (DataRow dataRow in tempdt.Rows)  
           {  
             DataRow newRow = _dt.NewRow();  
             newRow["EPIC"] = row[0];  
             newRow["Date"] = 
                    DateTime.ParseExact(dataRow["Date"].ToString(), "d-MMM-yy", null);  
             newRow["Open"] = Convert.ToDecimal(dataRow["Open"].ToString());  
             newRow["Volume"] = Convert.ToDecimal(dataRow["Volume"].ToString());  
             decimal high = Convert.ToDecimal(dataRow["High"].ToString());  
             decimal low = Convert.ToDecimal(dataRow["Low"].ToString());  
             decimal close = Convert.ToDecimal(dataRow["Close"].ToString());  
             newRow["High"] = high;  
             newRow["Low"] = low;  
             newRow["Close"] = close;  
             keysFind[0] = row[0];  
             keysFind[1] = 
                      DateTime.ParseExact(dataRow["Date"].ToString(), "d-MMM-yy", null);  
             if (!_dt.Rows.Contains(keysFind))  
             {  
               _dt.Rows.Add(newRow);  
             }  
           }  
           if (count % 30 == 0 && count != 0)  
           {  
             Persistence.CopyData(_dt);  
             _dt.Rows.Clear();  
           }  
           count++;  
         }  
         Persistence.CopyData(_dt);  
         _dt.Rows.Clear();  
       }  
     }  

Implementation of this code will allow you to easily 'scrape' the data you need. I run this every night prior to testing, each run netting some three-quarters of a million records within the space of about 15 minutes.

Next time we will discuss how to easily insert so many records into an SQL database.

Getting Your Data: The Prerequisites

Scraping Data

(A word of warning before we start. The legalities or otherwise of data 'scraping' must be highlighted. Most providers do not like you doing this - as I am an 'amateur' I feel I have a little leeway in this stricture, however you have been warned!)

As a budding algo trader, your initial problem is getting data to test. There are sources of commercially available end-of-day data however these tend to be expensive and seldom in a form I like. I originally used to 'scrape' data from Yahoo, however I found the data was erratic and often 'changed'. Additionally, the Yahoo data appeared to be patchy and I was never truly sure it was any good. Eventually I moved over to using Google's own end-of-day data. I have found this to be very good and consistent as well as having suitable history.

There is a difficulty with Google data that is not apparent on Yahoo's - Google does not make it easy for us to automate downloading of stock prices as each URL is addressed using Google's own code for the stock e.g. Vodaphone's EPIC (exchange symbol) is 'VOD' whilst Google uses the code '834331'.

The code snippet below will fetch Google id's for each each EPIC you supply - in this snippet, the EPICs are fetched into a <string> list using the function:

EpicList.GetEpicListFromYahoo();

This could as easily be a Linq-to-Sql call from a database. You can quite easily get the list of EPICs from Yahoo to populate your table beforehand.

The FetchGoogleCIDs method eventual saves the resultant data to an XML file named 'GoogleEpicCodes.xml' - this was just done for ease of use.

 public static void FetchGoogleCIDs()  
     {  
       IList<string> epiclist = EpicList.GetEpicListFromYahoo();  
       var dt = new DataTable("Epics");  
       dt.Columns.Add("EPIC", Type.GetType("System.String"));  
       dt.Columns.Add("GoogleId", Type.GetType("System.String"));  
       dt.AcceptChanges();  
       var wc = new WebClient(); //create new webclient for process  
       foreach (string s in epiclist)  
       {  
         Console.WriteLine("Starting on " + s);  
         string thisEpic = s.Replace(".L", "");  
         string urlTemplate = 
                 "http://finance.google.co.uk/finance/historical?q=LON:" + thisEpic;  
         string history = "";  
         try  
         {  
           history = wc.DownloadString(urlTemplate);  
         }  
         catch (WebException)  
         {  
           throw new WebException("Problem fetching GoogleId for " + thisEpic);  
         }  
         finally  
         {  
           wc.Dispose();  
         }  
         
         string cid = RegexCID(history);  
         if (!string.IsNullOrEmpty(cid))  
         {  
           DataRow newRow = dt.NewRow();  
           newRow["Epic"] = s.Replace(".L", "");  
           newRow["GoogleId"] = cid;  
           dt.Rows.Add(newRow);  
         }  
         else  
         {  
           Debug.WriteLine("");  
         }  
       }  
       wc.Dispose(); //Clean-up  
       string path = Environment.CurrentDirectory;  
       path = string.Format("{0}\\", path);  
       dt.WriteXml(path + "GoogleEpicCodes.xml", false);  
     }  

One method missing here is the method call:

 string cid = RegexCID(history);  

This uses a regex match to extract the relevant Google id for the stock as is listed below:

 private static string RegexCID(string html)  
     {  
       string regex = @"<input.type=hidden.name=cid.value=.(?<cid>.*).>";  
       RegexOptions options = ((RegexOptions.IgnorePatternWhitespace 
                   | RegexOptions.Multiline)  
                   | RegexOptions.IgnoreCase);  
       var reg = new Regex(regex, options);  
       MatchCollection matches = reg.Matches(html);  
       if (matches.Count == 0)  
       {  
         return string.Empty;  
       }  
       GroupCollection groups = matches[0].Groups;  
       string match = groups["cid"].Value;  
       return match;  
     }  

By iterating through your list of EPICs and applying the above methods, you will quickly derive a list of the associated Google id's required for the main price scraping exercise.

Be warned that Yahoo's list of EPICs is horribly mangled as Yahoo insists of attaching '.L' after each one - in the process stocks such as 'BT.A' (British Telecom) become 'BT-A.L'. This can be a source of major problems later on. I actually get my list of EPICs from sharelockholmes.com who luckily do not mess them around.

My next post will concentrate more on the scraping engine and the utility classes surrounding it.

New to Algo Trading

This blog represents a way of giving back to all those countless kind and generous souls who bothered to write down what they were doing and wished to share it with others. Much of what you will read in this blog has come from others and has only been poorly regurgitated by me. I have learnt a vast array of information, both in terms of programming and quantitative finance, all because people are kind enough to share - this is my way of giving back and saying thanks.

Firstly, I have to say that I am a novice when it comes to investing. I am not a professional investor trading millions of dollars per day, nor am I an academic seeking to increase the knowledge-base devoted to investing - I am merely an average bloke with an interest in investing and, probably more importantly, an interest in programming.

I suspect it was my interest in programming that first drove me towards algorithmic investing. I do not come from an IT background - strangely I work in food sales and have never been employed in the IT world, although I have nearly always had a passion for programming. The combined love of programming and the desire to see how the stock market ticks has brought me to this point.

What am I seeking - my criteria
As many amateur investors know, especially on my side of the 'pond', it is quite limited and/or expensive to trade on a regular basis in the stock market. I am as yet too unsophisticated to trade futures or derivatives and have therefore sought to limit myself to equities. Additionally, due to the cost limitations of data availability and the time element required for intraday trading, I have limited myself to purely end-of-day trading with data scraped from various free sources. In turn, I have also sought to limit the cost of buying and selling shares by using cheap execution only brokers.

The development of an algo-based system for the amateur investor needs to encompass the following:
  • A cheap/free source of relatively clean end-of-day data
  • The means to easily collect this data
  • A cheap and easy means to place orders
  • A system that works - always a biggy!
  • Preferably make everything as automated as possbile.
In order to achieve these aspects you are going to need to be relatively 'hands-on' on the programming front. I know many algo traders use Matlab and the like to develop their strategies, however, I feel more at home in using the likes of C# or Java to develop my strategies. Additionally, these skills are useful as you are likely to need to build the supporting framework (price downloading, data conversion etc etc) - you could buy these utilities, but they have always been out of my price frame.

Through this blog I aim to show fellow amateur algo traders the tools I have used to build the 'framework' as well as explore some of the indicators I have found useful. In the next installment I will outline some of the techniques I have developed to work on algo trading which I think may prove useful to others.

H98PZD98K6HV