Tuesday, November 17, 2009

Geocoding: Trials and Tribulations

For the past few months I have had a small project on the back burner to try and make geocoding easier.

The motivation can be traced back to the data collection I had to do for the data I needed for my research on minimising the amount of travelling that football supporters have to do over the Christmas holiday period (if you are interested in this see this link).

When I collected the original data I used greenflag.com. I could have used theaa.com or rac.co.uk. The problem with the latter two is that you had to type in the to/from postcode into a web form. At least with greenflag.com the postcodes were part of the URL so it was a simple case of generating the correct URLs (which I did with Excel), uploading to a web site and then clicking on each link.

But this was a long winded process in that I had to a) click on every to/from link (about 900) and then scroll the screen to collect the actual mileage. That takes a long time and as I have done this for seven seasons I thought it was about time to try and automate the process.

The obvious candidate was to use Google maps. When I started the sports scheduling research Google maps did not provide the facilities that I required. Things have moved on since then and Google maps now has an API which makes this type of automation a possibility.

So, over the last few months I have been investigating how to use Google maps, which does not come without its problems. For example, it is not that accurate when using UK postcodes, you have to learn the API, ideally you need to geocode UK postcodes to longitude and latitude etc.

I think that I have now resolved most of these problems and am just doing the last stages of test. So, more soon.

Sunday, October 25, 2009

Vehicle Routing Datasets

For reasons which will become apparent in the fullness of time I have recently been trying to track down all the vehicle routing datasets that are out there; specifically the Capacitated Vehicle Routing Problem (CVRP).

I know that the problem was introduced in 1959 (with small instances being made available in that paper1) and I think I have tracked down the majority of datasets since then, but if anybody out there has a list (or a reference to a list) it would be helpful.


1 Dantzig, G.B., Ramser, J.H., 1959. The truck dispatching problem. Management Science 6, 80–91

Saturday, October 3, 2009

Conjuring Trick: Revealed

Thank you for all the feedback to my last blog, both personally and through facebook.

The answer is that information is communicated via the four remaining cards. It is not obvious how it is done but take a look at the article by Colm Mulcahy, which explains all.

If you are interested in these sort of puzzles I fond this one at Gurmeet Singh's blog. It is an excellent blog and contains many other puzzles like this.

Conjuring Trick

I recently came across another blog (I'll point you to that blog at a later date), which has a number of puzzles on it. One, in particular caught my eye (probably because of a long time fascination with conjuring).

The trick works as follows.

A volunteer chooses five cards from a normal pack of cards. You take the five cards, look at them and return one of the cards to the volunteer. The other four cards are handed to another volunteer. You leave the room and a colleague enters. The second volunteer reads out the cards you gave them and your colleague names the card that the first volunteer is holding.

How is this done (post your ideas)?

If you want to know, click the "interesting" box at the bottom of this blog and, if there is enough interest, I'll point you to the original blog and the answer (unless somebody already knows, of course).

Friday, September 18, 2009

Claude Shannon, Edward Thorp, Roulette and Blackjack

I guess many people have heard of Claude Shannon (information theory/entropy).

Perhaps not as many people have also heard of Edward Thorpe? I have known of his work for many years as he was chiefly responsible for making the gambling industry change the rules of blackjack1,2. Not only did he develop something called basic strategy (the best strategy to minimise the house edge) but he also developed card counting (keeping track of certain cards to maximise the chances of winning). Due to Thorp's work (and also earlier work by Baldwin et al.3) casinos started using more than one deck and shuffling before the end of the shoe (the implement used to hold the cards) so as to minimise the effect of card counters (card counting is not actually illegal, but casinos don't like it and can ask you to leave).

So, I had known of the work of both Shannon and Thorp but I never realised that they had worked together on roulette. I found this gem in a book I am currently reading4. They worked together on a device to predict what segment of the wheel the ball would land in. I'm not sure of the outcome of their work yet, as I am still reading the book. But, the point is, I had never associated these two scientists as working together; which I found interesting.


References
1: Thorp E.O. (1961) A Favorable Strategy for Twenty-One. Proceedings of the National Academy of Sciences, 47:110-112

2: Thorp E.O. (1966) Beat the Dealer: A Winning Strategy for the Game of Twenty-One. New York: Random House (revised edition of 1962 book)

3: Baldwin R.R., Cantey W.E., Maisel H. and McDermott J.P. (1956) The Optimum Strategy in Blackjack. Journal of the American Statistical Association, 51:429-439

4: Poundstone W. (2005) Fortune's Formula: The Untold Story of the Scientific Betting System that Beat the Casinos and Well Street, Hill and Wang

Saturday, September 12, 2009

P vs NP

I have just read an excellent article in the latest issue of Communications of the ACM.

Fortnow L. (2009) The Status of the P versus NP Problem Communications of the ACM, 52(9):78-86 (doi:10.1145/1562164.1562186)

The article not only describes what the P=NP problem is (and it is one of the best non-technical descriptions I have seen) but also provides some of the research directions that might lead to a proof that P=NP (or, more likely (in my view - and many others), that P≠NP).

If you are any sort of computer scientist, then I would urge you to take a look at this article.

You can stay up to date with this area on the author's blog (http://blog.computationalcomplexity.org)

Friday, September 11, 2009

The 2009 IEEE Symposium on Computational Intelligence and Games: Report

I have just spent the last week at the 2009 IEEE Symposium on Computation Intelligence and Games in Milan (see last post).

It has been an excellent week both from a scientific point of view and from a networking point of view (I have even added a few new facebook friends as a result of the symposium).

There really is some fantastic work going on around the world. The plenaries and tutorials showed just some of this. The work being done by people such as Ken Stanley, Michael Mateas, Yngvi Björnsson, David Stern (Microsoft Research), Stefano Lecchi (Milestone) and James Vaccaro is very impressive and the products they produce have the commercial sector (some of which work in this area anyway) really taking notice.

Of course, there is other excellent work being done, by a great many other people, and to list them all would mean listing most of the presentations given at the conference. So, although I have highlighted just a few presentations, it does not detract from all the other work being done in this area. Indeed, the newly established IEEE Transactions journal in this area is a testament to how bouyant the area is.

Another highlight at CIG'09 was the competitions. This year they really seem to have come of age. In previous years (at both CIG and other conference) the competitions were very popular, and well received, but it just has a slightly different feel this time around.

At this year's conference there were four competitions:
I will try and blog about each one in the future.

We hope that the presentations from the symposium will be available to view soon. They were all recorded but there are some copyright issues to resolve.

The symposium will be run again in 2010. The venue has just about been decided but it needs to be rubber stamped so I cannot say on a public forum where it will be.

Sunday, September 6, 2009

The 2009 IEEE Symposium on Computational Intelligence and Games

I have just arrived in Milan for the 2009 CIG (Computational Intelligence and Games) conference. This was a conference that Simon Lucas and I started in 2005. Simon is now editor-in-chief of the IEEE Transactions on Computational Intelligence and Artificial Intelligence in Games. The 2005 conference, I believe, was partly responsible for paving the way to enabling this journal to be established. I am fortunate enough to serve as an Associate Editor for the journal.

As I said above, the first conference (actually it's a symposium, as the correct title is The IEEE Symposium on Computational Intelligence and Games) was held in 2005 in Essex (UK). The original plan was to hold the symposium every two years, but it was so successful that we decided to hold it again in 2006. This time, Sushil Louis and I chaired it. It was held at the University of Reno in Nevada.

In 2007 it was held in Hawaii, as part of the then newly formed Computational Intelligence Symposium Series of conferences (CISS). Simon, again chaired this symposium, along with Sung-Bae Cho and Alan Blair. Actually, at the 2007 CISS, I chaired the associated scheduling conference (Computational Intelligence in Scheduling (CISched).

In 2008, CIG was held in Perth, Australia, chaired by my good friends Luigi Barone and Phil Hingston.

The sympsium has now moved to Milan (chaired by Pier Luca Lanzi). It has certainly done the rounds (Essex, Reno, Hawaii, Perth and Milan) and, having been to all of them, I know from first hand experience that it is going from strength to strength.

Looking at this years program it promises to be a very interesting week. If you have an interest in computational intelligence or games, take a look at http://www.ieee-cig.org/.

Friday, August 28, 2009

Random Number Generation

We all know that computer generated random numbers are not really random at all, but just pseudo-random. And I know that is a lot of discussion about how best to generate random numbers.

To be honest, I have not taken much notice of this in all the programming that I have done but today I found out that it is important.

Without going into the (boring) details, I have been generating a random number that is either 1 or 2 - but I generate quite a few very quickly.

For anybody that is interested the code is something like

r = rand() % n +1 [where n is the range you are interested in]

The problem is, I was returning the same number every time. Only very occasionally did I get the odd difference.

If I increased the range (say generating numbers between 1 and 20), this made no difference (I always got the same number).

However, if I put in a delay between each call to rand(), this "solved" the problem, but this is not a good solution.

Doing some reading around the subject, I think it is to do with the low/high order bits when generating the random numbers.

But that does not help me generate a decent distribution when making very frequent calls to rand().

I seem to recall that there is a C++ class called RNG (Random Number Generator) available and I am sure that Numerical Recipes in C will have something to say on the issue, but I need to look into these.

Wednesday, August 26, 2009

Football Prediction: A decision to be made

Today I have been working on my research that is investigating if it is possible to predict the outcome of football matches. The measure I will eventually use, to see if the predictions can be considered successful, will include if it can make money at the bookmakers, if it it more successful than other tipsters etc.

One of the functions I have in my system is to be able to generate the league table for a given date. That is, taking into account the fixtures played to date, generate the league table for any point in the season.

I believe that my function is working correctly and today I was carrying out some tests to see if the league tables I generated were:
  1. Correct at the end of the season. That is, taking into account every match played, is my input data correct and does my algorithm process that data correctly.
  2. Does my algorithm, given a date, generate the correct table for that point in the season.
I initially thought that point 2 would be very time consuming to check but I found a very useful web site. http://www.statto.com is not only a very useful web site (for all sorts of reasons) but one of the facilities it offers is to generate a league table for a particular point in the season.

When doing my checks (and there are still a lot more to do), I have found some problems where my generated tables are not correct. This is almost certainly down to my inputting the results incorrectly, so I need to check all those.

However, my checks also highlighted another problem. Actually, I knew this was something that I needed to address but I had not really thought it through.

The problem arises when teams having points taken away. I knew that this happened and I had yet to include it in my system so I was expecting the tables not to match exactly.

However, I had assumed that the points were deducted at the start of the season but this does not seem to be the case. It appears that the points can be deducted at any time in the season.
This is not too much of an issue. It just makes the programming more complex than I had hoped.

The real issue is what do I do when a team has had points deducted?

Let me give you an example. A team has won 3 games and drawn 2. That means that they have received 11 points (you get 3 points for a win and 1 point or a draw). But, if they have had 10 points deducted then they will only have 1 point. This obviously affects their league position. If I am using the league position as one of the contributory factors in my predictive model, is this fair - or should I ignore the points deduction for the purposes of prediction?
On the other hand, their league position, with the points deduction, may affect the way they play, and could be a factor in the prediction.

I'm not quite sure what I am going to do yet.