Research Reflections: Sports Scheduling

Showing posts with label Sports Scheduling. Show all posts

Tuesday, August 24, 2010

PATAT 2010: Multi-objective Sports (Football) Scheduling

At the recent PATAT (8th International Conference on the Practice and Theory of Automated Timetabling) conference I was fortunate enough to be invited to give a plenary presentation.

My talk focussed on sports scheduling. Indeed, the title was "Scheduling Football (Soccer) Fixtures: Progress Made to Date and Future Challenges". I focussed on the conflicting objectives when trying to minimise travel distances, whilst also trying to reduce pair clashes (which can be considered as local derbies for the sake of this discussion).

This is a classic case of a multi-objective problem where minimising one objective causes the other to increase and vice versa. It is not (usually) possible to minimise both objectives, instead you are looking for a trade off. These are plotted on a pareto front where a user would then decide which trade off solution is the best.

In the plenary talk, I showed that it was possible to reduce both the distance and the pair clashes such that (sometimes) the distance did not significantly increase. This is a potentially useful result as it means that supporters do not have to travel any further (statistically) and the policing costs are reduced as they do not have to police so many local derbies.

I should say that this result is only work in progress at the moment in that it has not been verified by the football authorites or the police, but I would hope that it would be of interest to them.

In the same talk, I also discussed how I collected the data for this work (which essentially is the various distances between football clubs). This involved using Google maps and Multimap APIs. I'll talk about this in the next blog. I'll also provide a link to the paper (as I don't have it to hand at the moment). But, if you are interested the reference is:

Kendall G., McCollum B., Cruz F. and McMullan P. (2010) Scheduling English Football Fixtures: Consideration of Two Conflicting Objectives. In proceedings of the 8th International Conference on the Practice and Theory of Automated Timetabling (PATAT 2010), 11-13 August 2010, Queen’s University, Belfast, UK, pp 1-15

Tuesday, November 17, 2009

Geocoding: Trials and Tribulations

For the past few months I have had a small project on the back burner to try and make geocoding easier.

The motivation can be traced back to the data collection I had to do for the data I needed for my research on minimising the amount of travelling that football supporters have to do over the Christmas holiday period (if you are interested in this see this link).

When I collected the original data I used greenflag.com. I could have used theaa.com or rac.co.uk. The problem with the latter two is that you had to type in the to/from postcode into a web form. At least with greenflag.com the postcodes were part of the URL so it was a simple case of generating the correct URLs (which I did with Excel), uploading to a web site and then clicking on each link.

But this was a long winded process in that I had to a) click on every to/from link (about 900) and then scroll the screen to collect the actual mileage. That takes a long time and as I have done this for seven seasons I thought it was about time to try and automate the process.

The obvious candidate was to use Google maps. When I started the sports scheduling research Google maps did not provide the facilities that I required. Things have moved on since then and Google maps now has an API which makes this type of automation a possibility.

So, over the last few months I have been investigating how to use Google maps, which does not come without its problems. For example, it is not that accurate when using UK postcodes, you have to learn the API, ideally you need to geocode UK postcodes to longitude and latitude etc.

I think that I have now resolved most of these problems and am just doing the last stages of test. So, more soon.

Monday, August 24, 2009

Football Fixture Scheduling: Are all clashes equal?

In a paper published in JORS:

Kendall G. (2008) Scheduling English Football Fixtures Over Holiday Periods. Journal of the Operational Research Society, 59(6), pages 743-755 (doi:10.1057/palgrave.jors.2602382)

I investigated if it was possible to produce superior fixtures for the Christmas/New Year Period with respect to minimising the distance that is traveled by supporters on two particular days.

One of the issues that the underlying model had to capture was that certain teams could not play at home on the same day. For example:

Manchester United and Manchester City
Liverpool, Everton and Tranmere
Chelsea and Fulham
Colchester, Ipswich and Norwich
etc.

Teams that are paired in this way are called pairs (even though there might be more than two teams involved). A problem arises as it is not possible to eliminate all the pair clashes. That is, some paired teams have to play at home on the same day.

In the JORS paper, all the pair clashes were treated equally (e.g. Manchester United and Manchester City playing at home on the same day is not considered any more, or any less, important than Liverpool and Tranmere playing at home on the same day).

I am not sure if any given pair clash should be considered more (or less) important than any other, but I suspect so.

One of the things I plan to do is analyse the past few seasons fixtures and gauge if certain pair clashes are allowed more than others. Then I will use this evidence to weight the clashes during the process of searching for a good set of fixtures.

Wednesday, August 19, 2009

Football Fixture Scheduling: Another Project

Over the past nine months or so I have been working with a local company who provides an online service to schedule various types of sporting events. They contacted me after seeing my contact details on wikipedia (which was a nice way to be contacted).
The company need to produce double round robin tournaments but they have some factors which are not present in the major English leagues. For example

Teams might share pitches so those two teams cannot play at home on the same day.
There are a number of different leagues and the teams in different leagues might share pitches (see above) but the leagues play in different timeslots. That is, pitches are available in certain timeslots and teams can play in certain timeslots and these are not consistent across divisions.
We want to provide a good spread of games such that repeat fixtures (team i vs team j and team j vs team i) are as far apart as possible.

There followed a series of meetings (at 7:30am for reasons best known to ourselves), which discussed the problem so that we both fully understand the task at hand.

At the moment, I am trying to develop a constructive heuristic and then carry out some additional optimisation with simulated annealing.

It has proven to be a very interesting problem and there is still a lot that I want to do, but I have just about got a constructive heuristic that works (in that it produces a feasible solution).

My focus over the coming week or so (as I have some time to work on the problem) is the following.

Investigate if I can produce a more efficient constructive heuristic. At the moment, it takes many iterations to get a feasible solution and I'd like to do it much more quickly.
Run the local search optimisation (in this case simulated annealing) which will take the feasible solution from the constructive heusristic and try to improve on it. In addition, it will also attempt to spread out the repeat fixtures.
Once I have something that seems worthwhile I want to document the algorithm. This is for the purposes of a potential paper (the holy grail in our world!) and also to discuss the approach with the company.

So, the next couple of weeks should be very interesting (well, at least for me!).

Sunday, August 9, 2009

MISTA Conference: Almost There

The MISTA conference is almost upon us.

It was an early start this morning (3am) in order to get to Dublin on the 06:35 flight out of East Midlands Airport. We were actually in the hotel by 09:00 and, thankfully, they had rooms ready so it was not too bad.

We spent the day getting things ready, as far as we could. The real work will start tomorrow and it looks like being a long day. I think we'll open the registration desk at 07:30 and we'll return from the Guinness Storehouse at around midnight.

In between that, we have a Plenary Talk by Moshe Dror ("‘Strong’-‘Weak’ Precedence in Scheduling: Extended Order Implications"), followed by 36 papers, split into nine sessions (the full program can be downloaded from here).

For me (and this is a personal viewpoint; not talking as the conference chair) the highlight is the Sports Scheduling session as this is a particular interest of mine, as you'll see from my previous blog postings. The papers in this session are:

Mathematical Modeling for Maximising Gate Receipt Problem, Abdul-Hamid N.H., Kendall G. and Sagir M.
A Heuristic for Minimizing Weighted Carry-Over Effects in Round Robin Tournaments, Guedes A.C.B. and Ribeiro C.C.
Soccer Schedules in Europe: An Overview, Goossens D.R. and Spieksma F.C.R.
Round-Robin Sports Scheduling from a Graph Colouring Perspective: A Case Study in Rugby Union Scheduling, Lewis R. and Thompson J.

... but there are many other excellent papers also being presented throughout the day and your preferences will depend largely on your research interests.

Monday, July 20, 2009

Predicting the Results of Football Matches

I have recently become interested in trying to predict the results of football matches.

The interest grew from wondering what else I could use the data for, that I had collected for generating football fixtures (see JORS paper). The data included the travel distances between all the teams in each division. I also maintained the fixtures that were actually played over the Christmas/New Year period so that I could compare the fixtures I generated against those that were actually played. Needless to say, it took a long time to collect all this data.

As I had gone to all the time and trouble of collecting the data I want to maximise its usage, so I began to look for other uses that I could put it to, and prediction seemed an obvious challenge.

With this in mind, for most of last season I collected additional data that I think might be important in predicting football matches. For example, I have been updating all the scores as each fixture was played. I have also been keeping a record of the odds that bookmakers were offering. Just collecting this data was a large data collection exercise in itself and I certainly did not get the odds on every fixture, but I have enough to be going on with.

I'm not totally sure what I am going to do with this data yet but I know if I don't collect some of it as and when it is available, it becomes almost impossible (or at least a lot more difficult) to collect.

I have started programming some support functions. For example, given a date and a set of results I can generate the relevant league table for that point in the season.

My ultimate goal is to develop a prediction model and test out how good it is on the 2008-2009 season and, if I find a good model I will try and predict the fixtures for the 2009-2010 season before the matches are actually played.

The new season starts quite soon (kick off is 8th August 2009, but there is one match scheduled on the 7th August 2009).

If I am going to have ever chance of testing the prediction model over the coming season, I need to get programming!

Thursday, June 18, 2009

Scheduling Football Fixtures

The fixtures for the English football season have just been published (Wednesday 17th June 2009), ready for the kick off in August. Paul Fletcher's (another blogger) excellent article describes some of the complexities that have to be considered when generating the fixtures.

My research has focused on scheduling the fixtures over the Christmas/New Year period. On Boxing Day and New Years Day every team has to play. If a team plays at home on Boxing Day, they must play away on New Years Day (and vice versa), but they are not allowed to play a reverse fixture (for example, if Chelsea play Liverpool on Boxing Day, Liverpool cannot play Chelsea on New Years Day). There are also other factors which have to be considered (such as the number of fixtures that can take place in London etc.)

One of the most important aspects for the fixtures on Boxing Day and New Years Day fixture is to reduce the amount of traveling for the supporters. If you look at these fixtures for (at least) the past seven seasons you will see that these two days have the lowest travel distances when compared to other dates when all the teams play.

This was the focus of a paper I published in Journal of the Operational Research Society (you can access the paper here, or from the journal itself). The paper shows that it is possible to produce a set of fixtures which involves less travel than the fixtures that were actually used.

There are a couple of caveats.

I have tried to make sure that I have captured all the aspects of the real world problem, but I cannot be totally sure. For example, I consider all pair clashes (see the paper if you are interested in what a pair clash is) as being equal. Perhaps, some should be avoided more than others?
There are some seasons (typically World Cup years) where, over the Christmas and New Year period, every team has to play four matches and they must be sequenced Home, Away, Home, Away (or Away, Home, Away, Home). This makes the scheduling problem a lot more difficult, especially when you still attempting to minimise the travel distances. I still need to address this aspect of the problem.

The paper referred to above only considers four seasons. I have other work in progress which not only uses seven seasons of data but also uses more sophisticated search techniques. Not only am I able to significantly speed up the search (from about 30 hours to a matter of minutes) but I also managed to improve on the results reported in the JORS paper. I'll report more if/when the paper is published.

In the meantime, my attention is turning towards the 2009-2010 season. One of the biggest tasks each year is to collect the data representing distances between each football club. For the past seven years I have used http://www.greenflag.com. It is a time consuming task as I have to manually collect the data for each pair of teams. This year I am looking at exploiting google maps. They have an API (Application Programming Interface) which I hope will enable me to automate this data collection task. I will report on the success (or failure) of this approach soon.

Research Reflections