PredictGuru

Saturday, October 23, 2010

Page Rank kind of ranking for test cricket teams

The official ranking for test teams in ICC is quite complicated http://en.wikipedia.org/wiki/ICC_Test_Championship#Test_championship_calculations . It seems a bit confusing with all the arbitrary point calculations. (40 points, 90 points etc). And although it tries to give more points to a weaker team that beats a stronger team, the effect is not uniform. The scoring scheme can be made more elegant by a simple page rank kind of algorithm.

The idea is, if India beats Australia in a series 2-1, Indias score would be (score-of-australia)*2/3 and australias score would be (score-of-india)*1/3. At the end of a given time period, lets say india beat aus 2-0, SA 3-1 and lost to bangladesh 1-0. Indias score would be:
india-score = k(india-score + 2/2*aus_score + 3/4*sa_score +0*bangladesh_score)
similarly for bangladesh, bang-score = k(bang-score + 1*india-score + other series)

where k is a constant.

This leads to a linear set of equations which can be solved by using the eigen-value decomposition. Each eigen vector is a solution, which in this case becomes a possible team score. We can take the vector that corresponds to the maximum eigen value.

The data for all the test series played until now is available from cricinfo at http://stats.cricinfo.com/ci/content/records/335431.html. I used a slightly modified version of the above scoring scheme, The score I used was (number of wins)/(total-matches + 1) . Just to score more for a comprehensive 3-0 win (3/(3+1) = 0.75) v/s a 1-0 win (0.5) The scores for the last 3 years, that is 2007 and beyond are

"India" 0.5994906420666286
"Australia" 0.48114889162515917

"South Africa" 0.46906326930865216
"England" 0.27645475105538375,
"Sri Lanka" 0.29340243047144315
"West Indies" 0.11112461214397185
"Pakistan" 0.09815656282323282
"New Zealand" 0.05997455303303149
"Bangladesh" 0.03152476327152151,

If scored this way, Australia still seem to have a higher ranking as compared to the ICC ranking where they are number currently number 5. I also tried plotting the scores for the last 130 years of test cricket. Here are the results.

As expected, India is rising now whereas Australia is falling, WI dominated a decade after 75. Surprising thing is England started both the world wars winning against Australia and immediately after the wars, Australia was on top.

Here is the clojure code for doing all this, jblas is needed for eigen decomposition and jfreechart for plotting.

posted by Karthik K # 4:26 AM 1 Comments

Monday, April 26, 2010

Data says no Warming of Bangalore in the last 37 years.

It is a constant complaint that you hear from people these days. "Bangalore has grown warmer over the years." or this year is the warmest over the years. And the blame inevitably is on global warming. But how much of this is real? What is the magnitude of the temperature increase? Was this year warmer than the one before? . I downloaded data from 1973 to 2009 to see what happened and I have posted it here. The data can be downloaded from . Looks like the data is fairly accurate, but I have not tested it against any other sources.

The graph on the left is the temperature data for the last 37 years, averaged over 3 months (x axis is the number of days since 1973 Mar 1). You can see the yearly rise and fall of the temperature over the year. It may not entirely be clear, but the graph shows that there is no huge (> 1 degree) increase in the temperature of Bangalore. Also, as the media tend to assume, Bangalore does not seem to have huge variance in the temperature over the years.

We can make it more clear by taking a longer term average of the temperature of Bangalore.

When averaged over 5000 days, looks like the average temperature has only slightly increased from 23.57C to 23.8C. Hardly noticeable by humans.

In effect the overall increase in Bangalore temperature is very small and may not be a proof of global warming as we keep hearing.

posted by Karthik K # 9:22 AM 1 Comments

Saturday, March 13, 2010

Mobile meter for auto rickshaws

Would be nice to have an app for location aware mobile phones that you can use while traveling in Autos / Taxis in Bengalooru or elsewhere. The app should track the distance and waiting times and calculate the exact amount for the trip. With mobile phones connected to the net, the per kilometer fares can be downloaded from a common server, which maintains the rates for lots of places. May be there is an app already out there.. but too lazy to check.

posted by Karthik K # 9:27 PM 0 Comments

Wednesday, September 10, 2008

Flying bridges for bangalore traffic?

Can bridges suspended by helium / hydrogen / hot air filled balloons provide an easy alternative to flyovers in Bangalore?

Some of the advantages:
1. No need of any space on the ground for building pillars. Only need pillars where the vehicles can climb onto the bridge.

2. Bridges can be moved based on the traffic. Move all the balloons (air ships) to a new position.

Challenges :

1. Amount of Helium needed: At 1000ltrs to lift a kg, and having a limit of 8 tonnes weight for each balloon, they need to be filled by 8 million liters of helium.. Which would make each of the balloons to be about 100 mtr by 10 mtr dia cylinders.
2. Handling monsoon winds.

posted by Karthik K # 8:45 AM 1 Comments

Saturday, November 25, 2006

Product review tracking from motiflabs

A prototype of the Product Buzz tool is online.

posted by Karthik K # 7:34 AM 0 Comments

Monday, February 20, 2006

Automatic news ranking

Ranking is one of the most important problems in machine learning these days. All the search engines including google uses ranking for sorting the web pages according to the relevance to the entered query. It has been shown that using machine learning for ranking provides satisfactory results. (No idea if google uses machine learning also to get the page rank.) The idea is simple, a user manually ranks a set of documents (movies, webpages anything). Using this as reference, we train a function, which, given a new document, identifies the correct rank of the document in a set of documents. (Based on movie name, actors, director, contents if it is webpage...).
Ranking can also be used in many other areas. Here is a list

Email Ranking: Lets assume you are a person with lots of contacts. Or you are very famous and get 100s of legitimate emails everyday. Some mails require your immediate attention and some mails are not so important Machine learning can be used here to order the mails for you based on your past behaviour. So all the important mails will be at the top and the not so important mails will be at the bottom. If you do not have a spam filter all spams will naturally fall at the bottom with this approach. An implementation would be a plugin for thunderbird which puts any new email in its rightful place.
News Ranking: Consider news sites like slashdot and digg. The problem with slashdot is that a editor has to read through thousands of postings, find good stories edit it and post it. And this takes a few hours, the problem is similar in case of digg. It takes a few hours for a story to get enough diggs to push it to the front page. A story may be stale by the time it makes it to front pages. Now consider an algorithm that learns from the previous stories that have made it to the front page and automatically decides if a new story is front page material. If the algorithm is good, we get a near real time appearance on the front page. A simple implementation would be to get RSS feeds of the story from digg, rank it and if the rank is good enough post it on your site.
Blog ranking on blogspot, Photo ranking on flickr if you think of something leave a comment.

posted by Karthik K # 6:47 AM 1 Comments

Thursday, February 16, 2006

Of Flying Cars

There have been so many attempts to build small cars that can fly from short runways. The latest is an attempt by MIT guys which may or may not be successful. The problem may be our limited imagination. No human alive can conjure up a design in his/her head and predict if the machine is going to fly. To get around the problem I suggest we use a variant of genetic algorithm and CFD analysis tool to obtain a design. Here is an algorithm.

Convert an airplane outer body shape into a vector. This can be done by breaking the airplane into millions of small volumes or however you want. Ideally any vector imaginable should define a structure.
Create a base set (population) of vectors from available plane designs and probably birds.
Write code to convert the vectors back into a shape that can be simulated in a CFD (Computational Fluid Dynamics) solver like CFX or OpenFoam.
Define a survivability criterion for each design. This can be based on criteria like lift provided by the design, Volume, Surface area, stability. These data can be obtained by analysis of the output of a CFD solver.
Now in a loop, until a good design is found.

Select two designs from the population. Decide a crossover point break the vectors of the 2 designs at the crossover point and rejoin to the half vector from the other design,(crossover)
Perform mutations with some probability.
Determine if both the new vectors are valid design. (Plane with holes in wings.. Whatever u can think of that is obvious it not going to work.)
Run CFD analysis on the designs and determine the survivability.
If the designs are good (survivability > cutoff) add designs to the population and repeat loop. Else discard design and repeat loop.

Now a CFD analysis on a simple car takes a few days to complete. So to perform CFD on the millions of possible designs will take ages. So we should think of ideas from machine learning to minimize repeated effort, Also large scale distributed processing like SETI can be probably thought of. Also I coding for any of the above steps will be a project in itself. May be the idea itself is not practical. But its fun to think about it.

posted by Karthik K # 8:47 AM 1 Comments