PredictGuru

Monday, February 20, 2006

Automatic news ranking

Ranking is one of the most important problems in machine learning these days. All the search engines including google uses ranking for sorting the web pages according to the relevance to the entered query. It has been shown that using machine learning for ranking provides satisfactory results. (No idea if google uses machine learning also to get the page rank.) The idea is simple, a user manually ranks a set of documents (movies, webpages anything). Using this as reference, we train a function, which, given a new document, identifies the correct rank of the document in a set of documents. (Based on movie name, actors, director, contents if it is webpage...).
Ranking can also be used in many other areas. Here is a list

Email Ranking: Lets assume you are a person with lots of contacts. Or you are very famous and get 100s of legitimate emails everyday. Some mails require your immediate attention and some mails are not so important Machine learning can be used here to order the mails for you based on your past behaviour. So all the important mails will be at the top and the not so important mails will be at the bottom. If you do not have a spam filter all spams will naturally fall at the bottom with this approach. An implementation would be a plugin for thunderbird which puts any new email in its rightful place.
News Ranking: Consider news sites like slashdot and digg. The problem with slashdot is that a editor has to read through thousands of postings, find good stories edit it and post it. And this takes a few hours, the problem is similar in case of digg. It takes a few hours for a story to get enough diggs to push it to the front page. A story may be stale by the time it makes it to front pages. Now consider an algorithm that learns from the previous stories that have made it to the front page and automatically decides if a new story is front page material. If the algorithm is good, we get a near real time appearance on the front page. A simple implementation would be to get RSS feeds of the story from digg, rank it and if the rank is good enough post it on your site.
Blog ranking on blogspot, Photo ranking on flickr if you think of something leave a comment.

posted by Karthik K # 6:47 AM 1 Comments

Thursday, February 16, 2006

Of Flying Cars

There have been so many attempts to build small cars that can fly from short runways. The latest is an attempt by MIT guys which may or may not be successful. The problem may be our limited imagination. No human alive can conjure up a design in his/her head and predict if the machine is going to fly. To get around the problem I suggest we use a variant of genetic algorithm and CFD analysis tool to obtain a design. Here is an algorithm.

Convert an airplane outer body shape into a vector. This can be done by breaking the airplane into millions of small volumes or however you want. Ideally any vector imaginable should define a structure.
Create a base set (population) of vectors from available plane designs and probably birds.
Write code to convert the vectors back into a shape that can be simulated in a CFD (Computational Fluid Dynamics) solver like CFX or OpenFoam.
Define a survivability criterion for each design. This can be based on criteria like lift provided by the design, Volume, Surface area, stability. These data can be obtained by analysis of the output of a CFD solver.
Now in a loop, until a good design is found.

Select two designs from the population. Decide a crossover point break the vectors of the 2 designs at the crossover point and rejoin to the half vector from the other design,(crossover)
Perform mutations with some probability.
Determine if both the new vectors are valid design. (Plane with holes in wings.. Whatever u can think of that is obvious it not going to work.)
Run CFD analysis on the designs and determine the survivability.
If the designs are good (survivability > cutoff) add designs to the population and repeat loop. Else discard design and repeat loop.

Now a CFD analysis on a simple car takes a few days to complete. So to perform CFD on the millions of possible designs will take ages. So we should think of ideas from machine learning to minimize repeated effort, Also large scale distributed processing like SETI can be probably thought of. Also I coding for any of the above steps will be a project in itself. May be the idea itself is not practical. But its fun to think about it.

posted by Karthik K # 8:47 AM 1 Comments

PredictGuru

Monday, February 20, 2006

Automatic news ranking

Thursday, February 16, 2006

Of Flying Cars

Archives