Wednesday, September 10, 2008

Flying bridges for bangalore traffic?



Can bridges suspended by helium / hydrogen / hot air filled balloons provide an easy alternative to flyovers in Bangalore?

Some of the advantages:
1. No need of any space on the ground for building pillars. Only need pillars where the vehicles can climb onto the bridge.

2. Bridges can be moved based on the traffic. Move all the balloons (air ships) to a new position.

Challenges :

1. Amount of Helium needed: At 1000ltrs to lift a kg, and having a limit of 8 tonnes weight for each balloon, they need to be filled by 8 million liters of helium.. Which would make each of the balloons to be about 100 mtr by 10 mtr dia cylinders.
2. Handling monsoon winds.

Saturday, November 25, 2006

Product review tracking from motiflabs

A prototype of the Product Buzz tool is online.

Monday, February 20, 2006

Automatic news ranking

Ranking is one of the most important problems in machine learning these days. All the search engines including google uses ranking for sorting the web pages according to the relevance to the entered query. It has been shown that using machine learning for ranking provides satisfactory results. (No idea if google uses machine learning also to get the page rank.) The idea is simple, a user manually ranks a set of documents (movies, webpages anything). Using this as reference, we train a function, which, given a new document, identifies the correct rank of the document in a set of documents. (Based on movie name, actors, director, contents if it is webpage...).
Ranking can also be used in many other areas. Here is a list
  1. Email Ranking: Lets assume you are a person with lots of contacts. Or you are very famous and get 100s of legitimate emails everyday. Some mails require your immediate attention and some mails are not so important Machine learning can be used here to order the mails for you based on your past behaviour. So all the important mails will be at the top and the not so important mails will be at the bottom. If you do not have a spam filter all spams will naturally fall at the bottom with this approach. An implementation would be a plugin for thunderbird which puts any new email in its rightful place.
  2. News Ranking: Consider news sites like slashdot and digg. The problem with slashdot is that a editor has to read through thousands of postings, find good stories edit it and post it. And this takes a few hours, the problem is similar in case of digg. It takes a few hours for a story to get enough diggs to push it to the front page. A story may be stale by the time it makes it to front pages. Now consider an algorithm that learns from the previous stories that have made it to the front page and automatically decides if a new story is front page material. If the algorithm is good, we get a near real time appearance on the front page. A simple implementation would be to get RSS feeds of the story from digg, rank it and if the rank is good enough post it on your site.
  3. Blog ranking on blogspot, Photo ranking on flickr if you think of something leave a comment.

Thursday, February 16, 2006

Of Flying Cars

There have been so many attempts to build small cars that can fly from short runways. The latest is an attempt by MIT guys which may or may not be successful. The problem may be our limited imagination. No human alive can conjure up a design in his/her head and predict if the machine is going to fly. To get around the problem I suggest we use a variant of genetic algorithm and CFD analysis tool to obtain a design. Here is an algorithm.
  1. Convert an airplane outer body shape into a vector. This can be done by breaking the airplane into millions of small volumes or however you want. Ideally any vector imaginable should define a structure.
  2. Create a base set (population) of vectors from available plane designs and probably birds.
  3. Write code to convert the vectors back into a shape that can be simulated in a CFD (Computational Fluid Dynamics) solver like CFX or OpenFoam.
  4. Define a survivability criterion for each design. This can be based on criteria like lift provided by the design, Volume, Surface area, stability. These data can be obtained by analysis of the output of a CFD solver.
  5. Now in a loop, until a good design is found.
    1. Select two designs from the population. Decide a crossover point break the vectors of the 2 designs at the crossover point and rejoin to the half vector from the other design,(crossover)
    2. Perform mutations with some probability.
    3. Determine if both the new vectors are valid design. (Plane with holes in wings.. Whatever u can think of that is obvious it not going to work.)
    4. Run CFD analysis on the designs and determine the survivability.
    5. If the designs are good (survivability > cutoff) add designs to the population and repeat loop. Else discard design and repeat loop.
Now a CFD analysis on a simple car takes a few days to complete. So to perform CFD on the millions of possible designs will take ages. So we should think of ideas from machine learning to minimize repeated effort, Also large scale distributed processing like SETI can be probably thought of. Also I coding for any of the above steps will be a project in itself. May be the idea itself is not practical. But its fun to think about it.


Sunday, January 15, 2006

Natural Language Processing

Natural Language Processing has turned out to be a very difficult challenge. One of the reasons is that the way language has evolved. An advantage the human brain has is the availability of thousands of classifiers (read as neurons) making decisions. We will understand a sentence written in natural language probably only when the output of these agrees. Some of the neurons possibly are trying to see if a sentence makes sense in the currently understood form (Context information). However these facilities are currently unavailable to a computer.
May be we can use the procedures of statistical machine learning here. Here is the details of a system I wish to suggest.
  1. As everyone understands, English language (all languages apart from Sanskrit possibly) has some disadvantages like, a word can mean different things in different cases (for example table in table of contents and a table as in a dining table). Multiple words can be used to represent the same meaning (synonyms). A set of words representing a concept (rear view mirror). Etc. One way to simplify the mess is to use an intermediate language. Some of the properties of this intermediate language should be:
    1. One word, one meaning.
    2. One meaning one word. (All synonyms are represented by one word).
    3. All set of words that mean something are condensed into one word. A word in the intermediate language is a concept and not necessarily an English word. But it makes sense to make it as close to English as possible to reduce the effort. After all we will be using English most of the times)
  2. Such an intermediate language can now be converted into any form of representation. One such form is an "association graph of concepts". Here every noun (or a noun with adjectives) forms a node and verbs form the edges. Example "Ram is a good cat". Translates to a node for Ram, A node for "good cat" and a node for cat. There is an "is" edge from "Ram" to "Good Cat" and another "is" relation from "good cat" to "cat". (Guess this can be added by default). There can be a base graph already built after reading millions of documents say, to get the base knowledge.
  3. Once such a graph is built, the same graph can be used for word disambiguation as it would have a strong set of links for related concepts.
  4. The graph can form a common base for all human languages with "intermediate to particular language" translator built on top of it. So machine translation can be made to work in this way.
  5. Once the computer starts reading a new document, the concepts of the document will be close to a particular part of the graph. These ideas can be possibly used for text summarization and sentiment analysis.
Anyway this is still in the idea stage.. If we ever make any progress on such a system, I will post it here.

Friday, September 23, 2005

Statistical Machine Learning

We spent the whole of last month discussing statistical bounds for function classes. The equations kept getting horrible, makes me wonder if we are trying to fix the wrong problem. As of now SVMs can do much better generalization than the human brain. (After all we as humans tend to have superstitions, and we can look at a superstition as a failure in generalizing). So where else can we improve.
What would be nice to have is a system that tries to explain the features logically. Maybe that is where inductive programming comes in, but I don't have much knowledge in that area.

Sunday, August 28, 2005

CAPTCHA

Anyone who has tried to create an account in any of the portals these days would have seen CAPTCHAs these are the wiggly words that google and yahoo use to keep out the bots. Generally CAPTCHA are anything that does that trick, not just a wiggly piece of text. http://www.captcha.net/
Would be an interesting thing to try and break these stuff.

Restart

One of the ideas currently doing rounds is research blogs. http://hunch.net/
is a good one for machine learning. They are also planning to make it a place where people put up links to papers they like and which they think are worthy compared to the other millions that gets published just for the record.

Google