PredictGuru

Thursday, May 19, 2005

Clustered File System

The suceess of desktop search tools will probably signal an end to the way data is organized on the disk. After installing google desktop search i found accessing my files through desktop search was way easier than windows explorer. Explorer like interface is probably on its way out, i guess we will see file systems in the future which automatically store the data in a more logical way based on the contents, and freeing up the users from managing their data in seperate folders.
If i were to implement such a file system (in linux), i would provide a single folder where the user can copy all thier files. Whenever a new file is copied, it is indexed and all related files are probably clustered, so for example a user request for a listing of files (ls) containg a word (say simulated annealing) will be witten as "ls simulated annealing" and the fs should provide all files with the seachword in them. It can also list all files with some related words too (ofcourse with a lower priority). Would also be nice if the command "acroread simulated annealing" will open all pdf files contaning the word simulated annealing in them.

posted by Karthik K # 5:50 AM 0 Comments

Monday, May 09, 2005

p8

Some students from MIT recently generated a paper using context free grammar which lead to some controversy, however the idea of machines generating a document based on some rules is pretty old. One interesting idea is http://www.eblong.com/zarf/markov/ where a makov chain is learnt based on the text from a big collection and the text is generated probabilistically. Some more details (and some more fun) can be found here: http://www.cs.princeton.edu/courses/archive/spr05/cos126/assignments/markov.html
It would be interesting to train the Markov chains over sentences too (this requires that each sentence be given a state based on the set of words in the sentence) also we can try out filtering a sentence based on a grammar and look at the result. The first part can be implemented in I guess half an hour. It will probably take a bit more thinking to get the second and third part working. Guess I can steal the implementation of the cfg from somewhere.

posted by Karthik K # 9:41 PM 0 Comments

Friday, May 06, 2005

p7

I was asked to think about some of the applications of handwriting recognition on a simputer, here are some of the areas where i as a student would love to use it.

1. It would be nice if i can take down all my notes on a simputer and transfer it to my computer every day, i would also prefer to read my notes on a simputer (Instead of a computer or my handwriting, it is always better to let silicon handle the deciphering of my handwriting rather than me worry about deciphering it at exam time). Also cut and paste features would be valuable to have when writing notes.

2. Also once we have recognition features, it needs to be studied if we can somehow use it to compress the handwritten data. i.e compression based on strokes.

3. If the simputer is targetted at say. a shopkeeper, things he would be interested in would be very different from me i guess. He would probably be interested in a way to track his customers who have bought things from him. It frequently happens that the customers generally pay the shopkeeper monthly instead of at the time of purchase. Tracking the purchases and calculating the exact amount at the payment time would be made much easier with a simputer, wherein th e shopkeeper can just notedown the items and the cost (as if on a piece of paper) and the simputer will find the right customer from the database and debit the amount against the customer.

4. Signature verification in banks. In most of the banks, money is still withdrawn through self drawn cheques, simputers can be used there..Also simputer when connected to the internet can be used for secure transfer of money with the help of signature capture and transmission.

5. simputers can also be used for quick sharing of designs over the internet. It can be used like the back of the envolope between people on phone in different parts of the world

posted by Karthik K # 10:39 PM 0 Comments

Tuesday, May 03, 2005

p6

Determining right foods for the right person.
Every person has a different need for food and getting the balance right is difficult. Can pattern recognition stuff be used to determine what is right considering that persons BP, Sugar, Calories reqquired, Carbohyderates.. Now the trick would be to not just provide a chart providing the amount of each permissable food. but to take in the details about the food that a person has eaten till now and suggest a suitable recipie for the next food item. These softwares are already available but are not effective and have an inbuilt set of food items. But a better system would crawl the net to pick up new recipies and calculate the amounts of each nutrient in the food and suggest a bunch of them appropriately.
The difficulty in implementing this is the enormous need for domain knowledge .

posted by Karthik K # 8:05 AM 0 Comments

Monday, May 02, 2005

p5

15 billion dollars is up for grabs. That is the amount that banks and other financial institutions are supposed to spend in this year on systems that can track dirty money, frauds and payments to terrorist organizations. These systems are mostly built on the concepts of anomaly detection. Anamoly detection is a concept that was probably first introduced to figure out network intrusion detection to seperate out newer attacks. Traditional systems were rule based, but they can be easily defeated by any hacker. So the solution was to model what is normal (i.e normal usage) probabilistically. And whenever a newer entry in the logs that is not similar to any existing entry appears, classify it as an anomaly for human experts to review. The similarity measures and the modelling techniques for such things are already available in the pattern recognition world.
Now this idea is also applied to all banking transactions, as and when a new transaction is seen (such as suddenly 110 crore Rs transferred to the account of a person who doesnt even have a pan number) or a person getting a salary of 2lack pa suddenly getting a cheque for 30 lakhs) an anamoly can be signalled and the tax (possibly corrupt) officials can be allowed to handle it in a suitable way.
Guess would be fun to implement it and monitor it especially if u have contacts with the abpve mentioned tax officials.

posted by Karthik K # 8:28 PM 1 Comments

p4

NadiShasthra is a form of astrology where u are supposed to give the either the hand impresssion or the thumb impression (i am not sure which some even take the kundali) . The astrologer matches it with a huge database of palm leaves and comes up with a matching palm leaves and reads out, and somehow most of the things he says about ur past is correct (although most of the things he says about your future doesnt happen. lets assume it does. it is more fun that way). Ok so hoe does he manage to generate such information about you in such a small time.
Before that where did those palm leaves come from.. One of the explaination is that those were assignment submissions from disciples of a particular guru who was an expert in these fields.. Apparantly the students were asked to predict the future births / incarnations of some of the people in their time. (And looks like they did a pretty good job).
So how does pattern matching come in here:
One simple idea (courtesy Bhargav) is to get the data base from the pundit and put it online.. People submit their hand prints in jpg format, we use pattern matching techniques to get the appropriate palm leaf and spell out their future. (All for a few ofcourse).

The other more intersting idea is that if there is a real correlation between the human hand and the human destiny, it should be more logical to use a computer and mathematics to study it rather than just the human brain. All one would have to prove or disprove astri=ology is get the hand prints of a million people, cluster it based on their hand prints and see if some criterias like the area of work, ...(basically whatever an astrologer tries to predict) falls in the same clusters more often than random. A simple chi square test would be enough.

Also if the hand print does not throw up anything we can look at the thumb print, stars, planets whatever.
But i guess doing so would also lead to massive job losses in India So in the larger interest of the country i guess i will stay away from this area.

posted by Karthik K # 8:32 AM 0 Comments