Interesting Problems in Computer Vision

Mining Photo Archives
John Resig, used TinyEye’s MatchEngine to do computer vision on a photo archive. He has detailed his current work here – The challenge in this project will be use John’s data and create various vision prototypes which can be compared with MatchEngine.
Data Set(s): Photo Archive Link.

Mitosis Detection in Breast Cancer Histological Images
This contest at ICPR 2012 outlines a good set of data with histological images for breast cancer. The challenge is to use computer vision to detect mitosis.
Data Set(s): Histological images link.

ImageNet Challenges
ImageNet is really large set of images with about 1000 categories – A number of computer vision challenges are possible on this general purpose dataset.
Data Set(s): Image Net link.
Related Data Set:  CIFAR is like ImageNet, but it is much smaller. There’s the 10-category and the 100-category.

Cancer Detection with Computer Vision
There are tons of medical images in this collection of datasets, the Cancer Imaging Archive: , a number of challenge problems are possible on this archive.
Data Set(s): Cancer Image Archive link

Optimal Frequency Calculation

Different algorithms and structures have been used to solve rate-limiting. In this post I want to focus on use of probabilistic method & count-min sketch in particular.


To save space or time. It approximates the result.

There is extensive research on solving the membership problem using data structures from hash tables to balanced trees and B-tree indices, and these form the backbone of systems from OSs, compilers, databases and beyond. Many of these data structures have been in widespread use for forty or more years.

Count-min consumes a stream of events and produces approximate frequency of each of the events. It can be queried for frequency of a certain event and it will return the frequency of that event with certain probability.


Compressed sensing, Networking, Databases, NLP, Security (cryptography, finding primes), Computation Geometry (finding vertices),  Machine Learning.


Simple python implementation from github.

def query(self, x):
Return an estimation of the amount of times 
`x` has occurred. The returned value always 
overestimates the real value.
return min(table[i] for table, i in zip(self.tables, 


  • Count-Min C Implementation –
  • Count-Min Go Implementation –
  • Algorithms to live by –
  • C-Implementation –∼muthu/ massdal-code-index.html

What is Deep Machine Learning?

Deep Machine Learning often referred to as deep learning by the media is the mimicking human brain’s neo cortex by an AI engine. This sub-branch of machine learning is very nascent, largely deriving from neural network research of 1980s and some representational breakthroughs in 2006. Deep learning offers some solutions to problems such as reading hand-writing or finding objects in images using machines.

Several software toolkits such as opencv, mlib, tensorflow, thenos provide a set of neural network representations and algorithms for Deep Machine Learning. Middleware like keras makes it easy to enable toolkit portability.

There are several industry problems, which are currently using deep learning – the top problem areas are around image search, and machine vision (automotive/ aviation).

At this stage, we are very  interested in image search and application of automated learning to find data schemas for health care, finance and other domains and are exploring use cases to begin testing.



  • Brian Hur


Introducing a Karkhana project –

What is WiseVoter?

WiseVoter is a free demopedia where people can collaboratively keep track of politicians. Most of the data associated with a politican’s profile can be edited via a form (look for “edit with form”) link. The data comprises elements like Networth, Criminal Record, Education etc. This enable to write a lot of analytical tools which give you various facts about your politicians.

How is WiseVoter different?
Currently WiseVoter has the most comprehensive data of politicians contesting in the Indian Lok Sabha elections 2009. The data is spidered using Lok Sabha, Rajya Sabha websites, and various internet resource. Other wesbites don’t have this comprehensive data, and we aim to keep adding new data each day. However what sets us apart is that we are allowing this data to be editable by any one, in Wikipedia style. We aim to be around and serve this database for more than one election, unlike most other websites.

WiseVoter is of the people, by the people and for the people in every true sense!

Why Now?
This release of WiseVoter is targeting India Lok Sabha 2009 elections. A profile of a politician can be marked with appropriate candidature field and this information could be used by State or City elections as well!

How can I help?
Please help us –

  • Add more politicians to the demopedia! – Its simple just click add politician in left column and fill the forms.
  • Keep information of the current politician up to date. Just click on the “edit with form” link on top of each profile and change any discrepancies you see!
  • Spread the word – use our badge in your blog, website, facebook –
  • Give us feedback and discuss the information on the website using the bottom bar!!


Recently my team did the following presentation for our business development methods class:

The project was to find ways to propose a new business for a local company Treemo. We collaborated with the COO – Jeff Yee. Jeff was really a pleasure to work with, he almost instantaneously replied to our emails.

Anyhoo the presentation as you see here is proposing a company to help folks focus and improve the visibility of their amateur sports! The presentation and the related financial analysis and marketing research was very well received. We got comments like – “best presentation”, “can I go check this out? Is this real?” , “I would love to have something like this, especially since I’m a soccer mom”.

We also got very consturctive feedback like focussing more on a niche – may be niche of high school students wanted to get university scholarships via sports or may be a family oriented hub niche!!

The entire excercise was very frutiful for me, Mark & Cam. But we dont have customers or funding yet… any one out there???

The Art Of Start

The Art Of Start – the classical Guy Kawasaki work outline the five basic Entrepreneurial principals – Make meaning, Make Mantra, Get going , Define your business model and Weave a MAT (Milestones, Assumption and Tasks). That a very expressive list of getting going but above all i love his statement that Entrepreneurship is all about doing!

I have put that as my litmus test for picking up partners – if you want to do lets get going !! :), What else should one look in a partner? Ofcourse few things which come to my mind are – complementary skill sets, tenacity ….

To be or not to be..

Interesting points – why this guy left Microsoft to join Redfin, why this guy left to join Ontela and why this guy loves startups and what reason this guy has for you to leave the Empire and try something different.

I would summarize a little and add that its not just about finding a new culture and also a new way of life! In the above links i liked the idea of creating a forcing function to move on!

Whats your take on the big empire?

Pillars for Success of a BDM

A seasoned BizDev resource tells me that following skills are pillars for success of a business development manager:

  • Negotiation Skills with proper business conduct
  • Business Modeling with ROI asset evaluation
  • Organizational Maturity Skills
  • Deal Strategy Skills

Whats your opinion?

Market Research!

I was wondering if there is more insight and suggestions wrt tools for Market Research.

I recall from the class the various tools one can use

  • – Industry associations
  • – Competitor websites
  • – Public financial reports of competitors.

Particularly i wanted the ball mark data on quantities, customer segments and overhead cost which wine distributors reckon with. I couldn’t get enough information from the industry associations or competitor’s website and was wondering what else could i try. I need data to estimate the bulk and $$ required to sell wine to adistributor (typically the wine industries middle men).

custome essay