Detecting Soda Cans Against a Cluttered Background: Detecting the letter 'e'

Last week, I was reading over Dlagnekovs code in an attempt to understand how he used Adaboost. Dlagnekov's code is very clean and easy to understand. However, I had a difficult time discerning details from the code, such as how his Bayes classifiers work. I was also unclear as to how one is able to use a training set of images in conjunction with a group of weak classifiers to generate ROC curves. It is one thing to read theory on this subject, but I needed to see a concrete example.

I wrote up a trivial example of generating a training set and applying classifiers on paper, but I wasn't getting a nice looking graph with two distinct humps. Instead, I was getting an almost flat curve for the positive training set and two impulses for the negative training set. I talked with my professor and he said that the type of classifiers Dlagnekov uses are statistical classifiers, meaning that in order to get good curves, one must have a large number of positive and negative classifiers. He suggested that I implement an Adaboost method for detecting the letter "e" (lowercase) on a white sheet of paper before attempting to detect soda cans. He provided me with a link to a research paper that was first printed out with a dot-matrix printer, then scanned in and converted to PDF format. This means that the letters in the paper vary quite a bit, making the detection process nontrivial. He suggested that I use this paper to generate positive and negative training examples as well as using it to test my implementation. Here is a link to the paper I used:

http://www-cse.ucsd.edu/classes/sp02/cse252/foerstner/foerstner.pdf

Creating the Positive Training Set

I wanted each image to be 32x32 pixels so I zoomed in with Adobe Reader to a 400% scale and took a screenshot. I then pasted the screenshot into Windows Paintbrush and manually cut out every lowercase "e" in the image. I repeated this process until I had 100 different lowercase "e"s in a single bitmap image. This image was then run through a Matlab script I wrote where, when I click the center of each letter 'e', a 32x32 pixel region surrounding the selected location is automatically exported as a new bitmap file. I also saved the image with every lowercase "e" removed so I could later use this e-less image to create negative training examples. Since the research paper was scanned-in with a dot-matrix printer, the quality of each letter e is slightly different --some e's are almost perfect while others look more like o's. These images make up my positive training set, meaning they are examples of what the letter 'e' should look like. Here is a composite image consisting of all of the 32x32 images of the letter 'e' created for visualization purposes using a matlab script:

Creating the Negative Training Set

Using the exact same method I used to extract the letter e, I manually extracted 20 instances for each of the letters 'c','o', and 'a' since these letters appear to have features in common with the letter e, making them difficult to distinguish in some cases. I then took the images without the letter 'e' in them and ran I matlab script I wrote to automatically find and extract 400 random 32x32 areas from the e-less images. I now had my negative training set, meaning they are examples of what the letter 'e' should not look like.

Creating the Filters for the Weak Classifiers

Before I can create a group of Bayes classifiers to run through Adaboost, Each classifier needs to have a filter. Each filter does not have to be very good (in fact, if it is too good it might affect the results of Adaboost negatively) but it should yield a response that will be classified as correct for more than 50% of the time. A classifier that is correct exactly 50% of the time means that it randomly guesses if the image it is testing is a letter 'e'. One question that may arise would be "how do you know how to generate a filter that will work well with Adaboost?". The answer is basically "you don't have to". This is because Adaboost will automatically determine which filters are the best by measuring the accuracy of each weak classifier corresponding to each filter. So, in order to allow Adaboost to select effective filters, the trick is to choose "quantity" over "quality". Each filter should concentrate on a small area of the image and should be fast to compute. Initially, I generated about 7800 different filters where the sum of certain areas of the filter are subtracted from the sum of the other areas of the filter. This is done by setting some pixels to be -1 and others to be 1. When each 32x32 pixel filter is convolved with a 32x32 pixel image, this yields the same result. Dlagnekov did the same thing in his paper, but he used an optimization technique called integral images so that he would not have to compute the sum of every pixel region every time --the sum at different rectangular areas of each image is precomputed and stored. The 7800 filters that I generated didn't seem to work very well. Some of the filters I made consisted of a 2x2 pixel square moved at all possible image locations but these filters didn't seem to yield distinctive responses between the positive images and negative images. So, I looked at two research papers on generating filters using Haar-like features. This basically means that they took the difference between different regions like I did but they used larger-sized rectangles that were evenly divided into between 2 and 4 regions before their sums were subtracted. I decided to create a new Matlab script to generate a new set of filters that more closely resemble Haar-like features:

To generate this set of filters, I opened up an image of the letter 'e' in a photo-editor called Gimp and tried to identify regions of the letter 'e' that might yield a good classification. I figured that the following colored regions might be areas with good features:

The region to the right measures to be around 16x8 pixels, so I generated filters where all pixels were black except for a single 16x8 region where half of the pixels had value 1 and the other half had value -1.

Creating a Bayes Weak Classifier

In order to turn a particular filter into a Bayes classifier, there are 3 main steps that should be taken:

1) convolve the filter with all negative and positive training images and generate a separate histogram for each. Set the range of each histogram to be from the minimum value of both positive and negative responses to the maximum value of these responses. L1 Normalize each histogram by dividing the height of each bin by the sum of all of the bins. This will make the sum of each histogram add up to 1.

2) find the prior probability of finding the letter 'e' and the prior probability of finding a letter other than e. This means "if I pick a letter at random from a random piece of paper, what are the odds that the letter I pick is the letter 'e' ?". I asked my professor about this and he searched Google to come up with a probability of about 12% of finding the letter 'e'. The probability of finding a letter other than e is 100-12 = 88%.

3) find the mean and standard deviation of each set of responses, before they are binned into the histogram

When all 3 of these steps are done, the values generated are plugged into an equation that will basically fit a gaussian curve to each histogram, resulting in two humps.
The equation to use is:

(1 / ( sqrt(2*pi)*s)) * p * e^ - ( ((x-m)^2)/(2*s^2) )

where s,m is the standard deviation and mean from step 3, p is the prior probability from step 2, and x is the values of the x-axis of each histogram (ie: the values of the filter responses used to set the histogram bins).

When the two curves are generated, a cutoff point is set where anything to one side of the cutoff point is considered to be an 'e' and anything to the other side is considered to not be an 'e'. If a classifier is effective, then when the response of a new image is taken with the classifier's filter, this response should fall to the correct side of the cutoff point. If it falls to the side marked by the positive responses, then the image in question is considered to be the letter 'e' and "true" is returned. Otherwise, "false" is returned.

Generating Responses

When I graphed the response of a particular filter where a 16x8 pixel region was located on the right side, I got two curves completely inside each other. This is completely the opposite of what I thought would happen. I thought that since the right side of the letter 'e' looks different than the letters c,o, and a, then a filter with a 16x8 pixel window here would generate good responses. I wanted to make sure that at least some of my filters were good before I tried to run Adaboost on them. Otherwise, I would not know if Adaboost was working well or not. I talked with my professor and he suggested that I generate Bayes classifiers for each curve, calculate the error for each classifier, and determine which classifiers have the lowest error. The filters that perform the best will yield classifiers with the lowest calculated error. This way, I can reliably determine which curves have the lowest error rates and thus the best classification rates. This will provide a way to gauge the effectiveness of Adaboost as well as the quality of my generated filters.

Papers Reffered to

/*frequency of letter e*/
http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

/*Haar-like features*/
http://citeseer.ist.psu.edu/viola01rapid.html
http://citeseer.ist.psu.edu/mohan01examplebased.html

/*dot-matrix scanned paper for training set*/
http://www-cse.ucsd.edu/classes/sp02/cse252/foerstner/foerstner.pdf

Detecting Soda Cans Against a Cluttered Background

Saturday, February 04, 2006

Detecting the letter 'e'

No comments: