Detecting Soda Cans Against a Cluttered Background

Keypoint Matching With Real Frames

2006-06-08T22:45:00.000-07:00

I tried applying the keypoint-matching portion of the SIFT paper to an input image but I did not achieve very good results. I matched two input images to my database of several hundred different soda-can images and typically achieved results as shown in the first picture. This seems to indicate that when a cluttered background is present, matching individual keypoints does not produce reliable results. The second image shows all of the keypoints found in the input image, which is around 400. It appears that keypoints are being generated for each soda-can in the input image but these are not being determined to be the closest-matching keypoints in the scene.

In order to further determine what the problem is, I tried matching a database image with a frame containing exactly the same image of the soda can in the database. Though this is definitely not a way to test a Computer Vision algorithm for classification ability, it gave some indication about why the poor matching results were occurring. Even though the same image was used, not all of the points were matched. This indicates that analyzing the constellation of points is very important when dealing with a cluttered background. Otherwise, the background could have so much variety to it that keypoints arising in it would match better than keypoints in the correct object. Lowe describes in his SIFT paper a process of using clusters of results from a generalized Hough Transform. Although Lowe achieved good matching results using just his keypoint matching algorithm, he used high-resolution images where the keypoints within the objects in question were numerous and apparently more distinguishable from the background, as seen with a book-matching example at this link: http://www.cs.ubc.ca/~lowe/keypoints/

The next step that I take might be to implement the generalized Hough Transform that Lowe mentions, followed by a pose-estimation. One concern I have is that there are not enough keypoints that can be matched to a database of images. In an earlier blog posting, I showed matching results from comparing high-resolution images to low-resolution images that were scaled-down and oriented differently. These results showed that sometimes there were no keypoint matches between different scales of soda cans. If I have to add many different soda-can scales as well as orientations to my database of images, then the matching algorithm might become prohibitively inefficient and the accuracy level might drop even lower.

I might try using PCA-SIFT to extract keypoints. This algorithm replaces the "smoothed-histogram" that Lowe uses in his keypoint creation stages with PCA (principal component analysis). The PCA-SIFT keypoints are supposedly better at handling noisy images but are more prone to localization error. I was also thinking about using the features described in the Video Google paper, but there is no indication as to how efficient the computation of the "visual words" is. The Video Google paper apparently assumes that the "visual words" are already computed in a video before objects are classified.

Perhaps a long-term solution to object detection using mobile robots should be to simply detect the presence of an object with no regard as to what type of object it is. Objects of a certain size, color, dimension etc. could then be inspected more closely where some SIFT variation could be applied. Perhaps eventually an entire scene could be constructed into a 3D model and while this process is taking place, some easy to recognize objects could be classified before the entire 3D reconstruction completes.

I think that for now, I will implement the generalized Hough Transform and the other stages mentioned in Lowe's SIFT paper and see under what conditions soda cans can be recognized well. I will then be able to better determine if enough keypoints are being generated. I will also be able to determine where Adaboost fits in the detection/recognition process.

Removing The Background of Training Images

2006-06-07T08:54:00.000-07:00

I recently wrote a program to white-out the background of my soda can training images. Since I wanted to reuse the training set from my experiments with Adaboost, I had to implement two different stages to accomplish this.

The first stage involved writing a Perl script to reformat the polygon files that were created by the "Javimap" program I used. This script read all of the polygons in all of the files and merged them into a single text file. Thanks to Perl's build-in regular expression matching capabilities, this process worked quite well.

The second stage involved writing a C program to cut out each soda can subimage from the training images and white-out its background. I chose C because David Lowe's provided code is written in C and contains some functions for manipulating PGM files (which he chose for his SIFT implementation).

In order to white-out the background of the soda-can images, I checked each pixel to see if it was within the polygon that I specified when first creating the training images. To accomplish this task, I implemented an algorithm that was provided on this website for raytracing:

http://neptune.cs.byu.edu/egbert/455/CourseNotes/Lecture13-RayTracingIntersections.ppt

My entire algorithm went as follows:

1) read each polygon and load its corresponding image from the Perl-generated text file of polygon coordinates.

2) cycle through each set of polygon coordinates to find the largest and smallest pair of (x,y). Use these values to specify a bounding-box for each soda can.

3) Using the bounding box from (2), crop out the image within this bounding box (should be a single soda can).

4) Translate all polygon coordinates by subtracting the smaller bounding box coordinate from each (so they are with respect to (0,0) rather than the smaller bounding-box coordinate. This is because a new image was created with the bounding box region.)

5) For each pixel in the subimage, check if it is within the bounding polygon. If not, then set that pixel value to 1.0, which is white. To determine if a pixel is within the polygon, the polygon is translated to a new coordinate system where the pixel in question is at the origin. The edges of the polygon are then checked to see if they intersect the x-axis. If the number of intersections is odd, then the pixel is inside the polygon. If the number of intersections is even, then the pixel is not inside the polygon. More details on this algorithm can be found at the aforementioned website.

Now that I have created an algorithm to automatically extract training images with clutter-free backgrounds, I can redo the training set very easily if needed without having to manually white-out the background with Gimp.

The next step will be to proceed with matching input test images against my new database of soda-can images. I will then try applying additional concepts from the SIFT paper to filter out false-positives etc.

Learning Perl

2006-05-27T00:41:00.000-07:00

It took me quite a while to extract the number of keypoints detected and insert them into a spreadsheet. To make future tasks easier, I decided to learn Perl scripting. I read over some tutorials and books. From what I've learned about Perl, I created a script to remove keypoints and their corresponding images from the database it the number of keypoints detected in them is less than 3. This is accomplished by parsing the output text file from detecting keypoints in the database directory. Now that I can make Perl scripts, I should be able to accomplish many tasks faster by automating the more tedious work.

Here is a sample from the output file from creating database keypoints:

extracting keypoints: ./database/s06.pgm -> ./database/s06.key
Finding keypoints...
22 keypoints found.
extracting keypoints: ./database/ss01.pgm -> ./database/ss01.key
Finding keypoints...
2 keypoints found.
extracting keypoints: ./database/ss02.pgm -> ./database/ss02.key
Finding keypoints...
3 keypoints found.
extracting keypoints: ./database/ss03.pgm -> ./database/ss03.key
Finding keypoints...
0 keypoints found.

Here is the Perl script I wrote. It is called by passing in the name of the output text file ( a sample of which is above). The # signs represent comments:
#!usr/bin/perl
my $rmkp = ">./rmkp.txt";
my $line;
my $prevLine;
#@ARGV[0]: output text filename from keypoint detection.
open ( FILE, "@ARGV[0]") or
die "Cannot open input file: @ARGV[0] : $!\n";
while( $line = ) {
# i means case insensitive
# ([^>]*) match zero or more characters but not '>'
# if line matches regexp
if( $line =~ m/extracting keypoints:([^>]*)>/ ){
#removing words
$line =~ s/extracting keypoints:([^>]*)>//g;
#remove file extension
$line =~ s/.key//g;
#remove whitespace
$line =~ s/\s//g;
#store current line
$prevLine = $line;
}
#if number of keypoints found, store number
if($line =~ m/keypoints found./){
$line =~ s/keypoints found.//g;
$line =~ s/\n//g;
#if less than 3 keypoints
if($line <>

Here is the log file, rmkp.txt, generated by the Perl script. This shows the number of keypoints for the associated keypoint and image files that were deleted:
Files Deleted From Database:

2 ./database/ss01.key ./database/ss01.pgm
0 ./database/ss03.key ./database/ss03.pgm

Analysis of Keypoint Matching

2006-05-24T19:44:00.000-07:00

Image 1: table of keypoint matches (rows on the left matched with columns on the top)
Image 2: table of keypoints where entries with <2 or entries where an image was matched to itself were set to zero.
Image 3: a graph of the table in Image 2.

Today, I graphed the matching of keypoints. The x-axis of the graph above represents the keypoints from the input images. The y-axis represents the keypoints from the database. The colors indicate how many successful keypoint matches occurred. I removed all results that had fewer than 2 matching keypoints or results where the exact same keypoints were matched between identical images. One strange result is that certain images match very well when, for instance, "match image1 image2" is run and the same images yield no matches at all when "match image2 image1" is run. For instance, the image s05.pgm has 64 matches when matched with 04.pgm while no matches occur when 04.pgm is matched with s05.pgm. Because of s05.pgm's angle and small resolution of around 120x120 pixels, very few stable keypoints are found. In contrast, 04.pgm has a high resolution with around 400 keypoints found. It could be that s05.pgm's keypoints are very similar to each other because of the low pixel resolution of the image. When matched with the keypoints from 04.pgm, perhaps there are enough keypoints in 04.pgm that are close to the keypoints in s05.pgm that enough keypoints match to indicate a successful classification.

Though this table does not provide a complete set of test cases, it can be inferred that the higher resolution images match better with other high-resolution images than low to high, low to low, or high to low. I believe that the high number of matches between the high resolution images to s05.pgm is an anomoly and does not indicate that this particular image can reliably be compared to others.

If too many keypoints are stored in the database images, the computation time required to match new input images to these might be too great for real-time performance. Since Lowe was able to achieve good results using 600x480 pixel input images with smaller resolution training images, I will try doing the same. Now that I am more familiar with SIFT, I will re-read Lowe's SIFT paper. I will try to jump the gun and use the methods provided in Lowe's code to detect soda cans using footage from Robart III as the next step. This way, I will have a better idea about what problems arise. I will then look over Lowe's suggestions for narrowing down the range of useful keypoint matches using a Hough Transform etc.

Matching Keypoints

2006-05-22T19:03:00.000-07:00

Today, I matched soda-can keypoints to each other. Conveniently, C-code for doing this is provided with David Lowe's Keypoint generation code. Using the above set of images, I found keypoints for each image and matched them with the keypoints from all of the other images. The images with a prefix of "s" are the images without an "s" that have been scaled down to 3% of their original size and then scaled up by 600%. The resulting images with prefix "s" are of a size of around 150x240 pixels. The other images have a size of about 1100x2100 pixels. This was done as a test to see how much of an effect the resolution of the images has on extracting keypoints, which angles have the most matchable keypoints, and if similar resolution soda-can images match better than dissimilar resolution ones. I will post the results from this small experiment along with a table and a graph on my next blog entry.

This is how keypoint matching works between two images, referred to as A and B:

1. keypoints are found in A.
2. keypoints are found in B.
3. The closest matching keypoints are found between the images. Each keypoint is a 128 dimension vector. To find the distance between two keypoints, the Euclidean distance is found between the feature vector s belonging to the keypoints. Each keypoint in A is compared with each keypoint in B by Euclidean distance to find the closest matching keypoint. A match is determined as follows:

Let a' be a point in A. Let b' and b'' be the first and second closest matching points in B to the point a'. Let D(x,y) represent the Euclidean distance between its arguments x and y.

If D(a', b')< D(a', b'')*0.6 then the closest point is chosen.
Else, no points are chosen (no matching keypoint).

According to Lowe, finding merely the closest matching keypoint does not yield a reliable match. Lowe claims that a more reliable match is determined by comparing the second-closest matching keypoint. He does this by using the second-closest matching keypoint as a threshold.

SIFT Keypoints: preliminary results

2006-05-10T18:29:00.000-07:00

One question has arised with regards to keypoints: will soda cans in low-resolution video footage yield enough keypoints to perform SIFT effectively?

Running the Keypoint Detector

I downloaded the keypoint detection code from David Lowe's website:
http://www.cs.ubc.ca/~lowe/keypoints/

David Lowe is the creator of SIFT. The code is in binary format and was extremely easy to run in Linux. There are apparently no required libraries or dependencies to run this code. I took some photos with a digital camera and cropped out the regions containing soda cans from the background. From what I've seen so far, it appears that SIFT detects keypoints that represent characteristics within an object. I believe this means that the outline of a soda can cannot be taken into account for object detection. This may prove to be a problem if objects to be detected do not have many detailed features. To test the validity of this theory, I would like to try spray-painting a soda can a solid color and see if any SIFT keypoints are extracted. Here are some images of detected keypoints:

SCALED TO DOWN TO 1.5%:

SCALED TO DOWN TO 1.5% THEN SCALED UP TO 600%:

SCALED TO DOWN TO 3.0%:

SCALED TO DOWN TO 3.0% THEN SCALED UP TO 600%:

These cans all started out as high resolution images. These images were then scaled down to 1.5% or 3.0% of their resolution in an attempt to simulate soda cans at a typical scale encountered in live video footage. According to Lowe's README file that came with the keypoint detection code, more keypoints can be detected if an image is first scaled to a larger size. In fact, the number of detected keypoints typically doubled when a low resolution image was scaled up. The pepsi cans with a prominant logo yielded the most keypoints. The soda can in the top-most image yielded no keypoints at all. There are many cases where the specular highlights appeared to yield keypoints. These cases would be taken care of at a later stage where keypoints from different training images taken under different lighting conditions would be analyzed to determine which keypoints are invariant to scale and lighting. The more I think about it, the more appealing a hybrid solution to object detection seems --one that would incorporate data from multiple sources to arrive at a conclusion. For instance, perhaps some sort of texture recognition or color detection method could be used in conjunction with SIFT to detect objects with sparse and dense features. I will continue to investigate how SIFT can be applied to soda can detection and find out what image resolutions I have to work with. If SIFT proves to work well for objects up close, I might use some other methods to detect objects of interest and then proceed to zoom-in on those objects before applying SIFT.

The next step will be to test out keypoint matching between these detected keypoints.

New MileStones: Evaluating SIFT

2006-05-10T16:11:00.000-07:00

I have a new set of Milestones to accomplish for this quarter. I'm very busy this quarter so I will try to accomplish as many of them as I can.

Understand SIFT thoroughly - meet by 5/10/06

The purpose of this milestone is to understand SIFT well enough to be able to set up an appropriate training set and to plan out speciﬁc implementation details. This will include locating existing SIFT-related source code and determining what software should be used (i.e. shell-scripting, ImageMagick, Matlab, OpenCV, C++ etc.). Last quarter, I implemented everything in Matlab first before attempting to implement it in C++. This quarter, I will be giving C/C++ preference and use Matlab to aid in my understanding of SIFT or to generate charts, graphs etc. I've found shell scripting to be incredibly helpful in all aspects of programming --not just for this research.

Create the Training Set - meet by 5/17/06

This will most likely involve modifying my existing training set. This training set consists of extracted images of soda cans acquired at a single scale, against a static background, under constant lightining conditions, and two separate orientations. Although the use of this training set will most likely not result in scale-invariant keypoints, it will be useful in testing other aspects of this SIFT algorithm directly against my Adaboost implementation. Invariance to background clutter, lighting, performance while in motion, and efﬁciency can be tested using this training set. After a working SIFT implementation is created, a training set that includes soda cans extracted under a wider variety of conditions can later be swapped with this training set. I figure that I will go through all of the steps required to apply SIFT using only a few images. Once this is done, I will have a better idea about how to create a good training set for SIFT. Last quarter, I had to redo my training set several times so I want to avoid having to do that this quarter.

Detect Keypoints- meet by 5/24/06

This will involve using a difference-of-gaussian function to search the scale-space of the soda can training images to identify scale-invariant interest points. The most stable keypoints will then be selected and assigned orientations based on their local gradient directions. These gradients will then be transformed into a representation that is resistant to shape distortion and illumination differences. This will be accomplished using an existing binary implementation, such as the version provided by David Lowe on his website. I will probably make a primitive algorithm for matching keypoints initially and then improve upon it.

Match Keypoints of New Input Images (no background clutter) - meet by 5/31/06

A database of keypoints from the training images will ﬁrst be set up. A new input image will be matched with this database by ﬁnding the Euclidean distance between the feature vector of the input image and the feature vectors in the database. The new input images will include soda can images with backgrounds that have been manually removed so that no keypoints will be detected outside of the soda can regions. If background clutter is left
in, then keypoints extracted from these regions may yield so many negative classiﬁcations that false negative classiﬁcations will result even when a soda can is present in a particular region. Without background clutter, I can test keypoint matching without having to cluster the keypoints or apply any additional transformations to them.

Identify Keypoint Clusters, Apply Further Filtering of Keypoint Matches (background clutter added) - meet by 6/7/06

This will ﬁrst ﬁlter correct keypoint matches from the false matches by identifying clusters of three or more keypoints that agree on the orientation, scale, and location of the detected object. This is described in Lowe’s SIFT paper with a Hough Transform offered as a solution to identifying these clusters. Next, a least-squares estimate will be applied for an afﬁne approximation to the object pose. Further analysis can be applied, as described by Lowe, to determine the probability that a particular set of features is the object being searched for.

Run Detection Method Live - meet by 6/14/06

Once a SIFT implementation is successfully implemented, it will be tested using live video footage. Initially, I might try using the same setup that I used before with the same static background. The results from this test will be compared with my previous results based on the efﬁciency and accuracy of detection. Various conditions will then be changed, such as the ambient lighting. Video footage will then be tested while the camera is moving horizontally with respect to the soda cans. The complexity of the tests will be gradually increased and the results analyzed unless/until the SIFT implementation begins to perform poorly. If time permits, a new set of training images will be acquired using soda cans positioned in a variety of locations and orientations and this SIFT implementation will be retested. Depending on the availability of SIFT source code online, some of these milestones can be combined and their theoretical completion dates moved back or forward.

Testing Soda Can Detection on Robart III

2006-04-18T09:44:00.003-07:00

http://members.cox.net/fbirchmore/goodrun.mpg
http://members.cox.net/fbirchmore/goodrun2.mpg
http://members.cox.net/fbirchmore/goodrun3.mpg

We ran my soda can detection algorithm on live video from Robart III. Apparently, the classifier I have trained is very susceptible to variations in lighting and background clutter that was not present in the training data. Regardless of the aforementioned deficiencies in this particular classifier, it does quite well at eliminating almost all of the background clutter in the center of the image (only a central portion of the window is scanned as mentioned in a previous post). These false positives could be passed to another detection method for further processing or to signal to Robart III that he should zoom in on these areas for further inspection. From the results I have observed so far, I think weak classifiers that are better at detecting unique features of soda cans would work better than the haar-like features used. I think that a SIFT-based detection method will work much better , partly because it uses keypoints that would be more invariant to changes in lighting. Investigating a SIFT-based detection method will probably be the next step I will take. Although I still need to read more about SIFT to understand it fully, I am going to investigate the possibility of incorporating Adaboost into a SIFT detection method. More information on this will appear in later blog postings.

Test Video

2006-04-17T13:41:00.000-07:00

Here is the video test footage that is mentioned in the previous post:

http://members.cox.net/fbirchmore/attempt39_out.mpg

Success (under constrained conditions)

2006-04-17T08:06:00.000-07:00

When I had finished extracting soda-can images for my training set, I had 410 vertical and 420 horizontal soda can images. I then randomly offset each of these images by up to 1/8 of their width and 1/4 of their height to generate a total of 4444 positive training examples of vertically-oriented soda cans and 4554 positive training examples of horizontally-oriented soda cans --as Dlagnekov did in his thesis.

I then trained two classifiers using 23200 negative training examples for each. Instead of using the entire labeled image, I constrained my negative training examples to a 161x184 pixel subwindow in the bottom-center of the image. Furthermore, I used negative training examples from video footage that contained no soda-cans to begin with --rather than extracting negative examples from labeled images that exclude areas marked as containing soda cans. The first classifier used only the vertically-oriented soda-can images with a detection-window-size of 17x27 pixels and the second classifier used only the horizontally-oriented soda-can images with a detection-window-size of 30x14 pixels. The I then modified the existing license plate detection code to read in a list of classifiers from a file and run them sequentially on the source video.

Using this new framework, I ran my detection algorithm on a sample video that I had set aside before training. This sample video footage was taken using the same background that was used to build the training set. Every soda can in the video was detected very accurately with no false-positives. The worst detection in this frame is probably the upper-left where the detection window is off-center. This is probably due to the fact that the detected locations are clustered, which might be causing surrounding detection areas to average to an off-center detection area. The next step is to test this algorithm on ROBART III using live video footage.

Building a New Training Set

2006-04-06T18:07:00.000-07:00

Next week, a portion of my work will be submitted into our Code's SPIE conference paper. Although I have decided to explore a SIFT-based method for soda can detection, my more immediate goal is to make the most of Dlagnekov's LPR detection algorithm when applied to soda can detection. The Robotics Code at SPAWAR would like to place my soda can detection algorithm directly on Robart III to test the algorithm's efficiency and accuracy. In order to make my algorithm "Robot Friendly", I am going to refine its focus to detect soda cans at a single scale against a static background. This way, my chances of success in reducing the false positive rate and increasing the true positive rate within the next week are much greater than they are for detecting soda cans at multiple scales. Furthermore, I will limit the number of orientations to only two --upright and horizontal. This will require me to train two separate strong classifiers. By keeping my detection method simple, I will be able to place my classifier directly on Robart III without having to recompile anything as well. If this is successful and there is enough time, I will add additional functionality to my algorithm and possibly get Robart III to shoot at the soda cans he sees (using all six barrels of course).

Setting Up The Soda Cans

As illustrated in the photo above, I set up Diet Pepsi Cans on and around a wooden box. This wooden box has been featured in previous SPIE conference papers submitted by the Robotics Code at SPAWAR because it is safe for Robart III to shoot at it. Without modifying the position of the wooden box or the background, I placed the soda cans at different locations and Greg recorded each setup for 5-10 seconds using Robart III's head camera. I then rotated each soda can by about 60 degrees and Greg recorded this new setup as a separate MPEG file. This was repeated 44 times to create 44 different MPEG videos. When setting up soda cans, I tried to make sure that there was just about the same number of horizontally and vertically-oriented soda cans with which I can build my training set.

Extracting JPEG Images from MPEG Videos

I used ffmpeg to extract 1-2 JPEG files from each of the 44 MPEG videos. I used the following ffmpeg command for each MPEG video:

ffmpeg -ss 00:00:03 -t 00:00:01 -i ./.mpg -r 1 ./3%06d.jpg

where represents the name of the MPEG video being extracted and 3%06d.jpg means assign a different number as the name for each image file being extracted. The other parameters and their meanings are listed below:

-ss :
the start time of the source file at which recording will begin. I set it to 3 seconds so that the extracted frames would not be washed-out by Robart III's automatic camera aperature adjustment.

-t : the amount of time to extract video frames for. I set it to 1 second.
-r : the framerate which the extracted video (or images) will have. Since I
have extracted JPEG images rather than converting to another video format, between 1-2 images are extracted.

Additional documentation on FFMPEG is available at the following website:

http://ffmpeg.sourceforge.net/ffmpeg-doc.html

Making a Shell Script

To extract each video frame, I wrote a shell script that iterates through all MPEG files in the current directory. Video acquired from Robart III is upside-down by default so each extracted JPEG image needed to be rotated by 180 degrees. Here is a link to the websites I referred to about shell scripting:

http://www.freeos.com/guides/lsst/ch03sec06.html
http://bash.cyberciti.biz/academic/no_files_in_dir.sh.php

Using Image Magick

There was a time not too long ago (a month or so ago) when the task of rotating all of the images in a directory seemed difficult. Now, I've learned that you can use a shell script to call ImageMagick in order to batch-process images. I first installed ImageMagick using the Synaptic Package Manager in Ubuntu (my Linux Distro). I then simply called the "rotate" command in my shell script to rotate each extracted JPEG image in the directory. In addition to modifying images in the standard way, ImageMagick is also capable of creating composite images and extracting specified regions of an image. This will definitely be useful for creating composite images of my training set. Before I discovered ImageMagick and shell scripting, I was using a Matlab program that took me hours to write in order to create composite images (of course, I knew a lot less about Matlab then). Here is a link to the Image Magick documentation:

http://www.imagemagick.org/script/command-line-tools.php#mogrify

The Shell Script

Here is the code for the shell script that I wrote:

###########################################
#extracts JPEG images from each MPEG file in the current directory
for f in $(ls *.mpg)
do
ffmpeg -ss 00:00:03 -t 00:00:01 -i ./$f -r 1 ./jpegs/$f%06d.jpg
done
#rotates each image 180 degrees
for f in $(ls ./jpegs/*.jpg)
do
mogrify -rotate 180 $f
done
###########################################

This script seems almost too simple to even bother posting on this blog. However, I would have saved days of coding time had I realized how easy it is to use shell scripts to process images. Hopefully, this will be of benefit to someone who is reading this blog and perhaps has little exposure to computer vision.

Changing my mind...again...

2006-04-06T17:13:00.000-07:00

I've decided to use the Java program. Apparently, in order to upload images to LabelMe, the images must be placed online and a link must be emailed to
Bryan Russell and Antonio Torralba at MIT. Since I am under time constraints to meet a paper deadline next week, I am just going to use the Java program. Later, when I try a SIFT-based implementation, I may use LabelMe.

Changing my Mind...Decided to use LabelMe

2006-04-06T08:44:00.000-07:00

I've decided to use MIT's LabelMe annotation tool to rebuild my training set. At first I was hesitant to use LabelMe because I found out that their Matlab toolbox doesn't actually let you annotate images --that must be done online. I only have one copy of Matlab which is installed on a Windows machine. At this point, I would prefer to keep everything within Linux. However, I see a definite advantage to using LabelMe in that multiple people can label a set of images simultaneously since LabelMe is an online annotation tool. This means that I don't necessarily need to sit down all by myself and label images --I can solicit help. Here is the LabelMe website:

http://people.csail.mit.edu/brussell/research/LabelMe/intro.html

Getting Java to Compile

2006-04-05T20:59:00.000-07:00

Well, with a new Linux installation comes a new set of problems. I was initially unable to compile any Java programs short of a single file containing a single "main" function (no object definitions). I kept receiving the error message below:

javac ./Javimapper.java
java ./Javimapper
Exception in thread "main" java.lang.NoClassDefFoundError: ./Javimapper

I eventually fixed the problem by referring to the following website:

http://www.cs.princeton.edu/introcs/11hello/linux.html

Upon reading the webpage above, I simply typed in the following and I was able to compile and run the program just fine:

java -classpath ./ Javimapper

Finding an Image Annotation Program

2006-04-05T20:11:00.000-07:00

Rebuilding the Training Set

After I lower the false positive rate on my soda can classifiers, I plan to try some sort of SIFT-based approach to detecting soda cans and other objects as well. Before attempting to lower the false positive rate, I need to rebuild the training set...again. I believe that the horizontally-oriented soda cans that I have been extracting from my training footage are throwing off my detection rate. Furthermore, I have been mistakenly training classifiers using 720x480 resolution BMP images extracted from the training MPEG movies.

At the time, I was using Adobe Premiere to extract video frames since I had not yet set up ffmpeg and it had not occurred to me to use it. In order to use Adobe Premiere, I had to convert the MPEG movies into an AVI format that was at 720x480 resolution by default. I did not notice this and mistakenly thought that the original footage was 720x480 resolution when in fact it was 320x240.

By rebuilding the training set, I will be able to use smaller sliding windows for each classifier which should reduce both the training and detection times by quite a bit. When the original footage was accidentally resized to 720x480, I believe that the pixels were blurred. It will be interesting to see how much the detection rates are affected by using unmodified 320x240 footage. The 320x240 footage probably contains a greater amount of noise. However, the 320x240 footage will also train faster with each image taking up less memory --which will probably allow me to use a greater number of negative training examples and allow more bootstrapping operations with false positives.

Since I will probably soon be applying some SIFT-based approach to soda can detection, it would be beneficial if I use bounding-polygons instead of bounding-rectangles. I think this helps SIFT to avoid mistakenly extracting features from the background and allows the spread of detected points to be exploited. With this in mind, I finally decided to create a Java GUI program that will allow the user to specify bounding polygons that will be saved into different files. I will then write some code to determine the extreme points (i.e. max x, min y; max x, max y; etc...) of the polygons and set these to be the corners of rectangular bounding boxes. I will then use these bounding boxes to retrain some strong classifiers. The polygon data can then be saved for later use with SIFT or some other algorithm that would benefit from tight-fitting polygons.

Java Program to Modify

I found this Java program online by searching for "image map java" on Google. The potential candidates for image annotation programs seemed overly complex and generally only used bounding boxes. I realized that image maps that one might find on a website would use the same sort of image annotation as "image annotation toolbox" programs. I wanted to avoid using Matlab since I am less familiar with its object-oriented features than Java and I can only use Matlab on my laptop.

Here is the program I will modify:

http://www4.vc-net.ne.jp/~klivo/mapo/javimap.htm

It seems to work extremely well --especially for a 7-year-old program. I will most likely modify it by allowing it to load all images from an entire directory and making it export bounding box data in addition to polygon data.

Getting Java to Work

2006-04-05T20:07:00.000-07:00

I took the following steps to get Java to work on my USB Hard Drive:

1) download JDK binary from Sun's website
2) type the following:

sudo apt-get install fakeroot java-package java-common
sudo dpkg -i JDK1.5.deb
sudo update-alternatives --config java

I then picked the new Java version from the menu that appeared from the update-alternatives command. I found this information at the following website:

http://www.ubuntuforums.org/archive/index.php/t-76735.html

Fixing FFMPEG

2006-04-05T13:42:00.000-07:00

I was having trouble compiling and running the lpr_vhook program that Stephan and Kevin created. It was unable to find the highgui portion of the OpenCV library. Greg finally got it working by linking highgui.a directly into the program instead of using the shared library files. He also linked some other files that were missing by adding them to the Makefile. Apparently, my problem was that I downloaded a newer version of ffmpeg that had new library requirements so these had to be added. The ffmpeg website had removed their .tar files so I had to use their CVS repository to get the "bleeding edge" version of ffmpeg.

I learned a few new terminal commands in Linux:

ldd : list library dependencies for a file
!ff : runs most recent command that begins with the letters "ff"
history : list all commands that were typed into the terminal and enumerate them
!# : run command from "history" that has the number #

Even though it took me a week, booting Linux from a USB drive has payed off immensely in that it eliminates the need to reconfigure each Linux environment for each computer with Linux installed.

Now, I can finally redo my training set using 320x240 resolution and excluding horizontally-oriented soda cans. I may exclude partially occluded soda cans as well.

Getting X11 to work

2006-04-05T13:29:00.000-07:00

I had some trouble getting X11 to work when I attached my USB drive to my work computer. I was getting weird colored snow instead of graphics. After trying many different things, I finally installed the proprietary nvidia drivers instead of the open-source driver called "nv". By refererring to this website, I was able to install these drivers quite easily:

http://www.ubuntuforums.org/showthread.php?t=75074&page=4

I typed in the following instructions into a terminal window:

sudo apt-get install nvidia-glx linux-restricted-modules-$(uname -r)
sudo nvidia-glx-config enable

It worked so I have X11 now.

Installing Linux to a USB hard drive

2006-04-04T10:38:00.000-07:00

I recently purchased a Seagate 250GB hard drive and an Acomdata aluminum external hard drive USB/Firewire case. I use a nice Dell PC at work that only has Windows installed on it. Rather than trying to reformat and repartition this PC at work, I decided to attempt to install Ubuntu 5.10 to an external USB hard drive. This way, I can bring my own personal Linux distribution wherever I go and use it on any PC (given the motherboard supports booting from a USB device) without changing the contents of the PC's hard drive. I got this to work by following the instructions provided in this article:

http://www.ubuntuforums.org/showthread.php?t=80811

In case this Ubuntu Forum post gets deleted, I have copied and pasted the appropriate steps below. My installation experience was slightly different from the original steps mentioned in the forum --I have indicated these differences with bold,italic, light-blue text.

***********************************************************************************************
Here is what I do to successfully load UBUNTU v5.10 on my EXTERNAL USB DRIVE ...

(Step 0 ) Plug the external USB drive into the computer. Unplug all of the internal IDE hard drives.

( STEP 1 ) Instead of using "expert" mode to install, I just hit enter to start the install process (using the install CD ... NOT the live CD).

INSTALL NOTE: The installation process will ask you for some information when it starts up ...

first it will ask for your language,
then your (language) location,
then your keyboard type,
then it will detect your cdrom(s),
then it will attempt to detect your network configuration,
then it will ask you to give your PC a name (hostname),
then it detects hardware and starts up the partition phase of the install.

INSTALL NOTE: When you get to STEP 2 (the partition phase), there is a simple way to determine how the install program assigns device names to your hard drive(s). Before starting any partitioning activity in STEP 2, choose the GO BACK option in the lower left corner of this screen and then choose EXECUTE A SHELL farther down in the Ubuntu installer menu that appears. Choose CONTINUE at the next prompt. In the shell window that appears, type in fdisk -l (that is FDISK -L in all lowercase). This will give you a list of all the storage devices the installer program sees. Make a note of the device names (/dev/hd? or /dev/sd?) and then type in exit to return to the partition phase (STEP 2) of the install. (Kudos to "RyanGT" for suggesting this in post #257.)

( Step 2 ) (ignore the Step 2 below) If you have a 250GB hard drive like I do, you probably don't want Ubuntu to create 3 giant partitions that span the entire drive. In fact, you will probably want to set up separate partitions manually. I set up my partitions as follows:

Filesystem Size Used Avail Use% Mounted on Fs
/dev/sda1 122M 19M 97M 17% /boot ext3
/dev/sda2 449M 255M 171M 60% / ext3
/dev/sda3 1.5G swap swap
/dev/sda5 95M 33M 62M 35% /tmp reiserfs
/dev/sda6 1.9G 492M 1.4G 26% /var reiserfs
/dev/sda7 1.9G 33M 1.9G 2% /usr/local reiserfs
/dev/sda8 5.6G 1.6G 4.1G 29% /usr reiserfs
/dev/sda9 28G 653M 28G 3% /home reiserfs

(note: I displayed this information by later typing "df -h" in a terminal window. sda4 is not present because it represents the extended partition
of the drive where sda5 - sda9 are within sda4)

I made sure that the /boot partition was small so that it does not exceed the first 1024 cylinders at the beginning of the hard drive --which I guess would make this partition not bootable.

skip this -- ( STEP 2 ) During the partitioning phase, I let the install program format my external USB drive. (I believe UBUNTU calls this a guided partitioning ... which sets up an ext2 or ext3 partition and a swap partition for you.)

INSTALL NOTE: Look for the line during the partitioning phase that might say ... erase entire disk SCSI (0,0,0) (sda). Be careful here as some people have had problems when choosing any partitioning options that include the text "and use LVM" (Kudos to "brucetan" for sharing this in post #268.)

Feel like learning about LVM anyway? Check this link out. (Kudos to "SD-Plissken" for sharing this info.)

BE VERY CAREFUL on these screens to choose the correct SDA drive and NOT an HDA drive or you may unintentionally format another drive in your system. There is no undo button for this!

Once again ... BE 100% SURE OF THE DRIVE YOU WANT TO FORMAT!

( STEP 3 ) When the install gets to loading the GRUB bootloader ... DO NOT LET IT LOAD TO ANY OTHER DRIVE BUT THE EXTERNAL USB drive we are working with here. -- This doesn't matter since the other drives should be disconnected.

Say YES when asked if you want to install Grub to the MBR (since the USB drive is the only one connected --ignore where is says "say NO" below)
The install program will ask to load GRUB to the master boot record (MBR) of your internal hard drive (HDA). Say NO to this, and on the next screen, type in the correct path to the SDA (external USB) drive where we want to install the GRUB bootloader.
(Mine was /dev/sda [NOT sda1] ... but yours may be different depending on the number and/or types of drives in your system)

COMMENT: I loaded GRUB to sda (the bootsector of the external drive) and NOT to sda1 (which would be the first partition on the external drive).

INSTALL NOTE: at this point, the install program loads some stuff and ejects the CD ... wanting you to do a reboot.

( STEP 4 ) BE 100% SURE to leave the CD in the drive (and close the drive door) before rebooting. When the PC reboots, type in rescue (to load UBUNTU in rescue mode) ( Review REQUIREMENTS at the beginning of this guide! )

Why do we startup in rescue mode you might ask? It's because we have to edit a few files to get USB support loaded before UBUNTU actually gets going. And, we also need to change a setting in the GRUB menu file to make it work correctly.

( STEP 5 ) When the system comes back up it will ask for a partition to mount. Pick the correct mount point for your external drive from the list.
(Mine was mount /dev/discs/disc1/part1 ... but yours may be different depending on the number and/or types of drives in your system)

COMMENT: /dev/discs mount points start with disc0 (with 0 meaning the first drive in a system). So, my mount point of /dev/discs/disc1/part1 was really the second disk [disc1] (the sda drive we are working with) and the first partition [part1] on that disk.

( STEP 6 ) When it comes up to a terminal window (with RESCUE MODE in the upper left corner) and just sits there, hold down Ctrl-Alt-F2 to open another terminal window for us to do our edits in.

( STEP 7 ) Type in these lines before we start editing out files ...

mount -tproc proc /target/proc
chroot /target
su

INSTALL NOTE: I used vim to edit the files. (There is a great vim cheatsheet in pdf format available from "TimL".) It is weird to use at first until you learn what a few keys do in it. The INSERT key allows you to actually enter text where you place the cursor ... The ESC key takes you out of INSERT mode ... And hitting : x saves the file and exits out of vim. (IMPORTANT TIP ... that's a colon and a lowercase x with NO SPACE between the colon and the x)

( STEP 8 ) Run vim to edit the modules file to make sure USB support is added/loaded during UBUNTU startup ...

vim /etc/mkinitramfs/modules

Right below the last line of text, enter these lines ...

ehci-hcd
usb-storage
scsi_mod
sd_mod

Be sure to save the file changes (using : x)

COMMENT: Some PCs may need a more comprehensive list added here. Please take a look at post #181 if you are having startup issues after these edits. (Kudos to "brotorious" for his efforts on this.)

( STEP 9 ) Run vim to edit the initramfs.conf file to make sure enough time elapses for USB support to load before UBUNTU gets running ...

vim /etc/mkinitramfs/initramfs.conf

At the very top of this file, add this line which tells UBUNTU to pause for 12 seconds before starting up ...

WAIT=12 (in all caps here, not sure if necessary though)

Be sure to save the file changes (using : x)

INSTALL NOTE: Editing these two files loads the necessary commands to get USB support going so UBUNTU will recognize the external USB drive. But we still need to recompile (or recreate) the initrd.img that UBUNTU uses at startup ... so that these edits actually work.

( STEP 10 ) Recompile (recreate) the initrd.img file to include USB support from these edited files ...

mkinitramfs -o /boot/initrd.img-2.6.12-9-386 /lib/modules/2.6.12-9-386

INSTALL NOTE: When you are done with the previous step #9 (and before performing this step), you can run the "ls /lib/modules" command (without the quote marks) from a terminal window to see what kernel version number you should use for this install step. They may be using a kernel version higher than 2.6.12-9-386 that was available when I originally created this help guide. (Kudos to "Saxegaard" for bringing this to my attention in post #235.)

Note: this entire step #11 might not be necessary if you disconnected the other hard drives from the computer. You may need to edit the "device.map" file though and make sure it contains the following line:
(hd0) dev/sda
If you DO end up having to edit menu.lst, make sure you type
"grub-install /dev/sda" (without the quotes of course) to update grub.

( STEP 11 ) Edit the GRUB bootloader menu file to correct a small error that looks at the wrong drive to boot from ...

vim /boot/grub/menu.lst

Navigate down this file until you get to a section where there is a menu list (not commented out ... no #s) that has Ubuntu mentioned three times (and possibly an area mentioning Windows XP down below it, if you have XP installed on an internal drive of yours).

There is a line in these three Ubuntu menu choices that has root listed on it and probably has (hd1,0) to the right of it. We need to change this to (hd0,0) on all three of these menu choices. Why? Because according to GRUB, the external USB drive will be our first drive (hd0,0) and not our second drive (hd1,0) because we loaded GRUB on it's bootsector.

INSTALL NOTE: You will also want to change a "# groot" line in a section of your menu.lst file that may look something like this ...
## default grub root device
## e.g. groot=(hd0,0)
# groot=(hd0,3) <<< groot="(hd0,0)"> until the screen actually says to press enter). Hold down Ctrl-Alt-F1 to get back to the RESCUE MODE terminal window and type exit to reboot the system.

BE 100% SURE TO GET THE CD OUT OF THE DRIVE BEFORE UBUNTU RESTARTS

( STEP 13 ) After rebooting, UBUNTU continues to run it's install process and comes to the desktop. Use the username and password you setup earlier in the install process to get into UBUNTU

That's the process I've successfully used to install UBUNTU v5.10 on an external USB drive. I hope this helps anyone whose been wrestling with this process. Let me know if it works out for you.

DAVE

Reconnect all of your hard drives and boot up the computer. (make sure you tell your bios to boot from the USB drive)
If GRUB never loads and instead gives you an error message, then you may need to install GRUB to your main hard drive. Before doing this, try plugging in your USB drive into a different computer and see if it boots up. If you need to install GRUB to your IDE hard drive, then follow the steps in this article to trick the Ubuntu installer into skipping all steps and installing only the GRUB bootloader:

http://ubuntuforums.org/showthread.php?t=76652

You can then copy the new kernel and initrd images you created in previous steps to the IDE hard drive and add them to the boot loader. To figure this out, I simply looked at other entries listed in the bootloader and applied this format to the new kernel and initrd images.

OTHER PROBLEMS AND POTENTIAL FIXES ...

* Booting Ubuntu from Windows XP without loading GRUB to the MBR ...
http://www.ubuntuforums.org/showthread.php?t=56723
(Kudos to "celticmonkey" for documenting this.)

* Problems booting XP from an internal drive ...
( message #2 ) + ( message #203 )

* Problems starting UBUNTU after a kernel update using Ubuntu's automatic update app ...
( message #112 ) + ( message #161 )

* Using Windows XP to format drives larger than 32GB AS FAT32 ...
( message #188 )

* Re-installing GRUB in a relatively painless manner ...
( message #205 ) (Kudos to "RyanGT" for hunting this down.)

* Creating a bootable GRUB CD (sketchy details but seems to make sense) ...
( message #207 ) + http://www.gnu.org/software/grub/man...D_002dROM.html

REFERENCE MATERIALS ...

* GRUB Manual (section 14.3 shows what error codes mean)
http://www.gnu.org/software/grub/manual/grub.html
__________________
( If you don't care who gets the credit ... great things can be accomplished. )
Setup Guide: "Installing Ubuntu on an external USB Drive"
Good Starter Book: "Linux Desktop Pocket Guide"

Soda Can Detection So Far

2006-03-23T12:30:00.000-08:00

Here is an example of my second scale strong classifier running on some video footage.
It seems to do well towards the middle where there is little background clutter and the
soda cans are about the size of the detection rectangle. I will have to do something
about those false positives though.

NOTE: this video requires a divx decoder even though it is in WMV format.

http://members.cox.net/fbirchmore/testcan-2_1.5.wmv

Final Powerpoint Presentation

2006-03-22T19:30:00.000-08:00

Here is a link to a PDF of my final powerpoint presentation:

http://members.cox.net/fbirchmore/cse%20190a/soda_cans.pdf

Detecting Soda Cans Using C++

2006-03-11T18:02:00.000-08:00

I have successfully migrated from my Matlab implementation to Louka's C++ code for recognizing license plates. I have been having all sorts of problems getting his code to compile on my laptop. Fortunately, the code is running on Greg's computer so I finally resorted to using ssh to log into his computer in order to run Louka's code.

The C++ implementation works by reading in a text file where each line of the text file specifies the name of an image followed by the bounding box coordinates of a license plate in the image. Here is an example:

images/image1.jpg 20,20,80,80

This means that in the image "image1.jpg" in the directory "images", there is a license plate in that image with the upper-left corner at (20,20) and the lower-right corner at (80,80). Interestingly, negative training examples (ie: "not a license plate") are acquired by randomly selecting areas that do not overlap with the bounding boxes of the license plates. Unlike my method for training my Matlab implementation, background images can be obtained from areas immediately surrounding license plate locations. In my Matlab implementation for detecting soda cans, cropped images with soda cans removed have to be fed in to extract negative training examples --meaning that the areas immediately surrounding soda cans may yield a large number of false positive classifications (ie: detecting background clutter as a soda can). Another very important advantage to Louka's implementation is that areas containing soda cans can be of ALL DIFFERENT SIZES since a large detection window can overlap with the background. In addition, the quality of a detection can be determined by offsetting the bounding boxes for some of the training images and adding these offset images as additional positive training examples--causing "good" detections to be recognized by the presence of many different rectangles in the areas surrounding a detected object (as Louka did in his thesis).

Modifying Louka's C++ Code to Accomodate Soda Cans

Instead of rebuilding my entire training set of soda cans using a different method, I decided to start by modifying Louka's code to accomodate my training set. After modification, his code treated each positive input image as containing only a single object to be detected. Negative examples were extracted by randomly extracting portions of a seprately specified set of images containing only background (as in my Matlab implementation). I created each list of image filenames and bounding boxes by creating a Matlab script that reads all of the images in a directory, obtains their dimensions, and outputs them to a text file.

Training the Soda Can Detector using Louka's C++ code

I initially provided 127 images of soda cans and randomly extracted 1000 background images. For this test, all soda cans and background images had the same pixel dimensions of 22 x 36. I trained a strong classifier using about 2000 filters generated by Louka for 14 rounds. I was astonished to see that it finished training in only a few minutes!! I tried increasing the number of negative examples to 1000, 10000, and 17000. Even with these large numbers of negative examples, training finished in less than an hour for each! With my Matlab implementation, training on 127 soda can images and 1000 background images took 24 hours. I believe the enormous difference in training time can be partially attributed to the fact that Matlab runs through Java and that "for" loops are slow. Matlab is designed to perform matrix calculations efficiently but I mostly used "for" loops and cell arrays. In my Matlab implementation, I also used around 7000 filters while Louka used about 2000 filters. Louka also uses a Cascade of classifiers to optimize both his training and his detection time --though only one layer was used to train the aforementioned set of images in less than an hour.

Cascade of Classifiers

The Cascade optimization technique invloves training N layers where each layer represents a group of weak classifiers selected by Adaboost. The first layer is trained using all positive and negative training examples (ie: soda cans and non-soda cans). Each layer thereafter is trained with only the false positives (ie: background images classified as soda cans) used as the negative examples. This allows the early stages of the classifier to reject a majority of the windows immediately with only images of soda cans going through all layers of the cascade. In a typical input image, only a very small portion of the window will contain soda cans. Although an image classified as a soda can must travel through all stages of the cascade (which takes longer than a single strong classifier for this particular image) to be classified correctly, most areas of the image will be rejected immediately. If, for example, an input image contained a single soda can against a very sparse background, then the area containing the soda can would be run through all layers of the Cascade while the other areas of the image might only be run through a single layer. This results in a much faster training time and detection time at the expense of a higher number of false positives at later stages of the Cascade. To reduce the number of false positives, the later stages of the Cascade require a larger number of weak classifiers. Here is a diagram from Louka's paper illustrating how a Cascade works:

Detection Results

After I trained a single-stage classifier using 127 positive and about 17000 negative images with a single scale of soda can (22x36 pixels), the resulting strong classifier classified all 10 test images correctly. However, when I ran the same classifier on the same input image shown in my previous blog post, my detection rate was much lower. I decided that if I want to train a multiple layer cascade and use multiple scales, I would have to revert to Louka's original C++ implementation and rebuild my training set to fit this structure.

Rebuilding the Training Set

I decided that this time I would not try to build an image annotation toolset from scratch as I did with the letter 'e'. I explored the internet and found that the popular online annotation tool "LabelMe" has a Matlab toolbox:

http://people.csail.mit.edu/torralba/LabelMeToolbox/

I also discovered that they have an object recognition algorithm written in Matlab using "GentleBoost" which is a variation of Adaboost.

ttp://people.csail.mit.edu/torralba/iccv2005/boosting/boosting.html

However, I decided not to use this toolbox because their output format is XML when I wanted a normal textfile (I found out later that this would have been alright). I searched the internet some more and came across an annotation toolset:

http://www.pascal-network.org/challenges/VOC/voc2005/

This toolset will cycle through all images in a directory and let the user draw bounding boxes around the objects in the scene. The objects are then manually assigned a label. I created four different labels for four different types of pepsi cans that can be described as follows:

1 = upright, unoccluded pepsi can
2 = upright, partially occluded pepsi can
3 = horizontal, unoccluded pepsi can
4 = horizontal, partially occluded pepsi can

I then went through all of the same images from which I extracted my original training set and drew a bounding box around each pepsi can in the image. Next, I wrote a Matlab script to read in all of these extracted objects and export their names and bounding boxes to a single text file in a format recognizable by Louka's C++ code. I then converted the bounding box portions into a CSV file and loaded it into OpenOffice Calc so I could organize these coordinates into different scales etc. I decided to find the largest bounding box and use this size as the window for my strong classifier. Unlike my first training set which took about 7 hours to create, this training set took only a few hours to create and is much more useful. I labeled a total of 698 soda cans at a variety of different scales. Here is an example of one of the annotated images:

Notice that another of SPAWAR's robots, called the ATRV, has been used to clutter the background of the soda cans. This robot has a variety of detail with which to test the performance of my soda can detection algorithm as well as enforce Robart III's ability to avoid targeting a fellow robot.

I then trained a strong classifier using all 698 soda can images and about 6000 negative examples.
I will find out if this classifier was trained or not on monday.

The Next Step

The next step will be to once again organize the soda can images into different scales only this time the images will not be cropped and only the detection window will be scaled (the bounding boxes will not be modified). I will also add additional layers to the cascade of each image size. Although Louka mentioned in his thesis that license plate detection rates were not as high when input images were scaled in a pyramid, I may try this approach for soda cans just to see what happens.

Testing Soda Can Recognition

2006-03-06T10:42:00.000-08:00

This week, I modified the letter 'e' algorithm so that soda cans can be detected. I updated my Adaboost implementation by storing filters and detecting images using HSV channels (hue, saturation, value) instead of simply grayscale or RGB. With HSV, the value channel can be filtered to achieve the same results as filtering a grayscale image. If I choose to later add color filtering, I can simply take the other two channels into account. Here is a link to the Wikipedia entry for HSV:
http://en.wikipedia.org/wiki/HSV_color_space

New Filter Types

In addition to adding HSV filtering capabilities, I also added additional types of filters. These filter types are the same that were used by Dlagnekov in his LPR/MMR paper. The new types of filters use Haar-like features as before only they operate on derivatives and variance instead of simply pixel values. The types of filters added include: X-derivative, Y-derivative, X-derivative-variance, Y-derivative-variance. Each different filter type is assigned a number which is stored with each filter (ie: X-derivative = 3). If a filter is known to operate on the X-derivative of an image and this particular filter consists of "1"s in the top-half and "0"s in the bottom half, then the sum of the pixels in the bottom-half of the X-derivative of an image will effectively be subtracted from the top-half. To get the X derivative of an image, a Sobel kernel is convolved with the input image at each pixel location. This is accomplished by calling the matlab function 'conv2' with the input image and a Sobel kernel created with 'fspecial' as arguments. Here are the links to the Matlab help pages for these two functions:
http://www.mathworks.com/access/helpdesk/help/toolbox/images/fspecial.html
http://www.mathworks.com/access/helpdesk/help/techdoc/ref/conv2.html

Filters Generated

For my inital soda can classifier, I chose to use only one scale of training images --size category 2 (22 x 36 pixels). To detect soda cans at this scale, I generated a new set of filters that I thought would work well for detecting features of these soda cans. I generated three different categories of filters of sizes 22 x 36, 22 x 8, and 16 x 16. My intent was that the 22 x 8 filter would extract features from the top and sides of the soda can and the 16 x 16 filter would extract features from the corners and the logo. For each of these filter sizes, I generated 7 different types of feature computation that add/subtract values from the pixel intensities of the following:

1. original image
2. x-derivative image
3. x-derivative-squared image
4. x-derivative variance
5. y-derivative image
6. y-derivative-squared image
7. y-derivative variance

The squared derivatives were computed so that the variance could easily be obtained
but I decided to generate separate groups of filters for them just to see what happens.

Test Results

Using the aforementioned configuration, I trained Adaboost for 20 rounds using 127 images of 22 x 36 soda cans (positive training images) and 480 images of 22 x 36 random background (negative training images). Training took an unreasonably long time of about 8 hours. This is probably due to the fact that I am calculating the derivative of each image as I obtain their filter responses. This can be fixed by precomputing the derivatives of the training images which will probably result in a dramatic increase in both training and testing efficiency.

After Adaboost finished training, the detection rates for the test images looked pretty good with classification rates of 95.2224% with the remaining 4.777% due to false negatives (ie: it is a soda can but was not classified as one). These results were obtained from testing 10 positive and 20 negative training images that were removed from the training set prior to running Adaboost. However, after running the strong classifier on a full image that was scaled to 70% (so soda cans are around 22 x 36 pixels) the results were not nearly as good:

I think that the false positives appearing around the tires may be attributed to not using enough negative training examples (background examples). I tried re-training the same strong classifier using 1000 negative examples. This took about 24 hours and yielded results that were not much better. Dlagnekov reports that he used about 10,000 negative training examples. Since I am not willing to wait a week to train a new strong classifier, the next step will be to implement the Integral Images acceleration structure (as Dlagnekov did) and precompute the image derivatives.

Integral Images (in progress)

I have currently finished implementing Integral Images and have precomputed the image derivatives. Once my implementation is working, I will re-train my classifier and hopefully achieve better and faster results.

Creating Soda Can Training Set

2006-02-26T23:31:00.000-08:00

Here is the link:

http://members.cox.net/fbirchmore/cse%20190a/blog%202-26-06/2-26-06.html

Detecting the Letter 'e': success!

2006-02-20T22:37:00.000-08:00

To see my progress, click the provided link. I've grown tired of trying to fit everything into these little narrow blog spaces. After fumbling around with images, I activated my ISP web space.

http://members.cox.net/fbirchmore/cse%20190a/blog%202-20-06/2-20-06a.html