Detecting Soda Cans Against a Cluttered Background: May 2006

Saturday, May 27, 2006

Learning Perl

It took me quite a while to extract the number of keypoints detected and insert them into a spreadsheet. To make future tasks easier, I decided to learn Perl scripting. I read over some tutorials and books. From what I've learned about Perl, I created a script to remove keypoints and their corresponding images from the database it the number of keypoints detected in them is less than 3. This is accomplished by parsing the output text file from detecting keypoints in the database directory. Now that I can make Perl scripts, I should be able to accomplish many tasks faster by automating the more tedious work.

Here is a sample from the output file from creating database keypoints:

extracting keypoints: ./database/s06.pgm -> ./database/s06.key
Finding keypoints...
22 keypoints found.
extracting keypoints: ./database/ss01.pgm -> ./database/ss01.key
Finding keypoints...
2 keypoints found.
extracting keypoints: ./database/ss02.pgm -> ./database/ss02.key
Finding keypoints...
3 keypoints found.
extracting keypoints: ./database/ss03.pgm -> ./database/ss03.key
Finding keypoints...
0 keypoints found.

Here is the Perl script I wrote. It is called by passing in the name of the output text file ( a sample of which is above). The # signs represent comments:
#!usr/bin/perl
my $rmkp = ">./rmkp.txt";
my $line;
my $prevLine;
#@ARGV[0]: output text filename from keypoint detection.
open ( FILE, "@ARGV[0]") or
die "Cannot open input file: @ARGV[0] : $!\n";
while( $line = ) {
# i means case insensitive
# ([^>]*) match zero or more characters but not '>'
# if line matches regexp
if( $line =~ m/extracting keypoints:([^>]*)>/ ){
#removing words
$line =~ s/extracting keypoints:([^>]*)>//g;
#remove file extension
$line =~ s/.key//g;
#remove whitespace
$line =~ s/\s//g;
#store current line
$prevLine = $line;
}
#if number of keypoints found, store number
if($line =~ m/keypoints found./){
$line =~ s/keypoints found.//g;
$line =~ s/\n//g;
#if less than 3 keypoints
if($line <>

Here is the log file, rmkp.txt, generated by the Perl script. This shows the number of keypoints for the associated keypoint and image files that were deleted:
Files Deleted From Database:

2 ./database/ss01.key ./database/ss01.pgm
0 ./database/ss03.key ./database/ss03.pgm

Wednesday, May 24, 2006

Analysis of Keypoint Matching

Image 1: table of keypoint matches (rows on the left matched with columns on the top)
Image 2: table of keypoints where entries with <2 or entries where an image was matched to itself were set to zero.
Image 3: a graph of the table in Image 2.

Today, I graphed the matching of keypoints. The x-axis of the graph above represents the keypoints from the input images. The y-axis represents the keypoints from the database. The colors indicate how many successful keypoint matches occurred. I removed all results that had fewer than 2 matching keypoints or results where the exact same keypoints were matched between identical images. One strange result is that certain images match very well when, for instance, "match image1 image2" is run and the same images yield no matches at all when "match image2 image1" is run. For instance, the image s05.pgm has 64 matches when matched with 04.pgm while no matches occur when 04.pgm is matched with s05.pgm. Because of s05.pgm's angle and small resolution of around 120x120 pixels, very few stable keypoints are found. In contrast, 04.pgm has a high resolution with around 400 keypoints found. It could be that s05.pgm's keypoints are very similar to each other because of the low pixel resolution of the image. When matched with the keypoints from 04.pgm, perhaps there are enough keypoints in 04.pgm that are close to the keypoints in s05.pgm that enough keypoints match to indicate a successful classification.

Though this table does not provide a complete set of test cases, it can be inferred that the higher resolution images match better with other high-resolution images than low to high, low to low, or high to low. I believe that the high number of matches between the high resolution images to s05.pgm is an anomoly and does not indicate that this particular image can reliably be compared to others.

If too many keypoints are stored in the database images, the computation time required to match new input images to these might be too great for real-time performance. Since Lowe was able to achieve good results using 600x480 pixel input images with smaller resolution training images, I will try doing the same. Now that I am more familiar with SIFT, I will re-read Lowe's SIFT paper. I will try to jump the gun and use the methods provided in Lowe's code to detect soda cans using footage from Robart III as the next step. This way, I will have a better idea about what problems arise. I will then look over Lowe's suggestions for narrowing down the range of useful keypoint matches using a Hough Transform etc.

Monday, May 22, 2006

Matching Keypoints

Today, I matched soda-can keypoints to each other. Conveniently, C-code for doing this is provided with David Lowe's Keypoint generation code. Using the above set of images, I found keypoints for each image and matched them with the keypoints from all of the other images. The images with a prefix of "s" are the images without an "s" that have been scaled down to 3% of their original size and then scaled up by 600%. The resulting images with prefix "s" are of a size of around 150x240 pixels. The other images have a size of about 1100x2100 pixels. This was done as a test to see how much of an effect the resolution of the images has on extracting keypoints, which angles have the most matchable keypoints, and if similar resolution soda-can images match better than dissimilar resolution ones. I will post the results from this small experiment along with a table and a graph on my next blog entry.

This is how keypoint matching works between two images, referred to as A and B:

1. keypoints are found in A.
2. keypoints are found in B.
3. The closest matching keypoints are found between the images. Each keypoint is a 128 dimension vector. To find the distance between two keypoints, the Euclidean distance is found between the feature vector s belonging to the keypoints. Each keypoint in A is compared with each keypoint in B by Euclidean distance to find the closest matching keypoint. A match is determined as follows:

Let a' be a point in A. Let b' and b'' be the first and second closest matching points in B to the point a'. Let D(x,y) represent the Euclidean distance between its arguments x and y.

If D(a', b')< D(a', b'')*0.6 then the closest point is chosen.
Else, no points are chosen (no matching keypoint).

According to Lowe, finding merely the closest matching keypoint does not yield a reliable match. Lowe claims that a more reliable match is determined by comparing the second-closest matching keypoint. He does this by using the second-closest matching keypoint as a threshold.

Wednesday, May 10, 2006

SIFT Keypoints: preliminary results

One question has arised with regards to keypoints: will soda cans in low-resolution video footage yield enough keypoints to perform SIFT effectively?

Running the Keypoint Detector

I downloaded the keypoint detection code from David Lowe's website:
http://www.cs.ubc.ca/~lowe/keypoints/

David Lowe is the creator of SIFT. The code is in binary format and was extremely easy to run in Linux. There are apparently no required libraries or dependencies to run this code. I took some photos with a digital camera and cropped out the regions containing soda cans from the background. From what I've seen so far, it appears that SIFT detects keypoints that represent characteristics within an object. I believe this means that the outline of a soda can cannot be taken into account for object detection. This may prove to be a problem if objects to be detected do not have many detailed features. To test the validity of this theory, I would like to try spray-painting a soda can a solid color and see if any SIFT keypoints are extracted. Here are some images of detected keypoints:

SCALED TO DOWN TO 1.5%:

SCALED TO DOWN TO 1.5% THEN SCALED UP TO 600%:

SCALED TO DOWN TO 3.0%:

SCALED TO DOWN TO 3.0% THEN SCALED UP TO 600%:

These cans all started out as high resolution images. These images were then scaled down to 1.5% or 3.0% of their resolution in an attempt to simulate soda cans at a typical scale encountered in live video footage. According to Lowe's README file that came with the keypoint detection code, more keypoints can be detected if an image is first scaled to a larger size. In fact, the number of detected keypoints typically doubled when a low resolution image was scaled up. The pepsi cans with a prominant logo yielded the most keypoints. The soda can in the top-most image yielded no keypoints at all. There are many cases where the specular highlights appeared to yield keypoints. These cases would be taken care of at a later stage where keypoints from different training images taken under different lighting conditions would be analyzed to determine which keypoints are invariant to scale and lighting. The more I think about it, the more appealing a hybrid solution to object detection seems --one that would incorporate data from multiple sources to arrive at a conclusion. For instance, perhaps some sort of texture recognition or color detection method could be used in conjunction with SIFT to detect objects with sparse and dense features. I will continue to investigate how SIFT can be applied to soda can detection and find out what image resolutions I have to work with. If SIFT proves to work well for objects up close, I might use some other methods to detect objects of interest and then proceed to zoom-in on those objects before applying SIFT.

The next step will be to test out keypoint matching between these detected keypoints.

New MileStones: Evaluating SIFT

I have a new set of Milestones to accomplish for this quarter. I'm very busy this quarter so I will try to accomplish as many of them as I can.

Understand SIFT thoroughly - meet by 5/10/06

The purpose of this milestone is to understand SIFT well enough to be able to set up an appropriate training set and to plan out speciﬁc implementation details. This will include locating existing SIFT-related source code and determining what software should be used (i.e. shell-scripting, ImageMagick, Matlab, OpenCV, C++ etc.). Last quarter, I implemented everything in Matlab first before attempting to implement it in C++. This quarter, I will be giving C/C++ preference and use Matlab to aid in my understanding of SIFT or to generate charts, graphs etc. I've found shell scripting to be incredibly helpful in all aspects of programming --not just for this research.

Create the Training Set - meet by 5/17/06

This will most likely involve modifying my existing training set. This training set consists of extracted images of soda cans acquired at a single scale, against a static background, under constant lightining conditions, and two separate orientations. Although the use of this training set will most likely not result in scale-invariant keypoints, it will be useful in testing other aspects of this SIFT algorithm directly against my Adaboost implementation. Invariance to background clutter, lighting, performance while in motion, and efﬁciency can be tested using this training set. After a working SIFT implementation is created, a training set that includes soda cans extracted under a wider variety of conditions can later be swapped with this training set. I figure that I will go through all of the steps required to apply SIFT using only a few images. Once this is done, I will have a better idea about how to create a good training set for SIFT. Last quarter, I had to redo my training set several times so I want to avoid having to do that this quarter.

Detect Keypoints- meet by 5/24/06

This will involve using a difference-of-gaussian function to search the scale-space of the soda can training images to identify scale-invariant interest points. The most stable keypoints will then be selected and assigned orientations based on their local gradient directions. These gradients will then be transformed into a representation that is resistant to shape distortion and illumination differences. This will be accomplished using an existing binary implementation, such as the version provided by David Lowe on his website. I will probably make a primitive algorithm for matching keypoints initially and then improve upon it.

Match Keypoints of New Input Images (no background clutter) - meet by 5/31/06

A database of keypoints from the training images will ﬁrst be set up. A new input image will be matched with this database by ﬁnding the Euclidean distance between the feature vector of the input image and the feature vectors in the database. The new input images will include soda can images with backgrounds that have been manually removed so that no keypoints will be detected outside of the soda can regions. If background clutter is left
in, then keypoints extracted from these regions may yield so many negative classiﬁcations that false negative classiﬁcations will result even when a soda can is present in a particular region. Without background clutter, I can test keypoint matching without having to cluster the keypoints or apply any additional transformations to them.

Identify Keypoint Clusters, Apply Further Filtering of Keypoint Matches (background clutter added) - meet by 6/7/06

This will ﬁrst ﬁlter correct keypoint matches from the false matches by identifying clusters of three or more keypoints that agree on the orientation, scale, and location of the detected object. This is described in Lowe’s SIFT paper with a Hough Transform offered as a solution to identifying these clusters. Next, a least-squares estimate will be applied for an afﬁne approximation to the object pose. Further analysis can be applied, as described by Lowe, to determine the probability that a particular set of features is the object being searched for.

Run Detection Method Live - meet by 6/14/06

Once a SIFT implementation is successfully implemented, it will be tested using live video footage. Initially, I might try using the same setup that I used before with the same static background. The results from this test will be compared with my previous results based on the efﬁciency and accuracy of detection. Various conditions will then be changed, such as the ambient lighting. Video footage will then be tested while the camera is moving horizontally with respect to the soda cans. The complexity of the tests will be gradually increased and the results analyzed unless/until the SIFT implementation begins to perform poorly. If time permits, a new set of training images will be acquired using soda cans positioned in a variety of locations and orientations and this SIFT implementation will be retested. Depending on the availability of SIFT source code online, some of these milestones can be combined and their theoretical completion dates moved back or forward.