Trossen XBee  Top Banner

Tutorial: Where Am I? Place Recognition Using Omni-directional Images and Color Histograms

  1. Pi Robot's Avatar
    Pi Robot Pi Robot is offline Positronic Brain
    Category
    Introduction
    Views
    18,206
    Replies
    10
     

    Where Am I? Place Recognition Using Omni-directional Images and Color Histograms


    Photo 1: Our omni-directional vision test setup. The black object pointing up at the spherical mirror is a wireless webcam which beams back an omnidirectional image to a desktop PC for processing. The mirror itself is a Christmas ornament. The three images on the right represent from top to bottom: a raw omnidirectional image of the hallway; the unwrapped version of the image using RoboRealm; the color histogram of the image using EmguCV.

    Have you ever wondered how you know where you are? For example, I am sitting at my computer in our dining room inside our apartment which is located in Palo Alto, California, USA, Earth, Milky Way, The Universe. But how do I know this?

    It is easy to see that location is a hierarchical concept beginning at a small scale and working up to larger sizes and distances. At the smaller end (e.g. dining room) we probably rely mostly on visual cues and short term memory, while at the bigger end (e.g. Earth), we depend on conceptual or semantic knowledge. The small end is actually the hardest. Am I in the dining room or the living room? Is this my apartment or your house? How do I tell these locations apart? And how do I get from one to the other?


    What Would a Robot Do?

    In robotics, this general problem is called localization and navigation and it involves both knowing where you are and how to get somewhere else. Studies with people and animals have revealed at least two clues on how we visually distinguish one location from another. The first involves landmarks—distinct features of the surroundings that serve as sign posts. For example, I may have a picture on my living room wall that distinguishes it from the dining room. And if you do not have the same picture anywhere in your house, it can help tell my place apart from yours. Of course, we use landmarks all the time when giving someone directions: "Turn right at the fire station, then left at the bike shop, etc." However, unless you already have a fairly sophisticated object recognition system in place, localization and navigation by landmarks is probably not the best strategy to start with.

    The second approach involves image statistics. The most popular image statistic is the histogram. A histogram counts up how many times a given feature takes on different values. For example, the color histogram of an image counts the number of pixels in each color category: if we divide up the color space into different hues or colors, then each time a pixel in an image matches one of the colors, we increment that bin's counter by one. When we do this across the entire image, we get a kind of frequency chart representing the number of pixels in each of the color categories. This frequency chart is what we mean by a histogram and it is often characteristic of a given scene or room. For example, if your kitchen has lots of white in it while your living room has lots of brown and green, then the color histograms of pictures taken of these two locations will allow us to distinguish one from the other. As it turns out, understanding the nature of image statistics is also key to developing higher level object recognition algorithms, which in turn can help with landmark identification. So starting with image statistics rather than object recognition is a good idea for this reason too.

    Below is a simple color image and its corresponding histogram. The histogram's horizontal axis runs from red at the left to blue on the right. The vertical axis represents the number of pixels having that particular hue or color. Note the three peaks corresponding to the orange, green and blue patches. Note also that the orange peak is the highest because there are more orange pixels in the image than green or blue. (If you're wondering why I chose orange instead of red, it was to better show off the orange peak in the histogram which would otherwise would have been obscured by the axis on the left for a red peak...)





    Getting Around the House

    So let's start with a very simple task. How do we give a robot the ability to know which room it is in while roaming around the house? The easiest approach would be to create a visually unique landmark for each room. For example, we could place brightly colored pieces of paper at an appropriate height on the walls of each room so that all the robot would need to do is look around until he sees one of the landmarks, then read off the color to know what room he is in. Note that we wouldn't be able to use different shapes for this purpose. The reason is that a shape will look very different depending on the angle from which it is being viewed. So color is a better choice. The downside of this approach is that you have to mark up your house for it to work. And any time the robot wanted to confirm which room he is in, he'd have to look around to find the closest colored piece of paper. Furthermore, this technique wouldn't work very well outdoors, and someday the robot might want to get out and see the world for himself. So how could we accomplish the same thing without using artificial landmarks?

    For this we turn to image statistics. The simplest statistic is the color histogram as described above. We are going to take advantage of our robot's omni-directional vision system to capture the images. Such images work very well with histograms since any given image covers an entire panorama of the room. This means that it doesn't matter what orientation the robot might have when a picture is taken since the number of different colored pixels in the image is independent of the rotation of the image. Of course, the images and their histograms will vary depending on the position in the room from which they are taken. And this will mean that the histograms for a given room will not be exactly the same from one location to an other. However, we are hoping that these differences will be smaller than the differences between histograms computed in different rooms.

    To test this approach, we take the following steps:

    • Take a number of omni-directional pictures of each room, say 5-10 images per room. We call this the training run or learning phase.
    • Compute the color histograms of each picture and store them in a database.
    • Compute or train a classifier that can assign histograms to room names.
    • Take the robot back to the various rooms, take some new pictures, and ask the robot to tell us what room he thinks he is in based on the current picture. We call this the testing phase.

    Let's now take each of these steps in turn.

    Building the Image Database

    Below are a few omni-directional images taken from six different rooms. Alongside each image is one of its corresponding color histograms. I say "one" of its color histograms because a given image can be analyzed along a number of different color channels, with one histogram per channel. In the images below, the histogram corresponding to hue is shown. Hue is roughly associated with what we think of an object's basic color, with red colors toward the left in the histogram and blue colors toward the right. As you can see in the first histogram below, the balcony image contains a preponderance of red pixels. The images were produced by aiming a webcam into a polished silver Christmas ornament. This results in a circular panoramic image as shown in the photo above. The image is then "unwrapped" using the Polar module in the RoboRealm vision software package to produce the rectangular panoramic images shown below. In these images, the left edge and the right edge correspond to the same point in the world, directly behind the robot.

    Balcony



    Dining Room


    Living Room



    Kitchen



    Hallway



    Foyer


    As you can see, the histograms for different rooms have different shapes reflecting the different distributions of color in each room. However, we can also see that some rooms will be hard to tell apart using this method alone. For example, the histograms for the dining room and living room appear very similar. Fortunately, the histograms computed on some of the other channels are better at telling these two rooms apart. For example, here are the two saturation histograms for the same images above for the dining room and living room.



    The saturation of a color is a measure of how dark or light the color appears in the image. As we can see from the two saturation histograms above, the dining room on the left has a noticeably different saturation profile than the living room on the right. So while the robot might confuse these two room using hue histograms alone, it can make a better distinction if it also uses the saturation histograms. This result highlights a general principle when working with both robots and brains: never put all your eggs in one basket. The more kinds of information you can use to perceptually distinguish one object or scene from another, the less likely you are to make mistakes.

    The histograms shown above were calculated from the panoramic images using the EmguCV software package. EmguCV is a .Net wrapper around the OpenCV vision package (written in C++).

    Putting It All Together

    As the robot moves from one room to another during the learning phase, it stores six different histograms for each omni-directional image. The six histograms correspond to the color channels Red, Green, Blue, Hue, Saturation and Lightness. This might sound like a lot of data to have to store in memory for each image but in fact it is tiny compared to storing the raw images themselves. Each histogram can be represented by as few as 64 bins each with one number per bin (the count for that bin). So each image requires only 6x64 = 384 numbers per image. By comparison, to store all the pixel values for a 320x240 color image would require 320x240x3 = 153,600 numbers or 400 times what it takes to store all six histograms. A 640x480 image would require 921,600 numbers or 2,400 times more data than storing just the histograms.

    It is fun to speculate how this strategy for storing image statistics rather than the images themselves might relate to human and animal perception and memory. For example, imagine in your mind's eye what it's like to be in a forest. Do you see a photographically detailed image of each tree? Or is the image something more fuzzy like a certain mixture of green and brown, a pattern of vertical and horizontal shapes (trunks and branches), a patchwork of hard-to-describe texture (leaves, needles, bushes) and so on. We'll have more to say along these lines in a later article that includes shape and texture statistics in addition to color.

    Getting back to our color histograms and the robot's meanderings around the house. Let's have the robot take 5-10 pictures per room (depending on the size of the room) and store the histograms of each image in a database along with the name of the room in which the picture was taken. We'll start with six rooms: dining room, living room, kitchen, hallway, foyer and balcony. The balcony is not really a room but it allows us to see how the robot performs in an outdoor setting as well as indoors.

    After storing the histograms of all the reference images, we'll take the robot for a second tour of the various rooms (the testing phase), place him in random positions in those rooms, and ask him to tell us the name of the room he's in. How does he decide?

    Classified Information

    All creatures great and small must make continual discriminations between objects, scenes and situations. How else could we know that we are in the kitchen and not in the living room? Or that this slice of green bread might make me sick. In addition to receiving raw sensory data, the brain must decide which patterns of data belong in the same category. In the field of robotics, this process is called pattern classification. Furthermore, sensory information is rarely classified in its raw form: instead, the original data is preprocessed to extract certain features that characterize the information more succinctly. Good features also tend to be more stable than the raw data under a variety of conditions, such as changes in lighting or viewing angle. Typical features used in visual image processing include edges, line segments, colored blobs, basic shapes, texture patches and so on.

    Color histograms can also be thought of as a kind of feature of a visual image. (Or, at a finer level, the bin counts in each histogram can be considered to be the features.) Not only do the histograms condense the information so that it is easier to store in memory, but histograms vary less dramatically under different conditions, such as different viewing positions, than do the raw pixels values. For example, if we rotate an omni-directional image, the raw pixels move in the opposite direction, but the count of those pixels remains the same, so the histogram is left unchanged. In other words, color histograms are insensitive to the spatial arrangement of the image pixels. The downside of this invariance is that many different images can result in the same or similar histogram. We saw this earlier where the hue histograms of images of the dining room and living room were very similar. The hue histograms of these two rooms are similar because the histograms do not encode the spatial distribution of the colors, just their overall count. But this also matches our perceptual experience that the two rooms really do look similar in terms of overall color.

    Once we have a set of features that can summarize a given image, we need to build a classifier that can categorize these features into different groups or classes. In the case of our color histograms, a classifier will take a given picture, compute the six histograms we have been using, then use those histograms to tell us what room the picture was taken in. You can imagine doing this yourself, though it might not be easy. First you'd have to look at the six different histograms from a number of pictures taken in each of the rooms. Then you'd have to find similarities between the histograms from the same room as well as key differences between histograms computed for different rooms. With enough practice, you might be able to tell which room you were considering just by looking at six of its histograms.

    So how do we build a classifier that can match histograms with room names? Fortunately, many people have worked decades on this problem and there are many solutions depending on the type of data you are dealing with and your goals. The general problem can be framed as mapping a set of input values into a set of output values. In our case, the set of input values are the bin counts in our six histograms, while the desired output values can be thought of as 1's and 0's where we assign a 1 to the correct room and a 0 to all the incorrect rooms. Mathematically, these input values and output values can be represented as vectors and the mappings between them can be represented as matrices or other operators. In our case, the input vectors have 64 elements per histogram (histogram bin counts) while the output vectors have 6 elements, one for each room. The challenge is to find a mapping between histogram values as inputs into the correct room values as outputs.

    In the next few sections, we will illustrate three different types of classifiers based on three different mapping strategies from inputs to outputs: prototypes, nearest neighbors and artificial neural networks.

    Classification by Prototypes

    Perhaps the easiest classifier to build and understand is the prototype classifier. The idea is quite simple: during the training phase, take all the histograms for a given room and average them. In other words, if we have 10 pictures of the living room, take the 10 different hue histograms and average them together to form one representative hue histogram for the whole collection. Do the same thing for the other 5 histogram channels (saturation, lightness, red, green blue) resulting in six average histograms. Averaging input vectors is often referred to as forming prototypes so we will call this method the prototype classifier.

    The idea behind averaging is that the common features across the histograms for the same room will be enhanced while the parts that do not overlap will be diluted. In theory, this should result in a histogram that better matches other images taken from the same room. So how well does it work?
    Before describing the results, we'll give a brief description of the methodology. The steps in the experiment go like this:

    • Take an initial set of 5-10 pictures in each of the six rooms. (In the results below, there were 38 pictures taken during this phase.)
    • Compute the six different histograms for each of these pictures..
    • Compute the average or prototype histograms for each room, one for each color channel.
    • Take 3-5 new test pictures in each of the rooms. (In the results below, there were 22 test pictures.)
    • Compute the six different histograms for each of these new pictures.
    • Compare the histograms to the stored prototype histograms for each room. Each histogram channel (hue, saturation, etc.) gets a vote as to which prototype (and thus which room) best matches the test histogram. Whichever prototype gets the most votes gets to label the image by its room name. A confidence level is also computed and is defined as the number of votes for the winning room divided by the total number of votes possible (six in this case).

    For those of you wondering how we compare two histograms, the method used here is called the Jeffreys Divergence which essentially treats the two histograms as probability distributions. (See the Appendix for details.) One could also use Euclidean distance, but the results are not as good. The reason is that adjacent bin values in a color histogram are not independent as the Euclidean distance metric assumes. For example, we'd expect any real-world green object to always have more than one shade of green in its histogram, and these different shades of green lie adjacent to one another in the vector representing this histogram. Consequently, we'd expect these adjacent feature values to be correlated which is something the Euclidean distance does not capture.

    So how well does the prototype classifier do? The table below shows the classification results for the 22 test pictures. The first room name in each row is the correct classification of the test picture while the second room name after the arrow is what our classifier thinks it is. The number at the end of each row is the confidence of the decision: a value of 1.0 would be 100% confident (6 out of 6 votes) while 0.5 would 50% confident (3 out of 6 votes), and so on.

    balcony ==> balcony: 0.97
    balcony ==> balcony: 0.98
    balcony ==> balcony: 0.59
    balcony ==> balcony: 0.64
    dining room ==> dining room: 0.8
    dining room ==> dining room: 0.97
    dining room ==> dining room: 0.95
    dining room ==> dining room: 0.32
    foyer ==> foyer: 0.97
    foyer ==> foyer: 0.8
    hallway ==> hallway: 0.63
    hallway ==> hallway: 0.96
    hallway ==> hallway: 0.64
    hallway ==> hallway: 0.65
    kitchen ==> kitchen: 0.79
    kitchen ==> kitchen: 0.97
    kitchen ==> kitchen: 0.8
    living room ==> living room: 0.94
    living room ==> living room: 0.97
    living room ==> living room: 0.49
    living room ==> living room: 0.81
    living room ==> dining room: 0.48

    TOTALS: 21 Correct; 1 Error = 95% correct

    As you can see, this method made only one error (the last row highlighted in red) where it confused the living room with the dining room for that picture. There were also two other questionable classifications (highlighted in orange) where the room name is correct but the confidence is low, meaning that other rooms also got a significant portion of the votes.

    Classification by Nearest Neighbor

    The next classifier uses a technique known as the nearest neighbor algorithm. This time, instead of averaging the histograms for each image, we store all the histograms individually in our database. When we are presented with a test image, we compute its six histograms and compare them to all the stored histograms to find out which earlier picture best matches the current one. In other words, rather than comparing the test histograms to prototype histograms, we compare histograms to normal histograms. This requires a much larger number of comparisons but it has the advantage that once we choose the best match, we also know which picture it best matches. For example, if we take 5 pictures in different locations in the dining room during the training phase, then take a test picture also in the dining room, the nearest neighbor algorithm can tell us which of the 5 stored pictures is the best match. In theory this means that might also know roughly where in the dining room we are. (We'll explore such a possibility in a later article.)

    Here now are the results of using the nearest neighbor classifier on our 22 test pictures.

    balcony ==> balcony: 0.98
    balcony ==> balcony: 0.99
    balcony ==> balcony: 0.64
    balcony ==> balcony: 0.98
    dining room ==> living room: 0.48
    dining room ==> dining room: 0.98
    dining room ==> dining room: 0.81
    dining room ==> dining room: 0.8
    foyer ==> living room: 0.64
    foyer ==> foyer: 0.63
    hallway ==> hallway: 0.96
    hallway ==> hallway: 0.81
    hallway ==> hallway: 0.64
    hallway ==> hallway: 0.49
    kitchen ==> kitchen: 0.64
    kitchen ==> kitchen: 0.65
    kitchen ==> kitchen: 0.64
    living room ==> living room: 0.64
    living room ==> living room: 0.97
    living room ==> living room: 0.65
    living room ==> living room: 0.96
    living room ==> living room: 0.97

    TOTALS: 20 Correct; 2 Errors = 91% correct

    The nearest neighbor classifier made one more error than the prototype classifier but still correctly classified 20 out of 22 pictures. There was also a third classification (highlighted in orange above) that was a little low in confidence.


    Classification using an Artificial Neural Network

    The third type of classifier to be tested is called an artificial neural network or ANN. Neurons in the cortex tend to be arranged in layers, with one layer connected to another layer by many thousands of synapses. Artificial neural networks have been developed into a powerful method for mapping a set of input values into a set of output values. The input values are likened to the activity levels of neurons in one layer and the output values are taken to be the activity levels of another layer. By modifying the connections between the two layers, we can build a network that takes a set of values over the input neurons and produces a desired set of values over the output neurons. Since most of the magic in such networks takes place in the connections between layers, this kind of classifier is also known as a connectionist network.

    The kind of artificial neural network we will work with here uses a process called supervised learning. The process works in a series of "training" sessions. First we randomly set the connections between the input neurons and the output neurons. Then, for each histogram corresponding to the set of training pictures, we set the activation levels of the input neurons to the bin values of the histogram. Passing these values through the network's connections results in a pattern of activity across the six output neurons (one for each room). Since we know what room the histogram corresponds to, we know which output neuron should have a value of 1 while the others should have a value of 0. Early on in the training, this won't be the case—the output neurons will have values in between 0 and 1. To teach the network to do better in the future, we alter the connections in such a way that the output activities move closer and closer to the patterns of 1's and 0's that we know are correct. The particular method used to tweak the connections is called the learning algorithm for the network and a number of such algorithms have been developed for different types of networks and problems to be solved.

    The simplest kind of connectionist network is called a perceptron. A perceptron has a single layer of connections mapping the input nodes to the output nodes. The learning algorithm most often used with perceptrons is called the delta rule, because it tweaks the connections during training in a way that is proportional to the difference between the desired output values and the actual output values. (See the Appendix for details.) Since these changes in the connections must be made a little bit at a time, the whole training set of input vectors must be run through the learning process many times—usually hundreds or even thousands of cycles before the output values adequately match the correct values. Nonetheless, with today's computers, even thousands of training cycles can be done in seconds if not less.

    The figure below illustrates the kind of perceptron used in the current setup.



    The perceptron used in these experiments is part of the Aforge.Net software package. Aforge.Net includes many image processing routines as well as a set of machine learning algorithms including artificial neural networks. EmguCV also has routines for machine learning but I found Aforge.Net to be a little easier to work with for this application.

    So now let's try our room recognition test using a perceptron classifier. First we train the network using the first 38 pictures. Then we test the network by feeding it each of the 22 test pictures and looking at its output neurons to determine the room name. The results are as follows:

    balcony ==> balcony: 0.83
    balcony ==> balcony: 1
    balcony ==> balcony: 0.5
    balcony ==> balcony: 0.83
    dining room ==> dining room: 0.5
    dining room ==> dining room: 0.5
    dining room ==> dining room: 0.5
    dining room ==> living room: 0.83
    foyer ==> foyer: 0.83
    foyer ==> foyer: 0.67
    hallway ==> hallway: 0.83
    hallway ==> hallway: 0.83
    hallway ==> hallway: 0.83
    hallway ==> hallway: 0.67
    kitchen ==> kitchen: 0.83
    kitchen ==> kitchen: 0.67
    kitchen ==> kitchen: 0.83
    living room ==> living room: 1
    living room ==> living room: 1
    living room ==> living room: 0.83
    living room ==> living room: 0.83
    living room ==> living room: 0.83

    TOTALS: 21 Correct; 1 Error = 95% correct

    The perceptron classifier performs as well as the prototype classifier making only one mistake. There are however, a few more cases where the confidence level is borderline (a number of them at 0.5). Let's see if we can do better by training the network for a little longer before testing. For the results above, the network was trained for 2500 cycles. Let's bump that up to 5000 cycles and try again. Here are the new results:

    balcony ==> balcony: 0.83
    balcony ==> balcony: 1
    balcony ==> balcony: 0.5
    balcony ==> balcony: 0.83
    dining room ==> dining room: 0.67
    dining room ==> dining room: 0.5
    dining room ==> dining room: 0.5
    dining room ==> dining room: 0.67
    foyer ==> foyer: 0.83
    foyer ==> foyer: 0.83
    hallway ==> hallway: 0.83
    hallway ==> hallway: 0.83
    hallway ==> hallway: 0.83
    hallway ==> hallway: 0.67
    kitchen ==> kitchen: 0.67
    kitchen ==> kitchen: 0.67
    kitchen ==> kitchen: 0.83
    living room ==> living room: 1
    living room ==> living room: 1
    living room ==> living room: 0.83
    living room ==> living room: 0.83
    living room ==> living room: 0.83

    TOTALS: 22 Correct; 0 Errors = 100% correct

    This time the classifier gets a perfect score whereas the confidence levels remain roughly the same.

    Summing Up

    In this article we have seen how a fairly simple robot equipped with a homemade omni-directional vision system can tell what room it is in. If we imagine how we might go about such a task ourselves, we might think in terms of high level object recognition such as, “Oh, there is the stove so I must be in the kitchen.” Since object recognition is actually one of the harder things to get right when developing computer vision systems, we adopted a simpler strategy. In this approach, we treated the image as a whole and extracted the color histograms of the image along a number of different color dimensions such as hue, saturation and lightness. Since the images are omni-directional, such histograms are fairly resilient to rotations and small translations of the robot and therefore make good candidates for characterizing a particular room. When trying to figure out what room it is in, our robot might say something like, “There is a lot of white in this image so I must be in the kitchen.” We then tested three different classification algorithms for learning and discriminating between histograms: prototypes, nearest neighbor comparisons, and an artificial neural network called a perceptron. Although the neural network classifier was able to score 100% on the room recognition test, the prototype classifier came close at 95% correct and is much simpler to implement. In a future article, we will take a look at additional feature histograms to further refine place recognition. For example, one can count up all the edges in the image that are oriented at a certain angle. As we have already seen, one can go a long way toward recognizing a place in the world without having to actually recognize or name particular objects.

    Next Steps

    While having your robot name the room it is in might entertain your guests at your next party, it is not terribly useful on its own. What we'd really like is to have the robot navigate from one room to another either on command or for its own reasons. For example, suppose we are in the living room and we want our robot to retrieve something from the bedroom. Then the robot must not only know that it is currently in the living room, but also how to get to the bedroom and back again. This is the "navigation" part of "localization and navigation". I have already made some significant progress along these lines (borrowing heavily from the work of others) and it will be the subject of a future article. But the main strategy is as follows.

    First we let the robot explore the entire house on its own, storing new histograms as it needs them. This will be a supervised process wherein the robot will move in a random exploratory direction until the current image cannot be confidently classified in terms of its already stored images. Then it will stop and ask us, "where am I?" and we will name the room as the robot stores the new histogram. At the same time, the algorithm will track the direction and distance traveled between new images by simply recording the latest wheel encoder differences. These numbers do not have to be terribly accurate, but today's encoders give us pretty good data anyway.

    The result of all this exploration is something called a graph where the nodes in the graph are stored images (and their histograms) and the links or edges between the nodes represent the encoder data to get from one node to another. Using standard graph theoretic algorithms, we can then map out a path to get from one node to another (e.g. some place in the living room to some place in the bedroom). Since we can store as many nodes as we like (storage requirements per node are not that great and disk space is cheap), there will always be a path to get us from A to B even in the presence of random obstacles. Stay tuned!


    References

    This experiment was mainly inspired by the following article:

    Ulrich, I., and Nourbakhsh, I., “Appearance-Based Place Recognition for Topological Localization”, IEEE International Conference on Robotics and Automation, San Francisco, CA, April 2000, pp. 1023-1029. Best Vision Paper Award.

    Software References

    Aforge.Net - Image Processing and Machine Learning

    EmguCV - A .NET version of OpenCV that also include algorithms for machine learning.

    RoboRealm - Vision for Machines

    Appendix

    • Delta Rule for artificial neural networks: See the good explanation at http://en.wikipedia.org/wiki/Delta_rule

    • Jeffreys Divergence for comparing histograms: More often than not, when we want to compute the distance between two n-dimensional vectors x and y, we use the Euclidean metric:



      In words this says that the distance is the square root of the sum of the squares of the pair-wise differences between the individual vector components. As mentioned in the main article, this distance measure is not necessarily the best when comparing two color histograms. The reason is that adjacent bin values in a histogram tend to be correlated since they represent similar shades of the same color. For this reason, a different kind of distance metric is used called the Jeffreys Divergence or JD.

      First we normalize our histograms to have unit length by dividing each bin count by the Euclidean norm of the whole vector. This allows us to treat the new bin values, which are now all between 0 and 1, as a kind of probability distribution for finding various colors in the image. As it turns out, a good way to measure the distance between two probability distributions p1 and p2 is to use the Jeffreys Divergence formula as follows:


      where ln is the natural logarithm. This is the formula used in the main article for computing the distance between two histograms. For a complete explanation, please see http://en.wikipedia.org/wiki/Kullbac...ler_divergence.





Replies to Tutorial: Where Am I? Place Recognition Using Omni-directional Images and Color Histograms
  1. Join Date
    Jul 2008
    Location
    Salt Lake City, Utah
    Posts
    32

    Re: Where Am I? Place Recognition Using Omni-directional Images and Color Histograms

    Awesome tutorial, well thought out and well researched.
        

  2. Join Date
    Apr 2009
    Location
    Stanford, CA USA
    Posts
    590

    Re: Where Am I? Place Recognition Using Omni-directional Images and Color Histograms

    Hey thanks! Nice to know someone is reading these things!!

    --patrick
        

  3. Join Date
    Sep 2006
    Location
    Carol Stream, Illinois
    Posts
    1,695

    Re: Where Am I? Place Recognition Using Omni-directional Images and Color Histograms

    excellent tutorial patrick and great job at breaking complicated topics down

    �In the long history of humankind (and animal kind, too) those who learned to collaborate and improvise most effectively have prevailed�
    - Charles Darwin
        

  4. Trossen Desktop RoboTurret Thread Banner
  5. Join Date
    Jul 2008
    Location
    Salt Lake City, Utah
    Posts
    32

    Re: Where Am I? Place Recognition Using Omni-directional Images and Color Histograms

    Patrick, I am very intrigued by your work in this tutorial. Have you done any tests with varying lighting levels?
        

  6. Join Date
    Apr 2009
    Location
    Stanford, CA USA
    Posts
    590

    Re: Where Am I? Place Recognition Using Omni-directional Images and Color Histograms

    Quote Originally Posted by droidcommander
    Patrick, I am very intrigued by your work in this tutorial. Have you done any tests with varying lighting levels?
    While I haven't tested different lighting levels systematically, I can say that anecdotally the algorithm seems quite robust in that regard. Three factors contribute to this: the webcam (Linksys) has its own auto exposure feature which adjusts quite well for different overall lighting; RoboRealm has a Color Balance module that I use that works quite well even if you turn off the webcam's auto exposure feature; and finally, I normalize the histograms before classifying them.

    One thing that can throw off the algorithm is light coming through large windows at different times of the day, but mostly when comparing day versus night. For example, the living room used in the tutorial has a large sliding glass door along its length. During the day there are lots of outside color pixels across that whole wall--e.g. greens, browns, blues, etc. But at night, the whole "wall" is black. So under these conditions, one might want to have two sets of reference histograms for that particular room--daytime and nighttime. And since the storage requirements to store extra histograms are rather small, this shouldn't be too much of a problem. Furthermore, one of the nice things about being a robot is you can tell what time of day it is without even looking at a watch.

    --patrick
        

  7. Join Date
    Sep 2008
    Location
    Southampton, UK
    Posts
    32

    Re: Where Am I? Place Recognition Using Omni-directional Images and Color Histograms

    That vision system is amazing, getting 360 degrees of data must give it so much more capability. Excellent tutorial also, very detailed and clear. I think I will try this myself at some point, it you seam to get some very good pictures, are you using any special camera at all? Also have you thought of or tried optical flow for obstacle avoidance with your system. I wonder this as you images point from horizontal all the way to the floor/top of your robot!!
        

  8. Join Date
    Apr 2009
    Location
    Stanford, CA USA
    Posts
    590

    Re: Where Am I? Place Recognition Using Omni-directional Images and Color Histograms

    Yeah, I was quite surprised at the quality of the images given that the reflecting ball is just a 50-cent Christmas ornament from Target, although you can see some distortions due to small warps in the ball. I have looked into getting a better mirror such as the ones they make for panoramic photography; for example, http://store.eyesee360.com/. However these things are typically more than $500 so I can break 1000 Christmas balls before I reach that cost. Also, I am taking the pictures at 320x240 resolution to keep video frame rates high, but you can get even better looking images at 640x480.

    As for the camera, I am using a Wireless G webcam from Linksys as seen here:

    http://www.amazon.com/Linksys-WVC54G.../dp/B0010OXEDU

    The camera is nothing special but I needed it to be able to mount horizontally in an easy way and this Linksys has a nice swivel mount and platform that does the job perfectly.

    As for using omni-directional video for obstacle avoidance, yes, I have played with that quite a bit and hope to post another tutorial in the not-too-distance future with the results. At the moment I am not using optical flow, although that is a great idea. Instead I am simply using edge detection which tends to draw a nice perimeter around objects on the floor, chair legs, and so on.

    Let us know if you build your own system--it would be great to compare results!

    --patrick
        

  9. Re: Where Am I? Place Recognition Using Omni-directional Images and Color Histograms

    Aside from the fact that the histogram / saturation detection is a brilliant way to detect rooms... the Christmas ball / omnivision thing is sheer genius. I love cheap, elegant, wildly effective solutions like that.
        

  10. Join Date
    Apr 2009
    Location
    Stanford, CA USA
    Posts
    590

    Re: Where Am I? Place Recognition Using Omni-directional Images and Color Histograms

    Thanks Shimniok! Yeah, I was surprised at how far one can get with a 50 cent ornament, although it is sometimes hard to find one with good reflectivity and roundness. I've been consumed lately getting a pair of arms to work but I hope to get back to omnivision as soon as I figure out a way to mount the mirror on the new robot body as shown below:



    --patrick

    http://www.pirobot.org
        

  11. Re: Where Am I? Place Recognition Using Omni-directional Images and Color Histograms

    Excellent tutorial, also very good work on your web page !
    keep it up the amazing work Patrick. !!


        

Reply To Tutorial