An Introduction to Mapping And Localization
Where Am I?
Frequently, robots find themselves asking this question. Knowing your location, and being able to navigate to other locations, is extremely important for autonomous robots. The act of finding one's location against a map is known as localization. So how can a robot localize itself? The most common form of intelligent localization and navigation is to use a map, combined with sensor readings and some form of closed-loop motion feedback. But a robot needs to do so much more than just localizing itself. Often, we'd like our robots to be able to build their own maps, since map building by hand is tedious, boring, and error-prone. The field of robotics that studies localization and mapping is typically called SLAM (simultaneous localization and mapping).
Mapping and Localization are almost always a precursor to navigation. We can now break down the several types of problems that exist in SLAM and lesser problems:
- Known map, known localization - this is the easiest type of problem. If you know the map, and your current location, you can easily navigate to a new location. This will almost never be the type of problem you are solving, unless:
- Known map, unknown localization - you know what the world looks like, but you don't know where you are. Once you find out where you are, you really are just doing case #1 above.
- Unknown map, unknown localization - the hardest of all problems. This is known as simultaneous mapping and localization (SLAM). After a decent amount of time, you should have a pretty good map, putting you into case #2 above.
At any given time, the robot has quite a bit of data to keep track of. This data is typically called it's state. The most important part of the state is the robot's pose. If we operate in a 2D Cartesian map, a pose is simply the robot's X and Y coordinates, and the heading it is pointing in. The robot typically will also keep track of it's velocity and the map.
Generally, navigation, localization, and mapping are done in an iterative way -- drawing much of it's underlying structure from the fields of recursive state estimation, and in particular markov chains. A robot will typically do the following, while maintaining it's knowledge of the current state in memory:
- Move some distance, which typically introduces some error in our known position. This is frequently termed a control action.
- Take some sensor measurements, which should help us reduce the error in our known position.
- Localize itself against a map using it's knowledge of how far it might have moved and it's sensor readings.
- Update the map using our sensor readings, if we are map building as well.
- Set a new course, if necessary, and repeat.
Solving this problem is a hefty task, and as such, we should start our tutorial with a roadmap of where we as readers are going. This tutorial addresses three primary topics:
- Mapping: types of maps, how to build a map.
- Localization: how to find our location inside of a map.
- Navigation: how to move from location on the map to another. We will only briefly discuss some of the available algorithms.
Along the way, we will have to answer other questions though:
- Sensors required for mapping and localization, namely odometry and ranging ones. Also, their sensor models.
- Probabilistic methods, since neither our odometry nor our ranging sensors, are 100% accurate. Throughout, we will mention where uncertainty, and thus probability, enter into our system
Probability is concerned with uncertainty of events occurring. If X is some event, then P(X) is the probability of that event occurring. A probability is always between 0 and 1.Probabilities are characterized by distributions. In school you typically dealt with the standard distribution by way of the "curve" on test grades. This distribution is typically called the Gaussian, or Normal, distribution. In the context of mapping, it can be summarized as: the mostly likely location is the most probable, and the probability of being anywhere else drops off quickly as you move away from the most likely position:
Within the study of probability there is a sub-field concerned with Bayesian methods. These methods are frequently used in robotics because they can easily be applied to problems involving multiple sensor readings, especially when the sensors are less than perfect. For instance, suppose we have a sensor that detects if a door is open or closed. Now further suppose this sensor is correct only 75% of the time. We want to know whether the door is open or not. If we take no sensor readings, we have a 50% belief that the door is open (since it is either open or closed). If we take one sensor reading, an it says open, we believe more that the door is open now... but our sensor isn't perfect, maybe the door is still closed. Bayes rule allows us to take several readings, and apply each to our previous belief, to decide how likely the door is to be open.
When you wake up in the morning, you can probably guess that you are in your bed, in your home. Unless of course you in a hotel room on vacation, or someone kidnapped you in the middle of the night. This knowledge of where you are at the start of your day is important. If you were not aware of your beginning location, you wouldn't know how to go somewhere else. You also know that it is more difficult to go to a new city, than to walk down the block to get a gallon of milk from the corner shop. One will require a detailed map, the other might be a short list of instructions. Regardless of problem difficulty, you need to know where you start, and this is your "prior knowledge". Our robot will typically have some prior knowledge as well. Even if it does not have "prior knowledge" when it starts it's task, all data accumulated to a certain point would be prior knowledge for later work.
Probabilistic approaches to robot navigation are always concerned with prior knowledge. Because they work in an iterative or recursive fashion, they have to have some prior knowledge to start with. We have to always be careful to make sure our priors are accurate, and that at a minimum they do not corrupt our system. Above, we suggested that the door was 50% likely to be open since there were 2 options: open or closed. But what if the door is to an office, and the tenant is a hermit who keeps the door closed 90% of the time. We have a different prior. However, using Bayesian theory, if we assume that the door has a 10% likelihood of being open, we may find it hard to ever determine that the door is open, since our prior knowledge is so heavily biased towards being closed.
Maps and Mapping
There are two main styles of maps out there: metric and topological. Metric maps are built around the geometry of the area being mapped, whereas topological maps are concerned with the connections between key areas of interest. The style of map used will depend on the end goal for the system. We will only discuss 2D maps throughout this tutorial, but the ideas can somewhat easily be scaled to 3D maps.
Figure 1 - Occupancy Grid Map. The red block is where the robot is centered, the darkness of each box is an indication of how likely it is to be filled.
The most common metric map is the occupancy grid. Basically this is like a piece of graph paper, each cell represents a portion of the floor space, and is marked as either empty or full of debris (or more commonly, if using probability, how certain we are that the area is full of debris). Our sensor readings can easily be used to build an occupancy grid, hence their predominance in SLAM. Memory usage is the major drawback to an occupancy grid. If we have a large amount of open space we will waste a lot of memory to represent it, or we have to map a huge area a metric map may become quite large. There are several approaches out there for dynamic cell size, which can trade extra computation for memory savings when dealing with large open spaces.
You can think of topological maps as sort of like a road map. In a road map, you'd have cities connected by highways. In our robot's map it is more likely that we have rooms connected by hallways. In topology/graph theory terms, the rooms or cities are known as nodes or vertices, and the hallways or highways are connections. These nodes are typically called features in SLAM. Feature extraction and topological mapping are frequently used in concert with machine vision approaches. Feature extraction from sensor readings is an extremely difficult subject, and one that is not extensively studied yet. As such we will stick to occupancy grid mapping for the remainder of this tutorial.
Figure 2 - A Topological Map Of The Trinity Fire Fighting Arena. Red nodes are rooms, blue nodes are hallway intersections. Lines represent paths. This map has been hand tuned to limit the directionality of some paths, for better performance in the competition.
Map Building and Sensor Models
We will now consider the part above about "take a sensor measurement and update our map". We can now introduce our first piece of probability. We can never be entirely certain if a cell in our 2D occupancy grid map is occupied or empty, thus we will typically give each cell a probability of being full or empty. We have to initialize the values of all cells to some unknown, a common value would be 0.5 denoting that we believe there is a 50% chance the cell is occupied. If you are navigating in mostly open space, you may lower this value, or vice versa. For now, we will assume we know our position absolutely. Our position is typically called a "pose", and in our 2D map, it would entail X and Y coordinates as well as a rotation angle.
If we know our pose absolutely, it is simply a matter of taking a reading from each sensor, and using that reading plus a model of our sensor's characteristics to update the probability of occupancy of each cell. The simplest form of this would be to just increment or decrement the probability that cell is occupied based on each sensor measurement, which is typically referred to as a histogram approach.
When most people start thinking about localization and navigation they immediately want to throw GPS or a compass on their bot. While GPS or a compass will work for some things, they are by no means a way to do all your navigation, and they can only be used outdoors. Sensor that are useful for a building a map are encoders and rangers. IR or sonar rangers will be much more useful for indoor navigation, since we are mostly concerned with obstacles.
Most sensors are not entirely accurate. For instance, a SHARP IR ranger give an output that is probably best described by a standard distribution, most of the readings will be very close to the actual distance, but we will rarely see a perfect value. However, there is also the issue that occasionally these sensors give off spurious values when the actual ranging breaks down due to the environment. This requires a more complex sensor model:
Figure 3 - Sensor models for IR ranger. The image on the left is a naive view of what the sensor returns, it may be usable for simple histogram models. The graph on the right is an approximation of how accurate these sensors are: for an object at X=40, we see a highly likely Gaussian model [this plot shows a normal distribution with mean 40 and SD=5] with some highly likely error at the endpoints of the sensor's range. Mixture models are frequently used to interpret sensor data. Note: the above graph is NOT normalized.
A sonar sensor is slightly more complex still, since it has a wide cone of detection. We can use a sensor model like that shown below to characterize how likely a box is filled, given a particular reading. The actual application of sensor readings into our map varies widely, due to the various ways of localizing the robot.
Figure 4 - Sensor model for sonar ranging sensor. Because the beam spreads out, even our histogram models are more complex.
We've now reached the conundrum that makes SLAM so difficult: before we can apply our sensor measurements to our map, we need to know our localization, but if our map be incomplete or barely started how can we localize ourselves?
Given a known map, an estimate of our location, and sensor readings, we should be able to localize ourselves, with some probability of being correct. Clearly, the more sensor data we have, the more reliable our localization will be. The typical ways of doing this entail either Kalman filters or Particle Filters. I'll discuss Particle filters here, since I believe they are more accessible to the general public.
The idea of a particle filter is generally quite easy. Each "particle" is a guess of the robot's pose. Each time we localize ourself, we compute how likely each particle is to be the real pose. Typically, the particles with the lowest probability of correctness are replaced by new random guesses. Even if all of our particles are spread throughout the map, they will typically converge very closely to our real position over many iterations. We may use either the best particle, or the average particle position, as our guess for the actual pose. A real system might have 100s or 1000s of particles, thus computation requirements are quite high.
Figure 5 - Particle Filter example. As time moves on, and we get more data, as well as our history of what we've passed, we narrow in on our position. Note how at T=5, we've isolated multiple spots.
There are further issues with localization: encoders are not perfect. As our robot moves, it will introduce further uncertainty about it's location. If we know approximately how inaccurate our encoders are, we can use that to model the error of our control actions [typically as a Bivariate Gaussian or similar distribution].
Given a metric map, and a pose estimate, simple methods such as A* or wavefront planning can be used to compute where to go. Typically we will still use our sensors for obstacle avoidance in case something wasn't captured in our map (such as your dog fluffy, or a small child). You will also typically replan the navigation quite frequently (such as everytime you localize).
This tutorial briefly introduced the main concepts of Mapping, Localization, and Navigation. While we mainly discussed metric 2D approaches, there is research out there using dozens of different map styles. 3D mapping is also a field of research. Additionally, there are a number of groups working to use stereo vision for 3D occupancy grid mapping, as well as single cameras for feature detection in topological maps. The sky really is the limit, and this is still very much an open research question.
Robotic Mapping: A Survey - a paper of particular interest by Sebastian Thrun.
Introduction to AI Robotics - by Robin Murphy has a very good (and very readable) coverage of mapping with occupancy grids as well as lighter weight methods like HMM (histogram in motion mapping)
Probabilistic Robotics - by Thrun et al. is probably the definitive book on Probabilistic Navigation. But it's not for the faint of heart. Sebastian Thrun also has numerous other papers on his website.