PDA

View Full Version : Theory behind stereoscopic camera vision(?)



jrowe47
02-06-2008, 12:03 AM
I've been looking at how robotic stereo vision works, and I think I've got the hang of it.

The eyes are a fixed, known distance apart. The incoming images are analyzed for similarities, and then one image is superimposed. The distance from the similar point on one image from the point on the other then forms the base of a right triangle. The angle of the hypotenuse and the length of the base(distance between the eyes) is known, therefore you can judge distance to the object in focus. The art lies in recognizing similarities, from what I understand.

I think that a neural net would certainly allow you to optimize border recognition, and that certain levels of camera contrast would allow greater precision. The idea, I think, is to map the immediate environment in 3D from the robots perspective. If it has already encountered an object, it should already have a model of it in it's database, and could tag a certain area of the screen as "object_Beer_Bottle047", and instead of actively scanning the bottle itself, it removes the area from active analysis(skips over the particular required overlaps corresponding with the known object), until either the bottle is removed from the vicinity, or is changed enough to warrant a new situation (such as falling of a table and being broken.)

So, a recap... two images are fed to the analyzing program. One is static, one is moved until a similarity can be aligned. The program places both side by side, and does a first pass 'find similarities' routine, and then notes the position of the similarities on the fixed point image. The mobile picture is superimposed onto the static image, pixel by pixel, each pixel representing a level of 3D resolution. When similarities are aligned, the positions are noted, providing an x,y,z coordinate relative to the bot's eyes. When similarities share certain parameters (light conditions, color, position, shared shadow, etc.) then the coordinates noted during the stereo analysis are combined to form a 3D model of the perceived object.

Using more cameras would increase recognition of borders and increase depth resolution, I think, by providing more angles of analysis. 4 cameras would provide 6 angles of perception, 5 would give 11 angles. I'm sure at some point the power, processing, and cost requirements make it pretty unworkable to include lots of 'eyes,' but on the other hand, with a bunch of cheap cameras, you could get really good resolution for realtime stereoscopic vision. Also, having more than 2 opens up the possibilities of multiple focus points, since a bot has no inherent 'train of thought' limitation. It can handle multicameral thought as easily as the programmer can create a thread.

Anyway, 3D stereo vision is limited only by processor speed and camera resolution, which can be simulated by using multiple low res cameras, which has the added benefit of increased z resolution. I know there have to be really really cheap cameras out there because they are ubiquitous in cell phones, and I can buy a webcam for $10 at tigerdirect.

So machine vision, in theory, shouldn't be too great a hurdle. Obviously the 1" resolution at up to 6 kilometers isnt gonna be within reasonable bot expectations (unless the army supply truck tips on the highway near my house and I happen to be driving by. eh heh.) I think a .5 cm resolution at up to 40 feet should be very reasonable, and the closer something gets the better the detail. Also, the cameras dont have to be the same resolution, they just have to have a known position and the images have to be the same size.

wind27382
02-06-2008, 08:07 AM
this would be nice to see if someone could actually put a demo together.

Alex
02-06-2008, 10:21 AM
Mottors (http://forums.trossenrobotics.com/member.php?u=1533) has done a ton of work with Stereoscopic vision and has documented quite a bit of it on his blog (http://streebgreebling.blogspot.com/).

I just ran a quick search on his blog.

One for "vision"
http://streebgreebling.blogspot.com/search?q=vision

One for "stereoscopic"
http://streebgreebling.blogspot.com/search?q=stereoscopic

I know it's not a straight up demo, but he does do an incredible job at explaining a lot of different things in all his posts about it:)

robot maker
07-19-2008, 07:30 PM
here is a site i found on stereoscopic vision where somebody did alot of work on it
rodney the robot http://sluggish.uni.cc/rodney/vision.htm

robot maker
07-19-2008, 07:42 PM
also linuxguy there is a linux version too with code
and full windows version in c++6.0

robot maker
07-19-2008, 08:12 PM
also from the same webite rodney the robot
senteince 3d and setereo vision and alot of info

http://code.google.com/p/sentience/



here is a site i found on stereoscopic vision where somebody did alot of work on it
rodney the robot http://sluggish.uni.cc/rodney/vision.htm

Matt
07-19-2008, 11:52 PM
All those links are Bob Motters :) PM him to have him ring in here. He's done great stuff in this area.

robot maker
07-20-2008, 02:00 PM
yes thats his sites,love the work he has done


All those links are Bob Motters :) PM him to have him ring in here. He's done great stuff in this area.

ScuD
07-20-2008, 02:17 PM
If you use the camera's simply for "range detection", wouldn't it be simpler to have the camera's move to have a variable angle between them?

Do a comparison between the two images, reduce / increase angle until the images are more than 90% alike, then take that angle and calculate distance?

There would still be differences given the perspective / dissimilarities of camera's / lighting etc, so you'd never get a 100% match, but wouldn't it still be a viable way of getting some sort of ranging?

I say this because the human eyes do move 'towards each other' to get things into focus, and I feel this is the way we judge the range. After a few feet the difference in angle between our eyes is negligable, thus making it harder to judge distance, but why would we need that since our arms can't reach it?

In my view it's a lot easier to do a boolean comparison of each pixel value and then divide the amount of pixels by the passed comparisons than actually processing the entire image and then finding that piece of information in the next image once processed.

However, that still leaves out anything close to object recognition, but that's waaaaay too advanced for me :happy:

Adrenalynn
07-20-2008, 02:28 PM
My eyes don't move closer to each other, they only change the angle from the center point. The angle of incidence.

I can also tell the difference between 200 and 250 feet. I can see that an object 200 feet away is closer than an object 250 feet away. Not with as much precision, but that's due to the spacing between the eyes and parallax.

ScuD
07-20-2008, 02:32 PM
I'm a native dutch speaker, cut me some slack here :p

metaform3d
07-20-2008, 03:48 PM
Stereoscopy can ideally process the entire visual field at once, giving you a map of the approximate distance at each point. There are complexities like depth of field and occlusion, but in general the noisy real-world environment actually helps make the processing easier. It works better when there are lots of textures to perform local cross-correlations over.

The same thing can be done with a moving camera, although it has challenges with moving objects. I have birds, and when they are getting ready to fly someplace they bob their heads up and down. The two head positions give them different views and let them range their destination.

JonHylands
07-20-2008, 03:57 PM
If you use the camera's simply for "range detection", wouldn't it be simpler to have the camera's move to have a variable angle between them?

Do a comparison between the two images, reduce / increase angle until the images are more than 90% alike, then take that angle and calculate distance?

There would still be differences given the perspective / dissimilarities of camera's / lighting etc, so you'd never get a 100% match, but wouldn't it still be a viable way of getting some sort of ranging?

That's pretty much what I described on my MicroRaptor vision page (http://www.bioloid.info/tiki/tiki-index.php?page=MicroRaptor+Vision), about a year and a half ago. I think this is one of the important keys to doing object recognition (being able to isolate the object in your field of view)...

ScuD
07-20-2008, 04:39 PM
Good point, hadn't thought of it in that way (isolating objects). Have you had any further progress on that part ?

It's things like these that make me wish I had more high-level programming abilities.
So much to learn, so little time..

JonHylands
07-20-2008, 04:42 PM
No, I really haven't even started on the vision part. Its a huge thing, and I want to have a viable platform working first.