PDA

View Full Version : Using Pocketsphinx with ROS



Pi Robot
04-29-2011, 06:21 PM
I have a few questions about pocketsphinx that so far I can't find simple answers to:

1. When designing a corpus, does it improve recognition if you put whole phrases on a line instead of individual words? For example, we might want to recognize the four phrases "move forward", "move backward", "move left", "move right" as well as the same four phrases with the word "go" substituted for "move". So should the corpus be:

move forward
move backward
move left
move right
go forward
go backward
go left
go right

or can I simply use:

move
go
forward
backward
left
right

2. I used to create a lot of grammars when working with SAPI 5 under Windows. This greatly improved recognition accuracy and allowed for recursive definitions of terms that could act like variables, like "go X feet forward" where X would be a list of number terms. Can something similar be done with pocketsphinx?

3. How does recognition performance relate to number of words/phrases in the corpus? If I want some of the phrases to pertain to navigation and others to say, arm movements, is it better to segment these out into different files and load/unload as needed? Or can they all just live in one big corpus?

Thanks!
patrick

lnxfergy
04-29-2011, 06:38 PM
Patrick,

I split this off into its own thread, since I can see us getting quite off track, and I'm sure this will be more useful here as a standalone thread.

1. It would appear that we get better results by including phrases -- the reason is that your phrases get compiled down into a probabilistic language model -- thus, "move forward" will be more likely than "forward forward" or "move move" if you put in whole phrases.

2. I believe you can specify a grammer, but I haven't ever used that form. If you do go this route, let me know if we need any API changes to the ROS drivers (I think you would need to specify a different set of files than the lm/dict)

3. Obviously, the larger the vocabulary, the lesser the overall accuracy. That said, we have played with much larger vocabularies than are found in the demos with good results. I would think you should have no trouble having both vocabs in a single file.

-Fergs

Pi Robot
04-29-2011, 08:18 PM
Good idea to start a new thread--and thanks for the answers. If I figure out the grammar stuff, I'll post back to the thread.

--patrick

Pi Robot
04-30-2011, 09:47 PM
Hey Fergs,

So this is what I have found out about grammars (JSGF) and pocketsphinx. As far as I can tell, the basic process is that you convert your grammar file into a list of words as described below, then generate lm and dic files using the online lmtool as usual. The example everyone uses is for ordering a pizza. Here is a simple grammar file:


#JSGF V1.0;

grammar pizza;

public <startPizza> = i want to order a <size> pizza with <topping>;

<size> = small | medium | large;

<topping> = pepperoni | mushrooms | anchovies;Say we have this in a file called pizza.jsgf. The next step is to convert this into fsg format with a tool that comes with the sphinxbase-utils Debian package:


$ sphinx_jsgf2fsg < pizza.jsgf > pizza.fsgThe fsg file is then converted to a word list with a perl script called fsg2wlist.pl that I found in this obscure location (http://www.speech.cs.cmu.edu/15-492/assignments/hw1/hw1_data.zip) and looks like this:


#!/usr/bin/perl -w
use strict;

while (<>) {
chomp;
if (/^TRANSITION \S+ \S+ \S+ (\S+)$/) {
print "$1\n";
}
}So we run:


$ fsg2wlist.pl < pizza.fsg > pizza.wordsFinally, the pizza.words file is used with the online lmtool in the same way as a regular corpus file. The resulting lm and dic files are downloaded and the rest is the same as what your pocketsphinx package now does.

The latest version of pocketsphinx_continous can be run with a "-jsgf filename" parameter together with a "-dict filename" parameter while omitting the "-lm filename" parameter like this:


pocketsphinx_continuous -jsgf pizza.jsgf -hmm /usr/share/pocketsphinx/model/hmm/wsj1 -dict pizza.dicbut when I add a "jsgf" property in your recognizer.py script, I get the error:

TypeError: object of type `GstPocketSphinx' does not have property `jsgf'

So I guess GstPocketSphinx has different parameters.

Not sure if you can make use of any of this?

--patrick

lnxfergy
04-30-2011, 09:52 PM
Interesting, so you can actually use just lm/dic files if you have run through all the steps to convert the grammer into a language model?

You mention "the latest version"... that may be the issue. For ease, we're using the pocketsphinx version that can be installed through debs... which I presume is certainly a revision or so old (I want to say that the installed version is 0.5.0, and the latest in SVN is 0.6.0, but I'm not entirely sure).

-Fergs

Pi Robot
04-30-2011, 10:13 PM
Interesting, so you can actually use just lm/dic files if you have run through all the steps to convert the grammer into a language model?


Yeah, I just tried the following grammar file:


#JSGF V1.0;

grammar navigation;

public <navCommand> = <command> <direction>;

<command> = move | go | turn;

<direction> = forward | backward | left | right;

<stop> = stop | halt | abort;

<faster> = faster | speed up;

<slower> = slower | slow down;then went through the conversion steps and fired up recognizer.py with the resulting lm and dic files and it worked perfectly.



You mention "the latest version"... that may be the issue. For ease, we're using the pocketsphinx version that can be installed through debs... which I presume is certainly a revision or so old (I want to say that the installed version is 0.5.0, and the latest in SVN is 0.6.0, but I'm not entirely sure).


I'm using only debs on Ubuntu 10.04. The versions I have are:

gstreamer0.10-pocketsphinx: 0.5.1
pocketsphinx-utils: 0.5.1 (contains pocketsphinx_continous command line utility)

--p

Pi Robot
04-30-2011, 10:54 PM
First a correction to the sample navigation grammar I posted above. It should be:


#JSGF V1.0;

grammar navigation;

public <navCommand> = <command> <direction> | <stop> | <faster> | <slower>;

<command> = move | go | turn;

<direction> = forward | backward | left | right;

<stop> = stop | halt | abort;

<faster> = faster | speed up;

<slower> = slower | slow down;(I did not have the <stop> | <faster> | <slower> rules on the main <navCommand line.)

Second, it is clear that although the grammar generates the correct lm and dic files following the procedure described in my earlier post, recognizer.py does not actually enforce the grammar--as we would expect since it is not currently making use of it. In contrast, the pocketsphinx_continuous command line utility *does* enforce the grammar structure when given the "-jsgf filename" parameter. For example, saying "move" does not generate a recognized response where as "move forward" does, which is what we would like. In the current form of recognizer.py, just saying "move" on its own generates a response. (But of course, your if-then statements in voice_cmd_vel.py take care of this.)

Finally, I don't yet know how to get back the rule names that are recognized, rather than the actual words. For example, if I say "halt", I want the rule name <stop> to be returned (though there is no harm in knowing that "halt" was actually said.)

--patrick

lnxfergy
04-30-2011, 11:21 PM
Patrick, can you try "fsg" as the parameter name in the recognizer.py script. I see no jfsg when running gst-inspect, but I do see an fsg option.

-Fergs

Pi Robot
05-01-2011, 08:39 AM
Hey Fergs,

I tried the fsg parameter name in recognizer.py and pointed it to my .fsg file as created from the .jsfg file. After most of the recognize.py startup messages display on the terminal, the script dies at this point:


INFO: fsg_model.c(534): FSG: 28 states, 16 unique words, 16 transitions (23 null)
INFO: fsg_model.c(177): Computing transitive closure for null transitions
INFO: fsg_model.c(228): 14 null transitions added
[recognizer-1] process has died [pid 16645, exit code -11].
log files: /home/patrick/.ros/log/a6b39888-7325-11e0-baf9-8c736e77238f/recognizer-1*.log
all processes on machine have died, roslaunch will exit
shutting down processing monitor...
... shutting down processing monitor complete
doneNot sure how to get more diagnostic info before it dies. (The recognizer-1*.log files pointed to above don't have any useful info in them.)

--patrick

Pi Robot
05-01-2011, 09:27 AM
A couple of other items to add to the wish list:


return the confidence value along with the result (so we can reject low confidence results)
return the rule or tag name from the fsg grammar instead of or in addition to the recognized word(s)

The command line tool pocketsphinx_continuous does not seem to output either of these so I don't know if this is possible. But I wonder what is the point of a grammar file without being able to know which rule or tag got fired?

--patrick

Pi Robot
05-04-2011, 08:45 AM
I tried the fsg parameter name in recognizer.py and pointed it to my .fsg file as created from the .jsfg file. After most of the recognize.py startup messages display on the terminal, the script dies at this point:


INFO: fsg_model.c(534): FSG: 28 states, 16 unique words, 16 transitions (23 null)
etc...

OK, we are one step closer to a solution for using grammars: I installed the latest sphinxbase-0.7 and pocketsphinx-0.7 from source (http://cmusphinx.sourceforge.net/wiki/download/) and got the same error. Running outside of ROS using pocketsphinx_continuous, I found there is a segmentation fault when loading the fsg grammar:


$ pocketsphinx_continuous -lm nav_test.lm -dict nav_test.dic -fsg nav_test.fsg
...
INFO: fsg_search.c(145): FSG(beam: -1080, pbeam: -1080, wbeam: -634; wip: -26, pip: 0)
INFO: fsg_model.c(678): FSG: 47 states, 17 unique words, 25 transitions (42 null)
INFO: fsg_model.c(213): Computing transitive closure for null transitions
INFO: fsg_model.c(264): 47 null transitions added
ERROR: "fsg_search.c", line 332: The word 'turn' is missing in the dictionaryBTW, the work 'turn' is clearly in the dictionary. The contents of nav_test.dic are:


ABORT AH B AO R T
BACKWARD B AE K W ER D
COME K AH M
DOWN D AW N
FASTER F AE S T ER
FORWARD F AO R W ER D
GO G OW
HALT HH AO L T
LEFT L EH F T
MOVE M UW V
RIGHT R AY T
SLOW S L OW
SLOWER S L OW ER
SPEED S P IY D
STOP S T AA P
TURN T ER N
UP AH P

So then I tried the fsg-model parameter instead of fsg:

$ pocketsphinx_continuous -lm nav_test.lm -dict nav_test.dic -fsg-model nav_test.fsg

and it works like a charm! Alas, when I tried this in recognizer.py:


try:
fsg_ = rospy.get_param('~fsg')
except:
rospy.logerr('Please specify a grammar file')
return
asr.set_property('lm',lm_)
asr.set_property('dict',dict_)
asr.set_property('fsg-model',fsg_)
and launch with:


<launch>
<node name="recognizer" pkg="pi_speech" type="recognizer.py">
<param name="lm" value="$(find pi_speech)/params/test_fsg/nav_test.lm"/>
<param name="dict" value="$(find pi_speech)/params/test_fsg/nav_test.dic"/>
<param name="fsg" value="$(find pi_speech)/params/test_fsg/nav_test.fsg"/>
</node>
</launch>I get the error:



asr.set_property('fsg-model',fsg_)
TypeError: could not convert argument to correct param type
Traceback (most recent call last):
Note that gst-inspect pocketsphinx shows that fsg-model is a valid parameter.

Any thoughts?

--patrick

lnxfergy
05-04-2011, 08:49 AM
I imagine you have to remove lm to make fsg work -- otherwise the language model would be used?

Next: fsg should be the filename, fsg_model says it is an object:


fsg : Finite state grammar file
flags: readable, writable
String. Default: null Current: null
fsg-model : Finite state grammar object (fsg_model_t *)
flags: writable
Pointer. Write only


-Fergs

Pi Robot
05-04-2011, 09:25 AM
I imagine you have to remove lm to make fsg work -- otherwise the language model would be used?

The pocketsphinx_continuous command line test seems to work better with the -lm parameter than without it. Also, I figured out why the -fsg parameter was causing the error "The world 'turn' could not be found in the dictionary". Seems that the online lm tool puts all the dictionary words in file.dic and file.lm in UPPERCASE while fsg_search.c seems to be looking for lowercase! When I convert all words in my .dic and .lm files to lowercase, I can go back to the -fsg parameter instead of -fsg-model on the pocketsphinx_continuous command line and everything works fine.


Next: fsg should be the filename, fsg_model says it is an object:

fsg : Finite state grammar file
flags: readable, writable
String. Default: null Current: null
fsg-model : Finite state grammar object (fsg_model_t *)
flags: writable
Pointer. Write only


Yeah, now that I figured out the case-sensitive bug, the -fsg parameter works fine with the pocketsphinx_continuous command line test. However, going back to fsg instead of fsg-model in recognizer.py produces a segmentation fault on or right after line 264 of fsg_model.c:


INFO: fsg_model.c(678): FSG: 47 states, 17 unique words, 25 transitions (42 null)
INFO: fsg_model.c(213): Computing transitive closure for null transitions
INFO: fsg_model.c(264): 47 null transitions added
Segmentation faultline 264 reads:


E_INFO("%d null transitions added\n", n);--patrick

Pi Robot
05-05-2011, 08:05 PM
Hey Fergs,

I wonder if you would consider adding a rosdep.yaml file to your pocketsphinx package. The reason I ask is that your gstreamer_pocketsphinx dependency did not get picked up when I installed the package on an Ubuntu 10.04LTS machine where the package is actually called:

gstreamer0.10-pocketsphinx

I realize you can't handle every OS but at least for Ubuntu it might be nice to have it customized. On the other hand, perhaps they have already changed the name in later versions of Ubuntu, in which case, never mind!

--patrick

lnxfergy
05-05-2011, 08:50 PM
rosdep.yaml files go under the stack (in this case, rharmony) -- and, it's already got that setup I believe? We're using 10.04 on all our machines here and I was able to install using rosdep.

-Fergs

Pi Robot
05-06-2011, 08:51 AM
Thanks Fergs--figures I checked out only the pocketsphinx package on that one machine and therefore did not have the stack-level rosdep.yaml...

--patrick

lnxfergy
05-06-2011, 09:13 AM
Thanks Fergs--figures I checked out only the pocketsphinx package on that one machine and therefore did not have the stack-level rosdep.yaml...

--patrick

No problem -- if you look back through our SVN commit history you will see that it took us a couple hours in the lab to figure out we couldn't have a rosdep.yaml in a package before we moved it up to the stack.... so, it's not exactly a well known fact.

-Fergs