Tutorial
Perlbox Voice Application Framework
Using perlbox voice libraries,
Perlbox Voice for TK provides a transparent interface to several open source speech systems.
The goal of this project is to provide an easy to use, easy to configure application that connects
spoken human words to computer commands as well as connecting computer responses to spoken human words.
Perlbox-Voice allows the user to easily configure vocabularies that consist of "commands" and "responses".
The idea is that when the human says "command" (such as "web browser"), the computer will execute "response" (such as "mozilla").
Sphinx-2 Listening Agent
The Sphinx-2 Listening agent was created by the
Sphinx Group at Carnegie Mellon University
in order to stimulate the creation of speech-using tools and applications, and to advance the state of the art both directly in speech recognition, as well as in related areas including dialog systems and speech synthesis
under an open source license.
The purpose of releasing the source code publicly was to encourage
development of speech tools in common computing environs.
Festival Talking Agent
The Festival speech synthesis system is a general multi-lingual speech synthesis system. Festival was created by the
The Centre for Speech Technology Research at the University of Edinburgh
and offers a full text to speech system with various APIs, as well an environment for development and research of speech synthesis techniques.
How To Run Perlbox-Voice for Tk
Figure 0: The Splash Screen.
Step 1: Start the System
When Perlbox-Voice begins, the listening agent is not running. To start the listner, follow the two steps below.
-Click the icon labeled "Control" (see Figure 1)
-click the button labeled "Start Listener"
To stop the listening agent, click the button labeled "Stop Listener".
Figure 1: The "Control" Pane.
The Listener takes a bit of time to wake, you will be informed when the Listener is up and ready to listen. At this point you are ready
to begin issuing commands for your computer to execute.
If you want the speaker to say something to you, enter some text into the text box and click the button labeled
"Speak this Text".
Step 2: The Computers Vocabulary.
One of the major features of Perlbox Voice Application Framework is the simplification of the creation of new language models (vocabulary)
for the Sphinx 2 listener. In this section we will discuss the method by which you can easily create new vocabularies for use in Perlbox Voice for Tk.

Figure 2: The "Vocabulary" Pane.
The term vocabulary refers to the list of words and phrases that Perlbox-Voice understands, as well as the actions that it is to take upon hearing
these commands. For the purpose of this document, the term speech will refer to what you say and the term command will refer to
the action that the computer should take upon hearing command. A command
Speech
A speech can be any combination of the 127,000 words listed in the Perlbox-Voice modification of CMU Pronouncing Dictionary.
This dictionary contains most nouns, verbs, adjectives and adverbs in the English language, as well as many common proper nouns and common words in non English languages.
Command
A command is any instruction that your computer understands how to execute. Typically, this would be an application. The Perlbox-Voice application framework provides additional
functionality in this area by providing pseudo commands that will not be passed on the operating system, but will performed though Perlbox-Voice framework. These pseudo commands
are listed below.
Meta Commands
Perlbox provides meta-commands for additional functionality over the standard commands. These commands will be processed by Perlbox insted of being passed
directly to the operating system. Currently, the 'say' meta-command is defined. An example would be if you wanted your computer to respond when you said "Good morning".
Under the column "When you say", put "Good morning" (without the quotes) and put "say Good morning to you!" (without the quotes)
into the "Computer does" column. This special keyword 'say' in the "Computer does" column tells Perlbox to not pass this command to the shell and instead to speak it.
Perlbox releases after 0.8.0 have enhanced meta-commands though the use of the 'backtick' or 'backquote' operator ( looks like `, shares the key with ~ and is NOT a single quote).
This allows you to put the result of a shell command into the output speech.
You say
|
Computer does
|
Results
|
list home
|
say Listing is `ls ~`
|
The speaker says "Listing is ..." then the output of `ls ~`
|
fortune
|
say Todays fortune is `fortune`
|
The speaker says "Todays fortune is ..." then the output of the fortune command.
|
date
|
say Today is `date +"%A %B %e %Y"`
|
The seaker says something like "Today is Sunday January 23 2005"
|
Essentially, when Perlbox detects a command enclosed withing backquotes, that command is sent to the shell. The output from the command is then
put back into the say string where the backquoted command was.
Reloading the Current Vocabulary
If you wish to reset the vocabulary (perhaps you deleted some fields that you did not intend to) you can reload the current vocabulary
by clicking the button labeled "Reset Fields". This will reset the value only as far back as the last time you entered this pane or selected
"Apply Changes".
Adding an Entry
To add an entry, type the speech that you will say into the text box labeled "When You Say". Then type the command that you want the computer to execute into the text box labeled "Computer Does".
If you are satisfied, click the button labeled "Add Entry". The new entry should appear in the table above.
Deleting an Entry
If you want to remove an entry from the table, highlight this entry by clicking on it, then click the botton labeled "Delete Entry".
Creating Your New Vocabulary
Make sure that the table is correct. No attempt will be made by Perlbox-Voice to verify the correctness of the commands in your new vocabulary.
When you are satisfied, simply click the button labeled "Apply Changes". A new vocabulary will be created and made ready for use.
Step 2: The Configuration Tab.
The configuration tab currently allows for only two options. We will discuss each of these options below.

Figure 3: The "Configuration" Pane.
Sound Response
Sound response is refers to the Festival speech synthesizer. For the purpose of this document, we will refer to this agent as "the talker".
The Talker will inform you of various pieces of information, pertaining primarily to the state and actions of the Listener and Perlbox-Voice in general.
With the slider at the top of this pane, you can set the verbosity level, the higher it is, the more the talker will chatter to you.
After you have had a look around the application, 5 is probably a good setting.
You should be careful, however, if you are running other sound applications and your sound card does not support duplexing modes; as the talker will
simply wait until the sound card is free and then say everything that it had to say in one long stream.
Browser for Help Documents
Perlbox Voice for Tk uses html format for help documentation, such as this document. This field gives you the ability to change the default browser
used to open help documentation. Common values for this field include: mozilla, netscape, galeon, firebird, epiphany, konqueror and opera.
Desktop Plugin
Perlbox voice provides for desktop plugins, which give you direct access to control your favorite desktop through voice commands.
By default, no desktop plugin is enabled, and only one desktop plugin can be enabled at a time. In order to enable a plugin, click on the option
menu under the text "Desktop Plugins" and select the plugin you want to load and click the button labeled "Apply Changes".
A new language model will be created and the listener will be restarted. For more information on how to use your favorite desktop plugin, click on the link below.
Available Desktop Plugins:
KDE Desktop
Magic Words
This is a new feature, suggested by users. This allows the listener to operate in two modes: continuous mode or guarded mode. When "Use Magic Word" is turned off,
and the listener will try to interpret everything it hears into a command. This can be a problem in noisy environments. When "Use Magic Word" is turned on, the
listener will listen first for the magic word. When the magic word is heard it will run in continuous mode for about 5 seconds and then return to guarded mode.
The magic word can be one word, a phrase, or a complete sentence.
For instance, if your magic word is set to "Listen to me", the perlbox-voice speech recognition will wait for the magic word "Listen to me" before it tries to
interpret any voice commands. When the voice recognizer hears the words "Listen to me", it will try to interpret everything it hears for about the next five seconds.
Consider the following examples for the default vocabulary:
1. "Listen to me" (1 to 5 pause) "web"
2. "Listen to me web"
3. "Listen to me, I want the web site done by tomorrow"
4. "Listen to me" (over 5 second pause) "web"
The fist two examples will open mozilla. The second and third will not.

Figure 4: The "Help" Pane.
Use this pane to obtain help and information about Perlbox and Perlbox Voice for Tk.
This concludes the Perlbox Voice for Tk tutorial. We hope that you have found this document useful. Please, feel free to contact us with any questions
at me@perlbox.org . Updated January 2005, Shane C. Mason.
|