The Microsoft.com Speech website Microsoft Speech SDK

SAPI 5.1

Chapter 8

CoffeeS6

Introduction

In CoffeeS6, the last of the tutorial chapters, no new programming code is introduced, per se. Rather, CoffeeS6 makes a variation on an existing theme. The past several Coffee tutorials demonstrated working with context-free grammars, also called grammar rules. In short, they were predetermined lists of words that needed to be matched exactly. Even dynamic grammars, though more flexible, still had to match exact words once the word list was determined. For all the promise of speech recognition, using the models presented so far, you have not been able to dictate or use free-formed speech. With CoffeeS6, you can use unrestrained speech in your applications. It uses the simple case of renaming the coffee shop to anything you want.

The following topics will be discussed:

Also see Grammar Format Tags: Special Characters for more about grammar modifiers.

Embedded Dictation

As mentioned in the introduction, it would be limiting if you could not speak anything; that is, dictate to your application. Remember, the Coffee samples showcase a command and control model. As users, you are ordering the application to perform the following: get coffee, hire and fire employees, and make them talk to you. There is little room to get chatty with the other , customers or employees. In fact, the Dictation model is better covered in both width and depth in the SDK sample applications Dictation Pad and Simple Dictation. By the same token, CoffeeS6 would be remiss in omitting this entirely. For that reason, the coffee shop manager now allows the user to change the name of the shop.

For instance, go to office and give the order “manage store name.” A new screen displays and if you follow the instructions, you can say, “Rename the coffee shop to” and provide any name you want. CoffeeS6 will echo back the new name as “Welcome to the X coffee shop,” X, of course, being the moniker. The name will even display in all in the subsequent windows.

This is called embedded dictation because it combines the two forms. CoffeeS6 is still in command and control mode and using the same grammar rules as before. The advantages are that Coffee users can easily expand their drink offerings with only a minor change to the grammar file. Embedded dictation is not dynamic grammar. A rule is not created right before use and seeded with the words you want. In fact, embedded dictation would not handle this case appropriately. In renaming the coffee shop, you as the programmer, will have no idea what the user may choose for a name. The only real limitation is if the new name is in the dictation or not. Even this can be worked around.

Keep in mind, embedded dictation does not create new rules. CoffeeS6 has explicit support for renaming the store because a rule provides for that case. Instead, embedded dictation lessens restrictions on existing grammars, whether they are static or dynamic. You need not even change the code. Your application might want to handle the words elements differently, but that change is not really SAPI-related.

Grammar Modifiers

Programmatically, there is no new code to support modification. CoffeeS6 simply modifies existing grammar rules. Technically, these modifiers are not XML tags in the same way that <P> and <l> are, for example. They appear inside the rule and are associates to other text or the modifier appear as the element itself. Each is explained below.

Wildcard: …

The ellipsis is a wildcard symbol indicating that any word or words may be accepted in this position. Informally it is also known as a garbage collector because it is used to accept words that the application may not explicitly care about. In this role, four things happen. First, the user may say anything, including a series of words. As expected, the engine attempts to recognize the words, although not much is done with the words afterward. Coffee needs to know that a word was actually spoken and that the user was not just coughing or sneezing. Second, the rule will still be matched and activated. If the rule uses several parts (such as VID_EspressoDrinks), the wildcard words successfully match the rule requirements. Third, the parsed phrase returns only one element for the ellipsis regardless of the number of words actually spoken. For the most part, Coffee’s only interest is that legitimate words were spoken but not what the words actually were. And fourth, the element in the returned phrase itself will be the ellipsis rather than any useful word. Again, since Coffee is interested only that something was spoken, it makes sense not to return the actual word. If you are interested in the word, then the wildcard marker is not the right one for the rule; see Dictation below. In short, it really is a wildcard because any word may be used and still activate the rule.

The Coffee.xml (for CoffeeS6) snippet uses it in the following role in VID_EspressoDrinks:

<L> 
	<P>May I have</P>
	<P>Can I have</P>
	<P>Can I get</P>
	<P>Please get me</P>
	<P>Get me</P>
	<P>I'd like</P>
	<P>I would like</P>
	<P>...</P>
</L>

In previous Coffee examples, the wording of the request was limited to one of the first seven requests. That is, the user had to begin by saying, “please get me,” or “can I get,” a drink. Because of the wildcard, the CoffeeS6 user may say almost anything and still get the drink. “Gimme a mocha,” will work (mimicking real life to boot). Even “gee-I-dunno-I-suppose-I’d-like a mocha” will also work provided the customer slurs the words together enough. Remember, the rules are phrased-based and a sufficiently long pause between words never activates rules in the same way that the indecisive customer will never gets drinks by saying “may (pause) I (pause) have (pause) a (pause) mocha.”

So why even have the other phrases if the ellipsis is present? There are some subtleties to that answer. First, astute programmers may notice that they do not care what is spoken here. In the code, nothing is ever actually dependent on the fact that the customer said “please” or not. Yet, a rule consisting of only “coffee” may fire inappropriately. For example, a customer simply saying, “coffee is good,” might fire a too-simple rule. In this case, some sort of introductory clause is needed. Second, the additional words speed up the recognition process. The engine is much more likely to recognize “please” or “may” because it is described exactly in the rule. It also increases the confidence rating for the rule overall. Though both of the following phrases would activate the drink rule, “please get me a mocha,” returns a much higher confidence rating than would “aardvark a mocha.”

Dictation: *

The asterisk is a dictation indictor. Like the wildcard ellipsis, any word (or possibly words, see next entry) will validate the rule. In the same manner, the engine will attempt to recognize the word. The difference is that the actual word is returned back to the user in the phrase element. This is the key to renaming the coffee store.

The new rule VID_Rename is defined:

<RULE ID="VID_Rename" TOPLEVEL="ACTIVE">
	<P PROPNAME="Named SAPI Coffee Shop to"> Rename the coffee shop to *+ </P>
</RULE>

Ignoring the plus sign for the moment, VID_Rename is activated on upon successfully matching “rename the coffee shop to,” followed by any word in the engine’s dictionary. The parsed phrase returns an element containing the actual word. With the actual word available, CoffeeS6 can use it to display the new name as it would with any stored variable.

Multiple dictation: +

The dictation entry in VID_Rename has a plus sign after the asterisk. This indicates that multiple words may be accepted in the rule. This way you can dictate longer phrases. Over-zealous customers may now rename the coffee store to virtually any name they want. By saying, “rename the coffee shop to Billy Bob Joe’s and Sally Jean Ann’s Coffee Emporium on the Highway,” they have successfully changed the name.

Confidence Increase: +
Confidence Decrease: –

One of these two signs placed in front of words respectively increases or decreases the required confidence for a successful recognition. Obviously increasing the required confidence means that the speech recognition engine will have to be much more certain that the word it hears really is the expected word. For example, if the user is responding to an important question such as “Reformat hard disk?” you want to make an extra effort that what is recognized as “yes” really is “yes.” To make sure, the rule is noted as “+yes”.

Likewise, the minus sign decreases the required confidence for the word. That is, you can de-emphasize some words. Although the word is required for the rule, it is not important to verify that the user actually said it. It is a case of “close enough is good enough.”

In CoffeeS6 this is seen in the VID_ThingsToManage rule:

<RULE ID="VID_ThingsToManage" >
	<L PROPID="VID_ThingsToManage">
	<P VAL="VID_Employees">employees</P>
	<P VAL="VID_ShopName">-shop +name</P>
	<P VAL="VID_ShopName">-store +name</P>
	</L>
</RULE>

The last two phrase elements allow the store name to be changed. However, Coffee is de-emphasizing the words “shop” and “store.” It is not important that users speak this word precisely; it just needs to be reasonably close. However, “name” has to be recognized clearly.

Code Modifications

In terms of code support for embedded dictation, there are very few changes. The case of handling a new rule is added, of course. The real work of CoffeeS6 is in the case of VID_Rename of ExecuteCommand(). Notice there is no extra effort required to implement the dictation itself.

if ( 5 <= pElements->Rule.ulCountOfElements )
{    
	if ( SUCCEEDED( pPhrase->GetText( 5, pElements->Rule.ulCountOfElements - 5, FALSE, &wszCoMemNameText, NULL )))    
	{       
		int ilen = wcslen(
		pElements->pProperties->pszName );     
		ilen = (ilen + wcslen(wszCoMemNameText ) + 2) * sizeof(WCHAR);       
		wszCoMemValueText = (WCHAR *)CoTaskMemAlloc( ilen );  
		       
		if ( wszCoMemValueText )       
		{             
			wcscpy( wszCoMemValueText,pElements->pProperties->pszName );             
			wcscat( wszCoMemValueText, L" " );              
			wcscat(wszCoMemValueText, wszCoMemNameText );           

			// Copy new shop name to global shop name
			_tcsncpy( g_szShopName, W2T(wszCoMemNameText), NORMAL_LOADSTRING - 1 );            
			PostMessage( hWnd, WM_RENAMEWINDOW, 0, (LPARAM) wszCoMemValueText );

			CoTaskMemFree(wszCoMemNameText );      
		}  
	}
}

This code filters through the phrase elements to retrieve the dictated text. The phrase elements are examined and retrieved. In practice, it is usually better not to assume that the sixth element is always the one you want but, in this case, CoffeeS6 does. Since the rename rule allows multiple words, CoffeeS6 starts at the sixth element (since it knows the first five elements have to be “rename the coffee shop to”) and strings together the rest of the words for the new name.



You are reading help file online using chmlib.com

If you want your help file to be removed or added please send e-mail to chmlibcom@gmail.com
Partner sites: Logo Design, Simple Anti-Crisis Accounting Software, Voice Search for Web