The Microsoft.com Speech website Microsoft Speech SDK

SAPI 5.1

Chapter 6

CoffeeS4

Introduction

With the completion of the first four Coffee samples, you covered the basics. You were shown how to place orders by speaking and you were able to hear the request spoken back. In both cases, SAPI used its defaults. Specifically, the barrista talked back to you with the voice set using Speech properties in Control Panel. In truth, SAPI usually offers much more than a single voice. Other voices may be available as well as different languages, engines, or other resources. Many of these can be changed either dynamically (always through Speech properties and sometimes by the application itself) and programmatically (through the application’s code).

CoffeeS4 introduces the fundamentals of resource management. As in CoffeeS3, you can enter the manager’s office. In the next two tutorials, you are going to manage employees. CoffeeS4 displays the voices available. Like any good coffeehouse help, you will not be able to manage them too much, but you will able to hear one of them speak in the currently selected voice. CoffeeS5 lets you change the voice. In doing this simple task, you prepare for bigger things.

The following topic will be discussed

·         Resources

·         Managing Resources

Resources

To provide the robust range of options, such as many different voices, recognizers, languages, and user interface dialog boxes, SAPI needs to store this information for later use. This stored information is collectively referred as resources. Resource management, therefore, is the ability to query SAPI for the availability of resources, gather information about the resource, initiate, instantiate, or remove resources as needed. CoffeeS4 finds and displays voices stored as resources, and determines which one is active. Using the same techniques, you can filter the voices for language, gender, age, or some other criterion and display only those.

SAPI relies on two closely associated terms and concepts: objects and tokens. COM and object oriented programmers will recognize objects. They are the functional implementation of a class. That is, an implemented object will have memory allocated to it and have its members and methods initialized. Once implemented or instantiated, the object may be used by the application.

However, SAPI only instantiates objects when they are needed. It is a waste of the computer’s memory to have unneeded objects allocated. Therefore, SAPI stores the information needed to create those objects including the voice’s name, language, and GUID. This stored information is referred to as a token. A token is the textual representation of the parameters needed to fully implement that resource. Stated another way, tokens are the parameters for the object. SAPI needs only to read a token at the right time and instantiate an object based on that information.

In addition, there can be many tokens and SAPI organizes them into related groups called categories. Currently SAPI maintains about eight general categories. Of particular interest to CoffeeS4 is SPCAT_VOICES. This category contains all the voice-related tokens.

Each type of token is different and is documented separately. That is, a voice token will have different values and key entries from a recognizer token because each has different functions. Tokens of the same type (such as voice tokens) will contain the same required entries (such as language, gender, age) although vendors may provide additional and non-standard information for their resource.

The following is a sample representation of a voice token. Not all entries are listed.

Token

ValueName

Sample Value

Comments

MSMary

Required entry. This is the token name.

(Default)

MS Mary

Required entry. This is the language independent name.

409

MS Mary

There may be multiple entries; one for each language the voice supports. At least one entry is required. The numeric value is the standard Windows language code and is in hexadecimal with no leading notation such as “0x.”

809

MS Mari

CLSID

{65DBDDEF-0725-11D3-B50C-00C04F797396}

Required entry. This is the class ID (CLSID) for the object.

MSMary/Attributes

Required entry. Additional information is available through the attributes although not all attributes are required.

Language

409;809

Required. This is the language the voice supports.

Age

Adult

Required. This is the age for the voice.

Gender

Female

Required. This is the gender for the voice.

 

Methods

Resources alone would be only marginally useful if you could not work with them. Often you will need to know what resources are available. CoffeeS4, for instance, searches for all the voices it can use. Once found, you may need to use that resource either to instantiate an object or to know more about the resource. There is a rich set of interfaces and helper functions available to do just that. Many of these interfaces are found in the Resource Manager set in Application Level Interfaces of the reference API document. In the same manner, the helper functions provide convenient ways to perform a task without having to know all the underlying details at this point.

For instance, the heart of the matter for CoffeeS4 is getting a list of all the available voices. This task can be done by one helper function SpEnumTokens. At the simplest, the call would look like this:

//Pointer to token enumerator
CComPtr<IEnumSpObjectTokens> cpEnum;

// Get a token enumerator for tts voices available
HRESULT hr = SpEnumTokens(SPCAT_VOICES, NULL, NULL, &cpEnum);
// check hr result

SPCAT_VOICES restricts the search to the voice category. You can filter the resources using the middle two parameters (both of which are set to NULL. In this case, you want all voice resources so no criteria were set. Just as easily, you could have searched for female voices or narrowed it down even more with just the adult, English-speaking female voices.

cpEnum is an interface pointer to IEnumSpObjectTokens. SpEnumTokens does all the initialization work and returns a complete list for you. In this case, the list is a complete set of tokens in SPCAT_VOICES. You can think of IEnumSpObjectTokens as a link list with built-in support functions. Using IenumSpObjectTokens, you can find the next item in the list, skip several items, make a copy of the list, or go back to the beginning of it, among other things. Although some of the methods will be described here, see the reference API section for additional methods and details.

You are on your way to finding and listing all voices. CoffeeS4 uses the following algorithm in the code:

  • Finds the voices.
  • Searches the list one by one and retrieves the name of each voice.
  • Stores the display names of the voices. CoffeeS4 needs this indexed array to refresh the screen during updates.
  • Makes one extra step and displays the current voice in red.
  • Like the display name, CoffeeS4 stores the token name in an indexed array for later use. In both cases, storing the names is not a requirement, rather a convenience. For screen updates, CoffeeS4 could also poll resources again but that seems like a waste of time. For both steps 3 and 4, this involves looping through the list, extracting the appropriate name and assigning it to the array.

    All the work is done in the CoffeeS4 ManageEmployeesPaneProc() procedure and specifically the WM_INITPANE case. This initialization is logically placed here because the information must be present at the time the window is rendered.

    In one sense, the hard part is done for you. Finding the available voices is accomplished in the one-line SpEnumTokens that passes back a list of all the voices and even provides the means to navigate that list. It also provides a method to determine how many items were found using ::GetCount.

    static ULONG ulNumTokens;
    hr = cpEnum->GetCount( &ulNumTokens );
    

    Knowing the total number of items, CoffeeS4 now allocates the two indexed arrays.

    static CSpDynamicString* 
    ppcDesciptionString;
    ppcDesciptionString = new CSpDynamicString [ulNumTokens];
    //Check hr result
    
    static WCHAR**  ppszTokenIds;
    ppszTokenIds = new WCHAR* [ulNumTokens];
    
    //Check hr result
    ZeroMemory( ppszTokenIds, ulNumTokens*sizeof( WCHAR* ) );
    

    CspDynamicString is a helper function for handling string arrays It is a string class similar to other object oriented string classes. The subsequent allocation and release of each of its elements is automatic. You do not have to remember to do it manually. On the other hand, ppszTokenIds is simpler array of pointers for storing GUIDs of the token. CoffeeS4 manually allocates it because it is needed throughout the application. As a result, it must also be manually freed when no longer required. This is done in ManageEmployeesPaneCleanup(). The ZeroMemory() confirms all the values are initialized to zero. No valid GUID will be zero.

    The next step of looping through the array is equally easy. CoffeeS4 navigates the list item by item to find the voice’s name. As mentioned earlier, IEnumSpObjectTokens has such a method named ::Next.

    IspObjectToken  *pToken = NULL
    while (cpEnum->Next(1, &pToken, NULL) == S_OK)
    {
    	//Code here
    }
    

    The list represented by cpEnum is traversed one item at a time as the first parameter indicates. The information is passed back in pToken, which is an interface to IspObjectToken. You might correctly guess that you will be looking at this interface in a moment. The last parameter is the number of items actually read. Since it is possible to read more than one at a time, it is also possible that not many items are left to read. If that were the case, it would return the number of items it could read. If this parameter simply cannot read any more items, it returns an error. In this case, CoffeeS4 stops looping through the “while” statement.

    As CoffeeS4 steps through the list one at a time, it retrieves the names of the resources. There are two names for the particular token: the token name (also called the token ID) and the display name. The two could be the same but not necessarily. Also the display name can vary by language. It is the display name CoffeeS4 shows in the management window. Again, a helper function is available to simplify this task. SpGetDescription retrieves the display name and assumes the current language. In the case of the sample token, that name would be MSMary/409 value of “MS Mary.”

    At the same time, the token ID is also retrieved. This token Id is needed since CoffeeS4 also determines which voice is currently in use and it will need it shortly. No helper function is provided, as this is a straightforward call to get token ID. In the sample token, this would be “MSMary.” In both cases, the information is stored in indexed arrays for later use.

    while (cpEnum->Next(1, &pToken, NULL) == S_OK)
    {
    	// Get a string which describes the token, in our case, the voice name 
    	hr = SpGetDescription( pToken, &ppcDesciptionString[ulIndex] );
    
    	// Get the token id, for a low overhead way to retrieve the token later  
    	// without holding on to the object itself
    	hr = pToken->GetId( &ppszTokenIds[ulIndex] );
    	ulIndex++;   
    	
    	// Release the token itself  
    	pToken->Release();  
    	pToken = NULL;
    }
    

    When no longer needed, the token must be explicitly released. CoffeeS4 resets the pointer to NULL and is ready for the next loop. With all the information stored, the last task is to determine which is the currently active voice. This too, is a simple task. CoffeeS4 loops through the token ID array (ppszTokenIds) and compares each location of the array with the system voice. If there is a match, it breaks out of the loop and stores the index position in ulCurToken.

    // Get the token representing the current voice
    HRESULT hr = g_cpVoice->GetVoice( &pToken );
    
    if ( SUCCEEDED( hr ) )
    {    
    	// Get the current token ID, and compare it against others to figure out
    	// which desciption string is the one currently selected.   
    	hr = pToken->GetId( &pszCurTokenId);   
    	if ( SUCCEEDED( hr ) )    
    	{
    		ulIndex = 0;
    		while ( ulIndex < ulNumTokens && 0 != _wcsicmp(pszCurTokenId, ppszTokenIds[ulIndex] )
    		{ 
     			ulIndex++;         
     		}
    
    		// We found it, so set the current index to that of the current token        
    		if ( ulIndex < ulNumTokens )           
    		{
    			ulCurToken = ulIndex;
    		}
            
    		CoTaskMemFree( pszCurTokenId);    
    	}
    	pToken->Release();
    }
    

    The key to this is the ::GetVoice call. This passes back the current voice. Compare this value against the current voice. The current voice, stored in UlCurToken, is used in ManageEmployeesPanePaint() to highlight the active voice in a different color.

    The rest of the codes for CoffeeS4 are relatively simple and are basically modifications of previously discussed techniques. Two new rules have been added: Please Manage the Employees and Hear Them Speak. The former displays the list of voices once you are in the office and the latter speaks the current voice. Arrogant perhaps, but the employee states confidently “I will be the best employee you've ever had. Let me work.” In order to hear those words every employer loves, a new case must be added to ProcessRecoEvent(), that of VID_Manage.

    The rest of the new code essentially handles events to the screen. OfficePaneProc(), ManageEmployeesPaneProc() and ManageEmployeesPaneCleanup() do the rest of the work.



    You are reading help file online using chmlib.com

    If you want your help file to be removed or added please send e-mail to chmlibcom@gmail.com
    Partner sites: Logo Design, Simple Anti-Crisis Accounting Software, Voice Search for Web