You are reading help file online using chmlib.com
|
3 Overview: Tokens, Categories and the Registry
4.3 Instantiating an Object from a Token
5 Tokens and Categories For Engine Developers
5.1 Making Resources Available Through SAPI
5.2 Associating Files with Tokens
5.3 Inspecting UUnderlying Keys of a Token
5.4 Creating New Keys in the Registry
This document is intended to help developers of speech-enabled applications discover and use resources (Voices/Recognizers) on a computer that has SAPI installed. A speech-enabled application is one that attempts to either recognize or synthesize speech.Developers of speech recognition (SR) and speech synthesis (Text to Speech or TTS) engines make their resources available to applications.
This spec answers the following questions:
· What are Tokens and Categories in SAPI?
· Where is information about tokens stored in the Registry?
· How does an application find tokens and initialize resources (i.e., Voices or Recognizers) from them?
· What are the SAPI-defined attributes that engines should document in the registry?
· How are files associated with tokens?
Note:
The Speech SDK documentation section on Object Tokens, which provides a complete description of the ISpObjectToken and ISpObjectTokenCategory and their methods, complements this document.
A token is an object representing a resource that is available on a computer, such as a voice, recognizer, or an audio input device. A token provides an application an easy way to inspect the various attributes of a resource without having to instantiate it. The Vendor of a Recognizer, and Gender of a Voice are examples of attributes of resources. In many cases, applications should use SAPI-provided helper functions for common scenarios. For example, an application can use the SpCreateBestObject helper function to rapidly create the object, given a certain type of resource. The application can also query for tokens meeting certain criteria without using the helper function. To do this, the application calls the EnumTokens method on the ISpObjectTokenCategory interface to get an enumerator, and inspect the tokens in the enumerator further if it chooses to. Finally, the application selects one of the tokens in the enumerator to instantiate a resource. Once the resource (such as SR Engine) is instantiated, if it implements the ISpObjectWithToken interface, then it is handed a pointer to the token that was used to create it. This way, the resource contains a handle to more information about itself.
Conceptually, a token contains the following information:
· The language-independent name. This is the name that should be displayed wherever the name of the token is displayed. It is marked as (Default) in the registry. The implementer of the token may also choose to provide a set of language-dependent names in several languages.
· The CLSID used to instantiate the object from the token.
· A set of Attributes, which are the only set of queriable values in a token. This means SAPI provides a mechanism to query for tokens whose attributes match certain values. Details on how to query for tokens that match a set of attributes are in Sections 4.1 and 4.2.
A token may also contain the following:
· If a token has user interfaces (UIs), such as the properties of a Recognizer or a wizard to customize a Voice to display, then the token will also contain the CLSID for the COM object used to instantiate each type of UI.
· The set of Files from which SAPI returns the paths to all the associated files for the token.
SAPI stores information about tokens in the registry. A token is represented in the registry by a key and the key’s underlying keys and values. When an application queries SAPI for tokens of all the female voices on the computer, SAPI will look at the HKEY_LOCAL_MACHINE\Software\Microsoft\Speech\Voices area. This corresponds to a Category and categories are discussed in the Section 3.2. SAPI searches for tokens that match the criteria (in this case, a voice with the Gender attribute set to female) and uses one of these matching tokens to initialize the voice. The application may also specify a different fully qualified registry path to specify any non-standard (from a SAPI) location in the registry for SAPI to search for a token. In addition to the keys SAPI recommends, the entry for the token may contain any other bits of information that the implementer of the token can store here. In the registry, a token looks like this:
Table 1 Parts of a Token in the Registry
RegKey |
ValueName |
Sample Value |
Comments |
SampleTokenKey |
|
|
Required - This is the RegKey for the Token. |
|
(Default) |
Joe |
Required - Language Independent Name. |
|
409 |
Joe |
Name in Hex LangID 409, which is English. There may be several of these rows, one for each LangID in which the Token has a name. Note, no leading 0x before the LangID. |
|
809 |
Joe |
|
|
CLSID |
{8021D50E-D93C-4075-8504-FC4E124D64E9} |
Required - Sample CLSID for object which instantiates the token. |
SampleTokenKey/Attributes |
|
|
Attributes for the token are under this key. |
|
Language |
409;809 |
There may be several of these rows, one for each attribute that is queriable. See Section 4 for an explanation of each of the attributes. |
|
Vendor |
VoiceVendor |
In the registry, this looks like:
Figure 1 A Token Key in the Registry
The Attributes key contains all the queriable values for the token. Section 4.2 discusses in detail how an application queries a token.
Figure 2 Attributes of a Token
If the token is capable of displaying UI, then each UI has its own key under the token. Fig 3 shows the token for a Recognizer that supports four types of UI: AddWord, EngineProperties, MicTraining and UserTraining, as well as the CLSID underlying each UI type.
Figure 3 A Token that supports UI has a token for each UI type
SAPI provides a comprehensive set of helper functions for the common scenarios using tokens. Section 4.1 provides a number of examples. SAPI also provides a way for engines and applications to implement tokens in their own proprietary manner. See Section 3.4 on token enumerators, for further discussion. Sections 4 and 5 explore common scenarios using these interfaces from application and engine coding perspectives.
A ObjectTokenCategory (hereafter referred to as category) is the highest level of grouping of registry entries in SAPI. A category is a class of tokens (or of resources, since each token represents an actual resource on the computer). Intuitively, a category is a type of SAPI resource. It is represented in the registry by a key containing one or more token keys under it. It is created and manipulated using helper functions such as SpCreateDefaultObjectFromCategoryIDor methods on the ISpObjectTokenCategory interface. Please refer to the SAPI documentation for details on either of these. Examples of categories are Recognizers and Voices. Figure 4 shows the default SAPI categories, with the category Voices selected.
Figure 4 The Category Voices
SAPI organizes tokens in the Registry under seven categories.
By default, the following tokens for six of the SAPI categories are located under HKEY_LOCAL_MACHINE\Software\Microsoft\Speech (HKLMS). This is where all system-specific SAPI keys and values should be stored as recommended by Windows Application guidelines. Examples include settings and files for Voices and the Recognizers ( also known as Speech Recognition engines) installed on a computer, as shown in Figure 1.
1. Voices
2. Recognizers
3. AppLexicons
4. AudioInput
5. AudioOutput
6. PhoneConverter
The tokens for the other category, Recoprofiles, are located under HKEY_CURRENT_USER\Software\Microsoft\Speech (HKCUS).HKCUS also contains all other user-specific keys and values in the registry, such as user defaults for Voices, Recognizers, as well the location of the user lexicon file.
Categories contain the following items:
· A single key called Tokens, and the keys for the tokens that belong to that category under it. For example, the Voices category has a key for the voice called Manuel. All the keys and values for Manuel are located under HKLMS/Tokens/Manuel.
· Keys for token enumerators. A token enumerator is a special type of token that generates other tokens for the same category. This token provides a way for Vendors to generate tokens that are generated in non-standard way, such as, reading data from a stored file stored. Those engine vendors following SAPI guidelines for registering resources (Sections 4 and 5) can safely ignore these and regard them as generators for another set of tokens. Section 3.4 explains token enumerators in more detail.
A CategoryID uniquely identifies a category in the registry. For SAPI defined categories they take the form of HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\{CategoryName}. For example, HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Recognizers\ for the Recognizers category. All SAPI CategoryIDs should be referenced using the constants defined in sapi.idl file:
1. SPCAT_AUDIOOUT
2. SPCAT_AUDIOIN
3. SPCAT_VOICES
4. SPCAT_RECOGNIZERS
5. SPCAT_APPLEXICONS
6. SPCAT_PHONECONVERTERS
7. SPCAT_RECOPROFILES
Similarly, TokenIDs uniquely identify tokens in the registry. For tokens located in SAPI defined categories, they take the form of:
· CATID\Tokens\TokenKeyName - a static token from the registry. For example, HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Recognizers\MSASREnglish
· CATID\TokenEnums\TokenEnumKeyName - a static token from the registry that represents a token enumerator. This token instantiates a token enumerator used to enumerate dynamic tokens. SAPI uses this for its own implementation of audio input and output to list just the channels available on the computer at runtime. Token enumerators can also read tokens from other areas of the registry, or from remote computers. For example, HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioOutput\DSoundAudioIn
· CATID\TokenEnums\TokenEnumKeyName\ - a dynamic token representing the default token that the specified token enumerator generates. For example, SPDSOUND_AUDIO_IN_TOKEN_ID creates the default Dsound audio in an object. For example: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioOutput\DSoundAudioIn\
· CATID\TokenEnums\TokenEnumKeyNameEnumExtra… - a specific dynamic token from the specified token enumerator. For example: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioOutput\DSoundAudioIn\Direct Sound Crystal WDM Audio, which generates the Direct Sound Crystal WDM audio object.
In addition to the category defaults mentioned in Section 3.2, the categories Voices, Recognizers, AudioInput, AudioOutput and RecoProfile, also have user defaults and settings. As shown in Figure 5, these are located in the HKCUS area, under their respective category keys. Section 6 explains each category of tokens. This section also lists out the user-specific entries in the HKCUS and the system-wide entries in HKLMS.
Figure 5 The User category for Recognizers
Note: This section is relevant only forEngine or Application developers who need to store tokens in a separate part of the registry or even on the file system, and dynamically enumerate them.
SAPI provides a way for third parties to store their registry settings without following any of the SAPI-recommended guidelines. SAPI can find these tokens as long as the parties have implemented token enumerators. Token enumerators are COM objects that enumerate the necessary entries for the tokens under it. All token enumerators are stored under CategoryName/TokenEnums. Each token enumerator listed under a category needs to have the CLSID of the COM object that implements it under the token enumerator.
The token enumerator
· Must implement the methods Next, Skip, Reset, Clone, Item, GetCount on the IEnumSpObjectToken interface.
· May choose to implement methods SetObjectToken and GetObjectToken on ISpObjectWithToken interface. As mentioned in the end of Section 3.1, these give a resource a handle to the token that was used to instantiate it.
These tokens can be located in a separate part of the registry or somewhere else (possibly on the flusters). It is the responsibility of the token enumerator to return correctly on the above methods so an application does not know the difference between tokens coming from the token enumerator and tokens coming from the SAPI-specific part of the registry.
SAPI itself uses token enumerators only for the AudioInput and AudioOutput categories. Refer to Sections 6.4 and 6.5 for more details. Note that the token enumerator for the MMSYS audio object creates its tokens from keys that are under it.
The following is an example of what a TokenID for a token located under a token enumerator looks like: CategoryName/TokenEnums/TE1/XXX where (i) TE1 is a sample token enumerator and (ii) XXX is a reference to one of the tokens generated by TE1. On a call to the helper function SpCreateCreateNewToken giventhe TokenID above, the IEnumSpObjectToken returned by the token enumerator TE1 to SAPI includes all tokens. SAPI then goes through each token (those returned by token enumerators and those under the tokens key) to find the one that has a Token name matching XXX.
Table 2 Parts of the AudioInput token enumerator
RegKey |
ValueName |
Sample Value |
Comments |
HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\AudioInput\ |
|
|
This is the category. |
|
DefaultDefaultTokenID |
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioInput\TokenEnums\MMAudioIn\ |
This is the TokenID for the default token for this category. If the DefaultTokenID is present, it will supercede this default token for the category. Details in section 4.2 |
HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\AudioInput\TokenEnums |
|
|
|
HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\AudioInput\TokenEnums\MMSys |
|
|
This is the MMSys token enumerator |
|
CLSID |
{GUID} |
This is the CLSID for the COM object that implements the MMSound token enumerator. |
Figure 6 AudioInput token enumerator in the registry
Figure 6 illustrates how the AudioInput token enumerator looks in the registry.
A SAPI 5 application needs to find tokens and instantiate objects that meet certain criteria from the resources available on a computer. Helper functions distributed in the sphelper.h file are the recommended way for applications to interact with tokens and categories whenever possible. Table 3 provides a list of helper functions and the scenarios they address. The helper functions have been broken up into Common Helper Functions and Engine Developer Helper Functions based on likelihood of use. If the specific helper function is not found in either section, refer to the SAPI documentation for the comprehensive listing.
Table 3 Common Helper Functions
Helper Function |
Action |
Example Helper Function Call |
SpGetDefaultTokenFromCategoryID |
Creates the default token from a CategoryID. The last argument tells SAPI to create the token if it does not currently exist. |
CcomPtr<ISpObjectToken> m_cpEngineToken; hr = SpGetDefaultTokenFromCategoryId(SPCAT_RECOGNIZERS, &m_cpEngineToken); |
SpFindBestToken |
Finds the most appropriate token given a set of required and optional criteria. For details on attribute matching see Section 4.2 |
CComPtr<ISpObjectToken> cpTokenEng; hr = SpFindBestToken(SPCAT_RECOGNIZERS, L"Language=409", L"VendorPreferred", &cpTokenEng);
|
SpEnumTokens |
Returns a token enumerator containing all tokens meeting a set of required and optional attributes. Tokens in the enumerator are sorted in the order specified in the Section 4.2. |
CcomPtr<IEnumSpObjectTokens> cpIEnum; hr = SpEnumTokens(SPCAT_VOICES, L"Gender=Female;Language=409", L"Vendor=VoiceVendor1;Age=Child” , &pEnum); |
SpCreateDefaultObjectFromCategoryID
|
Creates the default object in a category, such as AudioInput or Recognizer |
CComPtr<ISpVoice> cpVoice; SpCreateDefaultObjectFromCategoryID(SPCAT_VOICES, &cpVoice); |
SpCreateBestObject |
Instantiates a resource that best matches a set of required and optional criteria. For details on attribute matching see Section 4.2 |
CComPtr<ISpVoice> cpVoice; SpCreateBestObject(SPCAT_VOICES, L"Vendor=VoiceVendor1;Age=Child", L”Gender=Female”, &cpVoice);
|
SpCreateObjectFromToken |
Creates an object from a token. |
CComPtr<ISpVoice> cpVoice; CComPtr<ISpObjectToken> cpVoiceToken; //--like last step SpFindBestToken(SPCAT_VOICES, L"Language=409", L"VendorPreferred", &cpVoiceToken); /--now create object SpCreateObjectFromToken(cpVoiceToken, &cpVoice); } |
Table 4 Engine Developer Helper Functions
Helper Function |
Action |
Example Helper Function Call |
SpCreateNewToken |
Creates a new object token in the registry with CategoryID, but without specifying a keyname. This creates a token with a GUID as its registry key. |
CComPtr<ISpObjectToken> cpUserToken; hr = SpCreateNewToken(SPCAT_RECOPROFILES, L"", &cpUserToken); cpUserToken; |
SpGetTokenFromID |
Creates a token from a TokenID of an enumerator or a new token if the token does not already exist. The last argument of FALSE tells SAPI not to create the token if it does not already exist. |
CComPtr<ISpObjectToken> cpAudioInTok; hr = SpGetTokenFromID(SPCAT_AUDIOIN, &cpAudioInTok, FALSE))) |
SpCreateObjectFromSubToken |
Creates an object from a subtoken of a token. In this case, the engine token pEngineToken has the Lts key under it, which in turn has a CLSID value under it. This CLSID is used to instantiate the object. |
CComPtr<ISpObjectToken> m_cpEngineToken; hr = SpGetDefaultTokenFromCategoryId(SPCAT_RECOGNIZERS, &m_cpEngineToken); ISpLexicon * m_pLtsLex; HRESULT hr = SpCreateObjectFromSubToken(pEngineToken, L"Lts", &m_pLtsLex); |
SpGetSubTokenFromToken |
Creates a subtoken under a token. This is useful, for example, when an Engine vendor would like to create a subtoken for custom data under its Recognizer token. |
CComPtr<ISpObjectToken> cpSubSubToken; hr = SpGetSubTokenFromToken(&m_cpEngineToken, L"EngineProperties", &cpSubSubToken, TRUE ); |
The principal tasks related to tokens and categories that an application needs to accomplish are:
· Enumerating tokens
· Inspecting and instantiating tokens
The two primary ways to enumerate tokens are by the helper function SpEnumTokens, or by the methodISpObjectTokenCategory::EnumTokens. Both methods allow the caller to specify a category and a set of required and optional attributes. The call then returns a token enumerator containing all the tokens matching those criteria. The method is defined as:
HRESULT EnumTokens( [in] const WCHAR *pszCatName, [in, string] const WCHAR *pReqAttrs, [in, string] const WCHAR *pOptAttrs, [out] IEnumSpObjectTokens **ppEnum); |
When identifying matching tokens under in a category, an application needs to specify a fully qualified category identifier (FQCID). An FQCID is the full registry path to a category, such as HKEY_CURRENT_USER\Software\Microsoft\Speech\Voices. It is recommended that these categories be referenced using the constants defined in the sapi.idl file below, and not using the full string to minimize typos in commonly used registry paths. SAPI maps the constant to the correct hive in the registry and returns matching tokens from the category. For instance, the SAPI defined AudioInput constant (from the sapi.idl file) is:
//--- Categories for speech resource management const WCHAR SPCAT_AUDIOOUT[] = L"HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Speech\\AudioOutput"; |
Similarly, there are constants for the AudioInput, Voices, Recognizer, Applexicon, PhoneConverter, and RecoProfile categories.
An application may also specify a non-standard registry location by simply providing its FQCID, such as HKEY_CURRENT_USER\Software\TTSVendor1\Speech\Voices
In both SpEnumTokens and ISpObjectTokenCategory::EnumTokens the following clauses are permitted in the ReqAttrs and OptAttrs strings, separated by semicolons.
Condition |
Example |
Explanation |
Exists |
Telephony;Dictation |
The valuenames Name and Dictation exist in the list of attributes for this token. |
One of |
Language=409 |
At least one of the values of the Valuename Language is 409. There may be other values, like 809, 512 as well. |
Not Equals |
Age!=Child;Age!=Teen |
Values of Age that are neither “Child” nor “Teen”. |
The tokens are sorted “best matches” first using the following intuitive rules:
1. Only tokens matching the required attributes are returned.
2. Those tokens matching the optional attributes as well will be before those that just match the required attributes.
3. If there are no required or optional attributes (i.e., both are set to NULL), the first token is the default token for that category. If there is a valid DefaultTokenID in HKLMS/Category, that is returned as the default tokenID. If not, if there is a default tokenID in HKCUS/Category, that is returned. If none of these exist, SAPI searches for a DefaultdefaultTokenID in HKLMS/CategoryName, and that is returned.
4. Matching Rules: If a token matches an optional attribute, it gets a score of 1, otherwise, 0 for that attribute. The optional attributes mentioned earlier in the query string are more significant. These scores are concatenated as shown in Table 7. The tokens are then placed in descending order. This is illustrated in Tables 6 and 7.
5. Tokens having the same score are returned in random order in the enumerator.
A call to EnumTokens could look like:
CComPtr<IEnumSpObjectTokens> cpEnum; CComPtr<ISpObjectTokenCategory> cpVoiceCat;
HRESULT hr = cpTokenCategory.CoCreateInstance(CLSID_SpObjectTokenCategory); const WCHAR Req_Attrs[ ]=L"LanguagesSupported=409"; const WCHAR Opt_Attrs[]=L"Vendor=VoiceVendor1;Age=Child;Gender=Female”;
HRESULT hr = cpVoiceCat->EnumTokens(SPCAT_VOICES, ReqAttrs , OptAttrs , &cpEnum); // SPCAT_VOICES is defined in sapi.idl |
If the following voices are installed on a computer as shown in Table 6:
Table 6 Voices installed on a computer
Voice |
Vendor |
Age |
LanguagesSupported |
Gender |
Michelle |
VoiceVendor1 |
Child |
409; 411 |
Female |
Mary |
VoiceVendor1 |
Adult |
409 |
Female |
Jane |
VoiceVendor2 |
Child |
409 |
Female |
Frank |
VoiceVendor2 |
Adult |
411 |
Male |
Anna |
VoiceVendor2 |
Adult |
411 |
Female |
Then the order of the Voices returned in cpEnum will be as shown in Table 7:
Table 7 Scoring of tokens matching optional criteria
Optional Criteria -> |
Vendor |
Age |
Gender |
Net Score |
Michelle |
1 |
1 |
1 |
111 |
Mary |
1 |
0 |
1 |
101 |
Jane |
0 |
1 |
1 |
011 |
The final order is:
1. Michelle (meets all required criteria, scored 111 on optional criteria)
2. Mary (meets all required criteria, scores 101 on optional criteria)
3. Jane (meets only required criteria, score 11 optional criteria)
If the call to EnumTokens is changed to:
HRESULT hr = cpVoiceCat->EnumTokens(SPCAT_VOICES, NULL, NULL, &cpEnum); |
and the users default token in HKCUS\Voices\DefaultTokenID is set to: HKEY_LOCAL_MACHINE\Software\Microsoft\Speech\Voices\Tokens\Jane
then the enumerator cpEnum will contain all the tokens, with Jane being the first token.
What does SAPI do when ISpObjectTokenCategory::EnumTokensis called?
Consider a fictitious category that has both tokens and token enumerators under it. When an application calls the SAPI ISpObjectTokenCategory::EnumTokens, the following things happen:
1. SAPI creates an enumerator called IEnumSpObjectTokens that can enumerate all the matching tokens from these keys under HKLMS/Voices/Tokens.
2. Token enumerators Step (skip this step if not using token enumerators).
a. SAPI searches for a CategoryName/TokenEnums key. If found, it instantiates a token enumerator from each of the tokens under this key, one by one.
b. Each of the token enumerators return an IEnumSpObjectToken containing matching tokens under it that is merged with the IEnumSpObjectToken created in (i).
3. SAPI applies the required attributes so that the IEnumSpObject enumerator contains only those tokens that match these Attributes, then it sorts them according to how well they match the optional attributes (exact rules earlier in Section 4.2).
4. The application searches for an appropriate token and until one is found, it steps through each token, and further checks attributes and strings of each token with ISpObjectToken methods GetData, GetStringValue, and GetDWORD (inherited from ISpDataKey).
5. The application identifies the token it is interested in and calls ISpObjectToken::CreateInstance and QIs the newly created object to see if the newly created object supports the ISpObjectWithToken interface. If it does, SAPI calls ISpObjectWithToken::SetDataKey to give the newly instantiated object a pointer to the token it was instantiated from.
Continuing with this example, the application now has a pointer to the enumerator IEnumSpObjectTokens. An application may choose to step through the enumerator with the methods Next, Skip or Reset to find an ISpObjectToken that best meets its needs. Assume that the application is searching for a voice that sounds clear over a telephone. Also assume that such voices typically have a ValueName called SupportsTelephony, which is set to 1. There is no such protocol in SAPI; this is for illustration only. Because this is not a value under Attributes, it cannot be picked up by the standard query mechanism of required attributes. The variable pCurVoiceToken represents a token for that category. In the example below, the category is populated with tokens in cpEnum until a voice is found that also supports Telephony.
ISpObjectToken *pCurVoiceToken;
bool fFeature = false;
while (cpEnum->Next(1, &pToken, NULL) = S_OK) { // At this point, all we know is that pToken is a pointer to a Voice token.
hr = pToken->GetData(L"SupportsTelephony", fFeature); // Note, ISpObjectToken inherits from ISpDataKey
if (( SUCCEEDED( hr ) ) && fFeature ) { // this is the token for the Voice we want pCurVoiceToken =pToken; break; } } |
At this point, store the selected Voice token in pCurVoiceToken. Now create the voice object from this token, so that Speak and other methods on it may be called. To create a voice object, ISpVoice must be created.
EXTERN_C const CLSID CLSID_SpVoice;
CComPtr<ISpVoice> cpVoice;
// The Application may want to check to see if the token has any associated UI that it needs to display hr = pCurVoiceToken->IsUISupported(SPDUI_EngineProperties, NULL, 0, NULL, &fSupported);
// The Application calls the UI, or maybe enables a button in its own UI so the user can call the UI
// Next, CoCreate an instance of SpVoice called cpVoice hr = cpVoice.CoCreateInstance(CLSID_SpVoice);
if( SUCCEEDED( hr ) ) { // set cpVoice to our selected voice token hr = cpVoice->SetVoice(pCurVoiceToken); }
|
At this point the cpVoice object (of type ISpVoice) has been instantiated and is ready to speak, with a call such as:
hr = cpVoice->Speak( L"This audio file was created using SAPI five text to speech.", 0, NULL); |
In addition to the enumerating and instantiating tokens, an engine vendor also needs to be able to:
· Create new tokens
· Associate files with tokens
There are several straightforward steps for an SR or TTS engine to be discoverable by SAPI:
1. Make an appropriate entry under the correct CategoryID/Tokens in the registry (details in Section 6).
2. Make an entry under CategoryID/TokenEnums if the vendor prefers dynamic tokens (i.e., the engine registry information is already stored in some other registry location or file). The enumerator should implement the interfaces outlined in Section 3.4.
3. Look at the standard attributes for a category in SAPI and identify the characteristics of the engines so that applications can query the engine for these properties.
4. Hand the SR engine a pointer to the recognition profile token once the RecoInstance has been created.
One of the key issues for an engine vendor is to associate files with tokens in the registry, such as the language model files for a Recognizer or a RecoProfile token. A token can query for all the files under its Files key using the ISpObjectToken::GetStorageFileName method. SAPI searches for the file in a number of known locations. Because of the possibility of roaming, SAPI does not store fully qualified file paths in the registry (such as C:/Documents and Settings/JoeUser/Local Settings/Application Data), but stores paths such as %1c%\Microsoft\Speech\Files\MSASR\SP_81738BE4B81F42F0BFC4BB98B72EB81A.spz instead. SAPI queries the ShGetFolderPath .dll for the user’s non-roaming directory on the individual computers. The calling application can specify (i) the specific name of the file if any, and (ii) the subdirectory to put the file in. Refer to the GetStorageFileName documentation for the exact interfaces. The engine may append any additional vendor-identifying directory names to indicate engine-specific data. Deleting the tokens with which the files are associated by calling ISpObjectToken::RemoveStorageFileName, will remove files from the file system as well.
Caveat: If roaming is enabled, the user’s RecoProfiles in the HKCUS hive of the registry will roam (because the entire HKCUS hive roams); the associated files, situated in a non-roaming directory will not. This causes two unexpected effects:
1. When the Recognizer is initiated on the second computer, the Recoprofiles are likely to be missing. The Recognizer needs to be able to handle this and copy the necessary new-profile files. Known issue: Upon roaming the Microsoft SR Engine currently creates a new set of files, but these have entirely different names from the names on the original computer. As a result, when the registry is roamed back to the original computer, the original profile files become orphaned.
2. Subsequently, upon deleting the Recoprofiles from one computer, all the associated files and registry entries on that computer will be deleted. The rest will become orphans, that is, files without pointers to them.
Besides helper functions, keys under a token can be inspected using a recognizer token, and opening the attributes key under it as a DataKey. Then all the ISpDataKey methods are available to inspect the values under the Attributes key. The sample below goes from the Recognizer token, to the attributes key under it, and finally to the “Desktop” and “Telephony” strings under that.
hr = SpGenericSetObjectToken(pToken, m_cpEngineObjectToken); if(FAILED(hr)) { return hr; }
// Read attribute information CComPtr<ISpDataKey> cpAttribKey; hr = pToken->OpenKey(L"Attributes", &cpAttribKey);
if(SUCCEEDED(hr)) { WCHAR *psz = NULL; hr = cpAttribKey->GetStringValue(L"Desktop", &psz); ::CoTaskMemFree(psz); if(SUCCEEDED(hr)) { // This instance of the engine is for doing desktop recognition } else if(hr = SPERR_NOT_FOUND) { hr = cpAttribKey->GetStringValue(L"Telephony", &psz); ::CoTaskMemFree(psz); if(SUCCEEDED(hr)) { // This instance of the engine is for doing telephony recognition } } } |
Below is another snippet of code where the Microsoft Sample Engine creates a new entry under a recognition profile. If the Recognition Profile does not exist for the engine (pszCLSID contains a pointer to the Engine GUID), it needs to be created it as well as the Gender and Age values under it.
// Read attribute information from Engine key;pProfile is the RecoProfile token we obtain by calling GetRecoProfile on the Recognition Instance.
hr = pProfile->OpenKey(pszCLSID, &dataKey); if(hr = SPERR_NOT_FOUND) { // This user profile has not been seen before, so create a new registry key to hold info for it hr = pProfile->CreateKey(pszCLSID, &dataKey);
// Now set some default values if(SUCCEEDED(hr)) { hr = dataKey->SetStringValue(L"GENDER", L"UNKNOWN"); } if(SUCCEEDED(hr)) { hr = dataKey->SetStringValue(L"AGE", L"UNKNOWN"); }
// Now create some temporary file storage for trained models // this will create a valuename called SampleEngTrainingFiles and value C:\Documents and Settings\username\application data\microsoft speech\files\MSASR\LM7454901D23334AAF87707147726EC235.dat
if(SUCCEEDED(hr)) { hr = pProfile->GetStorageFileName(CLSID_SampleSREngine, L"SampleEngTrainingFile", "MSASR\LM%d.dat", CSIDL_FLAG_CREATE, &pszPath); }
// and request a UI for user training or properties - SPDUI_RecoProfileProperties hr = AddEventString(SPEI_REQUEST_UI, 0, SPDUI_UserTraining); |
This section documents in some detail, the registry settings of each category of tokens in both the HKCUS and the HKLMS hives. Each token entry needs to have the required keys and values for a token as outlined in Table 1. To find the most suitable token on the computer for the Recognizers, Voices, and Phone Converters categories of tokens, an application needs to define a standard set of attributes that applications can query for. It is important for engine vendors to implement these keys exactly as specified because the engines/voices must be discoverable through SAPI to applications.
It is important to note that in addition to the specified keys and values, a vendor may create any keys and values necessary to use as a resource in the registry. SAPI will ignore these values and not disturb them in any way, unless SAPI is uninstalled from the computer.
The Voices category enumerates every voice installed on the computer by all TTS engines. The voice tokens should be located under the key:
HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Voices\Tokens
The requirements for a voice token are listed below, with some sample values.
· Each voice token should meet the requirements for a standard token.
· Voices should document the SAPI-specific attributes that describe them so applications can search for them. Table 8 contains a full listing of Voices attributes and their locations. All Voice attributes are required. Section 3.1 and Section 4.2 have more information about attributes and querying them.
· Voices may have their own Vendor-specific UI implemented by the TTS Engine rendering the voice. If such UI is present, then the UI needs a separate token in the location described in Table 8. The minimum requirement is that the token contain the CLSID of the COM object implementing the UI. Click Properties on the Text-to-Speech tab of the Speech Control Panel to access the Vendor-specific UI. The Properties button will be unavailable if the EngineProperties token for the current default Voice is not supported
Table 8 provides a detailed listing of the registry entries that constitute a sample voice token called VoiceToken1 under HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Voices\Tokens
Table 8 Voice Registry and Attributes
RegKey |
ValueName |
Comments |
VoiceToken1 |
|
Required - This is the RegKey for the Token. |
|
(Default) |
Required - language independent name. |
|
409 |
Name in Hex_LangID 409, which is English. There may be several of these rows, one for each LangID in which the token has a name. Note, no leading 0x before the LangID |
|
809 |
|
|
CLSID |
Required - Sample CLSID for object which instantiates the voice. |
VoiceToken1/Attributes |
|
Attributes for the Token are under this key. |
|
Age |
Required - Value should be “Child,” “Teen,” “Adult,” or “Senior” depending on Age of TTS Voice. Senior indicates an elderly voice. Vendors may choose to classify some voices as both “Senior” and “Adult”. |
|
Vendor |
Required – TTS engine Vendor name. |
|
Language |
Required - The LCID in hex of language this engine speaks. |
|
Gender |
Required - Value should be “Male” if Male voice, “Female” if female. |
|
VendorPreferred |
Required - If this is the Default voice for the vendor named in vendor. |
|
Name |
Required - String representing language independent name |
VoiceToken1\UI |
|
Required, if the Voice has UI - UI tokens for the voice token will be stored under this key. |
VoiceToken1\UI\EngineProperties |
|
The only SAPI-specific UI token is EngineProperties. Called when the user clicks Properties on the Text-to-speech tab. |
|
CLSID |
Required - Sample CLSID for object which instantiates engine-specific UI from Speech properties in Control Panel. |
Note: Please refer to the registry entries of the Microsoft recognizer and the Sample Engine, which ship in the SAPI 5 SDK, as an example of how the are entries are created.
There is also a Voices category in the HKCUS hive that stores the following:
· The default TTS rate selected by the user using Speech properties in Control Panel.
· The default voice selected by the user.
Table 9 provides a listing of the user registry entries that constitute a voice token in
HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\Voices\
Table 9 Voices - User Registry Settings
|
ValueName |
Value |
Comments |
VoiceToken1 |
DefaultTokenID |
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\MSMary |
This points to the default voice selected by the user in Speech properties in Control Panel. It is empty at installation time and gets populated when the user selects a voice. |
|
TTSRate |
5 |
{Scale is 0 to +10, with 0 being slowest}. This applies to all voices on the system for the currently logged on user. |
Note: The TTS engine does not need to store any of these values, SAPI takes care of that.
Vendors may choose to store any additional keys and values in the same areas of the registry. Following is additional information relating to voice tokens:
· User specific entries for the voice (such as volume, pitch, rate, and any other information) should be stored in keys and values under HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\Voices\Tokens\VoiceToken1
This creates a structure in the HKCUS hive parallel to the one in the HKLMS hive.
· Entries applying to all the voices using an engine should be stored underHKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\Voices\Tokens\EngineGUID1
· Non-user entries (pertaining to all users on the computer) for a voice should be stored in keys and values under the categoryHKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Voices\Tokens
SAPI enumerates all the SR Engines installed on the computer from the tokens and token enumerators under
HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Recognizers
Below are the guidelines for registering recognizer tokens:
· Each recognizer token should meet the requirements for a standard token (Table 1).
· Each speech recognition engine installed on the computer should have a recognizer token. If vendors use a single recognizer for recognition in multiple languages (with different acoustic models), or a discrete and a continuous recognizer, they may choose to store the relevant data files and other initialization information under separate tokens, but use the same value for the CLSID. For example, a vendor may use the same recognition engine to recognize both Japanese and English. In this case, there are two tokens, both containing the CLSID of the same recognizer, but associated with different language and acoustic model files stored with the token.
· Recognizers should document the SAPI-specific attributes shown in Table 10 so that applications can search for them. Required attributes are also indicated Table 10.
· Most speech applications written with SAPI will be tested for a specific engine, or a few specific engines, if the application has a clear need for multiple engines. Typically applications will query for and use this engine by default. Use attributes when the application cannot find its preferred engine (or doesn’t have one), and needs to locate the most suitable engine installed on the computer for its needs.
· Recognizer tokens may have an Alternate CLSID if they implement alternates.
· Recognizer tokens may have a RecoExtension CLSID for objects that extend SAPI's recognition context.
· The Recognizer may also have a number of engine-specific UIs that it exposes to SAPI. There should be a separate key under {Recognizer TokenID}/UI/ for each such UI supported. The keys are listed and documented in Table 10 below.
Table 10 provides a detailed listing of the registry entries that constitute a sample voice token called VoiceToken1 under HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Recognizers\Tokens
Table 10 Sample Entries of a Recognizer token
RegKey |
ValueName |
Comments |
RecognizerToken1 |
|
Required - This is the RegKey for the Token. |
|
(Default) |
Required - Language Independent Name |
|
409 |
Name in Hex_LangID 409, which is English There may be several of these rows, one for each LangID in which the token has a name. Note, no leading 0x before the LangID |
|
AlternatesCLSID |
CLSID for the object that implements alternates. |
|
RecoExtension |
CLSID for the object that extends the recognition context provided by SAPI. |
RecognizerToken1\Attributes |
|
|
|
Vendor |
Required - Company Name |
|
Language |
Required - The LCIDs in hex of the language(s) this engine recognizes. Typically this is only one per recognizer token. Recognizers recognizing multiple languages can expose that in the manner shown. In addition, a value of “409;9” indicates that the Recognizer recognizes generic English (American, British, Australian, etc.). |
|
SpeakingStyle |
Required - Value should be “Discrete” if the engine requires pauses between words for recognition. Recognizers without this requirement should have “Discrete;Continuous” for this value. |
|
Dictation |
Required - If the engine supports dictation, it must contain this value. |
|
CommandAnd Control |
Required - If the engine supports command and control. This value must be there if the engine supports Command and Control. |
|
Desktop |
The engine supports desktop audio. |
|
Telephony |
The engine is configured to recognize audio coming in from a telephony channel. |
|
VendorPreferred |
This token is the vendor’s default token. |
|
Alternates |
Value is “CC” if the engine supports Command and Control alternates. Value is “Dictation” if it supports dictation alternates. If both types of alternates are supported, then the value should be “CC;Dictation” |
|
Hypotheses |
The engine returns hypotheses before final recognition. |
|
WordSequences |
Value is “Trailing” if the engine supports word sequence elements in CFGs, at the end of rule. Value should be “Anywhere;Trailing” if word sequence elements is supported anywhere in a rule. The presence of this attribute indicates the SR engine supports the Text-Buffer functionality. Applications can test for the presence of this attribute and if available, ISpRecoGrammar::SetWordSequenceData and ISpRecoGrammar::SetTextSelection may be used. |
|
DictationInCFG |
Value is “Trailing” if the engine supports dictation element in CFGs, as defined by SAPI at the end of rule. Value should be “Anywhere;Trailing” if dictation element is supported anywhere in a rule. Dictation elementisan element in a CFG that loads a SLM and returns the recognition result to the application. Refer to SAPI documentation for additional information. |
|
WildcardInCFG |
Value should be “Trailing” if the engine supports garbage elements at the end of a rule in a CFGs, as defined by SAPI. Value should be “Anywhere;Trailing” if garbage elements are supported anywhere in a rule. Garbage is an element in a rule that the engine doesn’t recognize but recognizes the rest of the phrase from a CFG. Refer to SAPI documentation for additional information on Garbage. |
RecognizerToken1\UI |
|
Required - If the Voice has UI –this key stores UI tokens for the voice token. |
RecognizerToken1\UI\EngineProperties
|
|
Required - If engine supports a separate UI for Engine properties. A dialog box appears when the user clicks the Properties button next to the list of Engines on the Speech Recognition tab. If the engine does not support a UI for Engine properties, the Properties button will be unavailable. |
|
CLSID |
Required - Sample CLSID for object which instantiates engine-specific UI from Speech properties in Control Panel. |
RecognizerToken1\UI\ AddRemoveWord |
|
Required - If engine provides a separate UI for Engine properties. This brings up the dialog box for the user to add a word to the user lexicon and language model, typically from within an application. |
|
CLSID |
Required - Sample CLSID for object which instantiates an engine-specific UI from Speech properties in Control Panel. |
RecognizerToken1\UI\MicTraining |
|
Required - If the engine provides a separate UI for Microphone wizard. The microphone wizard sets the microphone gain. It appears from within Speech properties in Control Panel or any application that calls it. SAPI checks to see if RecognizerToken1\UI\MicTraining is populated. Click Configure Microphone on the Speech Recognition tab to invoke the CLSID below, which invokes the microphone training wizard. If the MicTraining token is not supported on the current default engine, the button is unavailable in the Speech Control Panel. |
|
CLSID |
Required - Sample CLSID for object which instantiates engine-specific UI from Speech properties in Control Panel. |
RecognizerToken1\UI\UserTraining |
|
Required - If the engine provides an engine-specific training wizard. The training wizard trains the current default Recognition profile. The training wizard appears from within the Speech properties in Control Panel or any application that calls it. SAPI checks to see if RecognizerToken1\UI\UserTraining is populated. Click Train Profile on the Speech Recognition tab to invoke the CLSID below, which invokes the training wizard. If the UserTraining token under the current default engine is not supported, the button is unavailable in Speech properties in Control Panel. |
|
CLSID |
Required - Sample CLSID for object which instantiates an engine-specific UI from Speech properties in Control Panel. |
RecognizerToken1\UI\RecoProfileProperties |
|
Required - If the engine-specific profile wizard exists. To set any additional properties on the Recognition Profile such as sensitivity, background adaptation, etc. Click Properties on the Speech Recognition tab next to the list of recognition profiles. |
|
CLSID |
Required - Sample CLSID for object which instantiates an engine-specific UI from Speech properties in Control Panel. |
There is also a Recognizers category in the HKCUS hive that stores the selected default Recognizer. This is done exactly as for Voices, as shown in Table 8. The CategoryID is:
HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\Recognizers\
RecoProfiles (RP) is a user-specific, engine-specific data file in which an SR Engine stores the user-specific acoustic and language data. The RP can be thought of as a bag of information that only the engine knows about. The RP also stores the attributes in a few keys under the RP’s key in the registry (this is current Speech Recognition tab of Speech properties in Control Panel).
There are two key reasons for a user to have multiple acoustic profiles:
1. In a shared login case (for example, with a Win98 or Millennium home computer where the users typically press cancel to the login dialog box to enter the computer), multiple files allow two or more users to keep languages and acoustic data separate. In this case, the user will need to manually change the profile to the correct one in Speech properties in Control Panel before starting recognition (or an application may provide its own UI to do this).
2. On a laptop, to offer the user the choice of having different acoustic profiles for different acoustic settings, such as home and office.
A typical RP token is located in the user hive in the following location in the registry
HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\RecoProfiles\Tokens\{ProfileGUID 1}
Initially, SAPI creates only one GUID, called Default User, for the RecoProfile. When the Recognizer is used for the first time, it should create a key under this GUID token of the Recognizer. For instance, if the default recognizer has the GUID XXX, the token HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\RecoProfiles\Tokens\{ProfileGUID 1}\XXXis created. RecoProfile stores all the files and settings under this key. These settings may include paths to the acoustic and language model files for the profile that are modified during speaker enrollment and subsequently during recognition. It may also contain additional data about the profile that may improve the recognizer accuracy, such as age, gender, microphone gain setting and so on.
Under the Recoprofile token, there is a key for the GUID of each engine that has a profile. When keeping the profile the same, a user switches the default engine (say to YYY) in Speech properties in Control Panel. The new engine, on instantiation (or termination of the session) should create thekey HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\RecoProfiles\Tokens\{ProfileGUID 1}\YYY FullPathToFil A
The HKEY_LOCAL_MACHINE/SOFTWARE/MICROSOFT/Speech/AudioInput category contains token enumerators that enumerate all the AudioInput devices present on the computer. There is a token enumerator for each class of AudioInput Device. By default, SAPI 5 will have only a single token enumerator for the MMSys technology. This token enumerator will create an audio token for each AudioInput device (microphone) on the computer and return it when an application or engine calls SpEnumTokens or IenumSpObjectTokens.
The AudioInput category does not have standard attributes, and if multiple technologies are installed, an application needs to inspect each token to find the most suitable one.
Any additional AudioIn token enumerators must meet the requirements for a token enumerator laid out in Table 2. Example of the AudioInput category at:
HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Recognizers\
RegKey |
ValueName |
Comments |
TokenEnums\MMSys |
|
This is the category. |
|
DefaultTokenID |
This Default can point to a token enumerator or token. |
AudioInput1 |
|
This is the key for the audio input device. |
AudioInput1\Attributes |
|
Attributes for the Token are under this key. |
|
Technology |
This is the technology, for example, "MMSys" |
|
Vendor |
This is the vendor name. |
The HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\AudioOutput category contains token enumerators that enumerate all the audio output devices present on the computer. As in the AudioInput category, there is a token enumerator for each technology of audio output Devices. By default, there will be a single token enumerator for MMsys. Under this, there will be entries for each audio output device installed on the computer.
RegKey |
ValueName |
Comments |
TokenEnums\MMSys |
|
This is the category. |
|
DefaultTokenID |
This Default can point to a token enumerator or token. |
AudioOutput1 |
|
This is the key for the audio output device. |
AudioOutput1\Attributes |
|
Attributes for the Token are under this key. |
|
NoSerializeAccess |
Optional: Override serialization of multiple voices. |
|
Technology |
This is the technology, for example, "MMSys" |
|
Vendor |
This is the vendor name. |
The AppLexicons category stores all the application lexicons SAPI knows about. As in other categories, the lexicons are located under HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Applexicons\Tokens. When called, the SpLexicon interface enumerates all the applexicons. Applexicons have no attributes, and therefore, there is no way to load only specific Applexicons. These keys will be created by applications to make their own lexicons available through SAPI.
The ISpPhoneConverter interface enables the application to convert from the SAPI character phoneset to the ID phoneset. Phone Converter keys should go under HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\PhoneConverters,
SAPI has a single phoneconverter for each language. An engine can query for the phoneconverter whose language attribute matches the application’s language of interest.
SAPI stores the user lexicon keys under the HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\UserLexicon key. UserLexicon is not a category by itself. There is no interaction between an application or engine with the UserLexicon token. It is mentioned here only for the sake of completeness of the registry documentation.
Table 1: Parts of a Token in the Registry
Table 2 Parts of the AudioInput token enumerator
Table 3 Common Helper Functions
Table 4 Engine Developer Helper Functions
Table 6 Voices installed on a computer
Table 7 Scoring of tokens matching optional criteria
Table 8 Voice Registry and Attributes
Table 9: Voices - User Registry Settings
Table 10 Sample Entries of a Recognizer token
You are reading help file online using chmlib.com
|