CHMlib Logo Leading Translation Management System for Translation Agencies, Top Translation Management Software for Freelance Translators, Time Tracking Software, Word Count, Character Count and Line Count Software, Computer Assisted Translation Tool for Easy Word Count, Clipboard Character Count Software, User-Friendly Translation Memory Software, Terminology Management Software, Terminology Management Server, Microsoft Glossary Software, Dictionary of Acronyms, Social Network for Translators and Translation Agencies, Acronyms and Abbreviations Search Engine, Russian Translation Lab, Ukrainian Translation Lab.
You are reading help file online using chmlib.com
CrisisHelper - First Aid in Tough Times of World Economic Crisis


The Microsoft.com Speech website Microsoft Speech SDK

SAPI 5.1

Text Grammar Format Overview

The Extensible Markup Language (XML) format inside a GRAMMAR XML element (block), is an "expert–only–readable" declaration of a grammar that a speech application uses to accomplish the following:

A GRAMMAR XML element (block) appears in a XML source code file. The XML source is compiled into a binary grammar format and is the format used by SAPI during application run time.


The following section covers:

Extensible Markup Language

The textual grammar format is an application of the XML. Every XML element consists of a start tag (<SOME_TAG>) and an end tag (</SOME_TAG>) with a case-insensitive tag name and contents between these tags. The start tag and the end tag are the same if the element is empty. For example, the tag (<SOME_TAG/>). For more information on the use of XML grammars, please see the Grammar XML Schema section. Additionally, more information about XML and the XML specification is available at: http://www.w3.org/TR/REC-xml.

For example, all grammars contain the opening tag <GRAMMAR> as follows:

<GRAMMAR>
... grammar content
</GRAMMAR>
Note that the contents of the grammar is contained between an opening tag and a trailing, closing tag.

return to the top of this pageBack to top

Attributes

Attributes of an XML element appear inside the start tag. Each attribute is in the form of a name followed by an equal sign followed by a string which must be surrounded by either single or double quotation marks. An attribute of a given name may only appear once in a start tag.

In summary, the literal string cannot contain either < or ', if the string is surrounded by single quotation marks. It may not contain ", if the string is surrounded by double quotation marks. Furthermore, use all ampersand (&) characters only in an entity reference such as &amp; and &gt;. When a literal string is parsed, the resulting replacement text will resolve all entity references such as &gt; into its corresponding text, such as >. In this specification, only the resulting replacement text needs to be defined for attribute value strings. More information about XML and the XML specification is available at: http://www.w3.org/TR/REC-xml.

For example, the grammar author can specify the language (id) of the grammar as follows.

<GRAMMAR LANGID="409">
... grammar content
</GRAMMAR>
The grammar element (<GRAMMAR>) has an attribute, called LANGID which must be a numeric value. The grammar author specifies the language attribute by placing the attribute inside the brackets of the opening tag, and enclosing the attribute value (e.g. 409) in quotation marks.

return to the top of this pageBack to top

Contents

The contents of an element consists of text or subelements. Formal definitions of valid contents in this specification are provided as regular and "multi-set" expressions. The pseudo-element name "Text" indicates untagged text. With these definitions, the XML specification defines the exact file syntax details.

For example, the grammar author can place either text or sub-elements inside a phrase tag as follows.

<PHRASE>
   hello
</PHRASE>

<PHRASE>
   <OPT>world</OPT>
</PHRASE>

The grammar author should review the SAPI 5 Grammar XML Schema to determine the type of content support in each tag (e.g. text and sub-elements, only text, only sub-elements, etc.).

return to the top of this pageBack to top

Comments

The SAPI 5 XML parser treats HTML comment tags as unknown XML tag elements. The engine should provide support for comments and other unknown XML elements.

It is recommended that grammar authors place comments in their XML files (e.g. mygrammar.xml), similar to commenting source code, since the XML parser will safely parse the comments without affecting the grammar itself. Similarly, there is increase in size of the binary form of the grammar (e.g. mygrammar.cfg) since the SAPI 5 grammar compiler strips out the comments.

An example of a comment in an XML grammar is as follows.

   <!-- the 'travel' rule is the main voice command for our app, so it active by default -->
   <RULE ID="RID_Travel" TOPLEVEL="ACTIVE">
      <PHRASE>travel from</PHRASE>

      <!-- include location grammar component, so we can change the location list at runtime -->
      <RULEREF REFID="RID_Location" PROPID="PID_FromDestination"/>
      <PHRASE>to</PHRASE>

      <!-- include location grammar component, so we can change the location list at runtime -->
      <RULEREF REFID="RID_Location" PROPID="PID_ToDestination"/>
   </RULE>

Note that the comment blocks always begin with <!-- and end with -->.

return to the top of this pageBack to top

How SAPI utilizes XML information

SAPI uses XML content in the following two methods.

  1. The SAPI context-free grammar compiler, compiles the XML grammar into a binary grammar format. The compiled binary grammar is loaded into the SAPI run-time environment from a file, memory, or object (.DLL) resource.
  2. The speech recognition (SR) engine queries the run-time environment for available grammar information.

return to the top of this pageBack to top

Frequently used definitions

Untagged text declaring a sequence of words that the recognition engine will recognize. Tentatively this text is only the not-necessarily-phonetic representation of words used for reading words whose pronunciation is unknown to the user (for example, for Japanese, kana, not kanji); this form will be called the spelling form. In further definitions in this section, Text will be referenced as though it were a pseudo-element.

return to the top of this pageBack to top

Non–empty concatenated recognition contents

The contents of a number of XML elements in this specification such as, the P element, contain a sequence of grammar constructs which are concatenated together (one grammar construct after another). These grammar elements must be recognized in order for the contents defined to be recognized.

The contents must be one of the following (and not both):

Text and any number of L, P, O, or RULEREF elements in any order with at least one L, P, or RULEREF.

For more information on the use of XML grammars, please see the Grammar XML Schema section.

return to the top of this pageBack to top



You are reading help file online using chmlib.com

If you want your help file to be removed or added please send e-mail to chmlibcom@gmail.com
Partner sites: Logo Design, Simple Anti-Crisis Accounting Software, Voice Search for Web