CHMlib Logo Leading Translation Management System for Translation Agencies, Top Translation Management Software for Freelance Translators, Time Tracking Software, Word Count, Character Count and Line Count Software, Computer Assisted Translation Tool for Easy Word Count, Clipboard Character Count Software, User-Friendly Translation Memory Software, Terminology Management Software, Terminology Management Server, Microsoft Glossary Software, Dictionary of Acronyms, Social Network for Translators and Translation Agencies, Acronyms and Abbreviations Search Engine, Russian Translation Lab, Ukrainian Translation Lab.
You are reading help file online using
CrisisHelper - First Aid in Tough Times of World Economic Crisis

The Speech website Microsoft Speech SDK

SAPI 5.1

Text Normalization

You can perform simple text normalization for voice training using the text buffer provided to the engine. Text normalization is the process of changing the input buffer that allows the engine to use preferred word units. The engine word units affect how words are expected to be pronounced as well as how they appear in the voice training wizard.

The text provided to the engine is called an article. An article is composed of multiple phrases, each separated by a new-line character. The voice training wizard displays one phrase at a time.

<article> ::= { <phrase> "\n" }

A phrase is a sequence of word units, separated by white space characters. In this context, white space characters are all characters for which the C run time function iswspace() returns TRUE.

<phrase> ::= { <word> | <literal_symbols> | <numeric_expression> }

Word units


The following symbols are recognized as units. They should be separated from adjacent text with white space; they will "snuggle" to the words appropriately when presented to the user.

<literal_symbols> ::=
"!\exclamation-point" | "\"\end-quote" | "\"\quote" | "#\pound-sign" | "$\dollar" | "%\percent" | "&\ampersand" | "'\end-quote" | "'\quote" | "(\paren" | ")\close-paren" | "*\asterisk" | "+\plus" | ",\comma" | "--\double-dash" | "-\hyphen" | "...\ellipsis" | ".\dot" | ".\period" | "/\slash" | ":\colon" | ";\semicolon" | "<\less-than" | "=\equals" | ">\greater-than" | "?\question-mark" | "@\at-sign" | "[\bracket" | "\\back-slash" | "]\close-bracket" | "^\circumflex" | "_\underscore" | "`\back-quote" | "{\left-brace" | "| \vertical-bar" | "}\right-brace" | "~\tilde"


Numbers can be the following form:

<digit> ::= "0"-"9"

<non_zero_digit> ::= "1"-"9"

<numeric_expression> ::= <integer_expression> | <integer_expression> <cardinal_suffix> | <floating_expression>

<integer_expression> ::= ["-"] <non_zero_digit>[<digit>[<digit>]] { [","] <digit><digit><digit> }

<floating_expression> ::= <integer_expression> "." <digit> [{ <digit> }]

<cardinal_suffix> ::= "st" | "nd" | "rd" | "th"


The remainder of the buffer will be treated as a collection of words:

<alpha_char> ::= "a"-"z"| "A"-"Z"

<word_char> ::= <alpha_char> | "-" | "_" | "0"-"9"

<word> ::= <word_0> | <word_1> | <word_2> | <word_3>

<word0> ::= <alpha_char> [{<word_char>}]

<word1> ::= <alpha_char> [{<word_char>}] "s'"|"in'"

<word2> ::= <alpha_char> [{<word_char>}] "." <word2>

<word3> ::= <abbreviation_string> "."

<abbreviation_string> ::=
"al" | "apr" | "assn" | "assoc" | "atty" | "aug" | "bef" | "bldg" | "ch" | "chg" | "co" | "com" | "cont" | "corp" | "dec" | "def" | "det" | "dev" | "div" | "doc" | "etc" | "ext" | "feb" | "gov" | "in" | "ins" | "int" | "intl" | "jan" | "jr" | "jul" | "jun" | "mar" | "messrs" | "mos" | "mph" | "mr" | "mrs" | "ms" | "mt" | "no" | "nov" | "oct" | "oz" | "par" | "pct" | "pfc" | "pp" | "pres" | "prov" | "pt" | "qtr" | "ref" | "reg" | "rep" | "rev" | "sdn" | "sec" | "sep" | "sq" | "sr" | "tech" | "vol" | "wm"

You are reading help file online using

If you want your help file to be removed or added please send e-mail to
Partner sites: Logo Design, Simple Anti-Crisis Accounting Software, Voice Search for Web