You are reading help file online using chmlib.com
|
You can perform simple text normalization for voice training using the text buffer provided to the engine. Text normalization is the process of changing the input buffer that allows the engine to use preferred word units. The engine word units affect how words are expected to be pronounced as well as how they appear in the voice training wizard.
The text provided to the engine is called an article. An article is composed of multiple phrases, each separated by a new-line character. The voice training wizard displays one phrase at a time.
<article> ::= { <phrase> "\n" }
A phrase is a sequence of word units, separated by white space characters. In this context, white space characters are all characters for which the C run time function
<phrase> ::= { <word> | <literal_symbols> | <numeric_expression> }
The following symbols are recognized as units. They should be separated from adjacent text with white space; they will "snuggle" to the words appropriately when presented to the user.
<literal_symbols> ::=
"!\exclamation-point" | "\"\end-quote" | "\"\quote" | "#\pound-sign" | "$\dollar" | "%\percent" | "&\ampersand" | "'\end-quote" | "'\quote" | "(\paren" | ")\close-paren" | "*\asterisk" | "+\plus" | ",\comma" | "--\double-dash" | "-\hyphen" | "...\ellipsis" | ".\dot" | ".\period" | "/\slash" | ":\colon" | ";\semicolon" | "<\less-than" | "=\equals" | ">\greater-than" | "?\question-mark" | "@\at-sign" | "[\bracket" | "\\back-slash" | "]\close-bracket" | "^\circumflex" | "_\underscore" | "`\back-quote" | "{\left-brace" | "| \vertical-bar" | "}\right-brace" | "~\tilde"
Numbers can be the following form:
<digit> ::= "0"-"9"
<non_zero_digit> ::= "1"-"9"
<numeric_expression> ::= <integer_expression> | <integer_expression> <cardinal_suffix> | <floating_expression>
<integer_expression> ::= ["-"] <non_zero_digit>[<digit>[<digit>]] { [","] <digit><digit><digit> }
<floating_expression> ::= <integer_expression> "." <digit> [{ <digit> }]
<cardinal_suffix> ::= "st" | "nd" | "rd" | "th"
The remainder of the buffer will be treated as a collection of words:
<alpha_char> ::= "a"-"z"| "A"-"Z"
<word_char> ::= <alpha_char> | "-" | "_" | "0"-"9"
<word> ::= <word_0> | <word_1> | <word_2> | <word_3>
<word0> ::= <alpha_char> [{<word_char>}]
<word1> ::= <alpha_char> [{<word_char>}] "s'"|"in'"
<word2> ::= <alpha_char> [{<word_char>}] "." <word2>
<word3> ::= <abbreviation_string> "."
<abbreviation_string> ::=
"al" | "apr" | "assn" | "assoc" | "atty" | "aug" | "bef" | "bldg" | "ch" | "chg" | "co" | "com" | "cont" | "corp" | "dec" | "def" | "det" | "dev" | "div" | "doc" | "etc" | "ext" | "feb" | "gov" | "in" | "ins" | "int" | "intl" | "jan" | "jr" | "jul" | "jun" | "mar" | "messrs" | "mos" | "mph" | "mr" | "mrs" | "ms" | "mt" | "no" | "nov" | "oct" | "oz" | "par" | "pct" | "pfc" | "pp" | "pres" | "prov" | "pt" | "qtr" | "ref" | "reg" | "rep" | "rev" | "sdn" | "sec" | "sep" | "sq" | "sr" | "tech" | "vol" | "wm"