(Page 6 of 7 in this chapter)


2.5 Filtering and Preprocessing Text Input

NaturalText text-to-speech converters speak any text you pass to them. However, NaturalText provides two sets of functions for processing text input that is improperly punctuated, imperfectly formatted, or that includes non-standard abbreviations or acronyms. These functions provide a way to filter and interpret text input so that converters speak it correctly. The NaturalText API provides functions for:

2.5.1 Filtering Text with User Exception Dictionaries

User Exception Dictionaries (UEDs) replace difficult to pronounce words and text strings with specific pronunciations. UED files are text files that establish specific substitution strings for targeted words. Calls to ttsLoadUED read the contents of UED files into host memory. Applications begin UED filtering by linking to loaded UED files.

Note: Once a UED is loaded and linked, UED filtering takes place automatically. This process takes place on the host machine rather than the DSP firmware.

UED files are simple text files consisting of two columns of text separated by white space. The first column identifies words to be replaced. The second column contains appropriate replacement strings. Replacement strings can consist of special phonetic symbols that the converter pronounces in a specific way, or of simplified spellings that closely approximate the desired pronunciation.

The following limits apply to UED files:

Once a UED file has been loaded, all input text for a specific context of text-to-speech is filtered according to entries in the file. When a UED encounters a word specified in the first column of the UED file, it replaces the selection with the corresponding replacement string from the second column. The example below shows two entries from a typical UED file:
Hrbec

Herbeck

Tsongas

\033[1ISoN1$GuS\033[0I

The top entry provides a simpler spelling for the specified word. The second entry uses phonemes to establish an exact pronunciation for the specified word. Phonemes are character sequences used to indicate phonetic spellings. See the NaturalText Text-to-Speech Reference Manual for more information about using phonemes in input text.

Follow the steps below to initiate and stop UED filtering for a particular context of text-to-speech:

  1. Load a UED file into host memory by calling ttsLoadUED.

    
    Note:   ttsLoadUED returns a UED handle. This handle is global to the process and can be linked to any CT Access context.
    
    
  2. Associate the UED with a CT Access handle by calling ttsLinkUED. A dictionary can be linked to multiple CT Access contexts, but each context can only be linked to one dictionary.

    
    
  3. Call ttsSpeak. Once the CT Access context and UED are linked, any text sent to the context is automatically filtered by the UED.

    
    
  4. Stop UED processing for a specific text-to-speech context by calling ttsUnlinkUED. ttsUnlinkUED unlinks a particular UED from a specific CT Access context. Any other contexts linked to the same UED remain unaffected.

    
    
  5. Remove a particular UED from memory by calling ttsUnloadUED. This unloads the UED from memory. However, UEDs cannot be unloaded if they are still linked to any CT Access contexts.

2.5.2 Using the E-mail Preprocessor

The E-mail Preprocessor filters and interprets English electronic mail message input before NaturalText speaks it. E-mail preprocessing accomplishes the following:

The NaturalText API includes two functions for preprocessing English text on the host system. The functions ttsPreprocess and ttsPreprocessFile preprocess English e-mail text so it can be passed to ttsSpeak.

ttsPreprocess operates on text buffers, while ttsPreprocessFile file operates on files. You can discard or retain message headers by specifying either TTS_METHOD_EMAIL_KEEP_HEADER or TTS_METHOD_EMAIL_STRIP_HEADER as processing methods.

The example code below illustrates a way of using ttsPreprocess to preprocess text from buffers:

  char      *buffer, *out_buffer;
    unsigned  outputbuf_size, result_size ;
    CTAHD     ctahd;
    DWORD     ret_code;

    ret_code = ttsPreprocess(
        buffer,                            /* Pointer to input text buffer      */
        TTSMETHOD_EMAIL_STRIP_HEADER,      /* Processing method                            */
        &result_size,                      /* Actual length of the the output*/
                                           /* text after processing.                  */
        out_buffer,                        /* Pointer to output text buffer    */
        outputbuf_size );                             /* Size of output buffer                      */

    if( ret_code == SUCCESS )
        ttsSpeak( ctahd, out_buffer );

Note: ttsPreprocess and ttsPreprocessFile run on the host machine and only apply to English text input.



(Page 6 of 7 in this chapter)


tech_support@nmss.com
Copyright © 1997, Natural MicroSystems, Inc. All rights reserved.