To modify normal speech sounds, use the control sequences explained in this section. These control sequences perform the following functions:
Generate non-speech sounds. Non-speech sounds included here are:
Signal
Silence
Tone
Modify the following speech attributes:
Attenuation
Baseline Pitch
Emphasis or Stress
Fast Scanning
Prosody
Speech Rate
Voice
High Frequency Energy Boost
Select either of the following data input characters
Phoneme Characters
Text Characters
5.2 Speech Attribute Selection
Speech attributes are modified by using the control sequences shown below.
Attenuation Select
Baseline Pitch Select
Emphatic Stress Character
Deemphasis Stress Character
Fast Scan
Prosody Select
Speech Rate Select
Indexed Speech Rate Select
Voice Select
High Frequency Energy Boost Select
5.3 Attenuation Select
Sets the audio output attenuation for speech produced after this sequence.
Format:
<ESC> [ <p1> a
Parameters:
<p1> Is the attenuation factor in decibels (dB). Use a decimal number between 0 and 15. A value greater than 15 causes the attenuation factor to be set at 15. A value of zero indicates the maximum output volume level.
Each step above zero causes an attenuation of 1 dB from the maximum output level. The attenuation value also affects the volume of tones produced with the Generate Tone sequence.
Zero is the default value. The selected attenuation remains in effect until changed by another Attenuation Select control sequence or when a reset sequence occurs.
a Speech Attribute function ID.
Examples:
In the following example, the attenuation is temporarily set to 5 to speak softly, and then reset to 0 for maximum volume.
Input: She <SP> muttered <ESC>[5a "Get <SP> out <SP> of <SP> here!" <ESC>[0a
5.4 Baseline Pitch Select
Sets the baseline pitch value (fundamental frequency) for speech produced from text received after this sequence.
Normally, the baseline pitch is the value from which all variations in pitch are calculated within a sentence. However, if the "Monotone Pitch" switch is on, pitch remains at the constant value given by the baseline pitch, and there are no variations. See "Monotone Pitch" in Section 4.
If clause prosody is being used, a baseline pitch change does not take effect until the beginning of the next clause which is received after this sequence.
The selected baseline pitch remains in effect until changed by another Baseline Pitch Select control sequence or when a Reset sequence occurs.
Before the processing of any text on each SPEAK verb, the baseline pitch is set to the default or to that specified with a PITCH clause. If you change the baseline pitch, it will be changed at the next SPEAK when the control sequence is encountered in the text.
Format:
<ESC> [ <p1> p
Parameters:
<p1> Is the value of the baseline pitch in Hz. Use a decimal number between 50 and 200.
A value of zero produces a whispering voice.
A value greater than 0 but less than 50 causes the baseline pitch to be set at 50, and a value greater than 200 causes the baseline pitch to be set at 200.
Values greater than 255 are rejected, but the speech is not altered. The default is 85.
The maximum resolution of baseline pitch is about ten Hz, so there may be no detectable difference between a setting of 103 and 107, for example.
p Baseline Pitch Select function ID.
Examples:
In the following example, the baseline pitch is temporarily raised to 200 Hz to give an adult the high-pitched voice of a child. It is then reset to 80 Hz.
Input: The <SP> little <SP> girl <SP> said: <ESC>[200p Where <SP> is <SP> my <SP> kitty? <ESC>[80p
5.5 Emphatic Stress Character
The tilde text character (~) is used as the first character of a word to indicate either emphasis or stress.
The tilde is not affected by the following:
It is not recognized by rules which are sensitive to the first letter of a word.
It is NOT used to stress abbreviations and special character sequences such as numbers, monetary amounts, etc.
The tilde only works in text mode.
Format:
~ <text>
Examples:
In the following sentence, the word "John" receives extra stress.
Input: It <SP> was <SP> ~John <SP> who <SP> did <SP> it.
The Emphatic Stress Marker and Protracted Duration Marker provide alternative means for indicating emphasis. See Section 6 for details.
5.6 Deemphasis Stress Character
The grave text character is used as the first character of a word to indicate deemphasis in a sentence.
The grave is not affected by the following:
It is not recognized by rules which are sensitive to the first letter of a word.
It is NOT used to stress abbreviations and special character sequences such as numbers, monetary amounts, etc.
The grave only works in text mode.
Format:
` <text>
Examples:
In the following sentence, the word "series" receives less stress.
Input: The <SP> giants <SP> won <SP> the <SP> world <SP> `series <SP> last <SP> year.
5.7 Fast Scan
Sets a "fast scan" mode of operation for speech produced.
When you select a non-zero scan skip number N (the <p1> value):
The system does not speak less essential words. These words, called function words, are typically prepositions, the definite and indefinite articles, and one-syllable words that do not give essential information to the text.
The TTSC speaks every Nth (where N is the value you specify for <p1>) remaining word. These words, called content words, typically include nouns, most verbs, adjectives, and adverbs.
The Fast Scan sequence remains until changed by another Fast Scan control sequence.
Format:
<ESC> [ <p1> f
Parameters:
<p1> Is the scan skip number. Use a decimal number between 0 and 7. A value greater than 7 causes the scan skip number to be set to 7. A value of zero causes normal operation without fast scanning.
The default is 0. The selected number remains in effect until changed by another fast scan or reset sequence.
You may select the Fast Scan speech rate with the Speech Rate Select control sequence. However, to keep the speech output as clear as possible, the TTSC automatically lowers the maximum allowed speech rate when the scan skip number is greater than 0.
If you specify a rate faster than the allowed maximum in conjunction with the Fast Scan sequence, the TTSC automatically lowers the speech.
When the Fast Scan sequence is set to zero, the system resets the speech to the value that was in effect before the Fast Scan sequence.
Note: If text is coming in at a sufficiently slow baud rate and the fast scan mode is in effect, it is possible to introduce pauses, which would not otherwise occur, between words.
f Fast Scan function ID.
Examples:
The following message is spoken with no fast scanning:
Input: Studies have not been done extensively, because its use seems to offer considerable speaker and situation dependent options. Certain researchers have suggested that phonological sequences across lexical boundaries is the principal factor which determines whether the presence of a glottal stop is obligatory.
Speech: With a scan number of 1
Studies not done extensively, use seems offer considerable speaker situation dependent options. Certain researchers suggested phonological sequences lexical boundaries principal factor determines presence glottal stop obligatory.
With a scan number of 3
Done seems speaker options. Suggested lexical factor glottal.
With a scan number of 6
Seems options. lexical glottal.
5.8 Prosody Select
Specifies either word prosody or clause prosody for speech produced from text received after this sequence. The selected prosody mode remains in effect until it is changed by another Prosody Select control sequence or when a Reset sequence occurs.
Format:
<ESC> [ <p1> P
Parameters:
<p1> Specifies the prosody value. The choices are:
0 - Word Prosody and 1 - Clause Prosody
Any other value causes a syntax error and the sequence is discarded. The default is clause prosody.
Word Prosody If you choose word prosody, each word is pronounced with full stress, without regard to the surrounding environment, and produces easily understood, but less natural sounding speech.
Speech production begins immediately after the TTSC receives a complete word.
Clause Prosody With clause prosody, an entire phrase is analyzed before speech begins, and words are stressed in relation to their environment. The speech produced resembles connected human speech in stress and intonation.
In clause prosody, speech production doesnt begin until the TTSC has received a complete sentence.
For long sentences, pauses are inserted automatically to improve prosody. Use commas liberally to explicitly control prosody.
P Prosody Select function ID.
Examples:
In the following example, Word Prosody is selected.
Input: <ESC>[0P This <SP> is ...
Speech: This, is, ...
5.9 Speech Rate Select
Sets the speech rate for speech produced from input characters received after this sequence.
Before the processing of any text on each SPEAK verb, the speaking rate is set to the default or to that specified with a RATE clause. If you change the speaking rate, it will be changed at the next SPEAK when the control sequence is encountered in the text.
Format:
<ESC> [ <p1> r
Parameters:
<p1> Is the value of the speech rate in words per minute. Use a decimal number between 50 and 250. A value less than 50 causes the speech rate to be set at 50, and a value greater than 250 causes the speech rate to be set at 250.
The default is 150. The maximum resolution of the speech rate is about ten words per minute. You may not detect the difference between 135 and 137, for example.
The selected speech rate remains in effect until changed by another Speech Rate Select control sequence, an Indexed Speaking Rate Select control sequence, or when a reset sequence occurs.
r Speech Rate Select function ID.
Examples:
In the following example, the speech rate is temporarily lowered to allow more time for a listener to write down a spoken address.
Input: His<SP>address<SP>is<ESC>[100r 457 <SP>7th.<ESC>[r
5.10 Indexed Speaking Rate Select
Changes the speaking rate according to an arbitrary scale instead of a standard rate in words per minute.
If both the Speech Rate Select and Indexed Speaking Rate control sequences are used, the most recently invoked one controls the speaking rate.
The relationship between the two sequences is as follows:
r = 50 + 8 * v
v = (r-50)/8 truncated to an integer
Before the processing of any text on each SPEAK verb, the speaking rate is set to the default or to that specified with a RATE clause. If you change the speaking rate, it will be changed at the next SPEAK when the control sequence is encountered in the text.
Format:
<ESC> [ <p1> v
Parameters:
<p1> is the arbitrary speech rate. Use a decimal number between 0 and 25, where 0 specifies the slowest rate and 25 the fastest.
The default is 13.
v Indexed Speaking Rate Select function ID.
Examples:
In the following example, the speech rate is the same as in the example shown for Speech Select Rate.
Input: His<SP>address<SP>is<ESC>[6v<SP> 457<SP>7th.<ESC>[v
5.11 Voice Select
Changes voice to one that is preferable or to provide contrast.
Before the processing of any text on each SPEAK verb, the voice is set to the default or to that specified with a VOICE clause. If you change the voice, it will be changed at the next SPEAK when the control sequence is encountered in the text.
Format:
<ESC> [ <p1> V
Parameters:
<p1> is the voice type. The choices are:
0 = Default Voice.
Default Speech rate = 150 words per minute
Default Baseline pitch = 85 Hz
1 = Older, larger male with lower pitch.
Default Speech rate = 155 words per minute
Default Baseline pitch = 75 Hz
2 = Younger, smaller, fast-talking male with high pitch.
Default Speech rate = 170 words per minute
Default Baseline pitch = 110 Hz
v Voice Select function ID.
5.12 High Frequency Energy Boost Select
Controls high-frequency-energy boost.
The high-frequency-energy boost used over long-distance phone lines may cause undesirable effects when the speech output is heard through headphones or some PBX equipment. The TTSC uses activate/deactivate switch 1 to control this energy boost. It is normally activated because typical use is over the telephone. It should be deactivated when you use headphones or a loudspeaker.
Format:
<ESC> [1A = Activates the boost. (Default)
<ESC> [1D = Deactivates the boost
5.13 Input Select
Sets the interpretation of data input characters.
If you use phonemes, be sure to set INPUT SELECT back to text mode at the end of your SPEAK. Ordinary text sent to the TTSC with phoneme mode ON produces speech that sounds like an alien language.
Format:
<ESC> [ <p1> I
Parameters:
<p1> specifies text or phoneme characters. The values are:
0 Text characters
1 Phoneme characters
If any other value is selected, the sequence is discarded.
The default input selection is text characters. The selected input remains in effect until changed by another Input Select control sequence or when a Reset sequence occurs.
I Input Selection function ID.
Example:
In the following example, the word "Xerox" is input as phoneme characters.
Input: He <SP> works <SP> at <ESC>[1I &ZE1RoKS <ESC>[0I.