Text characters are used to specify the message that is to be spoken. The complete set of text characters includes all "printable" ASCII characters which have hexadecimal codes between 20 and 7E. Text character input is the default condition and is selected at the completion of phoneme character processing using the control sequence:
<ESC>[0I
Several text processing switches govern the way the TTSC processes text characters. These switches can be turned on and off using the Switch On and Switch Off control sequences. The switches are described in detail in Section 4. Unless stated otherwise, the following descriptions assume that the switches are in their default positions.
Text character strings described in this section are as follows:
Alphabetic Emphatic and deemphasis stress
Acronyms Words without vowels
Abbreviations Time of day
Numbers Word Separators
Monetary amounts Apostrophe
Sentence termination Internal sentence punctuation
Colon Hyphen
Unreadable input Miscellaneous punctuation
3.3 Lowercase Alphabetic Text Characters
The text characters "a" through "z" are the most used characters. They are used for specifying words that are composed entirely of lowercase letters.
Example: A word composed entirely of lower case alpha characters.
Input: thoroughly
3.4 Upper Case Alphabetic Text Characters
The text characters A through Z are used to capitalize words, as at the start of a sentence or a proper name, and for acronyms.
Example: A word containing an initial upper case letter.
Input: Sam
Example: An acronym.
Input: TTSC
3.5 Emphatic And Deemphasis Stress Characters
The tilde is used as the first character of a word to indicate primary emphasis or stress. The grave is similarly used to indicate deemphasis in a sentence. The tilde and grave are ignored by rules which are sensitive to the first letter of a word.
|
CAUTION! The tilde and grave CANNOT be used to stress abbreviations and special character sequences such as numbers and monetary amounts. They cannot be used in phoneme mode. |
Example: A sentence in which the word "John" is to receive extra stress.
Input: It <SP> was <SP> ~ John <SP> who <SP> did <SP> it.
Example: A sentence in which the word "series" receives less stress than normally.
Input: The <SP> giants <SP> won <SP> the <SP> world <SP> `series <SP> last <SP> year.
3.6 Acronyms
An acronym is an uppercase word containing at least two letters. The TTSC pronounces each individual letter of an acronym. If the last letter of an acronym is immediately followed by an apostrophe and the letter "s" (or "S"), it is pronounced plural.
Example: It is SPIs speech.
Input: It's <SP> SPI's <SP> speech.
Speech: Its ess pee eyes speech.
The Upper Case Acronym Pronunciation Switch (7) affects the way all uppercase words are pronounced. When this switch is on (default), uppercase words are pronounced as acronyms in the manner described above<197>that is, they are spelled. When it is off, uppercase words are pronounced as regular words rather than as acronyms.
3.7 Abbreviations
The abbreviations in the following list are allowed as input. An abbreviation must contain at least one lowercase letter or it is recognized as an acronym (described above). The TTSC recognizes the abbreviations below and pronounces them as shown. In addition, two-letter state abbreviations and one- and two-letter direction abbreviations are pronounced under switch control as described in Section 4.
3.7.1 Days
Tue. Tuesday
Tues. Tuesday
Wed. Wednesday
Thu. Thursday
Thur. Thursday
Thurs. Thursday
Fri. Friday
Sat. Saturday
Feb. February Sept. September
Mar. March Sep. September
Apr. April Oct. October
Jun. June Nov. November
Jul. July Dec. December
3.7.3 Names and Addresses
Co. company
Dept. department
Hq. headquarters
Inc. incorporated
Ltd. limited
Univ. university
3.7.4 Thoroughfares
av. avenue
ave. avenue
blvd. boulevard
cir. circle
crk. creek
ct. court
ctr. center
cyn. canyon
dr. doctor or drive
expy. expressway
ft. fort or foot
fwy. freeway
hwy. highway
jct. junction
ln. lane
mt. mount
mtn. mountain
pk. park
pkwy. parkway
pky. parkway
pl. place
plz. plaza
rd. road
sq. square
st. saint or street
ter. terrace
terr. terrace
tpke. turnpike
tr. trail.
wy. way
gen. general
gov. governor
jr. junior
mgr. manager
ms. miz
mr. mister
mrs. misses
rev. reverend
secy. secretary
sr. senior
st. saint or street
3.7.6 Weights and Measures
mg. milligrams
ft. fort or foot
gal. gallon
gm. grams
hr. hours
hrs. hours
hz. hertz
KHz. kilohertz
MHz. megahertz
GHz. gigahertz
in. inches
kg. kilograms
km. kilometers
lb. pounds
lbs. pounds
mi. mile
min. minutes
mins. minutes
ml. milliliters
mm. millimeters
oz. ounce
ozs. ounces
pF picofarad
sec. seconds
secs. seconds
msec. millisecond
ns. nanoseconds
3.7.7 Miscellaneous
cf. compare
cu. cubic
ea. each
e.g. for example,
e.g for example,
eg. for example,
etc. et cetera
ext. extension
fig. figure
gov't government
i.e. that is,
ie that is,
misc. miscellaneous
no. number
ppd. post paid
pm. pee em
tts text to speech
vol. volume
vs. versus
Examples:
Input: Mr. <SP> Jones <SP> lives <SP> on <SP> Moody <SP> Rd. <SP><SP> <MS>
Speech: Mister Jones lives on Moody Road.
Input: It <SP> was <SP> on <SP> Fri., <SP> Jan. <SP> first.
Speech: It was on Friday, January first.
Input: He's <SP> from <SP> Boston, <SP> Mass. <SP><SP> <MS>
Speech: Hes from Boston, mass.
3.7.8 Abbreviations at End of a Sentence
The period at the end of most abbreviations acts as a sentence terminator if, and only if, the next non-space character is an upper case letter. The period never acts as a sentence terminator when it is used in abbreviations followed by a persons name.
Brig. brigadier Prof. professor
Bros. brothers Rev. reverend
Col. colonel Sgt. sergeant
Dr. doctor St. saint
Gen. general
Gov. governor
Lieut. lieutenant
Maj. major
Ms. miz
Mr. mister
Mrs. misses
Mt. mount
Rev. reverend
Sr. senior
St. saint
If an abbreviation is the last word in a sentence, follow it with two spaces, another sentence, or a "Commence Speak" escape sequence to force the sentence to be spoken.
Examples:
Input: Mr. Jones <SP> ate <SP> two <SP> kg. <SP> So <SP> what?
Speech: Mister Jones ate two kilograms. So what?
Input: Mr. Jones <SP> ate <SP> two <SP> kg. <SP> today.
Speech: Mister Jones ate two kilograms today.
Input: On <SP> tues. <SP> I <SP> left.
Speech: On tuesday. I left.
Input: The <SP> signal <SP> was <SP> 10 ms. <SP> late.
Speech: The signal was ten miz late.
3.7.9 Word Abbreviations
Most abbreviations are treated as abbreviations whether they are followed by a period or not. However, some character strings can be abbreviations or words. These strings are:
co. company Mar. March
fig. figure no. number
gal. gallon Sat. Saturday
in. inches Sun. Sunday
Jan. January Wed. Wednesday
The rules for word abbreviations are listed below:
1. If the string is not terminated by a period, it is treated as a word.
2. If the string is terminated by a period and the string could be an abbreviation for a month or day of the week, and the first character in the string is capitalized, the string will be treated as an abbreviation. If the first character is not capitalized, the string is treated as a word, and the period retained.
3. The period is retained at the end of the sentence for the following reasons:
If the string is not an abbreviation for a month or day of the week and is at the end of a sentence.
The next non-space character is a control character.
The next non-space character is another period.
If the next non-space character is lower case or digit (the only cases left) the string is treated as an abbreviation, and the period is deleted.
Examples:
Input: Mr. <SP> Jones <SP> looked <SP> at <SP> fig. <SP> three.
Speech: Mister Jones looked at figure three.
Input: He <SP> ate <SP> a <SP> fig. <SP> She <SP> ate <SP> one <SP> too.
Speech: He ate a fig. She ate one too.
Input: Mr <SP> Jones <SP> looked <SP> at <SP> fig <SP> three.
Speech: Mister Jones looked at fig three.
Input: He's <SP> coming >SP> next <SP> Wed. <SP> So <SP> is <SP> she.
Speech: Hes coming next Wednesday. So is she.
Input: Give <SP> me <SP> one <SP> gal. <SP> of <SP> gas.
Speech: Give me one gallon of gas.
Input: I <SP> know <SP> a <SP> nice <SP> gal.
Speech: I know a nice gal.
Input: Jan <SP> was <SP> wed <SP> on <SP> sun. <SP> last <SP> week.
Speech: Jan was wed on sun. Last week.
3.8 Special Abbreviations
The abbreviations "dr.," "st.," and "ft." are specially handled.
A double period is used to force the abbreviation to be recognized as the end of sentence as shown in the last example below.
These three abbreviations may appear without any period at all. In this case the same rules are used to determine what the abbreviation stands for.
3.8.1 Doctor and Drive
If the next non-space character following "dr." is an upper case letter, "dr." is pronounced "doctor," otherwise it is pronounced as "drive." The period in "dr." is never recognized as the end of the sentence.
3.8.2 Street and Saint
If the next non-space character following "st." is an upper case letter, "st." is pronounced "saint," otherwise it is pronounced as "street." The period in "st." is never recognized as the end of the sentence.
3.8.3 Fort and Foot
If the next non-space character following "ft." is an upper case letter, "ft." is pronounced "fort," otherwise it is pronounced as "foot." The period in "ft." is never recognized as the end of the sentence.
Examples:
Input: I <SP> live <SP> on <SP> Oak <SP> Dr. <SP> near <SP> John.
Speech: I live on Oak Drive near John.
Input: Dr. <SP> Bernstein <SP> cured <SP> him.
Speech: Doctor Bernstein cured him.
Input: Dr <SP> Bernstein <SP> cured <SP> him.
Speech: Doctor Bernstein cured him.
Input: I <SP> live <SP> on <SP> Oak <SP> Dr. <SP> So <SP> does <SP> she.
Speech: I live on Oak Doctor So does she.
Input: I <SP> live <SP> on <SP> Oak <SP> Dr <SP> <SP> So <SP> does <SP> she.
Speech: I live on Oak Doctor So does she.
Input: I <SP> live <SP> on <SP> Oak <SP> Dr.. <SP> So <SP> does <SP> she.
Speech: I live on Oak Drive. So does she.
3.9 Words Without Vowels
Words that do not contain a vowel (a, e, i, o, u, y) and are not abbreviations are spelled out letter by letter.
Example:
Input: It <SP> was <SP> grbld.
Speech: It was jee are bee ell dee.
3.10 Numbers
Numbers may appear in the input text, with optional commas to block off groups of three digits. Numbers using commas as three-digit group separators are pronounced using the terms trillion, billion, million, thousand, and hundred, as appropriate.
Examples:
Input: 7
Speech: seven
Input: 0
Speech: zero
Input: 15
Speech: fifteen
Input: 017
Speech: zero seventeen
Input: 98
Speech: ninety eight
Input: +10
Speech: plus ten
Input: 325
Speech: three twenty five
Input: 1234
Speech: twelve thirty four
Input: 1,990
Speech: one thousand nine hundred ninety
Input: 1,017
Speech: one thousand seventeen
Input: 981,234,567,890,123
Speech: nine hundred eighty one trillion, two hundred thirty ...
3.10.1 Numbers without Commas
Numbers that contain five or more digits and do not contain commas are pronounced digit by digit.
Examples:
Input: 19902
Speech: one nine nine zero two
Input: 34567890123
Speech: three four five six seven eight nine zero one two three
3.10.2 Large Numbers
A comma-blocked number with 16 or more digits is pronounced as a list of numbers rather than one large number.
Example:
Input: 1,981,234,567,890,123
Speech: one, nine eighty one, two thirty four, ...
3.10.3 Numbers with Inconsistent Commas
Commas that dont block off three-digit groups are assumed to be separating a list of numbers.
Example:
Input: 76,34
Speech: seventy six, thirty four
Input: 305,67,890
Speech: three oh five, sixty seven, eight ninety
Input: 305,67,<SP>890
Speech: three oh five, sixty seven, eight ninety
3.10.4 Numbers with Decimal Points
The decimal point following a number is pronounced "point" if the next character is a digit. If not, the period acts as a sentence terminator. Digits following a decimal point are pronounced one by one.
Examples:
Input: .7
Speech: point seven
Input: .76034
Speech: point seven six zero three four
Input: 0.7
Speech: zero point seven
Input: He <SP> saw <SP> 10. <SP> I <SP> did <SP> too.
Speech: He saw ten. I did too.
3.10.5 Numbers with Ordinalizers
Numbers with trailing ordinalizers "st," "nd," "rd," or "th" are pronounced accordingly. If the ordinalizer is inconsistent with the number preceding it (such as "3st"), it is spoken as normal text. Apostrophes (3'rd) are not permitted.
Examples:
Input: 32nd
Speech: thirty second
3.11 Time Of Day
Numbers separated by a colon are used to designate time. The following rules apply:
· If the next non-space characters following a colon are "00", they are pronounced as "oclock."
· If two digits other than "00" follow a colon, the digits are spoken, but the colon is not.
Examples:
Input: 12:35
Speech: twelve thirty five
Input: 3:00 <SP> today
Speech: three oclock today
Input: 3:05 <SP> a.m.
Speech: three oh five ay em
Input: 3:05am
Speech: three oh five am
Input: 3:00 <SP> a.m.
Speech: three oclock ay em
Input: 23:01 <SP> pm
Speech: twenty three oh one pee em
Input: 1:25:20 <SP> a.m.
Speech: one twenty five twenty ay em
Input: 1:2:204:25
Speech: one two two oh four twenty five
3.12 Monetary Amounts
Monetary amounts must appear in the input text with a leading dollar sign, followed by a number. Ordinalizers are not allowed. The number may be followed by one of the words "thousand," "million," "billion," or "trillion.". In the following examples, the Digit Pronunciation Switch is off and the Full Number Pronunciation Switch is on.
Examples:
Input: $12
Speech: twelve dollars
Input: $135
Speech: one hundred thirty five dollars
Input: $.10
Speech: ten cents
Input: $1.10
Speech: one dollar and ten cents
Input: $0.10
Speech: zero dollars and ten cents
Input: $1 <SP> million
Speech: one million dollars
Input: $1.256
Speech: one point two five six dollars
Input: $1.2
Speech: one point two dollars
Input: $1.256 <SP> million
Speech: one point two five six million dollars
Input: $ <SP>
Speech: dollar
Input: $ <SP> 1
Speech: dollar one
These pronunciations change to reflect "adjective" status when preceded by "a" or "an" or a number.
Examples:
Input: A <SP> $1.10 <SP> check <SP> is <SP> too <SP> much.
Speech: A one dollar and ten cent check is too much.
Input: He <SP> gave <SP> me <SP> 15 <SP> $10 <SP> dollar <SP> bills.
Speech: He gave me fifteen ten dollar bills.
3.13 Separators
Words are usually separated from each other by one or more <SP> space characters. They are also separated by the following:
Control sequences
Characters that are not a through z or A through Z.
Apostrophes (')
Multiple consecutive <SP> characters are permissible, but do not lengthen the pause between words.
Words are usually separated when the maximum word length is reached. The maximum word length is 29 characters. If a word separator has not been encountered by the 29th character, the word is terminated automatically at the 29th letter, and the remainder of the word is treated as a new word.
Example: A sentence with <SP> separators and a very long word.
Input: The <SP> coal <SP> miner <SP> had <SP> pneumonoultramicroscopicsilicovolcanoconiosos.
Speech: The coal miner had pneumonoultramicroscopicsilicovolcanoconiosos.
Several control characters act like the <SP> character. These are the carriage return <CR>, line feed <LF>, and <TAB>. All are pronounced as the word "space" if the Space Pronunciation Switch is on.
The occurrence of a <SP> or a control sequence within certain text character sequences can affect the way the sequence is spoken. For example, a <SP> or control sequence between a "$" and a number inhibits these characters from being recognized as a monetary amount.
3.14 Apostrophe Character
The apostrophe text character is used within a word to indicate a contraction or possessive. It has no effect on the spoken output.
Examples:
Input: 'bout
Input: don't
Input: Valdez'
Input: John's
3.15 Sentence Terminators
The period, question mark, and exclamation point text characters are used as sentence terminators. These characters cause a short pause in the speech at the end of the sentence and cause the appropriate pitch intonation to occur.
The rules for terminal punctuation are listed below:
A period that is immediately followed by an alphabetic character is ignored.
If a single letter is followed by a period and fewer than two <SP> characters, the period is ignored.
D PG = This is useful for forms such as "Man from U.N.C.L.E." or "J.R. Roberts."
Real end-of-sentence periods should be followed by two <SP> characters to ensure that they are interpreted as periods.
Example:
Input: She <SP> saw <SP> it. <SP><SP> Who <SP> are <SP> you? <SP><SP> Give <SP> it <SP> to <SP> me! <SP><SP>
Speech: She saw it. Who are you? Give it to me!
3.16 Sentence Internal Punctuation
The text characters listed below are used within a sentence. They cause a short pause and a "comma-type" pitch intonation in the speech.
|
Character |
ASCII Name |
|
, |
Comma |
|
; |
Semicolon |
|
( |
Opening Parenthesis |
|
[ |
Opening Bracket |
|
{ |
Opening Brace |
The text characters listed below are also used within a sentence. They cause a short pause and a "comma-type" pitch intonation in the speech only if the next non-space character is alphabetic or a digit.
|
Character |
ASCII Name |
|
" |
Quotation Mark |
|
) |
Closing Parenthesis |
|
] |
Closing Bracket |
|
} |
Closing Brace |
Example:
Input: Smith <SP> (the <SP> plaintiff) <SP> lost <SP> the <SP> case.
Speech: Smith, the plaintiff, lost the case.
3.17 Colon Character
The colon causes a short pause and a "comma-type" pitch intonation in the speech.
Example:
Input: Try <SP> this: <SP> A <SP> chilled, <SP> sliced <SP> pear.
Speech: Try this, a chilled, sliced pear.
Numbers separated by a colon are used to designate time.
Example:
Input: 12:35
Speech: twelve thirty five
3.18 Hyphen Character
The hyphen is usually treated as a word separator or <SP>. When the Minus Sign Pronunciation Switch (Switch 5) is on, every hyphen (-) character followed by a digit ("0" through "9"), is pronounced as the word "minus."
See the description of the Minus Sign Pronunciation Switch, Section 4, for details.
Input: A <SP> text-to-speech <SP> converter.
Speech: A text to speech converter.
3.19 Miscellaneous Punctuation Characters
The punctuation characters listed in Table 3-1 are pronounced as indicated when encountered in the input text.
Example:
Input: John <SP> is <SP> faster <SP> than <SP>Pat & Joe.
Speech: John is faster than Pat and Joe.
Input: This <SP> is <SP> 4/13/82.
Speech: This is four slash thirteen slash eighty two.
When the Punctuation Pronunciation Switch (Switch 2) is on, punctuation characters are spoken as indicated in Section 4.
3.20 Unreadable Input
Although the input text is usually readable, it does not have to be. Text that is totally unreadable can be sent to the TTSC, and it will make an attempt to say it. As an example, consider the following input text:
Input: jA$Kajkkk(%&<SP>]b<SP><SP>./???.,aR-72KO+/#&!*
To analyze the input example, use the following steps:
1. Insert <SP> characters in the places where the TTSC assumes word boundaries, and compresses multiple <SP> characters into single ones:
jA <SP> $ <SP> Kajkkk <SP> (%& <SP>
] <SP> b <SP> ./???., <SP> aR <SP>
72 <SP> KO <SP>+/#&!*
1. Next, change miscellaneous fixed-pronunciation punctuation into its corresponding words, and adjust upper/lower case for easier reading:
ja <SP> dollar <SP> kajkkk <SP> (
<SP> percent <SP>and <SP> ] <SP>
b <SP> . <SP> slash <SP> ???., <SP>
ar <SP> 72 <SP> KO <SP> plus <SP>
slash <SP> number <SP> and <SP> !
<SP> asterisk
1. Change numbers to full pronunciation form.
2. Change sentence internal punctuation to commas.
3. Change single letters and acronyms to letter-pronunciation form.
4. Change extraneous sentence terminators to periods, and delete unnecessary <SP> characters:
ja <SP> dollar <SP> kajkkk <SP>,
percent <SP> and, bee . <SP>
slash <SP> ...., ar <SP> seventy
<SP> two <SP> kay <SP> oh <SP>
plus slash number <SP> and! asterisk
The text is now in an almost-readable form. The TTSC will now attempt to pronounce the non-words "ja," "kajkkk," and "ar."