Previous Page TOC Index Next Page



3 Text Character Processing



3.2 Introduction

Text characters are used to specify the message that is to be spoken. The complete set of text characters includes all "printable" ASCII characters which have hexadecimal codes between 20 and 7E. Text character input is the default condition and is selected at the completion of phoneme character processing using the control sequence:

<ESC>[0I

Several text processing switches govern the way the TTSC processes text characters. These switches can be turned on and off using the Switch On and Switch Off control sequences. The switches are described in detail in Section 4. Unless stated otherwise, the following descriptions assume that the switches are in their default positions.

Text character strings described in this section are as follows:

Alphabetic Emphatic and deemphasis stress

Acronyms Words without vowels

Abbreviations Time of day

Numbers Word Separators

Monetary amounts Apostrophe

Sentence termination Internal sentence punctuation

Colon Hyphen

Unreadable input Miscellaneous punctuation


3.3 Lowercase Alphabetic Text Characters

The text characters "a" through "z" are the most used characters. They are used for specifying words that are composed entirely of lowercase letters.

Example: A word composed entirely of lower case alpha characters.

Input: thoroughly


3.4 Upper Case Alphabetic Text Characters

The text characters A through Z are used to capitalize words, as at the start of a sentence or a proper name, and for acronyms.

Example: A word containing an initial upper case letter.

Input: Sam

Example: An acronym.

Input: TTSC


3.5 Emphatic And Deemphasis Stress Characters

The tilde is used as the first character of a word to indicate primary emphasis or stress. The grave is similarly used to indicate deemphasis in a sentence. The tilde and grave are ignored by rules which are sensitive to the first letter of a word.

see graphic

CAUTION!

The tilde and grave CANNOT be used to stress abbreviations and special character sequences such as numbers and monetary amounts. They cannot be used in phoneme mode.

Example: A sentence in which the word "John" is to receive extra stress.

Input: It <SP> was <SP> ~ John <SP> who <SP> did <SP> it.

Example: A sentence in which the word "series" receives less stress than normally.

Input: The <SP> giants <SP> won <SP> the <SP> world <SP> `series <SP> last <SP> year.


3.6 Acronyms

An acronym is an uppercase word containing at least two letters. The TTSC pronounces each individual letter of an acronym. If the last letter of an acronym is immediately followed by an apostrophe and the letter "s" (or "S"), it is pronounced plural.

Example: It is SPI’s speech.

Input: It's <SP> SPI's <SP> speech.

Speech: It’s ess pee eyes speech.

The Upper Case Acronym Pronunciation Switch (7) affects the way all uppercase words are pronounced. When this switch is on (default), uppercase words are pronounced as acronyms in the manner described above<197>that is, they are spelled. When it is off, uppercase words are pronounced as regular words rather than as acronyms.


3.7 Abbreviations

The abbreviations in the following list are allowed as input. An abbreviation must contain at least one lowercase letter or it is recognized as an acronym (described above). The TTSC recognizes the abbreviations below and pronounces them as shown. In addition, two-letter state abbreviations and one- and two-letter direction abbreviations are pronounced under switch control as described in Section 4.

3.7.1 Days

Mon. Monday

Tue. Tuesday

Tues. Tuesday

Wed. Wednesday

Thu. Thursday

Thur. Thursday

Thurs. Thursday

Fri. Friday

Sat. Saturday

Sun. Sunday

3.7.2 Months

Jan. January Aug. August

Feb. February Sept. September

Mar. March Sep. September

Apr. April Oct. October

Jun. June Nov. November

Jul. July Dec. December

3.7.3 Names and Addresses

Bldg. building

Co. company

Dept. department

Hq. headquarters

Inc. incorporated

Ltd. limited

Univ. university

3.7.4 Thoroughfares

aly. alley

av. avenue

ave. avenue

blvd. boulevard

cir. circle

crk. creek

ct. court

ctr. center

cyn. canyon

dr. doctor or drive

expy. expressway

ft. fort or foot

fwy. freeway

hwy. highway

jct. junction

ln. lane

mt. mount

mtn. mountain

pk. park

pkwy. parkway

pky. parkway

pl. place

plz. plaza

rd. road

sq. square

st. saint or street

ter. terrace

terr. terrace

tpke. turnpike

tr. trail.

wy. way

xing. crossing

3.7.5 Titles

dr. doctor or drive

gen. general

gov. governor

jr. junior

mgr. manager

ms. miz

mr. mister

mrs. misses

rev. reverend

secy. secretary

sr. senior

st. saint or street

3.7.6 Weights and Measures

cm. centimeters

mg. milligrams

ft. fort or foot

gal. gallon

gm. grams

hr. hours

hrs. hours

hz. hertz

KHz. kilohertz

MHz. megahertz

GHz. gigahertz

in. inches

kg. kilograms

km. kilometers

lb. pounds

lbs. pounds

mi. mile

min. minutes

mins. minutes

ml. milliliters

mm. millimeters

oz. ounce

ozs. ounces

pF picofarad

sec. seconds

secs. seconds

msec. millisecond

ns. nanoseconds

3.7.7 Miscellaneous

bc. bee see

cf. compare

cu. cubic

ea. each

e.g. for example,

e.g for example,

eg. for example,

etc. et cetera

ext. extension

fig. figure

gov't government

i.e. that is,

ie that is,

misc. miscellaneous

no. number

ppd. post paid

pm. pee em

tts text to speech

vol. volume

vs. versus

Examples:

Input: Mr. <SP> Jones <SP> lives <SP> on <SP> Moody <SP> Rd. <SP><SP> <MS>

Speech: Mister Jones lives on Moody Road.

Input: It <SP> was <SP> on <SP> Fri., <SP> Jan. <SP> first.

Speech: It was on Friday, January first.

Input: He's <SP> from <SP> Boston, <SP> Mass. <SP><SP> <MS>

Speech: He’s from Boston, mass.

3.7.8 Abbreviations at End of a Sentence

The period at the end of most abbreviations acts as a sentence terminator if, and only if, the next non-space character is an upper case letter. The period never acts as a sentence terminator when it is used in abbreviations followed by a person’s name.

Brig. brigadier Prof. professor

Bros. brothers Rev. reverend

Col. colonel Sgt. sergeant

Dr. doctor St. saint

Gen. general

Gov. governor

Lieut. lieutenant

Maj. major

Ms. miz

Mr. mister

Mrs. misses

Mt. mount

Rev. reverend

Sr. senior

St. saint

If an abbreviation is the last word in a sentence, follow it with two spaces, another sentence, or a "Commence Speak" escape sequence to force the sentence to be spoken.

Examples:

Input: Mr. Jones <SP> ate <SP> two <SP> kg. <SP> So <SP> what?

Speech: Mister Jones ate two kilograms. So what?

Input: Mr. Jones <SP> ate <SP> two <SP> kg. <SP> today.

Speech: Mister Jones ate two kilograms today.

Input: On <SP> tues. <SP> I <SP> left.

Speech: On tuesday. I left.

Input: The <SP> signal <SP> was <SP> 10 ms. <SP> late.

Speech: The signal was ten miz late.

3.7.9 Word Abbreviations

Most abbreviations are treated as abbreviations whether they are followed by a period or not. However, some character strings can be abbreviations or words. These strings are:

co. company Mar. March

fig. figure no. number

gal. gallon Sat. Saturday

in. inches Sun. Sunday

Jan. January Wed. Wednesday

The rules for word abbreviations are listed below:

1. If the string is not terminated by a period, it is treated as a word.

2. If the string is terminated by a period and the string could be an abbreviation for a month or day of the week, and the first character in the string is capitalized, the string will be treated as an abbreviation. If the first character is not capitalized, the string is treated as a word, and the period retained.

3. The period is retained at the end of the sentence for the following reasons:

• If the string is not an abbreviation for a month or day of the week and is at the end of a sentence.

• The next non-space character is a control character.

• The next non-space character is another period.

• If the next non-space character is lower case or digit (the only cases left) the string is treated as an abbreviation, and the period is deleted.

Examples:

Input: Mr. <SP> Jones <SP> looked <SP> at <SP> fig. <SP> three.

Speech: Mister Jones looked at figure three.

Input: He <SP> ate <SP> a <SP> fig. <SP> She <SP> ate <SP> one <SP> too.

Speech: He ate a fig. She ate one too.

Input: Mr <SP> Jones <SP> looked <SP> at <SP> fig <SP> three.

Speech: Mister Jones looked at fig three.

Input: He's <SP> coming >SP> next <SP> Wed. <SP> So <SP> is <SP> she.

Speech: He’s coming next Wednesday. So is she.

Input: Give <SP> me <SP> one <SP> gal. <SP> of <SP> gas.

Speech: Give me one gallon of gas.

Input: I <SP> know <SP> a <SP> nice <SP> gal.

Speech: I know a nice gal.

Input: Jan <SP> was <SP> wed <SP> on <SP> sun. <SP> last <SP> week.

Speech: Jan was wed on sun. Last week.


3.8 Special Abbreviations

The abbreviations "dr.," "st.," and "ft." are specially handled.

A double period is used to force the abbreviation to be recognized as the end of sentence as shown in the last example below.

These three abbreviations may appear without any period at all. In this case the same rules are used to determine what the abbreviation stands for.

3.8.1 Doctor and Drive

If the next non-space character following "dr." is an upper case letter, "dr." is pronounced "doctor," otherwise it is pronounced as "drive." The period in "dr." is never recognized as the end of the sentence.

3.8.2 Street and Saint

If the next non-space character following "st." is an upper case letter, "st." is pronounced "saint," otherwise it is pronounced as "street." The period in "st." is never recognized as the end of the sentence.

3.8.3 Fort and Foot

If the next non-space character following "ft." is an upper case letter, "ft." is pronounced "fort," otherwise it is pronounced as "foot." The period in "ft." is never recognized as the end of the sentence.

Examples:

Input: I <SP> live <SP> on <SP> Oak <SP> Dr. <SP> near <SP> John.

Speech: I live on Oak Drive near John.

Input: Dr. <SP> Bernstein <SP> cured <SP> him.

Speech: Doctor Bernstein cured him.

Input: Dr <SP> Bernstein <SP> cured <SP> him.

Speech: Doctor Bernstein cured him.

Input: I <SP> live <SP> on <SP> Oak <SP> Dr. <SP> So <SP> does <SP> she.

Speech: I live on Oak Doctor So does she.

Input: I <SP> live <SP> on <SP> Oak <SP> Dr <SP> <SP> So <SP> does <SP> she.

Speech: I live on Oak Doctor So does she.

Input: I <SP> live <SP> on <SP> Oak <SP> Dr.. <SP> So <SP> does <SP> she.

Speech: I live on Oak Drive. So does she.


3.9 Words Without Vowels

Words that do not contain a vowel (a, e, i, o, u, y) and are not abbreviations are spelled out letter by letter.

Example:

Input: It <SP> was <SP> grbld.

Speech: It was jee are bee ell dee.


3.10 Numbers

Numbers may appear in the input text, with optional commas to block off groups of three digits. Numbers using commas as three-digit group separators are pronounced using the terms trillion, billion, million, thousand, and hundred, as appropriate.

Examples:

Input: 7

Speech: seven

Input: 0

Speech: zero

Input: 15

Speech: fifteen

Input: 017

Speech: zero seventeen

Input: 98

Speech: ninety eight

Input: +10

Speech: plus ten

Input: 325

Speech: three twenty five

Input: 1234

Speech: twelve thirty four

Input: 1,990

Speech: one thousand nine hundred ninety

Input: 1,017

Speech: one thousand seventeen

Input: 981,234,567,890,123

Speech: nine hundred eighty one trillion, two hundred thirty ...

3.10.1 Numbers without Commas

Numbers that contain five or more digits and do not contain commas are pronounced digit by digit.

Examples:

Input: 19902

Speech: one nine nine zero two

Input: 34567890123

Speech: three four five six seven eight nine zero one two three

3.10.2 Large Numbers

A comma-blocked number with 16 or more digits is pronounced as a list of numbers rather than one large number.

Example:

Input: 1,981,234,567,890,123

Speech: one, nine eighty one, two thirty four, ...

3.10.3 Numbers with Inconsistent Commas

Commas that don’t block off three-digit groups are assumed to be separating a list of numbers.

Example:

Input: 76,34

Speech: seventy six, thirty four

Input: 305,67,890

Speech: three oh five, sixty seven, eight ninety

Input: 305,67,<SP>890

Speech: three oh five, sixty seven, eight ninety

3.10.4 Numbers with Decimal Points

The decimal point following a number is pronounced "point" if the next character is a digit. If not, the period acts as a sentence terminator. Digits following a decimal point are pronounced one by one.

Examples:

Input: .7

Speech: point seven

Input: .76034

Speech: point seven six zero three four

Input: 0.7

Speech: zero point seven

Input: He <SP> saw <SP> 10. <SP> I <SP> did <SP> too.

Speech: He saw ten. I did too.

3.10.5 Numbers with Ordinalizers

Numbers with trailing ordinalizers "st," "nd," "rd," or "th" are pronounced accordingly. If the ordinalizer is inconsistent with the number preceding it (such as "3st"), it is spoken as normal text. Apostrophes (3'rd) are not permitted.

Examples:

Input: 32nd

Speech: thirty second


3.11 Time Of Day

Numbers separated by a colon are used to designate time. The following rules apply:

· If the next non-space characters following a colon are "00", they are pronounced as "o’clock."

· If two digits other than "00" follow a colon, the digits are spoken, but the colon is not.

Examples:

Input: 12:35

Speech: twelve thirty five

Input: 3:00 <SP> today

Speech: three o’clock today

Input: 3:05 <SP> a.m.

Speech: three oh five ay em

Input: 3:05am

Speech: three oh five am

Input: 3:00 <SP> a.m.

Speech: three o’clock ay em

Input: 23:01 <SP> pm

Speech: twenty three oh one pee em

Input: 1:25:20 <SP> a.m.

Speech: one twenty five twenty ay em

Input: 1:2:204:25

Speech: one two two oh four twenty five


3.12 Monetary Amounts

Monetary amounts must appear in the input text with a leading dollar sign, followed by a number. Ordinalizers are not allowed. The number may be followed by one of the words "thousand," "million," "billion," or "trillion.". In the following examples, the Digit Pronunciation Switch is off and the Full Number Pronunciation Switch is on.

Examples:

Input: $12

Speech: twelve dollars

Input: $135

Speech: one hundred thirty five dollars

Input: $.10

Speech: ten cents

Input: $1.10

Speech: one dollar and ten cents

Input: $0.10

Speech: zero dollars and ten cents

Input: $1 <SP> million

Speech: one million dollars

Input: $1.256

Speech: one point two five six dollars

Input: $1.2

Speech: one point two dollars

Input: $1.256 <SP> million

Speech: one point two five six million dollars

Input: $ <SP>

Speech: dollar

Input: $ <SP> 1

Speech: dollar one

These pronunciations change to reflect "adjective" status when preceded by "a" or "an" or a number.

Examples:

Input: A <SP> $1.10 <SP> check <SP> is <SP> too <SP> much.

Speech: A one dollar and ten cent check is too much.

Input: He <SP> gave <SP> me <SP> 15 <SP> $10 <SP> dollar <SP> bills.

Speech: He gave me fifteen ten dollar bills.


3.13 Separators

Words are usually separated from each other by one or more <SP> –space characters. They are also separated by the following:

Control sequences

Characters that are not a through z or A through Z.

Apostrophes (')

Multiple consecutive <SP> characters are permissible, but do not lengthen the pause between words.

Words are usually separated when the maximum word length is reached. The maximum word length is 29 characters. If a word separator has not been encountered by the 29th character, the word is terminated automatically at the 29th letter, and the remainder of the word is treated as a new word.

Example: A sentence with <SP> separators and a very long word.

Input: The <SP> coal <SP> miner <SP> had <SP> pneumonoultramicroscopicsilicovolcanoconiosos.

Speech: The coal miner had pneumonoultramicroscopicsilicovolcanoconiosos.

Several control characters act like the <SP> character. These are the carriage return <CR>, line feed <LF>, and <TAB>. All are pronounced as the word "space" if the Space Pronunciation Switch is on.

The occurrence of a <SP> or a control sequence within certain text character sequences can affect the way the sequence is spoken. For example, a <SP> or control sequence between a "$" and a number inhibits these characters from being recognized as a monetary amount.


3.14 Apostrophe Character

The apostrophe text character is used within a word to indicate a contraction or possessive. It has no effect on the spoken output.

Examples:

Input: 'bout

Input: don't

Input: Valdez'

Input: John's


3.15 Sentence Terminators

The period, question mark, and exclamation point text characters are used as sentence terminators. These characters cause a short pause in the speech at the end of the sentence and cause the appropriate pitch intonation to occur.

The rules for terminal punctuation are listed below:

• A period that is immediately followed by an alphabetic character is ignored.

• If a single letter is followed by a period and fewer than two <SP> characters, the period is ignored.

• D PG = This is useful for forms such as "Man from U.N.C.L.E." or "J.R. Roberts."

• Real end-of-sentence periods should be followed by two <SP> characters to ensure that they are interpreted as periods.

Example:

Input: She <SP> saw <SP> it. <SP><SP> Who <SP> are <SP> you? <SP><SP> Give <SP> it <SP> to <SP> me! <SP><SP>

Speech: She saw it. Who are you? Give it to me!


3.16 Sentence Internal Punctuation

The text characters listed below are used within a sentence. They cause a short pause and a "comma-type" pitch intonation in the speech.

Character

ASCII Name

,

Comma

;

Semicolon

(

Opening Parenthesis

[

Opening Bracket

{

Opening Brace

The text characters listed below are also used within a sentence. They cause a short pause and a "comma-type" pitch intonation in the speech only if the next non-space character is alphabetic or a digit.

Character

ASCII Name

"

Quotation Mark

)

Closing Parenthesis

]

Closing Bracket

}

Closing Brace

Example:

Input: Smith <SP> (the <SP> plaintiff) <SP> lost <SP> the <SP> case.

Speech: Smith, the plaintiff, lost the case.


3.17 Colon Character

The colon causes a short pause and a "comma-type" pitch intonation in the speech.

Example:

Input: Try <SP> this: <SP> A <SP> chilled, <SP> sliced <SP> pear.

Speech: Try this, a chilled, sliced pear.

Numbers separated by a colon are used to designate time.

Example:

Input: 12:35

Speech: twelve thirty five


3.18 Hyphen Character

The hyphen is usually treated as a word separator or <SP>. When the Minus Sign Pronunciation Switch (Switch 5) is on, every hyphen (-) character followed by a digit ("0" through "9"), is pronounced as the word "minus."

See the description of the Minus Sign Pronunciation Switch, Section 4, for details.

Input: A <SP> text-to-speech <SP> converter.

Speech: A text to speech converter.


3.19 Miscellaneous Punctuation Characters

The punctuation characters listed in Table 3-1 are pronounced as indicated when encountered in the input text.

Example:

Input: John <SP> is <SP> faster <SP> than <SP>Pat & Joe.

Speech: John is faster than Pat and Joe.

Input: This <SP> is <SP> 4/13/82.

Speech: This is four slash thirteen slash eighty two.

When the Punctuation Pronunciation Switch (Switch 2) is on, punctuation characters are spoken as indicated in Section 4.


3.20 Unreadable Input

Although the input text is usually readable, it does not have to be. Text that is totally unreadable can be sent to the TTSC, and it will make an attempt to say it. As an example, consider the following input text:

Input: jA$Kajkkk(%&<SP>]b<SP><SP>./???.,aR-72KO+/#&!*

To analyze the input example, use the following steps:

1. Insert <SP> characters in the places where the TTSC assumes word boundaries, and compresses multiple <SP> characters into single ones:

jA <SP> $ <SP> Kajkkk <SP> (%& <SP>
] <SP> b <SP> ./???., <SP> aR <SP>
72 <SP> KO <SP>+/#&!*

1. Next, change miscellaneous fixed-pronunciation punctuation into its corresponding words, and adjust upper/lower case for easier reading:

ja <SP> dollar <SP> kajkkk <SP> (
<SP> percent <SP>and <SP> ] <SP>
b <SP> . <SP> slash <SP> ???., <SP>
ar <SP> 72 <SP> KO <SP> plus <SP>
slash <SP> number <SP> and <SP> !
<SP> asterisk

1. Change numbers to full pronunciation form.

2. Change sentence internal punctuation to commas.

3. Change single letters and acronyms to letter-pronunciation form.

4. Change extraneous sentence terminators to periods, and delete unnecessary <SP> characters:

ja <SP> dollar <SP> kajkkk <SP>,
percent <SP> and, bee . <SP>
slash <SP> ...., ar <SP> seventy
<SP> two <SP> kay <SP> oh <SP>
plus slash number <SP> and! asterisk

The text is now in an almost-readable form. The TTSC will now attempt to pronounce the non-words "ja," "kajkkk," and "ar."

Previous Page TOC Index Next Page