ConText: Questions & Answers
For more information contact 415.506.4514 or infotext@us.oracle.com
_________________________________________________________________
Version 1.1
July 1994
Table of Contents:
I. INTRODUCTION
1.1 What is Oracle ConText?
1.2 Why use Oracle ConText?
1.3 Is Oracle ConText a document management system, proofreader, or
text retrieval application?
1.4 How do I use Oracle ConText?
1.5 What are some applications of the Oracle ConText technology?
1.6 How does Oracle ConText differ from other language processing
technology?
1.7 Does Oracle ConText have any competitors?
1.8 What types of writing can Oracle ConText process?
II. LANGUAGE ANALYSIS
2.1 How does Oracle ConText analyze language?
2.2 How does Oracle ConText know what is important to me, the reader?
2.3 Where does Oracle ConText get the vocabulary necessary for
processing language?
2.4 Will Oracle ConText recognize the vocabulary used in my industry?
2.5 Can I add words to the Oracle ConText lexicon?
2.6 What happens when Oracle ConText cannot recognize a word or
phrase?
2.7 Does Oracle ConText recognize British English?
III. LINGUISTIC FEATURES
3.1 What information about text does Oracle ConText generate?
3.2 What type of theme information does Oracle ConText produce?
3.3 Can I affect the theme information generated by Oracle ConText?
3.4 What grammatical analysis does Oracle ConText provide?
3.5 What statistical analysis does Oracle ConText provide?
3.6 What indexing information does Oracle ConText generate?
3.7 Does Oracle ConText provide parts of speech or parse trees?
IV. PROCESSING TEXT
4.1 Does Oracle ConText rewrite documents?
4.2 What type of text input does Oracle ConText require?
4.3 Can Oracle ConText analyze text containing grammatical errors?
4.4 How does text get "into" and "out of" Oracle ConText?
4.5 How is the output from Oracle ConText presented?
4.6 What is the size of the output generated by Oracle ConText?
4.7 Can Oracle ConText output be stored?
4.8 Does a document have to be reprocessed each time I want a
different type of output?
V. INTEGRATING WITH ORACLE CONTEXT
5.1 What components are required to integrate Oracle ConText?
5.2 Are development resources required to integrate Oracle ConText
with an application?
5.3 Do I need to know the C programming language to create components
for Oracle ConText?
5.4 Is Oracle ConText built on the Oracle7 Server?
5.5 Does Oracle ConText support a client/server architecture?
5.6 Does Oracle ConText provide any example components?
VI. DEVELOPMENT PLANS
6.1 Is Oracle ConText being integrated with Oracle products?
6.2 What are the plans for the Oracle ConText API?
6.3 Will Oracle ConText be integrated with any non-Oracle products?
6.4 What type of National Language Support (NLS) does Oracle ConText
provide?
VII. GENERAL QUESTIONS
7.1 How long has Oracle ConText been in development?
7.2 On which platforms is Oracle ConText available?
7.3 What are the system requirements for Oracle ConText?
7.4 How fast does Oracle ConText perform?
7.5 Can Oracle ConText analyze documents of any length?
7.6 What is included with Oracle ConText?
7.7 How much does Oracle ConText cost?
I. INTRODUCTION
1.1 What is Oracle ConText?
Oracle ConText is a natural language processing technology that identifies
themes and content in English text. Because it "understands" the text it proces
ses,
it can extract all the vital information contained in a text block as well
as determine the meaning of the text.
1.2 Why use Oracle ConText?
Like most business professionals today, you probably don't have enough time
to read all the documentation, reports, and trade journals that provide the
information you need to do your job well.
Oracle ConText offers the solution to this information challenge by identifying
themes and content in text to create powerful new management and navigation
methods for electronically-stored text.
For example, Oracle ConText can "read" your documents, and systematically and
intelligently condense them into concise document summaries and outlines. ConTe
xt
creates summaries and outlines using theme and meaning rather than simple word
frequency or other static methods.
Oracle ConText can also accelerate the process of looking for information acros
s
multiple documents. For example, Oracle ConText's MasterIndex system creates
index entries by extracting not only key words, but also every piece of informa
tion
in the text of a document, as well as the relationships between the information
.
These index entries can then be gathered from multiple documents and consolidat
ed
into global indexes.
1.3 Is Oracle ConText a document management system, proofreader, or text ret
rieval
application?
No. Oracle ConText is an application-independent component technology that
neither stores or manages online text, nor corrects grammar and spelling in
text. The power of Oracle ConText's language processing would be underutilized
on simple text retrieval tasks such as pattern matching or keyword searches.
Rather, Oracle ConText determines the meaning and themes contained in the text
of your documents.
1.4 How do I use Oracle ConText?
Oracle ConText's collection of advanced linguistic functions can be integrated,
through a C language application programmer's interface (API), with any system
dealing with text. Using these functions, you can add an intelligent layer
of language-processing to the tasks performed by text retrieval, document manag
ement,
proofreading, and other document automation tools.
Please note that Oracle ConText does not provide components for integrating
the ConText functions with another system; you must supply the necessary compon
ents.
And because experienced linguists and C programmers are needed to build these
components, you must be an approved partner before beginning development with
Oracle ConText.
However, you can use Oracle TextServer3, Oracle's first fully-integrated text
management solution, to access many of the ConText functions without any knowle
dge
of linguistics or C programming. Formerly SQL*TextRetrieval, Oracle TextServer3
provides powerful tools from Oracle's Cooperative Development Environment (CDE)
and Cooperative Server Technology (CST) for building client-server applications
that can store text in a database and retrieve the information using predefined
synonym families, sound-alike words, and structured fields.
Oracle ConText provides Oracle TextServer3 with a number of read-to-use, enhanc
ed
text retrieval and viewing capabilities, including:
- filtering of words that Oracle TextServer3 tracks in a document so that the
most thematically prominent words and phrases are used for more precise and
quicker retrieval of documents.
- document querying by theme, rather than simply by key words.
- speed-reading and summarizing of the documents in a database (thematic outpu
t
is stored in the database along with each document).
For more information about TextServer3, available in the Fall of 1994, please
consult Oracle TextServer3 documentation.
1.5 What are some applications of the Oracle ConText technology?
Some practical applications of Oracle ConText include:
- Automatic summaries for online documents and mail messages. Fixed-width sub
ject
fields and fixed-length file naming restrictions often prevent the subject
line of a message or the file name of a document from accurately reflecting
the content of the text. Oracle ConText can quickly analyze the text of a docum
ent
or message to produce more meaningful and intelligent access methods for the
text.
It can also summarize the contents of a document or message for quicker reading
or review. Summarized information can be critical for accessing documents or
messages over costly modem or wireless connections.
- Automatic evaluation and forwarding of electronic mail messages. ConText ca
n
read a message, determine the main themes, and pass that information to an
electronic mail system for automatic forwarding to the appropriate recipients.
- Automatic hypertext linking in online information. Oracle ConText can deter
mine
which sections and words in a document are thematically related and identify
the exact position of these words. Then, an online document design application
can automatically add the hypertext links.
- Intelligent information extraction. Because it understands the text it proc
esses,
Oracle ConText can enable information-gathering agencies, such as online news
services or government agencies, to build advanced applications for tracking
and extracting specific information and trends.
1.6 How does Oracle ConText differ from other language processing technology
?
Systems that attempt to process text typically rely more on word recognition
and repetition than any true understanding of the text. Oracle ConText represen
ts
a new paradigm in language analysis.
Oracle ConText focuses on grammatical content and theme to determine the actual
meaning of the text it processes. It recognizes that the position and role
of a word, more than the repeated occurrence of the word, influences how the
word contributes to the meaning of the surrounding text. In effect, it determin
es
meaning in text by answering such questions as:
- "What grammatical elements are present in the text?"
- "What grammatical and thematic relationships exist between the individual el
ements?"
- "Does an element contribute to the main idea of the sentence, or does it pro
vide
supporting detail for the main ideas?"
- "Within the context of the surrounding text, how do the elements contribute
to the development of theme?"
1.7 Does Oracle ConText have any competitors?
For the most part, no. There is no other technology commercially available
today that matches Oracle ConText's ability to process text and understand
the themes and concepts contained in the text. And integrated with document
summarization, indexing, viewing, retrieval, or navigation tools, Oracle ConTex
t
can create "language-intelligent" applications with abilities beyond most stand
ard
applications.
For example, a standard text retrieval system usually relies on a "brute force"
statistical approach, tracking or "indexing" every word in the text, then count
ing
the occurrences of each word or phrase to determine the "key words" for the
text. You can then specify these key words when searching for and retrieving
text.
Oracle TextServer3 provides a powerful text management system for quickly and
easily accessing text stored in a database. It utilizes the same methodology
as a standard text retrieval system; however, it does not rely solely on word
repetition for querying and retrieving text. Oracle ConText, with its content-
and theme-based language analysis, provides TextServer3 with enhanced retrieval
features, such as query-by-theme and text reduction for intelligent text tracki
ng,
as well as advanced text viewing and summarizing capabilities.
1.8 What types of writing can Oracle ConText process?
Oracle ConText is capable of analyzing hundreds of writing styles and types,
ranging from highly structured, complex writing to more informal, simple writin
g.
It is extremely well suited for business, instructional, and technical communic
ation.
Some examples of the types of documents that Oracle ConText can analyze include
:
- newspaper articles
- legal documents
- patents and patent applications
- technical and scientific journals
- multiple-topic documents, such as encyclopedias and newspapers
- electronic-mail messages
Oracle ConText is not as well-suited for processing transcriptions of unstructu
red
spoken word, such as colloquial dialogue or casual conversation. This type
of written communication often contains incomplete or rambling sentences that
do not provide a clear, linear development of theme.
In addition, ConText does not work well with non-natural languages such as
computer programming languages. However, a technical manual containing examples
of a computer programming language can be successfully analyzed if the examples
are first removed.
II. LANGUAGE ANALYSIS
2.1 How does Oracle ConText analyze language?
Oracle ConText uses a linguistic routine that simulates the complex human proce
ss
that takes place when you read text. Because this process is so complex, Oracle
ConText does not rely on a single linguistic approach to arrive at its understa
nding
of the text. Instead, it uses what can best be described as a "working" approac
h,
combining principles and rules from a variety of diverse linguistic theories
to produce the best overall results.
Beginning with the smallest grammatical unit, individual words or word phrases,
ConText identifies the grammatical function of each word in a sentence, taking
into account the word's placement in the sentence and its relationships, or
bindings, to the surrounding words. It then determines the thematic function,
if any, of the word in the sentence. These grammatical and thematic assessments
provide the basis for ConText's analysis.
As it encounters successively larger text blocks (sentences, paragraphs, or
the whole document), ConText systematically expands its analysis to add the
new information to its knowledge base. Using this method, ConText can identify
informational content as it is introduced and can track the development of
themes across sentences and paragraphs.
2.2 How does Oracle ConText know what is important to me, the reader?
When analyzing themes in text, it is often misleading to try to determine "impo
rtance"
as it relates to the reader, because importance relies on knowledge of the
reader's intent.
Oracle ConText does not presume to know what is important to the individual
reader. Instead, it weighs the thematic prominence of a piece of text as it
relates to the understanding of the text as a whole. A piece of text is importa
nt
only within the boundaries of the surrounding text and only when it provides
insight into the meaning of the text.
In addition, programmable settings in the API allow you to customize Oracle
ConText. Through these settings, you can ensure that the specific information
that you are interested in extracting from your text is always assigned the
proper thematic prominence.
2.3 Where does Oracle ConText get the vocabulary necessary for processing la
nguage?
Oracle ConText gets its "knowledge" of the English language from the Oracle
ConText lexicon -- an extensive, dictionary-like collection of more than 600,00
0
words and phrases, with up to 1,000 units of linguistic knowledge, called bindi
ngs,
for each word.
2.4 Will Oracle ConText recognize the vocabulary used in my industry?
For the most part, yes. The ConText lexicon includes many of the terms and
phrases used in more than 1,000 industries and fields of study. While the cover
age
in a particular area may not be extensive, the lexicon provides broad coverage
of such diverse subjects as pharmaceutical manufacturing, aviation, finance,
ornithology, and hair care.
The lexicon also provides extensive coverage of geographical areas, government
agencies, company names (with types of business), and product names (with types
of product). And the lexicon is continually updated and enhanced to reflect
the latest trends and developments in every subject or area.
2.5 Can I add words to the Oracle ConText lexicon?
Currently, no. However, future releases of Oracle ConText will include a tool
for creating user dictionaries that can be used in conjunction with the embedde
d
lexicon.
In a user dictionary, you will be able to define the specific words and phrases
that you want ConText to recognize. You will also be able to use a user diction
ary
to customize the behavior, thematic value, index properties, and conceptual
family assigned to existing words and phrases in the lexicon.
2.6 What happens when Oracle ConText cannot recognize a word or phrase?
Oracle ConText does not delete or ignore a word or word phrase that is not
included in the system lexicon. Instead, Oracle ConText assigns greater themati
c
prominence to the word as a safeguard against the word being mishandled. As
a result of the word's increased thematic prominence, the word may appear as
one of the themes that Oracle ConText extracts from the surrounding text.
For many applications, the function of the word is more important than its
precise meaning. Most domain specific words are either simple nouns or regular
verbs whose function is easily recognized by ConText.
2.7 Does Oracle ConText recognize British English?
Yes. Most grammar and spelling variations between British English and American
English are not substantial enough to affect Oracle ConText's parsing. However,
Oracle ConText has been designed to properly recognize and account for those
variations that might have an effect. For example, the lexicon currently recogn
izes
most British spelling versions, such as labour and honour, and processes them
identical to the American spellings.
Furthermore, two of the primary grammatical references used in the development
and testing of ConText, The Grammar for Contemporary English and The Oxford
English Dictionary, were written by British authors.
III. LINGUISTIC FEATURES
3.1 What information about text does Oracle ConText generate?
Oracle ConText produces four main types of output:
- theme information
- grammatical analysis
- statistical analysis
- indexing/content information
A detailed description of each type of output, along with possible uses, is
provided in the following questions.
3.2 What type of theme information does Oracle ConText produce?
Oracle ConText extracts two types of thematic information from the text it
processes:
- Theme Grading. The 16 theme gradings identify the function and importance
of each word within the context of the containing sentence. You can use this
information to reduce sentences to their main thematic elements for creating
document outlines, summaries, and specialized views of the original text.
If you combine Oracle ConText with a full-text retrieval system, such as Oracle
TextServer3, the system can make use of theme grading information to improve
the precision of its searches and the accuracy of its relevance ranking.
- Theme Profiles. A theme profile identifies the strongest themes contained
in each sentence in a paragraph, each paragraph in a document, or in the docume
nt
as a whole. Oracle ConText also generalizes or abstracts the themes that it
identifies to create concept categories.
For example, Oracle ConText abstracts the word font to the concept printing
and assigns a value to the theme printing. Oracle ConText increases the value
assigned to the printing theme if the document contains other words, such as
typeface, that belong to the printing concept category.
The result of this process is a list of 16 theme/concept words which an applica
tion
can use to classify or rank themes according to programmable criteria. You
can use these classifications to automate document routing, build document
synopses, and intelligently search on and retrieve documents, as well as in
many other document automation applications.
3.3 Can I affect the theme information generated by Oracle ConText?
Yes. Using over 40 programmable settings provided with the ConText API, you
can create custom theme profiles for text processed through Oracle ConText.
The settings allow you to specify that words with certain grammatical or themat
ic
characteristics (including theme grading) should be thematically highlighted
or suppressed, as well as specify the degree of thematic prominence assigned
to these words.
And in future releases, you will be able to create user dictionaries for modify
ing
the attributes of an individual word or adding your own terms to ConText's
knowledge base.
3.4 What grammatical analysis does Oracle ConText provide?
Oracle ConText returns a comprehensive assessment of the grammatical content,
writing style, and general readability level of the sentences, paragraphs,
and documents it processes. In addition, it can identify grammatical errors
in sentences, providing up to 30 error messages (from a dictionary of over
300 messages) per sentence.
You can use this information to build a full grammar checker capable of evaluat
ing
the content and meaning of sentences and identifying poorly-written or potentia
lly
ambiguous text, as well as identifying grammatical errors. Most standard gramma
r
checkers are limited by a rigid set of grammatical rules that focus on local
groups of words rather than full sentences, which often results in the grammar
checker missing the "point" of the text.
You can also use this grammatical output to rank documents according to their
level of readability. For example, after you use theme profiles to identify
a set of documents with the same or similar thematic content, you can compare
ConText's grammatical assessment for each document to select the most clearly
written and easily understood document.
3.5 What statistical analysis does Oracle ConText provide?
Oracle ConText generates up to 16 different theme statistics for each sentence,
paragraph, or document it processes. These statistics provide a numeric measure
ment
of the overall thematic/grammatical content and structure of a text block.
For example, one statistic determines the amount of "filler" in text by calcula
ting
the ratio of theme words to non-theme, or function, words in the text. Other
statistics measure such characteristics as theme concept, strength, and ambigui
ty.
Yet another statistic measures the percentage of sentences in a text block
that have grammatical errors or ambiguities, thus providing a quantitative
assessment of the grammatical composition of the text.
Theme statistics can be used to identify specific problems when dealing with
text that is unedited, grammatically or stylistically poor, ambiguous, or conta
ins
other such problems. They can also be used to rank documents according to their
theme characteristics or grammatical composition.
3.6 What indexing information does Oracle ConText generate?
Oracle ConText's MasterIndex identifies every important piece of information
in a document, including concepts, definitions, actions and actors, and keyword
s,
and extracts the information for structured storage or presentation. In effect,
the information produced by MasterIndex represents a normalized, structured
listing of the contents of a text block.
Oracle ConText's indexing capabilities should not be confused with the indexing
functions found in a standard text retrieval system. The index generated by
a standard text retrieval system is usually a simple listing of every word
in the text, whereas the indexing information generated by MasterIndex lists
all the thematically relevant and information-bearing words in the text and
describes the relationships between the words.
MasterIndex output can be used to:
- automatically create a back-of-book style of index for a single document or
global indexes for multiple documents.
- populate databases with structured content information.
- enable intelligent information-extraction agents to track specific informati
on
and trends.
In upcoming releases, Oracle Book, Oracle's online multimedia viewing tool,
will use MasterIndex to automatically create hyperlinked, back-of-book indexes
for Oracle Book documents.
3.7 Does Oracle ConText provide parts of speech or parse trees?
To some degree, yes. MasterIndex provides a "thematic parse" of the information
in sentences, including the Actor, Action, Object, etc. This is similar to
a full parse, but certain adjectives, adverbs, or other weak sentence elements
that do not materially add to a sentence's theme are not included. However,
ConText's advanced analysis of semantic relationships gives more information
than a simple part of speech model.
IV. PROCESSING TEXT
4.1 Does Oracle ConText rewrite documents?
No. Oracle ConText does not alter any of the text it processes. Instead, it
produces its output as an array of theme, grammar, statistic, and index informa
tion
that is separate from the original text. You can apply this output to the origi
nal
text, either directly or through a user interface, to present a different versi
on
or view of the text, but the original text remains unchanged.
Oracle ConText may include words in its output, in the form of nominals and
concepts, which do not appear in the original text. A nominal is the noun form
for a word. If the word is a noun, the nominal is simply the pluralized form
of the word. For example, swim nominalizes to swimming, while swimmer nominaliz
es
to swimmers.
Concept words provide a higher-level categorization or "generalization" for
the words with which they are associated. For example, ConText abstracts the
word font to the concept printing. If ConText determines that the text containi
ng
the word font significantly develops the topic of printing, it may return print
ing
as one of the themes for the text.
4.2 What type of text input does Oracle ConText require?
Oracle ConText requires English text in ASCII format. Documents in other format
s
must be filtered into ASCII before being processed through Oracle ConText.
Of course, such a filter could be built into the components used to create
a system that integrates with Oracle ConText. For instance, Oracle TextServer3
automatically handles all the filtering requirements for Oracle ConText.
Also, because Oracle ConText analyzes text in blocks, each word (or word phrase
),
sentence, and paragraph must be clearly identified. Each word or word phrase
must be set off from other words by spaces, each sentence must start with a
capitalized character or number and end with a valid punctuation mark, and
all paragraph boundaries must be clearly marked (typically by one or more hard
returns).
Finally, the text should consist of complete sentences and paragraphs, presente
d
as a single text flow. The text may require some filtering to provide a smooth
text flow and to remove non-text objects such as graphics, tables, text formatt
ing
and SGML tags, captions, footnotes, and electronic mail addresses.
4.3 Can Oracle ConText analyze text containing grammatical errors?
Yes. Not all text is structured in complete, grammatically correct sentences.
Oracle ConText compensates for grammatical errors by changing its clause-orient
ed
analysis and reduction style to a word- or phrase-oriented mode. Since analysis
begins with the single word or word phrase (the smallest grammatical unit proce
ssed
by Oracle ConText), local judgements are often unaffected by errors elsewhere
in a sentence.
In addition, Oracle ConText recognizes over 10,000 of the most common misspelli
ngs
of words. When it encounters one of these misspelled words, it assigns the
linguistic bindings for the correct spelling to ensure that the misspelled
word is analyzed correctly for usage and function. It also returns a grammatica
l
error message showing the correct spelling.
However, to ensure high quality output, application developers building an
Oracle ConText system may want to combine a proofreading tool, such as Oracle
CoAuthor, with the ConText components to correct spelling and usage errors
before the text processed.
4.4 How does text get "into" and "out of" Oracle ConText?
Oracle ConText is a component technology which does not include any modules
for managing the input or output of text; it simply processes text input and
generates results. You create the host program that provides the engine for
passing text to Oracle ConText and gathering the results.
A host program must call the ConText API, provide values for the required setti
ngs,
and pass text, one paragraph at a time, to Oracle ConText. You may also use
the program to provide an interface for specifying the source of the text (usua
lly
a file) and instructions for processing the text.
After ConText completes its analysis, the host program must gather and structur
e
the results, then direct the structured output to an application or other outpu
t
device, such as a file or monitor. The type and extent of output that the host
program gathers, as well as the format (e.g. binary or ASCII) that the host
program uses to present the output, should be dictated by the application or
other device that receives the output.
4.5 How is the output from Oracle ConText presented?
Each time a text block is processed, Oracle ConText returns the full range
of theme, grammar, statistic, and index information extracted from the text.
Oracle ConText does not manipulate this output in any fashion; it simply return
s
the output through an array of C language structures stored in memory.
The host program that passes the text to ConText determines the method of prese
ntation
for the output information. You can build a host program that presents the
information as markup for use in an application. Or, you could architect the
host program to interpret the output information and produce a view, such as
a summary, name list, or index, of the content of the original text.
For example, the theme grading information for a document identifies the theme
gradings assigned to each word. The host program could use this application-ind
ependent
markup information in a document viewing application, such as a speed-reader,
to highlight words in a document according to their assigned theme grading.
Or, the host program could use the theme grading information to present a readi
ng
summary of the document. The summary, containing only those words that were
assigned specific theme gradings, could then be stored in an ASCII flat file.
4.6 What is the size of the output generated by Oracle ConText?
Because Oracle ConText performs a full parse each time it processes a text
block, the size of the output can be enormous. It is usually unnecessary, howev
er,
to retain the full array of information that Oracle ConText produces. The type
and extent of output saved from the results of ConText's analysis should be
dictated by the needs of the application. In effect, the application, or the
host program that provides the output for the application, keeps only the infor
mation
it needs and discards the rest.
For example, a simple application that uses theme profiles to sort and route
documents would require the host program to retain only the 16 words or phrases
that make up the document's theme profile. An application that provides a back-
of-book
index for a multiple-topic document, such as an encyclopedia, might require
all of the indexing output from ConText, but not any of the grammatical or
statistical output.
4.7 Can Oracle ConText output be stored?
Yes. Once the host program extracts and structures the required information
from ConText's vast output, the information can be stored in a variety of media
including files, structured fields, and database tables. For example:
- summaries and abstracts can be stored as file attachments to the original do
cuments.
- the themes of a document can be stored in a structured field outside the doc
ument
to serve as keywords for queries.
- MasterIndex information, which includes definitions, transactions, and conce
pts
contained in a document, can be stored in database tables or other structured
schemes.
4.8 Does a document have to be reprocessed each time I want a different type
of
output?
No, provided the host program is architected to retain the necessary informatio
n
for the application that uses the output. Each time text is processed, Oracle
ConText produces its full array of output and the host program that passes
the text to Oracle ConText controls the type of information and level of detail
returned in the output.
Because Oracle ConText performs a full parse each time it is run, you should
process a document as few times as possible and use the host program to store
the level of output required for the applications you build. The more detailed
and varied the output is that you store with each ConText parse, the fewer
times the document needs to be processed.
V. INTEGRATING WITH ORACLE CONTEXT
5.1 What components are required to integrate Oracle ConText?
A typical Oracle ConText implementation makes use of the following components.
The components can be combined to create a stand-alone system or they can be
integrated with other applications to create a complete text management system.
- Oracle ConText. A stand-alone set of functions released as an object librar
y
with a C language application program interface (API). Included in this compone
nt
are the lexicon and parsing rules that ConText uses to process text. In order
to process text, a host program, which calls the ConText functions through
the API, must be built.
- Input. ASCII text that you want to process through Oracle ConText. The text
is usually contained in a flat file.
- Host program. A C program that calls the ConText API, provides values for
the required settings, and passes text, one paragraph at a time, to the API.
It also gathers and formats the output generated by ConText.
- Output. The theme, grammatical, statistical, and indexing information gener
ated
by ConText and presented through an array of C structures. The host program
gathers the output, formats it, and directs it to an output device, such as
a user interface or flat file. The output can be formatted as binary code for
interpretation by a user interface or as ASCII text for reading purposes.
- User interface. An application that accesses the ConText output along with
the original input text, and provides an interface for viewing and/or manipulat
ing
the input text.
5.2 Are development resources required to integrate Oracle ConText with an a
pplication?
Yes. Oracle ConText consists only of the API and the underlying functions.
You must build the components required for creating a stand-alone Oracle ConTex
t
text application or for integrating Oracle ConText with other applications.
Because of the considerable effort and expertise that are required to build
these components, Oracle must approve all potential Oracle ConText users as
development partners. To qualify as an approved partner, you must be able to
devote the time and resources required to plan and build the necessary integrat
ion
components. Oracle Consulting Services, with its experienced Oracle ConText
consultants, is available to help plan and build any system that integrates
with Oracle ConText.
In addition, Oracle TextServer3 provides access to a number of Oracle ConText's
advanced linguistic functions, such as text summarization/reduction and theme
extraction, without the need for linguistic resources or approved partner statu
s.
As TextServer3 indexes documents and stores them in the database, it automatica
lly
processes the text from the documents through ConText and stores the output
in the database. You can then access this information through easy-to-use tools
such as Oracle Forms and industry-standard SQL.
5.3 Do I need to know the C programming language to create components for Or
acle
ConText?
Currently, yes. The host program must be written in C, then compiled for the
ConText API. However, PL/SQL covers will be added to the API in future releases
.
5.4 Is Oracle ConText built on the Oracle7 Server?
No. Currently, Oracle ConText does not require the Oracle7 Server. However,
future releases of Oracle ConText will make use of the Oracle7 Server. At that
time, application developers will be able to access Oracle ConText output throu
gh
stored procedures and a number of other methods.
In addition, Oracle TextServer3 with Oracle ConText provides full integration
with the Oracle7 Server.
5.5 Does Oracle ConText support a client/server architecture?
Yes. In a typical configuration, the ConText functions, API, and host program
would be located on the server, while the application(s) that interpret the
ConText output would reside on a client machine. The client and server could
be connected directly, via a remote procedure call (RPC), or indirectly, via
a database or other techniques.
5.6 Does Oracle ConText provide any example components?
Yes. Oracle ConText 1.1 includes a number of working programs and applications
for evaluation and demonstration purposes, including:
- SpeedRead output processor. This sample host program processes text through
ConText, gathers theme grading, profile, and statistics information for the
text, and stores the results in a binary output file used by the SpeedRead
text viewer.
- SpeedRead text viewer. This sample application interprets the output file
from the SpeedRead output processor to provide 5 customizable levels of reducti
on
for speed-reading and summarization of input text. It also displays the theme
profile and statistics generated for the text.
The viewer is available as part of Oracle ConText for SunOS and also as a stand
-alone
client application for Microsoft Windows.
- ASCII output processor. This sample host program processes text through Con
Text
and returns the output, as ASCII text, to a standard output device, such as
a monitor screen or flat file.
The program provides access to the full range of thematic, grammatical, indexin
g,
and statistical output generated by Oracle ConText. Program parameters let
you control the type of output and level of detail returned by the program.
- Document Digest builder. This sample host program processes text through Co
nText
and gathers theme profile information for the entire text block and each paragr
aph
in the text. It then creates a digest of the text by selecting the paragraphs
that best represent the overall themes in the text and returning these paragrap
hs
as ASCII output.
Program parameters let you control the number of paragraphs selected and the
method by which the theme profiles are matched to determine the most representa
tive
paragraphs.
These relatively simple, stand-alone components are intended mostly for demonst
rating
Oracle ConText's language processing abilities; however, they can also serve
as models for building the more complex components required for a full-text
management system. In fact, the source code for the sample host programs is
provided with Oracle Context 1.1 to help illustrate the structure of typical
ConText components.
VI. DEVELOPMENT PLANS
6.1 Is Oracle ConText being integrated with Oracle products?
Yes. Oracle ConText is being integrated with a number of Oracle products, some
of which will be available as soon as the Fall of 1994:
- Oracle TextServer3. Oracle TextServer3 with ConText provides a robust, easy
-to-use
development platform for creating Oracle ConText-enabled text management system
s.
Oracle ConText provides text filtering to reduce the number of words tracked,
or "indexed", by TextServer3 during the initial storage of a document. Text
filtering allows for documents to be retrieved more quickly and with greater
precision. Also, ConText's thematic analysis allows documents to be queried
by theme.
Once a document is retrieved, Oracle ConText provides summarization and highlig
hting
of the text in the document for speed#reading and quick review. In addition,
indexing will be available in future releases to create new browsing/navigation
methods within and between retrieved documents.
- Oracle Book. Oracle ConText provides hyperlinked, back-of-book indexes, com
plete
with See and See also entries, for documents in Oracle Book, Oracle's online
multimedia viewing tool. It also automatically generates a synopsis for an
Oracle Book document and allow the user to navigate from any point in the synop
sis
to the corresponding point in the document.
In addition, Oracle ConText provides users of Oracle Book Designer, the tool
for creating Oracle Book documents, with the ability to specify words that
should always/never be included in the Oracle ConText index.
- Oracle Office. With Oracle ConText added to Oracle Office, Oracle's office
scheduling and electronic-mail system, mail messages can be summarized to varyi
ng
levels of detail, and sorted, queried, and routed using the themes that ConText
identifies for the message.
Oracle Office with ConText can also minimize the costs incurred with modems
or other expensive connections by summarizing messages before you read them.
6.2 What are the plans for the Oracle ConText API?
While Oracle TextServer3 provides an easy-to-use, integrated solution for proce
ssing
and managing text, Oracle will continue to provide direct access to ConText's
advanced linguistic functions through the C language API.
Future releases of Oracle ConText will provide a suite of tools to facilitate
application development and integration, including:
- Oracle Cooperative Development Environment (CDE). Application developers wi
ll
be able to use a wide range of CDE tools, including Oracle Forms, SQL*Plus,
PL/SQL, and Oracle Glue, to develop applications for accessing ConText function
s
and manipulating the output.
- Microsoft Visual Basic for Windows. Application developers will be able to
use many of Microsoft Visual Basic powerful development tools, including Window
s
API functions and VBX controls, to build Windows applications that integrate
with Oracle ConText.
6.3 Will Oracle ConText be integrated with any non-Oracle products?
Eventually, yes. Oracle is currently studying options for integration with
third-party systems, however no integration has been planned yet. Oracle TextSe
rver3,
with its access to the Oracle7 database and ready-to-use text input, filtering,
output storage, and retrieval functions, is available now for application devel
opers
and resellers who wish to integrate Oracle ConText with their text applications
.
Also, if you are an approved Oracle ConText development partner, the ConText
functions and API are available for integrating Oracle ConText with any system
that deals with text. And the enhancements being developed for future releases
of Oracle ConText will make integration easier and more flexible.
However, if you plan to integrate Oracle ConText with other systems, using
either Oracle TextServer3 or the ConText API, you may wish to enlist Oracle
Consulting Services to help design the integration and build the necessary
components.
6.4 What type of National Language Support (NLS) does Oracle ConText provide
?
Oracle ConText currently only supports English text. However, Oracle is develop
ing
plans for ConText-based modules capable of analyzing the grammatical structure
and syntax of a number of non-English languages, including several major Europe
an
and Asian languages. While these "language-intelligent" modules will not have
the full range of Oracle ConText's advanced linguistic functions or the complet
e
coverage of the system lexicon, they will provide new language-processing capab
ilities
for the Oracle7 Server and other Oracle products that support multiple language
s.
VII. GENERAL QUESTIONS
7.1 How long has Oracle ConText been in development?
Originally developed as a tool for processing online documentation, ConText
represents over 140 person-years of development, spanning a 20 year period.
Employing many of the original architects and developers, Oracle has been activ
ely
developing Oracle ConText for the past 2 years.
7.2 On which platforms is Oracle ConText available?
Oracle ConText is available in controlled release for the Sun UNIX platform.
The release includes a sample text viewer, SpeedRead, for personal computers
(PC) running Microsoft Windows.
In addition, Oracle ConText is currently being ported to the Sequent UNIX platf
orm,
which will be available by Fall 1994.
Plans for future ports include most major UNIX and PC platforms, including
OS/2.
7.3 What are the system requirements for Oracle ConText?
Implementing Oracle ConText requires the following:
- 6 megabytes of memory (suggested minimum)
- 35 megabytes of disk space
Oracle ConText does not have any Oracle product dependencies.
Please note that while it can be implemented stand-alone, Oracle ConText should
be implemented as a component of Oracle TextServer3. Consult Oracle TextServer3
documentation for configuration requirements and details.
7.4 How fast does Oracle ConText perform?
On a Sun SPARCstation 10, Oracle ConText can process approximately 4 kilobytes
of text per second, which translates loosely to about 450 words per second.
To increase throughput, you can run a host program for Oracle ConText on multip
le
processors connected to multiple text streams.
It should be noted that requesting only a fraction of the output does not impro
ve
performance, since Oracle ConText performs a full parse each time it processes
a text block. Oracle ConText extracts all the thematic and grammatical informat
ion
from the text and it is the host program which determines the amount of output
to retain.
7.5 Can Oracle ConText analyze documents of any length?
Yes. A longer document will take longer to process than a shorter one, but
the length of a document has no effect on the ability of Oracle ConText to
analyze the text in the document.
In addition, because Oracle ConText performs a full parse each time it processe
s
a text block, the processing time required for a document does not increase
exponentially as the document increases in length. In general, processing time
scales in direct proportion to the length of the document.
7.6 What is included with Oracle ConText?
When you purchase Oracle ConText, you receive the following components:
- API and underlying functionality for Oracle ConText
- sample programs that work with the API:
+ SpeedRead output processor
+ ASCII output processor
+ document digest builder
- source code for sample programs
- SpeedRead sample text viewer
7.7 How much does Oracle ConText cost?
Please contact your account manager for Oracle ConText pricing information.
_________________________________________________________________
[Oracle Home] [New Media Products]
Oracle Home | New Media Products
Copyright 1995 Oracle Corporation, 500 Oracle Parkway, Redwood Shores,
California 94065. All rights reserved.