A Properly Normalized Relational Database Specifically Designed To Store, Retrieve, Manipulate and Associate the Content and Structures of Written Languages

By

Michael J. Bramante

March, 2009

All rights reserved

Table of Contents

1 Introduction. PAGEREF _Toc227573339 \h 1

2 NLDB and Debunking the Myths about Relational Databases and Lexical Data. PAGEREF _Toc227573340 \h 2

2.1 Myth: Relational Databases Cannot Capture Lexical Information. PAGEREF _Toc227573341 \h 2

2.2 Myth: The View of Lexical Data Should Be Tied To the Structure of Data. PAGEREF _Toc227573342 \h 3

2.3 Myth: The More Complex the Data, the More Tables Are Required. PAGEREF _Toc227573343 \h 3

2.4 Myth: Relational Databases Cannot Represent Hierarchical Data. PAGEREF _Toc227573344 \h 3

3 The Database Normalization Imperative. PAGEREF _Toc227573345 \h 4

3.1 The Main Benefits of a Properly Normalized Relational Database. PAGEREF _Toc227573346 \h 4

4 The Problem with Machine-Readable Dictionaries and Lexical Text-Files. PAGEREF _Toc227573347 \h 4

5 NLDB Data Categories and Database Design. PAGEREF _Toc227573348 \h 5

5.1 Data Driven Design. PAGEREF _Toc227573349 \h 5

5.2 Data Categories. PAGEREF _Toc227573350 \h 6

5.3 The [Attribute] Entities. PAGEREF _Toc227573351 \h 7

5.4 The [Structure] Entities. PAGEREF _Toc227573352 \h 8

6 Conclusion. PAGEREF _Toc227573353 \h 10

6.1 NLDB Main Features and Abilities. PAGEREF _Toc227573354 \h 10

6.2 NLDB Potential Uses. PAGEREF _Toc227573355 \h 10

7 Appendices. PAGEREF _Toc227573356 \h 11

7.1 Appendix A: NLDB Physical Data Model PAGEREF _Toc227573357 \h 11

7.2 Appendix B: NLDB Data Dictionary. PAGEREF _Toc227573358 \h 11

7.3 Appendix C: NLDB SQL Creation Script PAGEREF _Toc227573359 \h 11

8 Endnotes. PAGEREF _Toc227573360 \h 12

 


1 Introduction

Natural Language Database (NLDB) is a properly normalized relational database that is specifically designed to store, retrieve, manipulate, and more importantly, associate the content, rules and structures of one or more natural languages in a granular and generalized manner allowing its data to be queried, using standard Structured Query Language (SQL), for any conceivable purpose.  Natural language content, structures and rules are represented in and reflected by the content and structure of NLDB tables and their relationships with one-another. I can demonstrate that NLDB can be the fundamental data source and foundation upon which viable, practical Natural Language Processing and Computational Linguististics applications can be built.

NLDB is a unified, application independent, general-purpose, generalized, data driven natural language knowledge base. NLDB data can be imported and integrated from any machine-readable dictionaries, thesauri, lexical data files, annotated corpora or any grammatically correct text via grammar parsing routines.  NLDB can store and retrieve any natural language structures and data including, but not limited to, grammatical, morphological, orthographical, syntactic, semantic, lexical and phonological. NLDB is currently capable of storing the contents of the annotated SUSANNE Corpus, WordNet 3.0 and most of the content from the Longman Dictionary of Contemporary English including phonograms and syllabification.  

One of NLDB’s most powerful and fundamental abilities is to store any given language structure independently from, yet associated with, it’s words.  For example take the following three sentences: 1) “The cow jumped over the moon”, 2) “A cat crawled under the car”, 3) “The boy hopped across the street”.  These three sentences share one distinct sentence structure that need only be stored one time and in one place.  The words from each of the three different sentences are stored separately yet can be associated with this one shared distinct sentence structure.  Distinct structures can be hard mapped to any equivalent structures in any other language including its own.  Distinct question structures can be mapped to distinct and equivalent answer structures.  With the right processing, this ability to map structures can be used for accurate machine translation and natural language processing.  The following quotes seem to express the need for and usefulness of a system such as NLDB.

 

“…what the research community ultimately needs is very large databases of language, analyzed in very great detail.”[1]

"Stroustrup (1997) states that 'constructing a useful…model of an application area is one of the highest forms of analysis. Thus, providing a tool, language, framework, etc., that makes the result of such work available to thousands is a way for programmers and designers to escape the trap of becoming craftsmen of one-of-a-kind artifacts.” [2]

“To enable computers to process human language, we need databases (corpora) of language samples annotated to show their structural features, as a source of information and statistics to guide the development of language-processing algorithms. This in turn requires some set of categories to be explicitly defined, so that researchers exchanging language data can be confident that they are using the annotations in the same way.  Computational linguistics needs something like the Linnaean taxonomy created for botany in the 18th century, which for the first time enabled naturalists everywhere to exchange information about plants secure in the knowledge that when they used the same names they were talking about the same things.”[3]

Understanding the critical importance of properly normalizing databases is necessary to understand and appreciate the significance of NLDB’s potential contribution to and impact on the creation of viable new natural language applications and accurate machine translation.  Critical differences separate and distinguish the NLDB normalized relational database from currently available machine-readable dictionaries, thesauri, lexical data files and annotated corpora. 

NLDB documentation consists of a data model and data dictionary. NLDB is composed of twelve core tables.  The data model is an Entity Relationship Diagram (ERD) identifying all of NLDB database tables, their structure, and their relationships with one-another.  The data dictionary documents each table’s statement of purpose, its structure, what its columns represent, practical examples of its purpose, and example data. 

Some of NLDB’s general categories of data include [WordTemplate], [Structure], [Word] and [Attribute] each of which has a very specific meaning.

A [WordTemplate] is simply a structure that identifies the abstracted representation of a word.  It represents the place in other structures where an actual word would be. A word-template is defined by a grouping of any number of attributes that uniquely define the type of abstracted word that is allowable in a specific point in a structure’s sequence. 

A [Structure] is defined by a [StructureType], a [Language] and an ordered sequence of one or more [WordTemplates].  It is a unique, distinct grammatical structure that is divorced from, yet associated with, actual words.  It is stored only one time and in one place.  It can represent a word or an abstract, grammatically well-formed phrase, sentence or paragraph; chapter, book, conversation or any other defined structure.

 [Word] is defined by a distinct spelling, a word sense, phoneme, syllabification, a language and any number of other different attributes.  Any distinct spelling of any word can be stored one time and in one place.  It doesn’t matter if a distinct spelling has fifty different senses or meanings in each of fifty different languages because associated attributes or attribute hierarchies can identify anything about that distinct spelling in each language.

 [Attribute] values can be hierarchically organized in any conceivable way and in any number of hierarchical levels. Each distinct attribute need only be stored one time in one place.  Attributes can define: Structures, Words, WordTemplates, and any other attributes.  In a hierarchical grouping, parent attributes, child attributes, relationship types, sequential ordering of attributes in a group, and nesting level can all be identified.

 

2 NLDB and Debunking the Myths about Relational Databases and Lexical Data

There exists a pervasive misunderstanding about the ability of relational models to represent lexical and language related information.  This misunderstanding stems from a general lack of knowledge about relational modeling and from the belief that a perceived failure to accurately represent lexical data relationally is due to relational database technology itself when in fact this is due to a failure to apply this technology to accurately represent lexical and language structures relationally.  I hope to eliminate these misunderstandings by explaining the NLDB design, my design decisions and how NLDB works.

2.1 Myth: Relational Databases Cannot Capture Lexical Information

I have designed and built numerous large enterprise scale commercial databases having upwards of 120 tables.  These are for systems that track inventory, sales etc…  NLDB has 12 tables.  Any data can be accurately modeled relationally regardless of whether that data is related to business, science or linguistics. NLDB is a relational model that captures the structural properties of lexical data by allowing the nesting of attributes. 

“Lexical data, as is obvious in any dictionary entry, is much more complex than the kind of data (suppliers and parts, employees' records, etc.) that has provided the impetus for most database research. Therefore, classical data models (e.g., relational) do not apply very well to lexical data, although several attempts have been made.” [4]

“Relational data models, including normalized models which allow the nesting of attributes, cannot capture the structural properties of lexical information.” [5]

2.2 Myth: The View of Lexical Data Should Be Tied To the Structure of Data

This quote suggests that the relational representation of lexical data is problematic because normalized data exists in different tables ‘thus fragmenting the view of data’.  This is a common misunderstanding that stems from confusing the way in which data is presented to the user with the physical structure of the data in the database. 

Many misconceive that data should be physically stored in a database in a way that is easiest for a user view.  This is called an un-normalized structure.  Structuring data this way renders it problematic to store retrieve and manipulate.  Data should be stored first and fore mostly in a way that most easily facilitates it’s storage, retrieval and manipulation.  Once data is properly structured it can be presented to the user in any conceivable way.

“Fragmentation of data: The most obvious problem arises from the fact that the number of values for each attribute in dictionary entries varies enormously. For example, entries may include several different pronunciations, parts of speech, orthographic variants, definitions, etc., while some other fields, such as examples, synonyms, cross-references, domain information, geographical information, etc., may be completely absent. To avoid massive duplication of data, the information must be split across several tables, thus fragmenting the view of the data.” [6]

2.3 Myth: The More Complex the Data, the More Tables Are Required

This is a gratuitous and unsupportable claim.  The apparent complexity of the data has nothing to do with the number of tables in an appropriately normalized relational database.  The number of tables in a database is determined by how data is modeled and normalized regardless of its apparent complexity.

 “To avoid massive duplication of data, the information must be split across several tables, thus fragmenting the view of the data. The more complex the data, the more tables are required, and the more fragmented the view.” [7]

2.4 Myth: Relational Databases Cannot Represent Hierarchical Data

This is an unsupportable claim.  There are a number of commonly used ways to capture hierarchically structured data in relational databases including recursive relationships and self-identifying entities.  All data can be relationally modeled either hierarchically or not.

“…the relational model cannot capture the obvious hierarchy in most dictionary entries.” [8]

“Michael Stonebraker recently argued that the traditional database concept of ‘one size fits all’ is no longer applicable in the database market. Nowhere is this more true than with scientific data. Scientific data is different from business data, for which current [relational] database technology has been developed. Much of scientific data is tree structured because it models an inherently hierarchical process or object.” [9]

 

3 The Database Normalization Imperative

Few people, even in the field of Computer Science, neither understand nor appreciate the critical importance of data modeling and properly normalizing databases.  Understanding this critical importance is necessary to understand and appreciate the significance of NLDB’s potential contribution to and impact on the creation of viable new natural language applications, accurate machine translation and many other practical software applications.  A vital dependency exists between proper database design and the functionality required to support practical language applications. 

3.1 The Main Benefits of a Properly Normalized Relational Database

One main benefit of a normalized relational database comes from the ability to store a distinct piece of data in one specific, distinct and unique place once and only once.  This simplifies and speeds data storage and retrieval, eliminates duplicate and redundant data, eases the data maintenance burden, reduces or eliminates inconsistent data and allows for the efficient use of storage space.

Another main benefit comes from the ability to most easily retrieve data in virtually any conceivable way via SQL.  SQL is a standardized language for manipulating data in relational databases through a Relational Database Management System (RDBMS).  SQL has been adopted by the American National Standards Institute (ANSI) and the International Standards Organization (ISO) as the standard data access language.

 

4 The Problem with Machine-Readable Dictionaries and Lexical Text-Files

There are many different machine-readable dictionaries, thesauri, lexical data files and annotated corpora currently available.  They contain varying kinds of data including grammatical, morphological, syntactic, semantic and lexical.  They are generally composed of some data text files and some custom programming to provide data access.  I have reverse engineered several of these to exposing their underlying data structures and inner workings.  The ways that their data is structured and accessed limits their utility to fairly simple searches.  Table 1 identifies some of the more popular of these data sources.

 

Acronym

Description

BNC

British National Corpora World Edition on CD-ROM

CELEX2

The Linguistic Data Consortiums Lexical Database

OED

Oxford’s English Dictionary

WordNet

Princeton University’s Lexical Database for the English language

MOBY

Lexical Data Files, University of Sheffield

DCPSE

Diachronic Corpus of Present-Day Spoken English

ICE

The International Corpus of English

COMLEX

The Linguistic Data Consortium’s English Dictionary

SUSANNE

The SUSANNE Corpus and Analytic Scheme

Table 1

Critical and significant differences separate and distinguish NLDB from these text-file systems.  Most notable of these differences are: 1) the content of the data, 2) the way the data is structured and 3) the methods of data access.   A common misconception is that all electronic representations of rows and columns of data are of equal utility.  This could not be farther from the truth.  Not all databases are created equal.  A poorly designed database severely limits the ability to extract and represent meaningful data.  A properly designed database allows data to be efficiently extracted and represented in any conceivable way. 

Some refer to these text-file systems as databases or relational. Although some of these systems can be considered databases, in a rudimentary sense of the word, none of them, which I have worked with, are normalized in any way nor are reliant on a RDBMS.  They generally violate first and all subsequent normal forms.  Consequently, the meaningful separation of disparate categories of data and their relationships to one another are lost or obscured.  This necessitates that each disparate text-file system come with an associated computer program that is custom built specifically to operate only on its respective text-files. These custom built programs limited to s pre-defined set of operations on their respective text-files. 

Properly normalized databases are designed so that the rules that relate their various tables to one another are contained within the structure of the database itself.   The text-file systems, I have seen, obscure many or all of these rules by embedding them in the programming code of their associated computer program.  This obfuscation combined with their un-normalized file structures effectively renders SQL useless or problematic as a means of extracting and representing meaningful data.  This also limits the data’s utility to the set of functions defined by their associated programs.

5 NLDB Data Categories and Database Design

NLDB is designed very specifically to manipulate, store and retrieve any natural language content, structures and rules.  In this section I explain the main categories of data that NLDB stores and the relationships that these categories have with one another.  The content, structures and rules of natural languages are both represented in and reflected by the content and structure of NLDB database tables and their relationships with one-another.  NLDB currently stores all WordNet 3.0 synsets in four different tables, none of which have more than six columns.  This WordNet data can not only be represented as in WordNet, but can now be queried via SQL in any conceivable way.

5.1 Data Driven Design

A good database design should be data-driven.  Being data-driven means that database tables and relationships should not change with the introduction of new kinds of data.  The population of NLDB with new kinds of data has no bearing on the physical structure of the database.  It is unnecessary to add or change NLDB tables as a result of adding any new data or new categories of data.

A benefit of this generalized, flexible approach to database design is that it allows NLDB to be populated with many disparate classes of attributes including: Part of Speech, Language, Word Sense, Semantic Relations, Phonetic Pronunciation, etc…  As with any database, the quality of its results is determined by the quality of its data. ‘The Garbage In Garbage Out’ (GIGO) principle applies to NLDB as it does anything else. 


5.2 Data Categories

Table 2 illustrates some data categories that NLDB can store along with a brief description of the category’s purpose and a small sample of the category’s data where appropriate. These categories are not physical database tables. There is no one-to-one correspondence of these categories and NLDB physical tables.  These are only conceptual representations of some data categories.

 

Category

Purpose

Example

Word

Identifies a unique and distinct spelling of a word in a language along with its specific sense and a main part of speech.  A word can also be identified by a virtually infinite number of different attributes and classes of attributes.

 

Language

Identifies the unique and distinct set of languages.

‘French’, ‘German’, ‘English’…

Spelling

Identifies the unique and distinct set of symbols or spellings of words in one or more languages

'Aborigine', ' аудитория', 'Чайковский', '关系数据库'...

StructureType

Identifies the unique and distinct set of structure type categories and structure types that that a language structure can have.   This category can also represent infinitely nest-able structure type categories and structure types.

‘Paragraph', 'Declarative Sentence', ‘Word’, ‘Phrase’, ‘Paragraph’, ‘Article’, ‘Book’, ‘Conversation’…

StructureRelation

Identifies the unique and distinct set of relationships that language structures can have with one another.  Any structure can have any number of any relationship types with any other structure.

'Equivalent', 'Question', 'Answer', ‘Member of’... 

Phoneme

Identifies the unique and distinct set of sequenced Phonograms that combine to form the pronunciation(s) of a word represented by phonetic symbology.

(t-mt, -mä-), (kn-found, kn-)…

Template

Identifies the unique and distinct set of "skeletal" language structures.  A [Template] is an abstract and well-formed grammatical sentence, phrase or other structure that is divorced from, yet associated with, any actual words and contains all the salient attributes of the words and the structure. This can be used to identify actual words that are allowable in any particular sequence of that [Template].

 

WordRelation

Identifies the unique and distinct set of the types of relationships that one word can have with another word.

'Homonym', 'Hypernym', 'Hyperonym', 'Hyponym', 'Synonym', 'Antonym'...

WordSense

Identifies the unique and distinct set of senses that a word can have in any language.

‘(a theatrical performance of a drama) "the play lasted two hours" ‘…

Table 2

5.3 The [Attribute] Entities

Any number of NLDB attributes can specify: a structure, a structures relationship with another structure or word, a word, a word’s relationship with another word or structure, another attribute or an attribute’s relationship with another attribute, word or structure.  These attribute entities represent unique and distinct attributes and attribute hierarchies.  The hierarchical nature if these entities allows for any number of new and disparate attributes, attribute categories or any organization of new hierarchies of attribute categories.

Many disparate kinds of attributes can be stored in the [Attribute] table.  One alternative to storing many disparate kinds of attributes in one [Attribute] table would be to segment the database structure into disparate, discrete, predefined kinds of attribute tables such as [WordRelation], [Spelling], [PartOfSpeech], [Tense], [WordSense], [Language], [StructureRelation].  The problem with this approach is that the database structure is no longer data driven because now if a user needs to add a new category of attributes that does not belong in any of these pre-defined tables, then the user is forced to alter the database structure by adding a new table to the database and change the structure of other tables in order to accommodate one new table.   Also, if the user needs to create more than one new attribute category and organize them into a specific hierarchy of attribute categories then the user is forced to alter the database structure even more by adding a new table for each category and changing the structures of other tables to represent the hierarchy.  This is an important aspect of relational design that explains why many attempts at relational lexical knowledge representation fail.

It is critically important to be able to add any number of new attributes, any number of new attribute categories and any number and organization of new hierarchies of attribute categories without having to change the structure of the database.  For these reasons it is critically important for one [Attribute] table to be able to represent any number of new attributes, any number of new attribute categories and any number and organization of new hierarchies of attribute categories.  It is for these reasons that the [Attribute] tables are used to store many disparate, unique, and distinct categories of data including, but not limited to: [PartOfSpeech], [Language], [Spelling], [Phonogram], [StructureRelation], [StructureType], [WordSense], [SemanticRelation], and [PhoneticPronunciation].

This Entity Relation Diagram (ERD) illustrates the deceptively simple structure of the [Attribute] entities.

Table 3 illustrates how data in the attribute tables can be used to visually hierarchically represent any attributes.

 

Attribute

Language

 

French

 

German…

Spelling

 

Aborigine

 

аудитория …

StructureType

 

Paragraph

 

Declarative Sentence…

WordRelation

 

Homonym

 

Meronym…

WordSense

 

Play: a theatrical performance of…

 

Play: a preset plan of action...

Part of Speech

 

Noun

 

 

Proper

 

 

Common

 

 

 

Countable

 

           Uncountable

 

 

Concrete

 

 

Abstract

Table 3

5.4 The [Structure] Entities

NLDB’s [Structure] entities constitute a language taxonomy specifically designed to store the surface and underlying structures of Natural Languages.  After designing NLDB’s [Structure] tables, I came across the SUSANNE Analytic Scheme created by Dr. Geoffrey Sampson of the University of Sussex. The SUSANNE Analytic Scheme is a comprehensive language-engineering-oriented taxonomy and annotation scheme for the (logical and surface) grammar of English.[10]  The annotated SUSANNE Corpus is a 130,000-word cross-section of written American English annotated in accordance with the scheme.  The name ‘SUSANNE’ stands for ‘Surface and underlying structural analysis of natural English’.[11]  

“SUSANNE scheme is so far as I am aware the first serious attempt anywhere to produce a comprehensive, fully explicit annotation scheme for English grammatical structure. It has won praise internationally” [12]

I discovered that NLDB required no modification to import and integrate the structures and contents of both the SUSANNE Analytic Scheme and the annotated SUSANNE Corpus.  Although NLDB can accommodate this data, it was not originally designed to do so.  NLDB’s [Structure] tables are generalized so that they can store SUSANNE Scheme tags as attributes as well as any other attributes, attribute categories and hierarchies of attribute categories which need to be added in the course of dealing with additional phenomena of natural languages. 

As you can see, the three sentences below are of an identical sentence structure.  Table 4 table graphically illustrates the abstract and well-formed grammatical sentence structure, or [Template], divorced from any actual words.  This ‘skeletal’ structure can identify all salient attributes of the words and the structures.  This structure can be used to identify any actual words that are allowable in any particular sequence of the structure template.  By placing different allowable words in a well-formed [Template] a virtually infinite number of sentences could be automatically created from this one single [Template].

 

Actual Sentences

1

2

3

4

5

6

The

cow

jumped

over

the

moon

A

cat

crawled

under

the

car

The

boy

hopped

across

the

street

 

1

2

3

4

5

6

Illocution: Assertion

Voice: Active

Structure Type: Sentence

Noun Phrase

Verb Phrase

Determiner: Article

Noun

Verb

Prepositional Phrase

Preposition: Self: Motion: Area

(See Preposition Project)

Noun Phrase

Noun: Concrete: Object: Life Form

Verb: Action: Motion

Determiner

Noun

Noun: Common: Countable

Verb: Number: Singular

Noun: Concrete: Object: Non-Life Form

Noun: Singular

Verb: Regular

Noun: Common: Countable

Verb: Transitive

Noun: Singular

Verb: Indicative: Past: Simple: Active

Predicate

Subject 

 

Object

 

Simple Subject

Simple Predicate

 

SUSANNE Scheme tags [O[S[Ns:s.Ns:s][Vd.Vd][P:p.[Ns.Ns]P:p]

[O[S[Ns:s.

.Ns:s]

[Vd.Vd]

[P:p.

[Ns.

.Ns]P:p]

AT

NN1c

VVDv

II

AT

NN1c

Table 4

 

6 Conclusion

NLDB is a properly normalized relational database that is specifically designed to store, retrieve, manipulate, and more importantly, associate the content, rules and structures of one or more natural languages in a granular and generalized manner allowing its data to be queried, using SQL, for any conceivable purpose.  Natural language content, structures and rules are represented in and reflected by the content and structure of NLDB tables and their relationships with one-another. I can demonstrate that NLDB can be the fundamental data source and foundation upon which viable, practical Natural Language Processing and Computational Linguististics applications can be built

 

6.1 NLDB Main Features and Abilities

·         Capture the structural properties of lexical data in a relational database

·         Define and store any distinct natural language structures divorced from, yet associated with, actual words

·         Map language structures in one language to equivalent structures in any other language

·         Map question structures to answer structures

·         Define different types of relationships between structures

·         Define any attributes and organize them in zero or more hierarchical levels

·         Associate any attribute or attribute hierarchy with any structure or any association between structures.

·         Query using standard SQL.

6.2 NLDB Potential Uses

·         Accurate Machine Translation: The ability to hard-mapping of structures with word-senses to equivalent structures in other languages.

·         Natural Language Understanding: The ability to hard-mapping of question structures (divorced from their words, yet associated with them) with answer structures.

·         Querying of any lexical information in any conceivable way.

·         Unambiguous Synonym Replacement: Re-write a poem by replacing all of its words, which have synonyms, with their synonymous word while maintaining the same rhyme, cantor and grammatical structure of the original poem.

·         Re-write a book by replacing all of its words, that have synonyms, with their synonymous word while maintaining the meaning and grammatical structure of the original text.

·         Text Generation from Pre-Defined Language Structures: Compose a new grammatically or colloquially correct novel that is a composite of identified salient factors that are mostly or completely consistent across all (Harlequin romance novels) such as sentence structures, main events, word usage, similar yet different nouns such as place names or people’s names.

·         Accurate Grammar Checking and Correction in multiple languages.

7 Appendices

7.1 Appendix A: NLDB Physical Data Model

7.2 Appendix B: NLDB Data Dictionary

The NLDB data Dictionary has been omitted in the interest of keeping this paper under the twelve page limit.  It is available upon request by email at: CompLing@Comcast.Net or at this [link]

7.3 Appendix C: NLDB SQL Creation Script

The NLDB SQL Creation Script has been omitted in the interest of keeping this paper under the twelve page limit.  It is available upon request by email at: CompLing@Comcast.Net or at this [link]

8 Endnotes



[1] Sampson, Geoffrey (2006) “The SUSANNE Analytic Scheme: The Need for Grammatical Taxonomy

 

[2] [February 11, 2002] "Combining UML, XML and Relational Database Technologies. The Best of All Worlds For Robust Linguistic Databases." By Larry S. Hayashi and John Hatton (SIL International). Pages 115-124 in Proceedings of the IRCS Workshop on Linguistic Databases (11-13 December 2001, University of Pennsylvania, Philadelphia, USA.

 

[3] Sampson, Geoffrey  (1995) “English for the Computer

 

[4] Zheng, Yifeng (2006) "Research Statement"

 

[5] Ide, Nancy; Le Maitre, Jacques; Véronis, Jean. (1999) “Outline Of A Model For Lexical Databases”.

 

[6] Ide, Nancy; Le Maitre, Jacques; Véronis, Jean. (1999) “Outline Of A Model For Lexical Databases”.

 

[7] Ide, Nancy; Le Maitre, Jacques; Véronis, Jean. (1999) “Outline Of A Model For Lexical Databases

 

[8] Neff, Mary S. Byrd, Roy J. Rizk, Omneya A. Creating And Querying Lexical Data Bases (ANLP, 1988)

 

[9] Zheng, Yifeng (2007) Efficient Scientific Data Management Over Trees

 

[10] Sampson, Geoffrey (2006) “The SUSANNE Analytic Scheme: The Need for Language Taxonomy

 

[11] Sampson, Geoffrey (2006) “The SUSANNE Analytic Scheme: The Need for Language Taxonomy

 

[12] Sampson, Geoffrey  (1995) “English for the Computer