Archive for January, 2010

“Scrambling and phrase structure in synchronic and diachronic perspective” (a new PhD dissertation)

January 17, 2010

A new PhD dissertation by Joel C. Wallenberg at University of Pennsylvania.

Title: “Antisymmetry and the Conservation of c-command: Scrambling and Phrase Structure in Synchronic and Diachronic Perspective.”

Advertisements

Linguistic Web Initiative

January 10, 2010

Language Faculty

Language Faculty is the unique natural human ability to process complex syntactic structures.  Together with the “conceptual-intentional” and “senso-motor” subsystems, it facilitates the human abilities of understanding and speech.

Language Faculty is a subject of study by theoretical linguists.

Theoretical Linguistics

After many years of intensive research in the field of theoretical linguistics, there has been significant progress in the deciphering of the general properties of human language. The recent expansion of research beyond English and other familiar European languages has enabled the refinement and verification of the central discoveries of theoretical linguistics.

Software startups compete in how many decades of fruitful research in theoretical linguistics they choose to ignore. A typical startup skips four or five. One of the leading linguistic companies is way ahead by using a linguistic parser that implements the state of the art linguistic framework from the late eighties…

Meaning Description Language

Relying on outdated theoretical frameworks is not the only problem of today’s linguistic technologies.  Another problem is the fact that the output produced is a complex and model-dependent parse tree. This does not provide application developers with any significant advantage in comparison to dealing with plain text directly.

The progress achieved by theoretical linguistics in the study of the human Language Faculty has opened the possibility of expressing at least the partial meaning of the natural language phrases in a way which is precise and scientifically sound.

The early version of this idea was awarded the 2007 Horizon Award by Computerworld.

Linguistic Web

Standardization of the Meaning Description Language will enable the clear division of labor between providers of linguistic analysis technology on the one side, and the application developers on the other.  While Linguistic intelligence companies will provide the service of translating the natural language to the standard formal meaning description language, the language oriented applications will put these results to practical use.  This, in turn, makes the Linguistic Web possible.

The Scientific Infrastructure for the Linguistic Web

January 5, 2010

There are two major pre-requisites for the emergence of the Linguistic Web:

1) A solid Linguistic Parser

2) An extensive Lexical Semantic Database

Not only do these two elements need to exist – they must also be generally available to developers worldwide. We will now examine what is the current status of each one of these two crucial components of the Linguistic Web.

Linguistic Parser

Just about everybody has heard about the big buzz generated by Powerset’s acquisition by Microsoft. This is just an example of the magnitude of effort needed for the development of realistic, industrial strength linguistic software.  

Scientific linguistic technology necessitates a very long period of development and significant financial investment.  Powerset’s collection of natural language technologies incorporates over 25 years of intense scientific research, which originated at the PARC (Palo Alto Research Center). 

After all this invested effort in research and development, it is not clear what the policy of Microsoft will be in regards to making their sophisticated linguistic platform generally available.   

And yet, there are other players on the block with advanced linguistic parsers, who just may go ahead and make available the scientific technology which can be used in the Linguistic Web for the massive production of language oriented applications.

Lexical Semantic Database

To meet the requirements of the Linguistic Web, any Semantic Ontology must be constructed in terms of natural semantic concepts used by Language Faculty (FL), the basis for the inborn human ability to process language.

What is needed is something such as the “Semantic Map” developed by Cognition Technologies.  It took more than 20 years to build, and is probably the largest scientific linguistic database for English in the industry.

Is there a comparable Lexical Semantic Database available for everyone? Not at the moment. Perhaps what is needed is a Wikipedia-type collective effort in order to build a global Lexical Semantic Database. Obviously, this effort must be in sync with the accumulated insights of the last 60 years of intense research in theoretical linguistics. 

Conclusions

Any way you look at it, an infrastructure which contains both a Linguistic Parser and a Lexical Semantic Database will be needed in order to jumpstart the Linguistic Web.

Imagine the economical impact of all these various natural language solutions, in all the major languages, being developed worldwide, all using the same underlying linguistic platform. Of course, a new standard format for the representation of Natural Language Objects will also be necessary, but this is a subject for another posting.