WHAT WE MEAN BY COMPUTATIONAL LINGUISTICS

Computational linguistics might be considered as a synonym of automatic processing of natural language, since the main task of computational linguistics is just the construction of computer programs to process words and texts in natural language.

The processing of natural language should be considered here in a very broad sense that will be discussed later.

Actually, this course is slightly “more linguistic than computational,” for the following reasons:

· We are mainly interested in the formal description of language relevant to automatic language processing, rather than in purely algorithmic issues. The algorithms, the corresponding programs, and the programming technologies can vary, while the basic linguistic principles and methods of their description are much more stable.

· In addition to some purely computational issues, we also touch upon the issues related to computer science only in an indirect manner. A broader set of notions and models of general linguistics and mathematical linguistics are described below.

For the purposes of this course, it is also useful to draw a line between the issues in text processing we consider linguistic—and thus will discuss below—and the ones we will not. In our opinion, for a computer system or its part to be considered linguistic, it should use some data or procedures that are:

· language-dependent, i.e., change from one natural language to another,

· large, i.e., require a significant amount of work for compilation.

Thus, not every program dealing with natural language texts is related to linguistics. Though such word processors as Windows’ Notebook do deal with the processing of texts in natural language, we do not consider them linguistic software, since they are not sufficiently language-dependent: they can be used equally for processing of Spanish, English, or Russian texts, after some alphabetic adjustments.

Let us put another example: some word processors can hyphenate words according to the information about the vowels and consonants in a specific alphabet and about syllable formation in a specific language. Thus, they are language-dependent. However, they do not rely on large enough linguistic resources. Therefore, simple hyphenation programs only border upon the software that can be considered linguistic proper. As to spell checkers that use a large word list and complicated morphologic tables, they are just linguistic programs.