рефераты конспекты курсовые дипломные лекции шпоры

Реферат Курсовая Конспект

LINGUISTICS AND ITS STRUCTURE

LINGUISTICS AND ITS STRUCTURE - раздел Образование, THE ROLE OF NATURAL LANGUAGE PROCESSING Linguistics Is A Science About Natural Languages. To Be More Precise, It Cove...

Linguistics is a science about natural languages. To be more precise, it covers a whole set of different related sciences (see Figure I.1).

General linguistics is a nucleus [18, 36]. It studies the general structure of various natural languages and discovers the universal laws of functioning of natural languages. Many concepts from general linguistics prove to be necessary for any researcher who deals with natural languages. General linguistics is a fundamental science that was developed by many researchers during the last two centuries, and it is largely based on the methods and results of grammarians of older times, beginning from the classical antiquity.

As far as general linguistics is concerned, its most important parts are the following:

· Phonology deals with sounds composing speech, with all their similarities and differences permitting to form and distinguish words.

· Morphology deals with inner structure of individual words and the laws concerning the formation of new words from pieces¾morphs.

· Syntax considers structures of sentences and the ways individual words are connected within them.

  FIGURE I.1. Structure of linguistic science.


· Semantics and pragmatics are closely related. Semantics deals with the meaning of individual words and entire texts, and pragmatics studies the motivations of people to produce specific sentences or texts in a specific situation.

There are many other, more specialized, components of linguistics as a whole (see Figure I.1).

Historical, or comparative, linguistics studies history of languages by their mutual comparison, i.e., investigating the history of their similarities and differences. The second name is explained by the fact that comparison is the main method in this branch of linguistics. Comparative linguistics is even older than general linguistics, taking its origin from the eighteenth century.

Many useful notions of general linguistics were adopted directly from comparative linguistics.

Historical linguistics discovered, for example, that all Romance languages (Spanish, Italian, French, Portuguese, Romanian, and several others) are descendants of Latin language. All languages of the Germanic family (German, Dutch, English, Swedish, and several others) have their origins in a common language that was spoken when German tribes did not yet have any written history. A similar history was discovered for another large European family of languages, namely, for Slavonic languages (Russian, Polish, Czech, Croatian, Bulgarian, among others).

Comparative study reveals many common words and constructions within each of the mentioned families—Romance, Germanic, and Slavonic—taken separately.

At the same time, it has noticed a number of similar words among these families. This finding has led to the conclusion that the mentioned families form a broader community of languages, which was called Indo-European languages. Several thousand years ago, the ancestors of the people now speaking Romance, Germanic, and Slavonic languages in Europe probably formed a common tribe or related tribes.

At the same time, historic studies permits to explain why English has so many words in common with the Romance family, or why Romanian language has so many Slavonic words (these are referred to as loan words).

Comparative linguistics allows us to predict the elements of one language based on our knowledge of another related language. For example, it is easy to guess the unknown word in the following table of analogy:

 

Spanish English
constitución constitution
revolución revolution
investigación ?

 

Based on more complicated phonologic laws, it is possible even to predict the pronunciation of the French word for the Spanish agua (namely [o], eau in the written form), though at the first glance these two words are quite different (actually, both were derived from the Latin word aqua).

As to computational linguistics, it can appeal to diachrony, but usually only for motivation of purely synchronic models. History sometimes gives good suggestions for description of the current state of language, helping the researcher to understand its structure.

Contrastive linguistics, or linguistic typology, classifies a variety of languages according to the similarity of their features, notwithstanding the origin of languages. The following are examples of classification of languages not connected with their origin.

Some languages use articles (like a and the in English) as an auxiliary part of speech to express definite/indefinite use of nouns. (Part of speech is defined as a large group of words having some identical morphologic and syntactic properties.) Romance and Germanic languages use articles, as well as Bulgarian within the Slavonic family. Meantime, many other languages do not have articles (nearly all Slavonic family and Lithuanian, among others). The availability of articles influences some other features of languages.

Some languages have the so-called grammatical cases for several parts of speech (nearly all Slavonic languages, German, etc.), whereas many others do not have them (Romance languages, English—from the Germanic family, Bulgarian—from the Slavonic family, and so on).

Latin had nominative (direct) case and five oblique cases: genitive, dative, accusative, ablative, and vocative. Russian has also six cases, and some of them are rather similar in their functions to those of Latin. Inflected parts of speech, i.e., nouns, adjectives, participles, and pronouns, have different word endings for each case.

In English, there is only one oblique case, and it is applicable only to some personal pronouns: me, us, him, her, them.

In Spanish, two oblique cases can be observed for personal pronouns, i.e., dative and accusative: le, les, me, te, nos, las, etc. Grammatical cases give additional mean for exhibiting syntactic dependencies between words in a sentence. Thus, the inflectional languages have common syntactic features.

In a vast family of languages, the main type of sentences contains a syntactic subject (usually it is the agent of an action), a syntactic predicate (usually it denotes the very action), and a syntactic object (usually it is the target or patient of the action). The subject is in a standard form (i.e., in direct, or nominative, case), whereas the object is usually in an oblique case or enters in a prepositional group. This is referred to as non-ergative construction.

Meantime, a multiplicity of languages related to various other families, not being cognate to each other, are classified as ergative languages. In a sentence of an ergative (эргативный падеж –винительный) language, the agent of the action is in a special oblique (called ergative) case, whereas the object is in a standard form. In some approximation, a construction similar to an ergative one can be found in the Spanish sentence Me simpatizan los vecinos, where the real agent (feeler) yo ‘I’ is used in oblique case me, whereas the object of feeling, vecinos, stays in the standard form. All ergative languages are considered typologically similar to each other, though they might not have any common word. The similarity of syntactical structures unites them in a common typological group.

Sociolinguistics describes variations of a language along the social scale. It is well known that various social strata (слой общества) often use different sublanguages within the same common language, wherever the same person uses different sublanguages in different situations. It suffices (этого достаточно) to compare the words and their combinations you use in your own formal documents and in conversations with your friends.

Dialectology compares and describes various dialects, or sublanguages, of a common language, which are used in different areas of the territory where the same language is officially used. It can be said that dialectology describes variations of a language throughout the space axis (while diachrony goes along the time axis). For example, in different Spanish-speaking countries, many words, word combinations, or even grammatical forms are used differently, not to mention significant differences in pronunciation. Gabriel García Márquez, the world-famous Colombian writer, when describing his activity as a professor at the International Workshop of cinematographers in Cuba, said that it was rather difficult to use only the words common to the entire Spanish-speaking world, to be equally understandable to all his pupils from various countries of Latin America. A study of Mexican Spanish, among other variants of Spanish language is a good example of a task in the area of dialectology.

Lexicography studies the lexicon, or the set of all words, of a specific language, with their meanings, grammatical features, pronunciation, etc., as well as the methods of compilation of various dictionaries based on this knowledge. The results of lexicography are very important for many tasks in computational linguistics, since any text consists of words. Any automatic processing of a text starts with retrieving the information on each word from a computer dictionary compiled beforehand.

Psycholinguistics studies the language behavior of human beings by the means of a series of experiments of a psychological type. Among areas of its special interest, psycholinguists studies teaching language to children, links between the language ability in general and the art of speech, as well as other human psychological features connected with natural language and expressed through it. In many theories of natural language processing, data of psycholinguistics are used to justify the introduction of the suggested methods, algorithms, or structures by claiming that humans process language “just in this way.”

Mathematical linguistics. There are two different views on mathematical linguistics. In the narrower view, the term mathematical linguistics is used for the theory of formal grammars of a specific type referred to as generative (порождающая грамматика) grammars. This is one of the first purely mathematical theories devoted to natural language. Alternatively, in the broader view, mathematical linguistics is the intersection between linguistics and mathematics, i.e., the part of mathematics that takes linguistic phenomena and the relationships between them as the objects of its possible applications and interpretations.

Since the theory of generative grammars is nowadays not unique among linguistic applications of mathematics, we will follow the second, broader view on mathematical linguistics.

One of the branches of mathematical linguistics is quantitative linguistic. It studies language by means of determining the frequencies of various words, word combinations, and constructions in texts. Currently, quantitative linguistics mainly means statistical linguistics. It provides the methods of making decisions in text processing on the base of previously gathered statistics.

One type of such decisions is resolution of ambiguity (неопределенность, неясность)in text fragments to be analyzed. Another application of statistical methods is in the deciphering of texts in forgotten languages or unknown writing systems. As an example, deciphering of Mayan glyphs was fulfilled in the 1950’s by Yuri Knorozov [39] taking into account statistics of different glyphs (see Figure I.2).

Applied linguistics develops the methods of using the ideas and notions of general linguistics in broad human practice. Until the middle of the twentieth century, applications of linguistics were limited to developing and improving grammars and dictionaries in a printed form oriented to their broader use by non-specialists, as well as to the rational methods of teaching natural languages, their orthography and stylistics. This was the only purely practical product of linguistics.

FIGURE I.2. The ancient Mayan writing system was deciphered with statistical methods.


In the latter half of the twentieth century, a new branch of applied linguistics arose, namely the computational, or engineering, linguistics. Actually, this is the main topic of this book, and it is discussed in some detail in the next section.

– Конец работы –

Эта тема принадлежит разделу:

THE ROLE OF NATURAL LANGUAGE PROCESSING

THE ROLE OF NATURAL LANGUAGE PROCESSING... LINGUISTICS AND ITS STRUCTURE... WHAT WE MEAN BY COMPUTATIONAL LINGUISTICS...

Если Вам нужно дополнительный материал на эту тему, или Вы не нашли то, что искали, рекомендуем воспользоваться поиском по нашей базе работ: LINGUISTICS AND ITS STRUCTURE

Что будем делать с полученным материалом:

Если этот материал оказался полезным ля Вас, Вы можете сохранить его на свою страничку в социальных сетях:

Все темы данного раздела:

THE ROLE OF NATURAL LANGUAGE PROCESSING
We live in the age of information. It pours upon us from the pages of newspapers and magazines, radio loudspeakers, TV and computer screens. The main part of this information has the form of natura

WHAT WE MEAN BY COMPUTATIONAL LINGUISTICS
Computational linguistics might be considered as a synonym of automatic processing of natural language, since the main task of computational linguistics is just the construction of computer

WORD, WHAT IS IT?
As it could be noticed, the term word was used in the previous sections very loosely. Its meaning seems obvious: any language operates with words and any text or utterance consists of them.

THE IMPORTANT ROLE OF THE FUNDAMENTAL SCIENCE
In the past few decades, many attempts to build language processing or language understanding systems have been undertaken by people without sufficient knowledge in theoretical linguistics. They ho

CURRENT STATE OF APPLIED RESEARCH ON SPANISH
In our books, the stress on Spanish language is made intentionally and purposefully. For historical reasons, the majority of the literature on natural languages processing is not only written in En

CONCLUSIONS
The twenty-first century will be the century of the total information revolution. The development of the tools for the automatic processing of the natural language spoken in a country or a whole gr

II. A HISTORICAL OUTLINE
A COURSE ON LINGUISTICS usually follows one of the general models, or theories, of natural language, as well as the corresponding methods of interpretation of the linguistic phenomena. A c

THE STRUCTURALIST APPROACH
At the beginning of the twentieth century, Ferdinand de Saussure had developed a new theory of language. He considered natural language as a structure of mutually linked elements, similar or

INITIAL CONTRIBUTION OF CHOMSKY
In the 1950’s, when the computer era began, the eminent American linguist Noam Chomsky developed some new formal tools aimed at a better description of facts in various languages [12].

A SIMPLE CONTEXT-FREE GRAMMAR
Let us consider an example of a context-free grammar for generating very simple English sentences. It uses the initial symbol S of a sentence to be generated and several oth

TRANSFORMATIONAL GRAMMARS
Further research revealed great generality, mathematical elegance, and wide applicability of generative grammars. They became used not only for description of natural languages, but also for specif

THE LINGUISTIC RESEARCH AFTER CHOMSKY: VALENCIES AND INTERPRETATION
After the introduction of the Chomskian transformations, many conceptions of language well known in general linguistics still stayed unclear. In the 1980’s, several grammatical theories different f

LINGUISTIC RESEARCH AFTER CHOMSKY: CONSTRAINTS
Another very valuable idea originated within the generative approach was that of using special features assigned to the constituents, and specifying constraints to characterize agreement or

HEAD-DRIVEN PHRASE STRUCTURE GRAMMAR
One of the direct followers of the GPSG was called Head-Driven Phrase Structure Grammar (HPSG). In addition to the advanced traits of the GPSG, it has introduced and intensively used the notion of

THE IDEA OF UNIFICATION
Having in essence the same initial idea of phrase structures and their context-free combining, the HPSG and several other new approaches within Chomskian mainstream select the general and very powe

THE MEANING Û TEXT THEORY: MULTISTAGE TRANSFORMER AND GOVERNMENT PATTERNS
The European linguists went their own way, sometimes pointing out some oversimplifications and inadequacies of the early Chomskian linguistics. In late 1960´s, a new theory, the Mean

THE MEANING Û TEXT THEORY: DEPENDENCY TREES
Another important feature of the MTT is the use of its dependency trees, for description of syntactic links between words in a sentence. Just the set of these links forms the representation

THE MEANING Û TEXT THEORY: SEMANTIC LINKS
The dependency approach is not exclusively syntactic. The links between wordforms at the surface syntactic level determine links between corresponding labeled nodes at the deep syntactic level, and

CONCLUSIONS
In the twentieth century, syntax was in the center of the linguistic research, and the approach to syntactic issues determined the structure of any linguistic theory. There are two major approaches

III. PRODUCTS OF COMPUTATIONAL LINGUISTICS: PRESENT AND PROSPECTIVE
FOR WHAT PURPOSES do we need to develop computational linguistics? What practical results does it provide for society? Before we start discus-sing the methods and techniques of computational lingui

CLASSIFICATION OF APPLIED LINGUISTIC SYSTEMS
Applied linguistic systems are now widely used in business and scientific domains for many purposes. Some of the most important ones among them are the following: · Text preparation

AUTOMATIC HYPHENATION
Hyphenation is intended for the proper splitting of words in natural language texts. When a word occurring at the end of a line is too long to fit on that line within the accepted margins, a part o

SPELL CHECKING
The objective of spell checking is the detection and correction of typographic and orthographic errors in the text at the level of word occurrence considered out of its context. Nob

GRAMMAR CHECKING
Detection and correction of grammatical errors by taking into account adjacent words in the sentence or even the whole sentence are much more difficult tasks for computational linguists and softwar

STYLE CHECKING
The stylistic errors are those violating the laws of use of correct words and word combinations in language, in general or in a given literary genre. This application is the nearest in its

REFERENCES TO WORDS AND WORD COMBINATIONS
The references from any specific word give access to the set of words semantically related to the former, or to words, which can form combinations with the former in a text. This is a very importan

INFORMATION RETRIEVAL
Information retrieval systems (IRS) are designed to search for relevant information in large documentary databases. This information can be of various kinds, with the queries ranging from “Find all

TOPICAL SUMMARIZATION
In many cases, it is necessary to automatically determine what a given document is about. This information is used to classify the documents by their main topics, to deliver by Internet the documen

AUTOMATIC TRANSLATION
Translation from one natural language to another is a very important task. The amount of business and scientific texts in the world is growing rapidly, and many countries are very productive in sci

NATURAL LANGUAGE INTERFACE
The task performed by a natural language interface to a database is to understand questions entered by a user in natural language and to provide answers—usually in natural language, but sometimes a

EXTRACTION OF FACTUAL DATA FROM TEXTS
Extraction of factual data from texts is the task of automatic generation of elements of a factographic database, such as fields, or parameters, based on on-line texts. Often the flows of the curre

TEXT GENERATION
The generation of texts from pictures and formal specifications is a comparatively new field; it arose about ten years ago. Some useful applications of this task have been found in recent years. Am

SYSTEMS OF LANGUAGE UNDERSTANDING
Natural language understanding systems are the most general and complex systems involving natural language processing. Such systems are universal in the sense that they can perform nearly all the t

RELATED SYSTEMS
There are other types of applications that are not usually considered systems of computational linguistics proper, but rely heavily on linguistic methods to accomplish their tasks. Of these we will

CONCLUSIONS
A short review of applied linguistic systems has shown that only very simple tasks like hyphenation or simple spell checking can be solved on a modest linguistic basis. All the other systems should

POSSIBLE POINTS OF VIEW ON NATURAL LANGUAGE
One could try to define natural language in one of the following ways: · The principal means for expressing human thoughts; · The principal means for text generation; · T

LANGUAGE AS A BI-DIRECTIONAL TRANSFORMER
The main purpose of human communication is transferring some information—let us call it Meaning[6]—from one person to the other. However, the direct transferring of thoughts is not possi

TEXT, WHAT IS IT?
The empirical reality for theoretical linguistics comprises, in the first place, the sounds of speech. Samples of speech, i.e., separate words, utterances, discourses, etc., are given to the resear

MEANING, WHAT IS IT?
Meanings, in contrast to texts, cannot be observed directly. As we mentioned above, we consider the Meaning to be the structures in the human brain which people experience as ideas and thoughts. Si

TWO WAYS TO REPRESENT MEANING
To represent the entities and relationships mentioned in the texts, the following two logically and mathematically equivalent formalisms are used: · Predicative formulas. Logical

DECOMPOSITION AND ATOMIZATION OF MEANING
Semantic representation in many cases turns out to be universal, i.e., common to different natural languages. Purely grammatical features of different languages are not usually reflected in

NOT-UNIQUENESS OF MEANING Þ TEXT MAPPING: SYNONYMY
Returning to the mapping of Meanings to Texts and vice versa, we should mention that, in contrast to common mathematical functions, this mapping is not unique in both directions, i.e., it is of the

NOT-UNIQUENESS OF TEXT Þ MEANING MAPPING: HOMONYMY
In the opposite direction—Texts to Meanings—a text or its fragment can exhibit two or more different meanings. That is, one element of the surface edge of the mapping (i.e. text) can correspond to

MORE ON HOMONYMY
In the field of computational linguistics, homonymous lexemes usually form separate entries in dictionaries. Linguistic analyzers must resolve the homonymy automatically, by choosing the correct op

MULTISTAGE CHARACTER OF THE MEANING Û TEXT TRANSFORMER
FIGURE IV.10. Levels of linguistic representation.

TRANSLATION AS A MULTISTAGE TRANSFORMATION
FI­GURE IV.13. The role of dictionaries and grammars in linguis

TWO SIDES OF A SIGN
The notion of sign, so important for linguistics, was first proposed in a science called semiotics. The sign was defined as an entity consisting of two components, the signifier

LINGUISTIC SIGN
The notion of linguistic sign was introduced by Ferdinand de Saussure. By linguistic signs, we mean the entities used in natural languages, such as morphs, lexemes, and phrases. Lin

LINGUISTIC SIGN IN THE MMT
In addition to the two well-known components of a sign, in the Meaning Û Text Theory yet another, a third component of a sign, is considered essential: a record about its ability or inability

LINGUISTIC SIGN IN HPSG
In Head-driven Phrase Structure Grammar a linguistic sign, as usually, consists of two main components, a signifier and a signified. The signifier is defined as a phoneme string (or a sequence of s

ARE SIGNIFIERS GIVEN BY NATURE OR BY CONVENTION?
The notion of sign appeared rather recently. However, the notions equivalent to the signifier and the signified were discussed in science from the times of the ancient Greeks. For several centuries

GENERATIVE, MTT, AND CONSTRAINT IDEAS IN COMPARISON
In this book, three major approaches to linguistic description have been discussed till now, with different degree of detail: (1) generative approach developed by N. Chomsky, (2) the Meaning Û

CONCLUSIONS
The definition of language has been suggested as a transformer between the two equivalent representations of information, the Text, i.e., the surface textual representation, and the Meaning, i.e.,

V. LINGUISTIC MODELS
THROUGHOUT THE PREVIOUS CHAPTERS, you have learned, on the one hand, that for many computer applications, detailed linguistic knowledge is necessary and, on the other hand, that natural language ha

WHAT IS MODELING IN GENERAL?
In natural sciences, we usually consider the system A to be a model of the system B if A is similar to B in some important properties and exhibits somewhat simila

NEUROLINGUISTIC MODELS
Neurolinguistic models investigate the links between any external speech activity of human beings and the corresponding electrical and humoral activities of nerves in their brain. I

PSYCHOLINGUISTIC MODELS
Psycholinguistics is a science investigating the speech activity of humans, including perception and forming of utterances, via psychological methods. After creating its hypotheses and model

FUNCTIONAL MODELS OF LANGUAGE
In terms of cybernetics, natural language is considered as a black box for the researcher. A black box is a device with observable input and output but with a completely unobservable inner s

RESEARCH LINGUISTIC MODELS
There are still other models of interest for linguistics. They are called research models. At input, they take texts in natural language, maybe prepared or formatted in a special manner befo

COMMON FEATURES OF MODERN MODELS OF LANGUAGE
The modern models of language have turned out to possess several common features that are very important for the comprehension and use of these models. One of these models is given by the Meaning &

SPECIFIC FEATURES OF THE MEANING Û TEXT MODEL
The Meaning Û Text Model was selected for the most detailed study in these books, and it is necessary now to give a short synopsis of its specific features. · Orientation to synth

REDUCED MODELS
We can formulate the problem of selecting a good model for any specific linguistic application as follows. A holistic model of the language facilitates describing the language as a

DO WE REALLY NEED LINGUISTIC MODELS?
Now let us reason a little bit on whether computer scientists really need a generalizing (complete) model of language. In modern theoretical linguistics, certain researchers study phonolog

ANALOGY IN NATURAL LANGUAGES
Analogy is the prevalence of a pattern (i.e., one rule or a small set of rules) in the formal description of some linguistic phenomena. In the simplest case, the pattern can be represented with the

EMPIRICAL VERSUS RATIONALIST APPROACHES
In the recent years, the interest to empirical approach in linguistic research has livened. The empirical approach is based on numerous statistical observations gathered purely automatically

LIMITED SCOPE OF THE MODERN LINGUISTIC THEORIES
Even the most advanced linguistic theories cannot pretend to cover all computational problems, at least at present. Indeed, all of them evidently have the following limitations: · Only the

CONCLUSIONS
A linguistic model is a system of data (features, types, structures, levels, etc.) and rules, which, taken together, can exhibit a “behavior” similar to that of the human brain in understanding and

REVIEW QUESTIONS
    THE FOLLOWING QUESTIONS can be used to check whether the reader has understood and remembered the main contents of the book. The questions are also recommended for t

PROBLEMS RECOMMENDED FOR EXAMS
IN THIS SECTION, each test question is supplied with a set of four variants of the answer, of which exactly one is correct and the others are not. 1. Why automatic natural language process

RECOMMENDED LITERATURE
1. Allen, J. Natural Language Understanding. The Benjamin / Cummings Publ., Amsterdam, Bonn, Sidney, Singapore, Tokyo, Madrid, 1995. 2. Cortés García, U., J. Bé

ADDITIONAL LITERATURE
10. Baeza-Yates, R., B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley Longman and ACM Press, 1999. 11. Beristáin, Helena. Gramática estructural de la l

GENERAL GRAMMARS AND DICTIONARIES
20. Criado de Val, M. Gramática española. Madrid, 1958. 21. Cuervo, R. J. Diccionario de construcción y régimen de la lengua castellana. Instituto

REFERENCES
34. Apresian, Yu. D. et al. Linguistic support of the system ETAP-2 (in Russian). Nauka, Moscow, Russia, 1989. 35. Beekman, G. “Una mirada a la tecnología del ma&ntild

SOME SPANISH-ORIENTED GROUPS AND RESOURCES
HERE WE PRESENT a very short list of groups working on Spanish, with their respective URLs, especially the groups in Latin America. The members of the RITOS network (emilia.dc.fi.udc.es / Ritos2) a

Хотите получать на электронную почту самые свежие новости?
Education Insider Sample
Подпишитесь на Нашу рассылку
Наша политика приватности обеспечивает 100% безопасность и анонимность Ваших E-Mail
Реклама
Соответствующий теме материал
  • Похожее
  • Популярное
  • Облако тегов
  • Здесь
  • Временно
  • Пусто
Теги