рефераты конспекты курсовые дипломные лекции шпоры

Реферат Курсовая Конспект

TEXT, WHAT IS IT?

TEXT, WHAT IS IT? - раздел Образование, THE ROLE OF NATURAL LANGUAGE PROCESSING The Empirical Reality For Theoretical Linguistics Comprises, In The First Pla...

The empirical reality for theoretical linguistics comprises, in the first place, the sounds of speech. Samples of speech, i.e., separate words, utterances, discourses, etc., are given to the researchers directly and, for living languages, are available in an unlimited supply.

Speech is a continuous flow of acoustic signals, just like music or noise. However, linguistics is mainly oriented to the processing of natural language in a discrete form.

The discrete form of speech supposes dividing the flow of the acoustic signals into sequentially arranged entities belonging to a finite set of partial signals. The finite set of all possible partial signals for a given language is similar to a usual alphabet, and is actually called a phonetic alphabet.

For representation of the sound of speech on paper, a special phonetic transcription using phonetic symbols to represent speech sounds was invented by scientists. It is used in dictionaries, to explain the pronunciation of foreign words, and in theoretical linguistics.

A different, much more important issue for modern computational linguistics form of speech representation arose spontaneously in the human practice as the written form of speech, or the writing system.

People use three main writing systems: that of alphabetic type, of syllabic type, and of hieroglyphic type. The majority of humankind use alphabetic writing, which tries to reach correspondence between letters and sounds of speech.

Two major countries, China and Japan,[10] use the hieroglyphic writing. Several countries use syllabic writing, among them Korea. Hieroglyphs represent the meaning of words or their parts. At least, they originally were intended to represent directly the meaning, though the direct relationship between a hieroglyph and the meaning of the word in some cases was lost long ago.

Letters are to some degree similar to sounds in their functions. In their origin, letters were intended to directly represent sounds, so that a text written in letters is some kind of representation of the corresponding sounds of speech. Nevertheless, the simple relationship between letters and sounds in many languages was also lost. In Spanish, however, this relationship is much more straightforward than, let us say, in English or French.

Syllabic signs are similar to letters, but each of them represents a whole syllable, i.e., a group of one or several consonants and a vowel. Thus, such a writing system contains a greater number of signs and sometimes is less flexible in representing new words, especially foreign ones. Indeed, foreign languages can contain specific combinations of sounds, which cannot be represented by the given set of syllables. The syllabic signs usually have more sophisticated shape than in letter type writing, resembling hieroglyphs to some degree.

In more developed writing systems of a similar type, the signs (called in this case glyphs) can represent either single sounds or larger parts of words such as syllables, groups of syllables, or entire words. An example of such a writing system is Mayan writing (see Figure I.2). In spite of their unusual appearance, Mayan glyphs are more syllabic signs than hieroglyphs, and they usually represent the sounds of the speech rather than the meaning of words. The reader can become familiar with Mayan glyphs through the Internet site [52].

Currently, most of the practical tasks of computational linguistics are connected with written texts stored on computer media. Among written texts, those written in alphabetic symbols are more usual for computational linguistics than the phonetic transcription of speech.[11] Hence, in this book the methods of language processing will usually be applied to the written form of natural language.

For the given reason, Texts mentioned in the definition of language should then be thought of as common texts in their usual written form. Written texts are chains of letters, usually subdivided into separate words by spaces[12] and punctuation marks. The combinations of words can constitute sentences, paragraphs, and discourses. For computational linguistics, all of them are examples of Texts.[13]

Words are not utmost elementary units of language. Fragments of texts, which are smaller than words and, at the same time, have their own meanings, are called morphs. We will define morphs more precisely later. Now it is sufficient for us to understand that a morph can contain an arbitrary number of letters (or now and then no letters at all!), and can cover a whole word or some part of it. Therefore, Meanings can correspond to some specially defined parts of words, whole words, phrases, sentences, paragraphs, and discourses.

It is helpful to compare the linear structure of text with the flow of musical sounds. The mouth as the organ of speech has rather limited abilities. It can utter only one sound at a time, and the flow of these sounds can be additionally modulated only in a very restricted manner, e.g., by stress, intonation, etc. On the contrary, a set of musical instruments can produce several sounds synchronously, forming harmonies or several melodies going in parallel. This parallelism can be considered as nonlinear structuring. The human had to be satisfied with the instrument of speech given to him by nature. This is why we use while speaking a linear and rather slow method of acoustic coding of the information we want to communicate to somebody else.

The main features of a Text can be summarized as follows:

· Meaning. Not any sequence of letters can be considered a text. A text is intended to encode some information relevant for human beings. The existing connection between texts and meanings is the reason for processing natural language texts.

· Linear structure. While the information contained in the text can have a very complicated structure, with many relationships between its elements, the text itself has always one-dimensional, linear nature, given letter by letter. Of course, the fact that lines are organized in a square book page does not matter: it is equivalent to just one very long line, wrapped to fit in the pages. Therefore, a text represents non-linear information transformed into a linear form. What is more, the human cannot represent in usual texts even the restricted non-linear elements of spoken language, namely, intonation and logical stress. Punctuation marks only give a feeble approximation to these non-linear elements.

· Nested structure and coherence. A text consists of elementary pieces having their own, usually rather elementary, meaning. They are organized in larger structures, such as words, which in turn have their own meaning. This meaning is determined by the meaning of each one of their components, though not always in a straightforward way. These structures are organized in even larger structures like sentences, etc. The sentences, paragraphs, etc., constitute what is called discourse, the main property of which is its connectivity, or coherence: it tells some consistent story about objects, persons, or relations, common to all its parts. Such organization provides linguistics with the means to develop the methods of intelligent text processing.

Thus, we could say that linguistics studies human ways of linear encoding[14] of non-linear information.

– Конец работы –

Эта тема принадлежит разделу:

THE ROLE OF NATURAL LANGUAGE PROCESSING

THE ROLE OF NATURAL LANGUAGE PROCESSING... LINGUISTICS AND ITS STRUCTURE... WHAT WE MEAN BY COMPUTATIONAL LINGUISTICS...

Если Вам нужно дополнительный материал на эту тему, или Вы не нашли то, что искали, рекомендуем воспользоваться поиском по нашей базе работ: TEXT, WHAT IS IT?

Что будем делать с полученным материалом:

Если этот материал оказался полезным ля Вас, Вы можете сохранить его на свою страничку в социальных сетях:

Все темы данного раздела:

THE ROLE OF NATURAL LANGUAGE PROCESSING
We live in the age of information. It pours upon us from the pages of newspapers and magazines, radio loudspeakers, TV and computer screens. The main part of this information has the form of natura

LINGUISTICS AND ITS STRUCTURE
Linguistics is a science about natural languages. To be more precise, it covers a whole set of different related sciences (see Figure I.1). General linguistics is a nucleus [18, 36]

WHAT WE MEAN BY COMPUTATIONAL LINGUISTICS
Computational linguistics might be considered as a synonym of automatic processing of natural language, since the main task of computational linguistics is just the construction of computer

WORD, WHAT IS IT?
As it could be noticed, the term word was used in the previous sections very loosely. Its meaning seems obvious: any language operates with words and any text or utterance consists of them.

THE IMPORTANT ROLE OF THE FUNDAMENTAL SCIENCE
In the past few decades, many attempts to build language processing or language understanding systems have been undertaken by people without sufficient knowledge in theoretical linguistics. They ho

CURRENT STATE OF APPLIED RESEARCH ON SPANISH
In our books, the stress on Spanish language is made intentionally and purposefully. For historical reasons, the majority of the literature on natural languages processing is not only written in En

CONCLUSIONS
The twenty-first century will be the century of the total information revolution. The development of the tools for the automatic processing of the natural language spoken in a country or a whole gr

II. A HISTORICAL OUTLINE
A COURSE ON LINGUISTICS usually follows one of the general models, or theories, of natural language, as well as the corresponding methods of interpretation of the linguistic phenomena. A c

THE STRUCTURALIST APPROACH
At the beginning of the twentieth century, Ferdinand de Saussure had developed a new theory of language. He considered natural language as a structure of mutually linked elements, similar or

INITIAL CONTRIBUTION OF CHOMSKY
In the 1950’s, when the computer era began, the eminent American linguist Noam Chomsky developed some new formal tools aimed at a better description of facts in various languages [12].

A SIMPLE CONTEXT-FREE GRAMMAR
Let us consider an example of a context-free grammar for generating very simple English sentences. It uses the initial symbol S of a sentence to be generated and several oth

TRANSFORMATIONAL GRAMMARS
Further research revealed great generality, mathematical elegance, and wide applicability of generative grammars. They became used not only for description of natural languages, but also for specif

THE LINGUISTIC RESEARCH AFTER CHOMSKY: VALENCIES AND INTERPRETATION
After the introduction of the Chomskian transformations, many conceptions of language well known in general linguistics still stayed unclear. In the 1980’s, several grammatical theories different f

LINGUISTIC RESEARCH AFTER CHOMSKY: CONSTRAINTS
Another very valuable idea originated within the generative approach was that of using special features assigned to the constituents, and specifying constraints to characterize agreement or

HEAD-DRIVEN PHRASE STRUCTURE GRAMMAR
One of the direct followers of the GPSG was called Head-Driven Phrase Structure Grammar (HPSG). In addition to the advanced traits of the GPSG, it has introduced and intensively used the notion of

THE IDEA OF UNIFICATION
Having in essence the same initial idea of phrase structures and their context-free combining, the HPSG and several other new approaches within Chomskian mainstream select the general and very powe

THE MEANING Û TEXT THEORY: MULTISTAGE TRANSFORMER AND GOVERNMENT PATTERNS
The European linguists went their own way, sometimes pointing out some oversimplifications and inadequacies of the early Chomskian linguistics. In late 1960´s, a new theory, the Mean

THE MEANING Û TEXT THEORY: DEPENDENCY TREES
Another important feature of the MTT is the use of its dependency trees, for description of syntactic links between words in a sentence. Just the set of these links forms the representation

THE MEANING Û TEXT THEORY: SEMANTIC LINKS
The dependency approach is not exclusively syntactic. The links between wordforms at the surface syntactic level determine links between corresponding labeled nodes at the deep syntactic level, and

CONCLUSIONS
In the twentieth century, syntax was in the center of the linguistic research, and the approach to syntactic issues determined the structure of any linguistic theory. There are two major approaches

III. PRODUCTS OF COMPUTATIONAL LINGUISTICS: PRESENT AND PROSPECTIVE
FOR WHAT PURPOSES do we need to develop computational linguistics? What practical results does it provide for society? Before we start discus-sing the methods and techniques of computational lingui

CLASSIFICATION OF APPLIED LINGUISTIC SYSTEMS
Applied linguistic systems are now widely used in business and scientific domains for many purposes. Some of the most important ones among them are the following: · Text preparation

AUTOMATIC HYPHENATION
Hyphenation is intended for the proper splitting of words in natural language texts. When a word occurring at the end of a line is too long to fit on that line within the accepted margins, a part o

SPELL CHECKING
The objective of spell checking is the detection and correction of typographic and orthographic errors in the text at the level of word occurrence considered out of its context. Nob

GRAMMAR CHECKING
Detection and correction of grammatical errors by taking into account adjacent words in the sentence or even the whole sentence are much more difficult tasks for computational linguists and softwar

STYLE CHECKING
The stylistic errors are those violating the laws of use of correct words and word combinations in language, in general or in a given literary genre. This application is the nearest in its

REFERENCES TO WORDS AND WORD COMBINATIONS
The references from any specific word give access to the set of words semantically related to the former, or to words, which can form combinations with the former in a text. This is a very importan

INFORMATION RETRIEVAL
Information retrieval systems (IRS) are designed to search for relevant information in large documentary databases. This information can be of various kinds, with the queries ranging from “Find all

TOPICAL SUMMARIZATION
In many cases, it is necessary to automatically determine what a given document is about. This information is used to classify the documents by their main topics, to deliver by Internet the documen

AUTOMATIC TRANSLATION
Translation from one natural language to another is a very important task. The amount of business and scientific texts in the world is growing rapidly, and many countries are very productive in sci

NATURAL LANGUAGE INTERFACE
The task performed by a natural language interface to a database is to understand questions entered by a user in natural language and to provide answers—usually in natural language, but sometimes a

EXTRACTION OF FACTUAL DATA FROM TEXTS
Extraction of factual data from texts is the task of automatic generation of elements of a factographic database, such as fields, or parameters, based on on-line texts. Often the flows of the curre

TEXT GENERATION
The generation of texts from pictures and formal specifications is a comparatively new field; it arose about ten years ago. Some useful applications of this task have been found in recent years. Am

SYSTEMS OF LANGUAGE UNDERSTANDING
Natural language understanding systems are the most general and complex systems involving natural language processing. Such systems are universal in the sense that they can perform nearly all the t

RELATED SYSTEMS
There are other types of applications that are not usually considered systems of computational linguistics proper, but rely heavily on linguistic methods to accomplish their tasks. Of these we will

CONCLUSIONS
A short review of applied linguistic systems has shown that only very simple tasks like hyphenation or simple spell checking can be solved on a modest linguistic basis. All the other systems should

POSSIBLE POINTS OF VIEW ON NATURAL LANGUAGE
One could try to define natural language in one of the following ways: · The principal means for expressing human thoughts; · The principal means for text generation; · T

LANGUAGE AS A BI-DIRECTIONAL TRANSFORMER
The main purpose of human communication is transferring some information—let us call it Meaning[6]—from one person to the other. However, the direct transferring of thoughts is not possi

MEANING, WHAT IS IT?
Meanings, in contrast to texts, cannot be observed directly. As we mentioned above, we consider the Meaning to be the structures in the human brain which people experience as ideas and thoughts. Si

TWO WAYS TO REPRESENT MEANING
To represent the entities and relationships mentioned in the texts, the following two logically and mathematically equivalent formalisms are used: · Predicative formulas. Logical

DECOMPOSITION AND ATOMIZATION OF MEANING
Semantic representation in many cases turns out to be universal, i.e., common to different natural languages. Purely grammatical features of different languages are not usually reflected in

NOT-UNIQUENESS OF MEANING Þ TEXT MAPPING: SYNONYMY
Returning to the mapping of Meanings to Texts and vice versa, we should mention that, in contrast to common mathematical functions, this mapping is not unique in both directions, i.e., it is of the

NOT-UNIQUENESS OF TEXT Þ MEANING MAPPING: HOMONYMY
In the opposite direction—Texts to Meanings—a text or its fragment can exhibit two or more different meanings. That is, one element of the surface edge of the mapping (i.e. text) can correspond to

MORE ON HOMONYMY
In the field of computational linguistics, homonymous lexemes usually form separate entries in dictionaries. Linguistic analyzers must resolve the homonymy automatically, by choosing the correct op

MULTISTAGE CHARACTER OF THE MEANING Û TEXT TRANSFORMER
FIGURE IV.10. Levels of linguistic representation.

TRANSLATION AS A MULTISTAGE TRANSFORMATION
FI­GURE IV.13. The role of dictionaries and grammars in linguis

TWO SIDES OF A SIGN
The notion of sign, so important for linguistics, was first proposed in a science called semiotics. The sign was defined as an entity consisting of two components, the signifier

LINGUISTIC SIGN
The notion of linguistic sign was introduced by Ferdinand de Saussure. By linguistic signs, we mean the entities used in natural languages, such as morphs, lexemes, and phrases. Lin

LINGUISTIC SIGN IN THE MMT
In addition to the two well-known components of a sign, in the Meaning Û Text Theory yet another, a third component of a sign, is considered essential: a record about its ability or inability

LINGUISTIC SIGN IN HPSG
In Head-driven Phrase Structure Grammar a linguistic sign, as usually, consists of two main components, a signifier and a signified. The signifier is defined as a phoneme string (or a sequence of s

ARE SIGNIFIERS GIVEN BY NATURE OR BY CONVENTION?
The notion of sign appeared rather recently. However, the notions equivalent to the signifier and the signified were discussed in science from the times of the ancient Greeks. For several centuries

GENERATIVE, MTT, AND CONSTRAINT IDEAS IN COMPARISON
In this book, three major approaches to linguistic description have been discussed till now, with different degree of detail: (1) generative approach developed by N. Chomsky, (2) the Meaning Û

CONCLUSIONS
The definition of language has been suggested as a transformer between the two equivalent representations of information, the Text, i.e., the surface textual representation, and the Meaning, i.e.,

V. LINGUISTIC MODELS
THROUGHOUT THE PREVIOUS CHAPTERS, you have learned, on the one hand, that for many computer applications, detailed linguistic knowledge is necessary and, on the other hand, that natural language ha

WHAT IS MODELING IN GENERAL?
In natural sciences, we usually consider the system A to be a model of the system B if A is similar to B in some important properties and exhibits somewhat simila

NEUROLINGUISTIC MODELS
Neurolinguistic models investigate the links between any external speech activity of human beings and the corresponding electrical and humoral activities of nerves in their brain. I

PSYCHOLINGUISTIC MODELS
Psycholinguistics is a science investigating the speech activity of humans, including perception and forming of utterances, via psychological methods. After creating its hypotheses and model

FUNCTIONAL MODELS OF LANGUAGE
In terms of cybernetics, natural language is considered as a black box for the researcher. A black box is a device with observable input and output but with a completely unobservable inner s

RESEARCH LINGUISTIC MODELS
There are still other models of interest for linguistics. They are called research models. At input, they take texts in natural language, maybe prepared or formatted in a special manner befo

COMMON FEATURES OF MODERN MODELS OF LANGUAGE
The modern models of language have turned out to possess several common features that are very important for the comprehension and use of these models. One of these models is given by the Meaning &

SPECIFIC FEATURES OF THE MEANING Û TEXT MODEL
The Meaning Û Text Model was selected for the most detailed study in these books, and it is necessary now to give a short synopsis of its specific features. · Orientation to synth

REDUCED MODELS
We can formulate the problem of selecting a good model for any specific linguistic application as follows. A holistic model of the language facilitates describing the language as a

DO WE REALLY NEED LINGUISTIC MODELS?
Now let us reason a little bit on whether computer scientists really need a generalizing (complete) model of language. In modern theoretical linguistics, certain researchers study phonolog

ANALOGY IN NATURAL LANGUAGES
Analogy is the prevalence of a pattern (i.e., one rule or a small set of rules) in the formal description of some linguistic phenomena. In the simplest case, the pattern can be represented with the

EMPIRICAL VERSUS RATIONALIST APPROACHES
In the recent years, the interest to empirical approach in linguistic research has livened. The empirical approach is based on numerous statistical observations gathered purely automatically

LIMITED SCOPE OF THE MODERN LINGUISTIC THEORIES
Even the most advanced linguistic theories cannot pretend to cover all computational problems, at least at present. Indeed, all of them evidently have the following limitations: · Only the

CONCLUSIONS
A linguistic model is a system of data (features, types, structures, levels, etc.) and rules, which, taken together, can exhibit a “behavior” similar to that of the human brain in understanding and

REVIEW QUESTIONS
    THE FOLLOWING QUESTIONS can be used to check whether the reader has understood and remembered the main contents of the book. The questions are also recommended for t

PROBLEMS RECOMMENDED FOR EXAMS
IN THIS SECTION, each test question is supplied with a set of four variants of the answer, of which exactly one is correct and the others are not. 1. Why automatic natural language process

RECOMMENDED LITERATURE
1. Allen, J. Natural Language Understanding. The Benjamin / Cummings Publ., Amsterdam, Bonn, Sidney, Singapore, Tokyo, Madrid, 1995. 2. Cortés García, U., J. Bé

ADDITIONAL LITERATURE
10. Baeza-Yates, R., B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley Longman and ACM Press, 1999. 11. Beristáin, Helena. Gramática estructural de la l

GENERAL GRAMMARS AND DICTIONARIES
20. Criado de Val, M. Gramática española. Madrid, 1958. 21. Cuervo, R. J. Diccionario de construcción y régimen de la lengua castellana. Instituto

REFERENCES
34. Apresian, Yu. D. et al. Linguistic support of the system ETAP-2 (in Russian). Nauka, Moscow, Russia, 1989. 35. Beekman, G. “Una mirada a la tecnología del ma&ntild

SOME SPANISH-ORIENTED GROUPS AND RESOURCES
HERE WE PRESENT a very short list of groups working on Spanish, with their respective URLs, especially the groups in Latin America. The members of the RITOS network (emilia.dc.fi.udc.es / Ritos2) a

Хотите получать на электронную почту самые свежие новости?
Education Insider Sample
Подпишитесь на Нашу рассылку
Наша политика приватности обеспечивает 100% безопасность и анонимность Ваших E-Mail
Реклама
Соответствующий теме материал
  • Похожее
  • Популярное
  • Облако тегов
  • Здесь
  • Временно
  • Пусто
Теги