Syntax-Based Collocation Extraction: 44 (Text, Speech and Language Technology)

Be the first to ask a question about Syntax-Based Collocation Extraction. Lists with This Book. This book is not yet featured on Listopia. Huy Phuc rated it really liked it Dec 02, Moatsm Awad rated it liked it Oct 19, Waradon marked it as to-read Apr 19, Ali Al marked it as to-read Aug 24, Maha marked it as to-read Oct 13, Muhammad Ardian marked it as to-read Mar 15, On the other hand, for those collocations that have little additional meaning over the combination of words, they still show the close semantic restriction between the components.

Here we should pay attention to two kinds of special cases: The idioms are considered fully fixed collocations that are non-compositional; they are fixed combination of words with specific meanings.

Join Kobo & start eReading today

Idioms are widely existed in Chinese and many other languages. Another kind of cases is terms. When we extrac t collocations from a specific domain, many of them are terms. In this research, the extracted term phrases are regarded as collocations. Brundage believe a collocation must be non-substitutable and non-modifiable [Brundage ].

Lecture 60 — Collocations - Natural Language Processing - University of Michigan

That means we cannot substitute any word of a collocation, even we use their synonyms, to construct a new collocation with same meaning. Furthermore, collocations cannot be freely modified by adding modifiers or through grammatical transformations because basically collocations are based on conventional usage. We think such a restriction is too strict, and many interesting word combinations will be lost.

In this research, we restrict the collocation must be limit substitutable and limit modifiable, that means when we use the corresponding synonyms to substitute the component in the word combinations, if non or only very few cases that the new generated word combinations are tend to be strong combination and others are meaningless or ill-formed, such a word combination is regarded as a collocations. According to Brundage's requirement, these are not collocations.

That means these word combinations are not exceptions, but frequently occur in the similar context. It is easily understand, the collocation is based on conventional usage. Only the frequently used word combinations are regarded as collocations. In domain area, many collocations tend to be term phrases.

Furthermore, some word combinations can be regarded as collocations in one specific domain, and in other domains, they tend to be free combinations with high co-occurrence. Even though these two words appear in context, they are normally not constructing a collocation. Classifications of Chinese Collocation. The collocations are various. Some are very rigid, whereas some are flexible. According to the internal restriction, substitutability, and modifiability, we classify Chinese collocations into 4 types. Such a classification strategy has good computational operation ability.

Editorial Reviews

A fully fixed collocation is the most strict type that fulfill two conditions,. Type 0 collocations include some idioms, proverbs and sayings.

  1. 50 Things you didnt know about the 1916 Easter Rising.
  2. Syntax-Based Collocation Extraction - Violeta Seretan - Google Книги;
  3. Chinese Computing Lab.
  4. Crime and Planning: Building Socially Sustainable Communities.
  5. The Profitable Consultant: Starting, Growing, and Selling Your Expertise.

Neither of the components can be substituted, or the word order can be changed, or any words can be inserted. The components within a Type 1 collocation have fully internal restriction, that is, the appearance of one word implies the co-occurrence of another one. Each component in this collocation can not be substituted by its synonyms. Different with Type 1 collocations, a Type 2 collocation allows very limited substitutable of the components.

  • Freges Logic.
  • Do It Again.
  • La chiave del mio cuore (Italian Edition).

Type 3 collocations maintain less internal restrictions. More substitutable of components is allowed but a limitation is still required. Once the components of a collocation can be substituted by their most corresponding synonymies, they tend to be grammatical collocation, but to be true collocation. The discussion on collocation from view of linguistic is very important to this research. Based on the definition and classification of collocation, we could distinguish the true collocations and pseudo ones. Thus, the correct answers can be identified from the output of automatic collocation extraction system.

Meanwhile, the mentioned characteristics and classification conditions of collocation motivate and direct the research on eliminate the pseudo collocations from automatic extraction result. Research on the automatic extraction of collocations. The research on automatic collocation extraction began with the work of Choueka [9] in and he in [10] conducted the experiment on the corpus of 11 million words of the New York Times.

Account Options

Church conducted an experiment on the Associated Press corpus of 44 million words in [11]. Smadja carried out the most comprehensive and most in-depth work in this field [6]. He developed a lexicograghic tool, Xtract which was applied to a 10 million-word corpus of stock market news report.

The techniques in these researches were mainly based on lexical statistics. The collocations were selected based on the following statistical criteria:. Frequency[,12] Mutual Information [7,11,13] Mean and variance of the distribution of the collocation [6,14].

These techniques utilized the statistical figures of words to reflect the relevance of the association.

Syntax-Based Collocation Extraction

However, there are some problems remain to be solved. Firstly, the accuracy of the automatic extraction is still unsatisfactory. Similar work was done for Chinese. As reported by Sun et al. His work showed that syntactic filtering can greatly improve the performance of automatic collocation extraction.

What is Kobo Super Points?

Extract the collocation patterns by automatically analyze the cases in Treebank. Thus, the correct answers can be identified from the output of automatic collocation extraction system. The accuracy of retrieval can be improved if the similarity between a user query and a document can be determined according to common collocations or phrases instead of common words. Machine Learning and Big Data. For permissions, please email:

Secondly, the techniques, based on statistics of words in a large corpus, cannot be applied when identifying collocations in a small corpus or one document. Thirdly, although different approaches have been taken, the scale of work was very limited. There have been few reports on large collocation databases extracted automatically or semi-automatically from running text. Finally, almost all of the past work was done for English. There are some work in this topic for Chinese but they are at a very limited scale [13, 14, 16].

The aim of shallow parsing is to reliably recognize relatively simple syntactic elements, rather than to produce complete parses. Previous research in this field can be classified into two categories: Abney [] used cascaded finite-state machines for shallow parsing. A finite-state cascade machine consists of strata, in which each stratum is being defined by a set of regular expression patterns for recognizing phrases.

Voutilainen [] used a constraint-grammar-based approach to assign an appropriate functional tag, indicating its syntactic function in the context to every word. Most users should sign in with their email address. If you originally registered with a username please use that to sign in. To purchase short term access, please sign in to your Oxford Academic account above.

Don't already have an Oxford Academic account?

Upcoming Events

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide. Sign In or Create an Account. Close mobile search navigation Article navigation. Syntax-Based Collocation Extraction Violeta. For permissions, please email: