Séminaire de linguistique : Traitement automatique du langage : analyse syntaxique en dépendances et sémantique distributionnelle

Séminaire

21 déc. 13:00 - 16:00 B1.661 (salle D. Corbin) - STL - Université de Lille - Domaine du pont de Bois

Laurence Romain (Univ. de Lille, UMR STL)

An exploration of constructional meaning via vector-space models

This paper presents a method to analyse constructional meaning in argument structure constructions. Following the principles of distributional semantics, I measure semantic similarity between the various elements found in the same slot of a construction in order to identify constructional meaning.

In construction grammar(s), the meaning of argument structure constructions is often closely associated with the (meaning of the) verb they are used with (Goldberg 1995; Stefanowitsch and Gries 2003, inter alia). However, for the causative alternation, the verb is not sufficient to determine constructional meaning and more specifically differences in constructional meaning between the alternants. That is, although many verbs of change of state alternate between the two constructions, they do not necessarily occur with the same themes (the entity undergoing the event denoted by the verb) in the two constructions.

Based on a dataset composed of 11,554 instances of the intransitive non-causative construction and the transitive causative construction with 29 different verbs extracted from the Corpus Of Contemporary American English (COCA), I will show how much information is shared by the two constructions. To do so, I will pay attention to the various themes found with each construction. If we only took themes individually, it would be difficult to assess exactly how similar they are (Lemmens, forth.). Therefore, I will use vector-space models to group these themes semantically. This will help identify verb senses and sub-senses and thus measure how (dis)similar the two constructions are.

References

Davies, Mark (2008-). The Corpus of Contemporary American English (COCA): 520 million words, 1990-present. Available online at: corpus.byu.edu/coca.

Goldberg, Adele E. (1995). Constructions. A construction grammar approach to argument structure. Chicago: University of Chicago Press.

Lemmens, Maarten (forthc.). Usage-based perspectives on lexical and constructional semantics. Shanghai, China: Shanghai Foreign Language Education Press.

Stefanowitsch, Anatol and Stefan T. Gries (2003). “Collostructions: investigating the interaction between words and constructions”. In: International journal of corpus linguistics 8.2, pp. 209–243.

Mathieu Dehouck (Inria Nord—Europe)

Morpho-Syntax Matters: Assessing the role of morphology in dependency parsing

Morphology is both a blessing and a curse when it comes to automated treatment of language. By encoding linguistic information (syntax, semantics) that might be relevant to the task in question morphological information can be of great help for automated systems. Unfortunately, this information comes at the cost of higher data sparsity and out-of-vocabulary forms and more generally at the cost of more parameters to be estimated by the system. This can prove detrimental especially when the information encoded by morphology is not relevant to the task in question. However, in the field of Natural Language Processing, most work on morphologically rich languages treat them as a rather homogeneous group of languages and as such apply the same techniques (such as tools from distributional semantics) to all of them. We hypothesise that this is an oversimplification of the reality and that different morphologically rich languages will benefit from different treatment of their morphology. Through experiments on dependency parsing of various languages and by introducing a new measure of morpho-syntactic complexity, we will show that not only can one distinguish morpho-syntactic languages from morpho-semantic ones, but that this distinction can actually improve parsing performances.

Retour