Semi-automatic techniques for extending the FrameNet lexical database to new languages

DSpace/Manakin Repository

Show simple item record

dc.contributor.advisor Delmonte, Rodolfo it_IT
dc.contributor.advisor Pianta, Emanuele it_IT
dc.contributor.author Tonelli, Sara <1976> it_IT
dc.date.accessioned 2010-06-22T05:59:50Z it_IT
dc.date.accessioned 2012-07-30T16:03:44Z
dc.date.available 2010-06-22T05:59:50Z it_IT
dc.date.available 2012-07-30T16:03:44Z
dc.date.issued 2010-03-29 it_IT
dc.identifier.uri http://hdl.handle.net/10579/1025 it_IT
dc.description.abstract Questo lavoro presenta un’analisi finalizzata allo sviluppo semi-automatico di risorse secondo il modello di FrameNet per nuove lingue, con un’attenzione particolare per l’italiano. L’approccio proposto consiste nel mantenere, ove possibile, l’architettura teorica di FrameNet inglese, e nell’arricchire automaticamente la parte della risorsa specifica per ogni lingua, in particolare acquisendo lexical unit e frasi d’esempio in italiano. La prima parte dell’analisi è dedicata alla presentazione della teoria semantica dei frame e alla presentazione dei progetti in corso per lo sviluppo di nuovi FrameNet. Si fornisce inoltre una breve panoramica degli ambiti del trattamento automatico del linguaggio ai quali le informazioni sui frame potrebbero fornire un contributo significativo. La seconda parte della tesi si concentra maggiormente sugli aspetti applicativi e presenta tre strategie per l’annotazione semi-automatica di informazioni sui frame in testi italiani. Anche se il presente lavoro riguarda principalmente l’italiano, il modello proposto può essere facilmente esteso a altre lingue, poiché gli esperimenti effettuati utilizzano risorse multilingue liberamente disponibili come il corpus Europarl (in 11 lingue), MultiWordNet (5 lingue) e Wikipedia (264 lingue). it_IT
dc.description.abstract The topic of this work is the semi-automatic development of FrameNet-like resources for new languages with a focus on Italian. Our approach is aimed at exploiting as much as possible the theoretical backbone of English FrameNet, and to find ways to automatically populate the language-dependent part of the database with Italian lexical units and example sentences. The first part of this thesis is devoted to the analysis of FrameNet theoretical background and to the discussion about ongoing projects for the development of new FrameNets. We also introduce the main natural language processing tasks that can benefit from the integration of frame information. The second part of the thesis is more task-oriented and presents three strategies for the semi-automatic annotation of Italian data with frame information. We start from the fundamental assumption that frames as defined in the English FrameNet can be re-used for the semantic analysis of Italian, but then we account also for some exceptions to such claim, due to different types of cross-linguistic divergences. Even if we focus on Italian, the presented framework can be easily applied to any new language, also because our experiments were carried out using publicly available multilingual resources such as the Europarl corpus (available in 11 languages), MultiWordNet (5 languages) and Wikipedia (264 languages). it_IT
dc.format.medium Tesi cartacea it_IT
dc.language.iso en it_IT
dc.publisher Università Ca' Foscari Venezia it_IT
dc.rights © Sara Tonelli, 2010 it_IT
dc.subject Semantica dei frame it_IT
dc.subject Risorse linguistiche - Creazione automatica it_IT
dc.title Semi-automatic techniques for extending the FrameNet lexical database to new languages it_IT
dc.type Doctoral Thesis it_IT
dc.degree.name Scienze del linguaggio it_IT
dc.degree.level Dottorato di ricerca it_IT
dc.degree.grantor Facoltà di Lingue e letterature straniere it_IT
dc.description.academicyear 2008/2009 it_IT
dc.description.cycle 22 it_IT
dc.degree.coordinator Cinque, Guglielmo it_IT
dc.location.shelfmark D000955 it_IT
dc.location Venezia, Archivio Università Ca' Foscari, Tesi Dottorato it_IT
dc.rights.accessrights openAccess it_IT
dc.thesis.matricno 955294 it_IT
dc.format.pagenumber 187p. it_IT
dc.subject.miur L-LIN/01 GLOTTOLOGIA E LINGUISTICA it_IT
dc.description.note Lavoro svolto presso la Fondazione Bruno Kessler, Trento, nel gruppo di ricerca su Human Language Technologies it_IT
dc.description.tableofcontent 1 Introduction 1.1 Innovative aspects 1.2 Structure of the thesis 2 FrameNet and Frame Semantics 2.1 Theoretical background to FrameNet 2.1.1 Frame semantics 2.1.2 Construction Grammar 2.2 The FrameNet project 2.2.1 Project versions 2.2.2 FrameNet Structure 2.2.3 Annotation workflow 2.2.4 FrameNet Statistics 2.3 FrameNet projects for new languages 2.3.1 Manual annotation 2.3.2 Semi-automatic annotation 2.4 Summary 3 Is FrameNet useful? 3.1 Introduction 3.2 FrameNet as a general framework for Semantic Analysis 3.3 FrameNet and Question Answering 3.4 FrameNet and Textual Entailment 3.5 FrameNet and Machine Translation 3.6 Summary 4 Frame information transfer from English to Italian 4.1 Introduction 4.2 Related work 4.3 General transfer framework 4.4 The Transfer Algorithms 4.4.1 Algorithm 1 4.4.2 Formalization of Algorithm 1 4.4.3 Algorithm 2 4.4.4 Formalization of Algorithm 2 4.4.5 Algorithm comparison 4.5 The gold standards 4.5.1 EUROPARL 4.5.2 MULTIBERKELEY 4.5.3 Gold standard comparison 4.5.4 Gold standard development 4.6 Evaluation framework 4.6.1 Evaluation 1 4.6.2 Evaluation 2 4.6.3 Evaluation 3: a proposal 4.7 Summary 5 Using WordNet to populate Italian frames 5.1 Introduction 5.2 WordNet and MultiWordNet 5.3 FrameNet and WordNet 5.4 Previous mapping approaches 5.5 Problem formulation 5.6 Dataset description 5.7 Feature description 5.8 Experimental setup and evaluation 5.9 MapNet and its applications 5.9.1 Automatic FrameNet extension 5.9.2 Frame annotation of MultiSemCor 5.10 Summary 6 Wikipedia as frame example repository 6.1 Introduction 6.2 Wikipedia 6.3 Motivation of the sentence extraction task 6.4 The Mapping Algorithm 6.5 The mapping experiment 6.5.1 Experimental Setup 6.5.2 WSD statistics and analysis 6.6 English FrameNet expansion 6.6.1 English data extraction 6.6.2 Output statistics 6.6.3 Evaluation of the English example sentences 6.7 Multilingual FrameNet expansion 6.7.1 Italian data extraction 6.7.2 Evaluation of the Italian sentences 6.8 Summary 7 Conclusions and perspectives for future research 7.1 Summary 7.2 The Final Resource 7.3 Future work A Frame semantics and dialogs A.1 The LUNA project A.2 Frame annotation of the LUNA corpus A.3 Newly introduced frames A.4 Statistics about the annotated corpus A.5 DA-frame Relationship A.6 Summary B Italian LUs and frames in the gold standards 159 B.1 Europarl B.2 MultiBerkeley it_IT
dc.identifier.bibliographiccitation Tonelli, Sara. "Semi-automatic techniques for extending the FrameNet lexical database to new languages", Università Ca' Foscari Venezia, tesi di dottorato, 22. ciclo, 2010 it_IT
dc.degree.discipline Linguistica computazionale it_IT


Files in this item

This item appears in the following Collection(s)

Show simple item record