Semi-automatic techniques for extending the FrameNet lexical database to new languages

dc.contributor.advisor	Delmonte, Rodolfo	it_IT
dc.contributor.advisor	Pianta, Emanuele	it_IT
dc.contributor.author	Tonelli, Sara <1976>	it_IT
dc.date.accessioned	2010-06-22T05:59:50Z	it_IT
dc.date.accessioned	2012-07-30T16:03:44Z
dc.date.available	2010-06-22T05:59:50Z	it_IT
dc.date.available	2012-07-30T16:03:44Z
dc.date.issued	2010-03-29	it_IT
dc.identifier.uri	http://hdl.handle.net/10579/1025	it_IT
dc.description.abstract	Questo lavoro presenta un’analisi finalizzata allo sviluppo semi-automatico di risorse secondo il modello di FrameNet per nuove lingue, con un’attenzione particolare per l’italiano. L’approccio proposto consiste nel mantenere, ove possibile, l’architettura teorica di FrameNet inglese, e nell’arricchire automaticamente la parte della risorsa specifica per ogni lingua, in particolare acquisendo lexical unit e frasi d’esempio in italiano. La prima parte dell’analisi è dedicata alla presentazione della teoria semantica dei frame e alla presentazione dei progetti in corso per lo sviluppo di nuovi FrameNet. Si fornisce inoltre una breve panoramica degli ambiti del trattamento automatico del linguaggio ai quali le informazioni sui frame potrebbero fornire un contributo significativo. La seconda parte della tesi si concentra maggiormente sugli aspetti applicativi e presenta tre strategie per l’annotazione semi-automatica di informazioni sui frame in testi italiani. Anche se il presente lavoro riguarda principalmente l’italiano, il modello proposto può essere facilmente esteso a altre lingue, poiché gli esperimenti effettuati utilizzano risorse multilingue liberamente disponibili come il corpus Europarl (in 11 lingue), MultiWordNet (5 lingue) e Wikipedia (264 lingue).	it_IT
dc.description.abstract	The topic of this work is the semi-automatic development of FrameNet-like resources for new languages with a focus on Italian. Our approach is aimed at exploiting as much as possible the theoretical backbone of English FrameNet, and to find ways to automatically populate the language-dependent part of the database with Italian lexical units and example sentences. The first part of this thesis is devoted to the analysis of FrameNet theoretical background and to the discussion about ongoing projects for the development of new FrameNets. We also introduce the main natural language processing tasks that can benefit from the integration of frame information. The second part of the thesis is more task-oriented and presents three strategies for the semi-automatic annotation of Italian data with frame information. We start from the fundamental assumption that frames as defined in the English FrameNet can be re-used for the semantic analysis of Italian, but then we account also for some exceptions to such claim, due to different types of cross-linguistic divergences. Even if we focus on Italian, the presented framework can be easily applied to any new language, also because our experiments were carried out using publicly available multilingual resources such as the Europarl corpus (available in 11 languages), MultiWordNet (5 languages) and Wikipedia (264 languages).	it_IT
dc.format.medium	Tesi cartacea	it_IT
dc.language.iso	en	it_IT
dc.publisher	Università Ca' Foscari Venezia	it_IT
dc.rights	© Sara Tonelli, 2010	it_IT
dc.subject	Semantica dei frame	it_IT
dc.subject	Risorse linguistiche - Creazione automatica	it_IT
dc.title	Semi-automatic techniques for extending the FrameNet lexical database to new languages	it_IT
dc.type	Doctoral Thesis	it_IT
dc.degree.name	Scienze del linguaggio	it_IT
dc.degree.level	Dottorato di ricerca	it_IT
dc.degree.grantor	Facoltà di Lingue e letterature straniere	it_IT
dc.description.academicyear	2008/2009	it_IT
dc.description.cycle	22	it_IT
dc.degree.coordinator	Cinque, Guglielmo	it_IT
dc.location.shelfmark	D000955	it_IT
dc.location	Venezia, Archivio Università Ca' Foscari, Tesi Dottorato	it_IT
dc.rights.accessrights	openAccess	it_IT
dc.thesis.matricno	955294	it_IT
dc.format.pagenumber	187p.	it_IT
dc.subject.miur	L-LIN/01 GLOTTOLOGIA E LINGUISTICA	it_IT
dc.description.note	Lavoro svolto presso la Fondazione Bruno Kessler, Trento, nel gruppo di ricerca su Human Language Technologies	it_IT
dc.description.tableofcontent	1 Introduction 1.1 Innovative aspects 1.2 Structure of the thesis 2 FrameNet and Frame Semantics 2.1 Theoretical background to FrameNet 2.1.1 Frame semantics 2.1.2 Construction Grammar 2.2 The FrameNet project 2.2.1 Project versions 2.2.2 FrameNet Structure 2.2.3 Annotation workflow 2.2.4 FrameNet Statistics 2.3 FrameNet projects for new languages 2.3.1 Manual annotation 2.3.2 Semi-automatic annotation 2.4 Summary 3 Is FrameNet useful? 3.1 Introduction 3.2 FrameNet as a general framework for Semantic Analysis 3.3 FrameNet and Question Answering 3.4 FrameNet and Textual Entailment 3.5 FrameNet and Machine Translation 3.6 Summary 4 Frame information transfer from English to Italian 4.1 Introduction 4.2 Related work 4.3 General transfer framework 4.4 The Transfer Algorithms 4.4.1 Algorithm 1 4.4.2 Formalization of Algorithm 1 4.4.3 Algorithm 2 4.4.4 Formalization of Algorithm 2 4.4.5 Algorithm comparison 4.5 The gold standards 4.5.1 EUROPARL 4.5.2 MULTIBERKELEY 4.5.3 Gold standard comparison 4.5.4 Gold standard development 4.6 Evaluation framework 4.6.1 Evaluation 1 4.6.2 Evaluation 2 4.6.3 Evaluation 3: a proposal 4.7 Summary 5 Using WordNet to populate Italian frames 5.1 Introduction 5.2 WordNet and MultiWordNet 5.3 FrameNet and WordNet 5.4 Previous mapping approaches 5.5 Problem formulation 5.6 Dataset description 5.7 Feature description 5.8 Experimental setup and evaluation 5.9 MapNet and its applications 5.9.1 Automatic FrameNet extension 5.9.2 Frame annotation of MultiSemCor 5.10 Summary 6 Wikipedia as frame example repository 6.1 Introduction 6.2 Wikipedia 6.3 Motivation of the sentence extraction task 6.4 The Mapping Algorithm 6.5 The mapping experiment 6.5.1 Experimental Setup 6.5.2 WSD statistics and analysis 6.6 English FrameNet expansion 6.6.1 English data extraction 6.6.2 Output statistics 6.6.3 Evaluation of the English example sentences 6.7 Multilingual FrameNet expansion 6.7.1 Italian data extraction 6.7.2 Evaluation of the Italian sentences 6.8 Summary 7 Conclusions and perspectives for future research 7.1 Summary 7.2 The Final Resource 7.3 Future work A Frame semantics and dialogs A.1 The LUNA project A.2 Frame annotation of the LUNA corpus A.3 Newly introduced frames A.4 Statistics about the annotated corpus A.5 DA-frame Relationship A.6 Summary B Italian LUs and frames in the gold standards 159 B.1 Europarl B.2 MultiBerkeley	it_IT
dc.identifier.bibliographiccitation	Tonelli, Sara. "Semi-automatic techniques for extending the FrameNet lexical database to new languages", Università Ca' Foscari Venezia, tesi di dottorato, 22. ciclo, 2010	it_IT
dc.degree.discipline	Linguistica computazionale	it_IT