dc.contributor.advisor |
Delmonte, Rodolfo |
it_IT |
dc.contributor.advisor |
Pianta, Emanuele |
it_IT |
dc.contributor.author |
Tonelli, Sara <1976> |
it_IT |
dc.date.accessioned |
2010-06-22T05:59:50Z |
it_IT |
dc.date.accessioned |
2012-07-30T16:03:44Z |
|
dc.date.available |
2010-06-22T05:59:50Z |
it_IT |
dc.date.available |
2012-07-30T16:03:44Z |
|
dc.date.issued |
2010-03-29 |
it_IT |
dc.identifier.uri |
http://hdl.handle.net/10579/1025 |
it_IT |
dc.description.abstract |
Questo lavoro presenta un’analisi finalizzata allo sviluppo semi-automatico di risorse secondo il modello di FrameNet per nuove lingue, con un’attenzione particolare per l’italiano. L’approccio proposto consiste nel mantenere, ove possibile, l’architettura teorica di FrameNet inglese, e nell’arricchire automaticamente la parte della risorsa specifica per ogni lingua, in particolare acquisendo lexical unit e frasi d’esempio in italiano.
La prima parte dell’analisi è dedicata alla presentazione della teoria semantica dei frame e alla presentazione dei progetti in corso per lo sviluppo di nuovi FrameNet. Si fornisce inoltre una breve panoramica degli ambiti del trattamento automatico del linguaggio ai quali le informazioni sui frame potrebbero fornire un contributo significativo.
La seconda parte della tesi si concentra maggiormente sugli aspetti applicativi e presenta tre strategie per l’annotazione semi-automatica di informazioni sui frame in testi italiani.
Anche se il presente lavoro riguarda principalmente l’italiano, il modello proposto può essere facilmente esteso a altre lingue, poiché gli esperimenti effettuati utilizzano risorse multilingue liberamente disponibili come il corpus Europarl (in 11 lingue), MultiWordNet (5 lingue) e Wikipedia (264 lingue). |
it_IT |
dc.description.abstract |
The topic of this work is the semi-automatic development of FrameNet-like resources for new languages with a focus on Italian. Our approach is aimed at exploiting as much as possible the theoretical backbone of English FrameNet, and to find ways to automatically populate the language-dependent part of the database with Italian lexical units and example sentences.
The first part of this thesis is devoted to the analysis of FrameNet theoretical background and to the discussion about ongoing projects for the development of new FrameNets. We also introduce the main natural language processing tasks that can benefit from the integration of frame information.
The second part of the thesis is more task-oriented and presents three strategies for the semi-automatic annotation of Italian data with frame information. We start from the fundamental assumption that frames as defined in the English FrameNet can be re-used for the semantic analysis of Italian, but then we account also for some exceptions to such claim, due to different types of cross-linguistic divergences.
Even if we focus on Italian, the presented framework can be easily applied to any new language, also because our experiments were carried out using publicly available multilingual resources such as the Europarl corpus (available in 11 languages), MultiWordNet (5 languages) and Wikipedia (264 languages). |
it_IT |
dc.format.medium |
Tesi cartacea |
it_IT |
dc.language.iso |
en |
it_IT |
dc.publisher |
Università Ca' Foscari Venezia |
it_IT |
dc.rights |
© Sara Tonelli, 2010 |
it_IT |
dc.subject |
Semantica dei frame |
it_IT |
dc.subject |
Risorse linguistiche - Creazione automatica |
it_IT |
dc.title |
Semi-automatic techniques for extending the FrameNet lexical database to new languages |
it_IT |
dc.type |
Doctoral Thesis |
it_IT |
dc.degree.name |
Scienze del linguaggio |
it_IT |
dc.degree.level |
Dottorato di ricerca |
it_IT |
dc.degree.grantor |
Facoltà di Lingue e letterature straniere |
it_IT |
dc.description.academicyear |
2008/2009 |
it_IT |
dc.description.cycle |
22 |
it_IT |
dc.degree.coordinator |
Cinque, Guglielmo |
it_IT |
dc.location.shelfmark |
D000955 |
it_IT |
dc.location |
Venezia, Archivio Università Ca' Foscari, Tesi Dottorato |
it_IT |
dc.rights.accessrights |
openAccess |
it_IT |
dc.thesis.matricno |
955294 |
it_IT |
dc.format.pagenumber |
187p. |
it_IT |
dc.subject.miur |
L-LIN/01 GLOTTOLOGIA E LINGUISTICA |
it_IT |
dc.description.note |
Lavoro svolto presso la Fondazione Bruno Kessler, Trento, nel gruppo di ricerca su Human Language Technologies |
it_IT |
dc.description.tableofcontent |
1 Introduction
1.1 Innovative aspects
1.2 Structure of the thesis
2 FrameNet and Frame Semantics
2.1 Theoretical background to FrameNet
2.1.1 Frame semantics
2.1.2 Construction Grammar
2.2 The FrameNet project
2.2.1 Project versions
2.2.2 FrameNet Structure
2.2.3 Annotation workflow
2.2.4 FrameNet Statistics
2.3 FrameNet projects for new languages
2.3.1 Manual annotation
2.3.2 Semi-automatic annotation
2.4 Summary
3 Is FrameNet useful?
3.1 Introduction
3.2 FrameNet as a general framework for Semantic Analysis
3.3 FrameNet and Question Answering
3.4 FrameNet and Textual Entailment
3.5 FrameNet and Machine Translation
3.6 Summary
4 Frame information transfer from English to Italian
4.1 Introduction
4.2 Related work
4.3 General transfer framework
4.4 The Transfer Algorithms
4.4.1 Algorithm 1
4.4.2 Formalization of Algorithm 1
4.4.3 Algorithm 2
4.4.4 Formalization of Algorithm 2
4.4.5 Algorithm comparison
4.5 The gold standards
4.5.1 EUROPARL
4.5.2 MULTIBERKELEY
4.5.3 Gold standard comparison
4.5.4 Gold standard development
4.6 Evaluation framework
4.6.1 Evaluation 1
4.6.2 Evaluation 2
4.6.3 Evaluation 3: a proposal
4.7 Summary
5 Using WordNet to populate Italian frames
5.1 Introduction
5.2 WordNet and MultiWordNet
5.3 FrameNet and WordNet
5.4 Previous mapping approaches
5.5 Problem formulation
5.6 Dataset description
5.7 Feature description
5.8 Experimental setup and evaluation
5.9 MapNet and its applications
5.9.1 Automatic FrameNet extension
5.9.2 Frame annotation of MultiSemCor
5.10 Summary
6 Wikipedia as frame example repository
6.1 Introduction
6.2 Wikipedia
6.3 Motivation of the sentence extraction task
6.4 The Mapping Algorithm
6.5 The mapping experiment
6.5.1 Experimental Setup
6.5.2 WSD statistics and analysis
6.6 English FrameNet expansion
6.6.1 English data extraction
6.6.2 Output statistics
6.6.3 Evaluation of the English example sentences
6.7 Multilingual FrameNet expansion
6.7.1 Italian data extraction
6.7.2 Evaluation of the Italian sentences
6.8 Summary
7 Conclusions and perspectives for future research
7.1 Summary
7.2 The Final Resource
7.3 Future work
A Frame semantics and dialogs
A.1 The LUNA project
A.2 Frame annotation of the LUNA corpus
A.3 Newly introduced frames
A.4 Statistics about the annotated corpus
A.5 DA-frame Relationship
A.6 Summary
B Italian LUs and frames in the gold standards 159
B.1 Europarl
B.2 MultiBerkeley |
it_IT |
dc.identifier.bibliographiccitation |
Tonelli, Sara. "Semi-automatic techniques for extending the FrameNet lexical database to new languages", Università Ca' Foscari Venezia, tesi di dottorato, 22. ciclo, 2010 |
it_IT |
dc.degree.discipline |
Linguistica computazionale |
it_IT |