String analysis for software verification

dc.contributor.advisor	Cortesi, Agostino	it_IT
dc.contributor.author	Olliaro, Martina <1991>	it_IT
dc.date.accessioned	2021-02-10	it_IT
dc.date.accessioned	2021-06-22T06:37:43Z
dc.date.available	2021-06-22T06:37:43Z
dc.date.issued	2021-03-24	it_IT
dc.identifier.uri	http://hdl.handle.net/10579/18470
dc.description.abstract	This thesis aims to investigate string manipulation with security implications in different programming languages and to improve the state-of-the-art by applying the abstract interpretation theory to string analysis. Erroneous string manipulation is a challenging problem in software verification and, in fact, it is one of the major cause of program vulnerabilities that can be exploited by malicious users, leading to severe consequences for the affected systems. By string analysis we mean statically computing the set of string values that are possibly assigned to a variable. Like for other analysis issues, this is undecidable. Thus a certain degree of approximation is necessary in order to find evidence of bugs and vulnerabilities in string manipulating code. We take advantage of the Abstract Interpretation theory, i.e., a powerful mathematical theory that enables us to define and prove the soundness of approximations. The five main contributions of this thesis are: We introduce a new sophisticated abstract domain for C strings. The way the domain (called M-String) is conceived allows it to be tailored for specific verification tasks (e.g., detection of buffer overflows). We describe the concrete and the abstract semantics of basic string operations and prove their soundness formally. Furthermore, we provide an executable implementation of abstract operations. Using a tool that automatically lifts existing programs into the M-String domain along with an explicit-state model checker, we evaluate the accuracy of the proposed domain experimentally on real-case test programs. We combine abstract domains resulting from the reduced product between string shape abstraction and string content abstraction, in order to improve the ability to detect inconsistent states leading to program errors without a major impact with respect to efficiency. In particular, the combinations involve some string abstract domains introduced in the literature with the segmentation domain that we instantiate for string analysis. Completeness, in Abstract Interpretation, ensures that the analysis does not lose information with respect to the property of interest. We provide a systematic and constructive approach for generating the completion of string domains for dynamic languages, and we apply it to the refinement of existing string abstractions. Indeed, for dynamic languages, lack of string analysis completeness is a key security issue, as poorly managed string manipulation code may easily lead to significant security flaws. We also provide an effective procedure to measure the precision improvement obtained when lifting the analysis to complete domains. Almost all the existing string abstract domains tracks information of single variables in a program (e.g., if a string contains a certain character), without inspecting their relationship with other values, causing the loss of relevant knowledge about their possible values. Thus, we introduce a generic framework that allows to formalize relational string abstract domains based on ordering relationship, and we instantiate such a framework to several domains built upon different well-known string orders (e.g., substring relationships). We implemented the domain based on substring ordering, and we provide an experimental evaluation about its effectiveness on some case studies. We manipulate string values in the context of relational database watermarking. We propose a semantic-driven watermarking approach of relational textual databases, which marks multi-word textual attributes, exploiting the synonym substitution technique for text watermarking together with notions in semantic similarity analysis, and dealing with the semantic perturbations provoked by the watermark embedding. We show the effectiveness of our approach through an experimental evaluation. We also prove the resilience of our approach with respect to the random synonym substitution attack.	it_IT
dc.language.iso	en	it_IT
dc.publisher	Università Ca' Foscari Venezia	it_IT
dc.rights	© Martina Olliaro, 2021	it_IT
dc.title	String analysis for software verification	it_IT
dc.title.alternative		it_IT
dc.type	Doctoral Thesis	it_IT
dc.degree.name	Informatica	it_IT
dc.degree.level	Dottorato di ricerca	it_IT
dc.degree.grantor	Dipartimento di Scienze Ambientali, Informatica e Statistica	it_IT
dc.description.academicyear	Dottorato_appello_150321_33 con proroga	it_IT
dc.description.cycle	33	it_IT
dc.degree.coordinator	Cortesi, Agostino	it_IT
dc.location.shelfmark	D002118	it_IT
dc.location	Venezia, Archivio Università Ca' Foscari, Tesi Dottorato	it_IT
dc.rights.accessrights	openAccess	it_IT
dc.thesis.matricno	834397	it_IT
dc.format.pagenumber	[18], 217 p.	it_IT
dc.subject.miur	INF/01 INFORMATICA	it_IT
dc.description.note	Cotutela con Masarykova Univerzita	it_IT
dc.degree.discipline		it_IT
dc.contributor.co-advisor	Matyas, Vashek	it_IT
dc.provenance.upload	Martina Olliaro (834397@stud.unive.it), 2021-02-10	it_IT
dc.provenance.plagiarycheck	Agostino Cortesi (cortesi@unive.it), 2021-03-15	it_IT