Abstract:
Formerly born as a simple system for the exchange of public documents, over the time the Web has become one of the main services of the Internet, and it is still evolving into an increasingly sophisticated platform. As the complexity of this structure grows, more and more attention is required to ensure that web applications meet their security and privacy requirements. The advent of HTML5 brought many changes to the client-side environment, one of which is the introduction of Web Storage, a feature that allows web applications to store data in the user's browser.
In this thesis we perform, to our knowledge, the first empirical analysis of the use of web storage in the wild. We leverage dynamic taint tracking at the level of JavaScript to collect explicit flows of information involving web storage in the Tranco Top 5k sites. Afterwards, we perform an automated classification of the detected information flows to shed light on the key characteristics of web storage. Our analysis shows that web storage is routinely accessed by third parties, including known web trackers, who are particularly eager to have both read and write access to persistent web storage information. This motivates the need for further research on the security and privacy implications of web storage content.