Abstract:
In recent years, a huge quantity of data produced by multiple sources has appeared. Dealing with this data has arisen the so called "big data problem", which can be faced only with new computing paradigms and platforms. Many vendors compete in this field, but at this day the de-facto standard platform for big-data is the opensource framework Apache Hadoop . Inspired by Google's private cluster platform, some indipendent developers created Hadoop and, following the structure published by Google's engineering team, a complete set of components for big data elaboration has been developed. One of this components is the Hadoop Distributed File System, one of the core components. In this thesis work, we will analyze its performance and identify some action points that can be tuned to improve its behavior in a real implementation.