Abstract:
Audio data compression and decompression is usually implemented via software codecs which are handmade crafted, often exploiting spectral properties of the signal. In this thesis we propose to tackle such problem as a data-driven approach, considering the time-frequency domain of an audio signal as an intensity map to be reconstructed. The main idea is to mask some input values and then apply sparse convolutional operation in order to perform depth completion and reconstruct the missing signal. In particular our method is divided in two main parts: first, we explore the feasibility of audio signal compression with sparse convolutions varying the level of missing information; we also explored how different level of sparsity affect the quality of the final reconstruction in order to choose the most suitable one according to the context. Secondly we aim at creating an ad-hoc binary mask so that the loss of information during the decompression step is minimized. We set the problem of mask generation as an optimization problem using two different approaches: by solving a minimization problem and via genetic algorithms.