Abstract:
An Electronic Health Record is a digitalized collection of a patient’s clinical history. While usually, researchers have focused on analyzing structured data contained in a patient’s Electronic Health Record, like laboratory test results or physical measures, recent studies are leveraging the unstruc- tured textual data contained in them. In this thesis, sentiment analysis techniques are applied to nursing notes in order to analyze and classify whether a patient will die within the first 80 days post-first hospital admission based on the emotional tone and positive/negative words detected in his/her admission note. To this aim, topic modeling, a dictionary-based classification approach, a Random Forest, a Multinomial Inverse Regression, and a logistic regression model are fitted. The main benefit of using various techniques is to capture different nuances of the notes written by health practitioners. The dataset analyzed is the MIMIC-III ICU (“Medical Information Mart for Intensive Care”) database, which describes patients admitted to the Beth Israel Deaconess Medical Center between 2001 and 2012.