Abstract:
Accurate human trajectory forecasting is crucial for various applications, including autonomous vehicles, social robots, and augmented reality systems. However, predicting pedestrian motion is a challenging task due to the complexities of human behavior, including social interactions, scene context, and the multimodal nature of pedestrian trajectories. This thesis focuses on the problem of human trajectory forecasting in crowded scenes using deep learning techniques. The goal is to predict socially and physically plausible future paths for multiple interacting agents in a scene, considering their past trajectories and the scene context. Furthermore, we investigate the effectiveness of a contrastive learning approach to enhance the model's spatial reasoning capabilities to avoid collisions with environmental constraints. Our approach is evaluated through both qualitative and quantitative analysis on established publicly available bird-eye view datasets (e.g., ETH/UCY), as well as an internal first-person view dataset, which is essential for our ultimate goal of integrating the trajectory forecasting model on a robot. To this end, we also describe how to apply models trained on bird's-eye view data to work in first-person view settings, which is essential for integrating the trajectory forecasting model into robotic systems.