Abstract:
This thesis talks about a semi-supervised learning applied on tracking multiple people in a video surveillance scenarios as graph transduction based on the notion of game theoretic approach.
Graph transduction is a semi supervised learning technique that tries to do classification over a graph of labeled and unlabeled data points (i.e. the labeled nodes with zero entropy, and the unlabeled ones with maximum entropy); here the data points are the detected persons in each frame.
As we know, Videos are composed of frames and in each frame there are peoples. And using people detectors (this topic is another issue and it is out of the scope of this thesis), we can detect people. Then each picture of detected patches will be treated as a graph nodes
And there will be a similarity comparison between the nodes. In the beginning targets to be tracked will be labeled, and then the provided labels propagate to the unlabeled ones consistently which means the target will be tracked in each frame of the video.
The frame work is based on game theoretic notion. The transduction or information propagation is formulated in terms of a non-cooperative multi player game, where equilibrium is in a sense of consistent labeling of the data or assigning targets to each patches of the frames, which the video is composed of. And multiple targets can be tracked simultaneously.
It can be seen as a learning approach that considers the tracking problem as a semi supervised learning problem, where given few target samples, we look forward for searching target occurrences in the video stream. The people’s appearances are modeled by using covariance matrices on color and gradient information which lie on Riemannian manifolds. Experiments tested on some video datasets show promising good results.