Abstract
We present SSPNet, a novel method for learning the spatiotemporal saliency of a target for visual tracking. State-of-the-art trackers typically track targets by predicting the target state, ie coordinates of a bounding box encompassing the target, from the target candidates sampled around a previous target state. However, they have two limitations: 1) vulnerability to tracking distractors present in a frame and 2) the strong bias of the target state estimation to the initial frame. The proposed method addresses this problem by predicting the spatiotemporal features of the target so-called current and future target saliencies. Given a frame, the current target saliency represents the spatial aspect of the target, whereas the future target saliency depicts how the target will appear in the next frame based on the temporal features. This technique improves tracking accuracy in two ways: we can exploit the similarity between the current and the future target saliencies to detect the distractors. Further, SSPNet provides better prior knowledge about the current target state compared to using only the previous frame, which mitigates the bias to the initial frame and occlusion problem. We show that SSPNet outperforms the state-of-the-art trackers, particularly in challenging sequences.
| Original language | English |
|---|---|
| Pages (from-to) | 399-416 |
| Number of pages | 18 |
| Journal | Information Sciences |
| Volume | 575 |
| DOIs | |
| State | Published - 2021.10 |
Keywords
- 3D convolutional neural networks
- Recurrent neural networks
- Spatiotemporal feature
- Visual tracking
Quacquarelli Symonds(QS) Subject Topics
- Computer Science & Information Systems
- Data Science
Fingerprint
Dive into the research topics of 'SSPNet: Learning spatiotemporal saliency prediction networks for visual tracking'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver