J. Berclaz, F. Fleuret, E. Turetken, and P. Fua, Multiple object tracking using k-shortest paths optimization, The IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 2011.

K. Bernardin and R. Stiefelhagen, Evaluating multiple object tracking performance: The clear mot metrics, EURASIP Journal on Image and Video Processing, 2008.

Z. Chen, Svm based people counting method in the corridor scene using single-layer laser scanner, The IEEE Int. Conf. on Intelligent Transportation Systems (ITSC), 2016.

S. Chopra, R. Hadsell, and Y. Lecun, Learning a similarity metric discriminatively, with application to face verification, The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2005.

M. Everingham, S. M. Eslami, L. Van-gool, C. K. Williams, J. Winn et al., The pascal visual object classes challenge: A retrospective, Int. Journal of Computer Vision

A. Geiger, P. Lenz, and R. Urtasun, Are we ready for autonomous driving? the kitti vision benchmark suite, The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2012.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014.

R. Hadsell, S. Chopra, and Y. Lecun, Dimensionality reduction by learning an invariant mapping, The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2006.

J. He and A. Arora, A regression-based radar-mote system for people counting, Int. Conf. on Pervasive Computing and Communications (PerCom), 2014.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang et al., Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv, 2017.

J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara et al., Speed/accuracy trade-offs for modern convolutional object detectors, The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR, 2017.

H. W. Kuhn, The hungarian method for the assignment problem, Naval Research Logistics Quarterly, 1955.

V. Letshwiti and T. Lamprecht, Appropriate technology for automatic passenger counting on public transport vehicles in south africa, Southern African Transport Conf. (SATC), 2004.

W. Li, R. Zhao, T. Xiao, and X. Wang, Deepreid: deep filter pairing neural network for person re-identification, The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014.

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed et al., SSD: Single shot multibox detector, European Conf. on Computer Vision (ECCV), 2016.

A. Milan, L. Leal-taixe, I. Reid, S. Roth, and K. Schindler, Mot16: A benchmark for multi-object tracking. arXiv, 2016.

D. Moujahid, O. E. Harrouss, and H. Tairi, Visual object tracking via the local soft cosine similarity, Pattern Recognition Letters, 2018.

M. Rauter, Reliable human detection and tracking in topview depth images, The IEEE Conf. on Computer Vision and Pattern Recognition Workshops, 2013.

J. Redmon and A. Farhadi, Yolo9000: Better, faster, stronger, The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR, 2017.

J. Redmon and A. Farhadi, Yolov3: An incremental improvement. arXiv, 2018.

S. Ren, K. He, R. Girshick, and J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems (NIPS), 2015.

F. Schroff, D. Kalenichenko, and J. Philbin, Facenet: A unified embedding for face recognition and clustering, The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2015.

Y. Sun, X. Wang, and X. Tang, Deeply learned face representations are sparse, selective, and robust, The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2015.

Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, Deepface: Closing the gap to human-level performance in face verification, The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014.

C. Vondrick, D. Patterson, and D. Ramanan, Efficiently scaling up crowdsourced video annotation, Int. Journal of Computer Vision, 2012.

D. Wang, H. Lu, and C. Bo, Visual tracking via weighted local cosine similarity, The IEEE Trans. on Cybernetics, 2015.

N. Wojke and A. Bewley, Deep cosine metric learning for person re-identification, The IEEE Winter Conf. on Applications of Computer Vision (WACV), 2018.

N. Wojke, A. Bewley, and D. Paulus, Simple online and realtime tracking with a deep association metric, The IEEE Int. Conf. on Image Processing, 2017.

M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, European Conf. on Computer Vision (ECCV), 2014.

J. Zhang, L. L. Presti, and S. Sclaroff, Online multi-person tracking by tracker hierarchy, The IEEE Conf. on Advanced Video and Signal-based Surveillance (AVSS), 2012.