AVVision-WACV'21

Keynote Speakers

Andreas Geiger, University of Tübingen

Andreas Geiger is a professor at the University of Tübingen and group leader at the Max Planck Institute for Intelligent Systems. Prior to this, he was a visiting professor at ETH Zürich and a research scientist at MPI-IS. He studied at KIT, EPFL, and MIT and received his Ph.D. degree in 2013 from the KIT. His research interests are at the intersection of 3D reconstruction, motion estimation, scene understanding and sensory-motor control. He maintains the KITTI vision benchmark.

Keynote Talk: Towards Robust End-to-End Driving
Abstract: I will present several recent results of my group on learning robust driving policies that have advanced the state-of-the-art in the CARLA self-driving simulation environment. To generalize across diverse conditions, humans leverage multiple types of situation-specific reasoning and learning strategies. Motivated by this observation, I will first present a framework for learning situational driving policies that effectively captures reasoning under varying types of scenarios and leads to 98% success rate on the CARLA self-driving benchmark as well as state-of-the-art performance on a novel generalization benchmark. Next, I will discuss the problem of covariate shift in imitation learning. I will demonstrate that existing data aggregation techniques for addressing this problem have poor generalization performance, and present a novel approach with empirically better generalization performance. Finally, I will talk about the importance of intermediate representations and attention for learning robust self-driving models.

Ioannis Pitas, Aristotle University of Thessaloniki

Ioannis Pitas (IEEE fellow, IEEE Distinguished Lecturer, EURASIP fellow) received the Diploma and Ph.D. degree in Electrical Engineering, both from the Aristotle University of Thessaloniki (AUTH), Greece. Since 1994, he has been a Professor at the Department of Informatics of AUTH and Director of the Artificial Intelligence and Information Analysis (AIIA) lab. He served as a Visiting Professor at several Universities. His current interests are in the areas of computer vision, machine learning, autonomous systems, image/video processing, human-centered computing. He has published over 1000 papers, contributed in 47 books in his areas of interest and edited or (co-)authored another 11 books. He has also been a member of the program committee of many scientific conferences and workshops. In the past he served as Associate Editor or co-Editor of 9 international journals and General or Technical Chair of 4 international conferences. He participated in 71 R&D projects, primarily funded by the European Union and is/was principal investigator in 42 such projects. Prof. Pitas lead the big European H2020 R&D project MULTIDRONE. He is AUTH principal investigator in H2020 R&D projects Aerial Core and AI4Media. He is chair of the Autonomous Systems Initiative. He is head of the EC funded AI doctoral school of Horizon2020 EU funded R&D project AI4Media (1 of the 4 in Europe). He has 32000+ citations to his work and h-index 85+ (Google Scholar).

Keynote Talk: Semantic 3D World Modeling
Abstract: Geometry estimation and semantic segmentation are two active machine learning research topics. Given single view or stereo images, depicted scene/object geometry in the form of depth maps can be accurately estimated. Similar neural network (NN) architectures can be successfully utilized for predicting semantic masks on an image. In several scenarios, both tasks are required at once, leading to the need of semantic world mapping techniques. In the wake of modern autonomous systems, simultaneous inference of both tasks by the same neural network is essential, as it offers considerable resource savings and can enhance performance, as these tasks can mutually benefit from each other. A great application area is 3D road scene modeling and semantic segmentation, e.g., to ‘road’, pavement regions that are essential for autonomous car driving.

Nemanja Djuric, Uber ATG

Nemanja Djuric is a Staff Autonomy Engineer and Tech Lead Manager at Uber ATG, for the past five years working on motion prediction, object detection, and other technologies supporting self-driving vehicles. Prior to ATG he worked as a research scientist at Yahoo Labs, which he joined after obtaining his Ph.D. in Computer Science at Temple University, under mentorship of prof. Slobodan Vucetic. Previously, he received B.Sc. and M.Sc. in Electrical Engineering in 2007 and 2009, respectively, from the University of Novi Sad, Serbia.

Keynote Talk: Object Detection and Motion Prediction for Safe Self-Driving using Raster-based Methods
Abstract: Object detection and motion prediction are critical components of self-driving technology, tasked with understanding the current state of the world and estimating how it will evolve in the near future. In the talk we focus on these important problems, and discuss raster-based methods developed at Uber ATG that have shown state-of-the-art performance. Such approaches encode the raw sensor data as top-down and/or front-view images of a surrounding area, providing near-complete contextual information necessary for accurate detection of traffic actors and their behavioral prediction. We present a number of recently proposed methods, ranging from models focusing solely on motion prediction to joint models that perform detection and prediction in an end-to-end fashion. We also discuss how to develop methods that obey map and other physical constraints of the traffic surroundings, resulting in more realistic predictions and improved modeling of uncertain environments in which the self-driving vehicles operate.

Walterio Mayol-Cuevas, University of Bristol & Amazon

Walterio Mayol-Cuevas received the B.Sc. degree from the National University of Mexico and the Ph.D. degree from the University of Oxford. He is a member of the Department of Computer Science, University of Bristol. His research with students and collaborators proposed some of the earliest versions of visual simultaneous localization and mapping (SLAM) and its applications to robotics and augmented reality. He is the General Co-Chair of BMVC 2013 and the General Chair of the IEEE ISMAR 2016.

Keynote Talk: Pixel Processor Arrays to Bridging Perception and Action in Agile Robots
Abstract: This talk will discuss recent advances in the development of visual architectures and their algorithms towards a step change in the direction of agile robotics. Current visual architectures often used in Robotic systems were not designed for vision nor action. They were designed for video recording or graphical processing. This hinders systems requiring low lag, low energy consumption and importantly force visual algorithms to process the world in ways that prevent effective coding for actions. In this talk I will describe work towards new architectures, specifically pixel processor arrays such as the SCAMP, that allow massive parallelism and focal-plane processing with reduced energy consumption. Examples include how these new architectures allow to perform visual computation onboard agile vehicles for tasks that involve visual odometry, recognition and deep network inference.