Abstract: I will present several recent results of my group on learning robust driving policies that have advanced the state-of-the-art in the CARLA self-driving simulation environment. To generalize across diverse conditions, humans leverage multiple types of situation-specific reasoning and learning strategies. Motivated by this observation, I will first present a framework for learning situational driving policies that effectively captures reasoning under varying types of scenarios and leads to 98% success rate on the CARLA self-driving benchmark as well as state-of-the-art performance on a novel generalization benchmark. Next, I will discuss the problem of covariate shift in imitation learning. I will demonstrate that existing data aggregation techniques for addressing this problem have poor generalization performance, and present a novel approach with empirically better generalization performance. Finally, I will talk about the importance of intermediate representations and attention for learning robust self-driving models.
Abstract: Geometry estimation and semantic segmentation are two active machine learning research topics. Given single view or stereo images, depicted scene/object geometry in the form of depth maps can be accurately estimated. Similar neural network (NN) architectures can be successfully utilized for predicting semantic masks on an image. In several scenarios, both tasks are required at once, leading to the need of semantic world mapping techniques. In the wake of modern autonomous systems, simultaneous inference of both tasks by the same neural network is essential, as it offers considerable resource savings and can enhance performance, as these tasks can mutually benefit from each other. A great application area is 3D road scene modeling and semantic segmentation, e.g., to ‘road’, pavement regions that are essential for autonomous car driving.
Andreas Papachristodoulou (KIOS CoE, University of Cyprus); Christos Kyrkou (KIOS CoE, University of Cyprus); Theocharis Theocharides (University of Cyprus)
Idoia Ruiz (Computer Vision Center); Lorenzo Porzi (Facebook); Samuel Rota Bulò (Facebook); Peter Kontschieder (Facebook); Joan Serrat (Computer Vision Center)
Divya Kothandaraman (University of Maryland College Park); Athira Nambiar (Indian Institute of Technology Madras); Anurag Mittal (Indian Institute of Technology Madras)
Florence Carton (CEA); David Filliat (ENSTA Paris); Jaonary Rabarisoa (CEA); Quoc-Cuong Pham (CEA)
Abstract: Object detection and motion prediction are critical components of self-driving technology, tasked with understanding the current state of the world and estimating how it will evolve in the near future. In the talk we focus on these important problems, and discuss raster-based methods developed at Uber ATG that have shown state-of-the-art performance. Such approaches encode the raw sensor data as top-down and/or front-view images of a surrounding area, providing near-complete contextual information necessary for accurate detection of traffic actors and their behavioral prediction. We present a number of recently proposed methods, ranging from models focusing solely on motion prediction to joint models that perform detection and prediction in an end-to-end fashion. We also discuss how to develop methods that obey map and other physical constraints of the traffic surroundings, resulting in more realistic predictions and improved modeling of uncertain environments in which the self-driving vehicles operate.
Abstract: This talk will discuss recent advances in the development of visual architectures and their algorithms towards a step change in the direction of agile robotics. Current visual architectures often used in Robotic systems were not designed for vision nor action. They were designed for video recording or graphical processing. This hinders systems requiring low lag, low energy consumption and importantly force visual algorithms to process the world in ways that prevent effective coding for actions. In this talk I will describe work towards new architectures, specifically pixel processor arrays such as the SCAMP, that allow massive parallelism and focal-plane processing with reduced energy consumption. Examples include how these new architectures allow to perform visual computation onboard agile vehicles for tasks that involve visual odometry, recognition and deep network inference.
Chidanand Kumar K S (GWM); Samir Al-Stouhi (Haval)
Quazi Marufur Rahman (Queensland University of Technology); Niko Suenderhauf (Queensland University of Technology); Feras Dayoub (Queensland University of Technology)
Bingyu Shen (University of Notre Dame); Boyang Li (University of Notre Dame); Walter Scheirer (University of Notre Dame)
Weihuang Xu (University of Florida); Nasim Souly (Volkswagen Group Innovation Center California); Pratik Prabhanjan Brahma (Volkswagen Group Innovation Center California)
Shiyu Chi (Tianjin University of Technology); Mian Zhou (Tianjin University of Technology)