Canberra, Australia | December 1, 2015

Deep Learning and its Applications
in Vision and Robotics

Tutorial and Workshop at the
Australasian Joint Conference on Artificial Intelligence (AI 2015) and the
Australasian Conference on Robotics and Automation (ACRA 2015)

Scope  Programme  Photos


Applications of convolutional neural networks and deep learning methods have been proliferating at an astronomical rate in the recent times into various disciplines of engineering. The significant advancements that deep learning methods have brought out for large scale image classification tasks have generated a surge of excitement in applying the techniques to other problems in computer vision and more broadly into other disciplines of computer science, such as robotics. However, building deep learning algorithms for highly non-linear real-world problems such as those encountered in computer vision and robotics is non-trivial and requires substantial expertise.
The goal of this workshop is to bring together researchers from Australia and Asia working in the field of deep learning to discuss recent advances, ongoing developments, and build collaborations by exchanging new ideas for future applications. Submissions will form the basis for spotlight talks and poster discussions.
SIGN UP for the Deep Learning Interest Group!
Thank you everyone for a great workshop! A big thank you to all the presenters and speakers, the more then 60 attendees and all our students and volunteers! THaNKS! If you want to stay in touch about events related to deep learning (especially in robotics and vision) please sign up for our SIGN UP for the Deep Learning Interest Group!



December 1, 2015

Workshop Programme

Deep Learning and its Applications in Vision and Robotics | Tutorial and Workshop at AI / ACRA 2015

QT Hotel AusAI conference) 1 London Circuit, Canberra, Australia | December 1, 2015
This is the schedule for our full day workshop starting at 9:15am.

09:15-09:30 Welcome [Slides] by Juxi, Sareh and Anoop, Australian Centre for Robotic Vision (ACRV)
09:30-10:30 "Learning Deep Hierarchies of Data Representations" [Tutorial I] [Slides] by Dr. Lizhen Qu, NICTA
Abstract: It is well known that the performance of machine learning algorithms heavily rely on data representations. Deep learning algorithms attempt to learn multiple levels of representation with increasing abstraction. Such automatically learned data representations have been found useful in many fields including computer vision, robotics, and natural language processing. In this tutorial, I will walk you through the key ideas of deep learning and cover the basics of deep neural networks. I will also briefly introduce some widely used deep learning models such as Deep Belief Networks and auto-encoders, together with their applications in computer vision and robotics.
10:30-11:00 Coffee Break
11:00-11:30 "A Practical Introduction to Deep Learning with Caffe" [Tutorial II] [Slides] by Peter Anderson, ACRV, ANU
Abstract: A tour of Caffe with practical tips on issues such as setting up your data, choosing architectures and hyperparameters, modifying caffe, selecting the best GPU hardware etc.
11:30-12:00 "Compacting ConvNets for End-to-end Learning" [Slides] by Dr. José J. Álvarez, NICTA/Data61, ANU
Abstract: Convolutional neural networks have achieved considerable success in many tasks in computer vision such as image classification, object detection / recognition or semantic segmentation. These networks are computationally demanding and not always feasible for embedded platforms where power and computational resources are relevant. Recent works have shown significant redundancy in the parameters of these networks. This over parametrization seems necessary to overcome the challenges existing in highly non-convex optimization problems. In this talk I review recent techniques to speed up and reduce the parameter redundancy existing in current networks.
12:00-12:30 Best Paper Finalist Spotlights
Cesar Cadena: Depth Estimation with Multi-modal Auto-Encoders
Daniel Weimar: Context-aware Deep Convolutional Neural Networks for Industrial Inspection [Slides]
12:30-13:30 Lunch (catered)
13:30-14:15 "(Deep) Learning for Robot Navigation and Perception" [Slides] by Professor Wolfram Burgard, University of Freiburg
Abstract: Autonomous robots are faced with a series of learning problems to optimize their behavior. In this presentation I will describe recent approaches developed in my group based on deep learning architectures for object recognition and body part segmentation from RGB(-D) images and terrain classification from sound. In addition, I will present an approach using sparse coding to compactly represent three-dimensional environments. For all approaches I will describe expensive experiments quantifying in which way the corresponding algorithm extends the state of the art.
14:15-15:00 "Deep Structured Learning" [Slides] by Professor Chunhua Shen, University of Adelaide
Abstract: Structured output learning concerns the problem of predicting multiple variables that have dependency, with Conditional random field (CRF) as a typical example. It shows great promise in tasks like semantic image segmentation. Recently, there is mounting evidence that features from deep convolutional neural networks (CNN) set new records for various vision applications. Here I show how we can combine CRFs with deep CNNs to predict complex labels while considering the dependencies between the output variables. The first application is to learn depth from single monocular images. Compared with depth estimation using multiple images such as stereo depth perception, depth from monocular images is much more challenging. We propose a deep structured learning scheme which learns the unary and pairwise potentials of continuous CRF in a unified deep CNN framework, termed Deep Convolutional Neural Fields. For the second application, we proffer a new, efficient deep structured model learning scheme, in which we show how deep Convolutional Neural Networks (CNNs) can be used to estimate the messages in message passing inference for structured prediction with CRFs. With such CNN message estimators, we obviate the need to learn or evaluate potential functions for message calculation. This confers significant efficiency for learning, since otherwise when performing structured learning for a CRF with CNN potentials it is necessary to undertake expensive inference for every stochastic gradient iteration. We also demonstrate that it yields results that are competitive with the state-of-the-art in semantic segmentation for the PASCAL VOC 2012 dataset.
15:00-16:00 Coffee and Poster Session (and audience voting for the best paper)
We will have the following poster presenters: Edison Guo, Fahimeh Rezazadegan, Fangyi Zhang, Frederic Maire, James Sergeant, Peter Anderson, Rodrigo Santa Cruz, Sean McMahon, Daniel Weimar
16:15 nVidia Best Paper Award Announcement / Concluding remarks
We want to congratulate Cesar Cadena, who won the best paper award, voted by the attendees and the organisers!
Cesar, enjoy your Titan X!! nVidia Best Paper award winner Cesar Cadena, with the organisers and his new Titan X card!



We invite people interested in the topics of deep learning, computer vision and robotics to attend the workshop on Dec 1. A registration is required (for us to know how many people will attend and to plan the catering). To register please follow the link to the AI conference webpage. If you register for the full AI conference, the workshop and tutorial are included. There is also the option to just register for the workshop, for AUD 200. If you are attending ACRA and want to also attend this workshop and tutorial there is a special rate to be paid during the ACRA online registration (please also inform the organizers at

Call for Papers

We invite contributions spanning the areas of deep learning, computer vision and robotics. A special interest is in the application of deep learning techniques to problems arising in "real world" settings of vision and control. The extended abstract submissions should be in IEEE Conference format and consist of no more than 4 pages.


The following is a non exhaustive list of topics of interest:

  • Novel deep learning architectures, models, and learning algorithms
  • Multimodal deep learning methods such as vision, speech, language, control
  • Unsupervised deep learning
  • Deep learning for decision making and control
  • Deep reinforcement learning
  • Applications of deep models to problems in vision and robotics,
    such as:
    • Activity recognition
    • Large scale image classification in real world/robotic settings
    • Robot control

Paper Submission Deadline

October 10, 2015 23:59 (AST) (extended)
Submit here via EasyChair

Submission Acceptance Notification: October 20, 2015
There will be a Best Paper Award in the form of a Titan X card sponsored by nVidia.


Juxi Leitner

Juxi Leitner

Queensland University of Technology (QUT)


Anoop Cherian

Australia National University (ANU)


Sareh Shirazi

Queensland University of Technology (QUT)

Contact Us