Jochen Lang

General and Specific Objectives of the Course

The recent surge in machine learning and in particular deep learning using neural network has revolutionized many fields including speech processing, data mining and medicine. Arguably one of the greatest impacts of this revolution is in computer vision. Since the success of AlexNet at the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) where the deep neural network solution outperformed, by a significant margin, arguably much more advanced classical computer vision systems, deep neural networks can now be found everywhere in visual processing. This revolution has created enormous economic interest (Facebook, Google, …). In electrical and computer engineers are expected to apply these neural networks techniques from machine learning in various computer vision application including in IoT and robotics. This topics course is to address computer vision in a principled manner. While it necessarily includes machine learning background, it specifically looks at neural networks and their applications to standard problems in computer vision. It will also contrast the deep-learning based approaches to classical computer vision approaches and how classical approaches inform the design of these deep-learning based solutions.

Calendar Description

Introduction to learning-based computer vision; statistical learning background; image processing and filtering primer; convolutional neural networks (CNNs), network layers, computer vision data sets and competitions; computer vision problems, in particular, image classification, detection and recognition, semantic segmentation, image generation, multi-view problems and tracking.

Course Prerequisites: None

Teaching Methods and Student Expectations

The course material will be covered in synchronous and asynchronous on-line lectures including program demonstrations. Additional resources in form of textbooks and on-line references are listed below. The course will be using group work and interactive student feedback using Virtual Campus (Brightspace) and Microsoft Teams. Students are encouraged to apply their knowledge through three programming assignments in Jupyter notebooks using Scikit-Learn, Keras and Tensorflow. Participation in the course requires approriate access to resources. The active participation of students is encouraged through discussions, the group video presentation and the individual student project presentations.

Recommended Textbooks and Additional Resources

General statistical learning

Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer 2nd ed., 2009.
(Available on-line: http://web.stanford.edu/~hastie/ElemStatLearn/

Computer vision

Richard Szeliski, Computer Vision: Algorithms and Applications, Springer, 2nd ed., 2022.
(Pre-print available on-line: http://szeliski.org/Book)
Reinhard Klette, Concise Computer Vision: An Introduction into Theory and Algorithms, Springer, 2014.
(Electronic version available for download from library http://biblio.uottawa.ca/en)

Deep Learning

Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning, MIT Press 2016.
(Available on-line:http://www.deeplearningbook.org)
Aurélion Gérion, Hands-on machine learning with Scikit-Learn, Keras & TensorFlow, 3rd ed., O’Reilly Media Inc., 2022.
(Jupyter notebooks can be found at https://github.com/ageron/handson-ml3 ).
Fei-Fei Li et al., Stanford CS231n course notes
(Available from http://cs231n.github.io)
Michel Nielsen, Neural Networks and Deep Learning
(Free online book from http://neuralnetworksanddeeplearning.com/)

Course Topics and Readings

Course notes will be made available through Virtual Campus.

Introduction and course overview, hands-on machine learning
ImageNet competition, commercial applications, brief historical overview, machine learning landscape, data handling, visualizing data, organizing the data: training, testing and validation
Statistical learning background
Linear regression review, linear least squares, regularization, logistic regression
Neural networks basics, non-linear optimization
Multi-layer perceptron, feed forward networks, activation functions, loss function, and training by back propagation. Gradient descent and stochastic gradient descent
Image processing and filtering
Correlation, convolution and linear filters
Convolutional neural networks (CNNs) “Classic layers”
Convolutional, pooling and fully-connected layers, visualizing CNNs
Training neural networks
Initialization, transfer learning, data augmentation, regularization, dropout, mini-batch normalization, data sets and competitions
Image classification, object detection and (semantic) segmentation
ImageNet competitions, metrics, regions with CNNs (R-CNN), fully-convolutional networks (FCNNs), U-Net, one-stage detectors, You Only Look Once (YOLO) detector, SSD: Single-Shot MultiBox Detector, instance segmentation
Object detection
Benchmarks and metrics, hourglass networks, cascade design, attention layers and multi-task networks. Applications as face detector, people detector and pedestrian detector.
Transformer architectures for computer vision
Vanilla transformer: attention, positional encoding and normalization. Image transformer, DETR, ViT, Swin transformer.
Multiview problems
Problem description for stereo and optical flow, geometric constraints and brief overview of classical methods, metrics.
Learning-based stereo and optical flow
Network designs for pixelwise classification and regression. Supervised and unsupervised training. Loss functions, occlusion handling.
Tracking and video object segmentation
Tracking by detection, discriminative and generative models, part-based trackers, discriminative correlation filters, siamese networks, short-term and long-term tracking, multi-object tracking, on-line tracking and real-time tracking, video object segmentation.
Image generation and translation (if time permits)
Image generation, image-to-image translation and style transfer. Variational autoencoders (VAE) and generative adversarial networks (GAN).

3 Programming assignments (using Tensorflow, Jupyter notebooks) Linear and logistic regression Image recognition Transfer learning for small data-sets	20 marks
Lab session at DEBI	10 marks
Project including oral presentation Marked in oral progress meetings (see Virtual Campus for more detail). Project must be done in groups 5. A video presentation is required.	40 marks
Final Exam Closed book.	30 marks

Learning-based Computer Vision

Professor

Jochen Lang

Contact

Teaching Assistants

Xiao Hu

Contact

Zishen Chen

Contact

Niloofar Hooshyaripour

Contact

General and Specific Objectives of the Course

Calendar Description

Course Prerequisites: None

Teaching Methods and Student Expectations

Recommended Textbooks and Additional Resources

Course Topics and Readings

Student Evaluation

Marking Scheme

Reminder: Academic Regulations

Academic Fraud and Plagiarism