Date 
Topic 
Instructor 
Scriber 
02/01/2018, Thu 
Lecture 01: Overview [ Lecture01a.pdf ]

Y.Y. 

02/06/2018, Tue 
Lecture 02: Invariance Wavelet Scattering Transform [ Lecture02.pdf ]
[Reference]:
 Stephane Mallat, Understanding Deep Convolutional Networks, Philosophical Transactions A, 2016.
 Stephane Mallat, Group Invariant Scattering, Communications on Pure and Applied Mathematics, Vol. LXV, 1331–1398 (2012)
 Joan Bruna and Stephane Mallat, Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012
 Stephane Mallat's short course on Mathematical Mysteries of Deep Neural Networks: [ Part I video ], [ Part II video ], [ slides ]

LIU, Haixia HKUST & HIT 

02/08/2018, Thu 
Lecture 03: Transfer Learning: a tutorial in python notebook.

Yifei Huang Y.Y. 

02/13/2018, Tue 
Lecture 04: Sparsity in Convolutional Neural Networks [ Lecture04_SunQY.pdf ]
[Reference]:
 Jeremias Sulam, Vardan Papyan, Yaniv Romano, and Michael Elad, MultiLayer Convolutional Sparse Modeling: Pursuit and Dictionary Learning, arXiv:1708.08705.
 Xiaoxia Sun, Nasser M. Nasrabadi, and Trac D. Tran, Supervised Deep Sparse Coding Networks, arXiv:1701.08349, GitHub source codes.
 Vardan Papyan, Jeremias Sulam, and Michael Elad, Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding, arXiv:1707.06066, IEEE Transactions on Signal Processing.
 Vardan Papyan, Jeremias Sulam, and Michael Elad, Working Locally Thinking Globally  Part II: Stability and Algorithms for Convolutional Sparse Coding, arXiv:1607.02009.

SUN, Qingyun Stanford U. 

02/15/2018, Thu 
Lecture will be rescheduled to another date, to be announced later

Y.Y. 

02/20/2018, Tue 
Lecture 05: Overview II: Generalization Ability and Optimization [ Lecture01b.pdf ]

Y.Y. 

02/22/2018, Thu 
Lecture 06: Poggio's Quest: When can Deep Networks avoid the Curse of Dimensionality and other theoretical puzzles? [ Lecture06.pdf ]
[Reference]:
 Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao,
Why and When Can Deepbut Not Shallownetworks Avoid the Curse of Dimensionality: A Review,
 Hrushikesh Mhaskar, Qianli Liao, Tomaso Poggio, Learning Functions: When is Deep Better Than Shallow, 2016.
 Liao and Poggio. Theory of Deep Learning II: Landscape of the Empirical Risk in Deep Learning. [ arXiv:1703.09833 ]
 Chiyuan Zhang, Qianli Liao, Alexander Rakhlin, Brando Miranda, Noah Golowich, Tomaso Poggio. Theory of Deep Learning IIb: Optimization Properties of SGD. [ arXiv:1801.02254 ]
 Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals,
Understanding deep learning requires rethinking generalization.
ICLR 2017.
[Chiyuan Zhang's codes]
 Yuan Yao, Lorenzo Rosasco and Andrea Caponnetto,
On Early Stopping in Gradient Descent Learning, Constructive Approximation, 2007, 26 (2): 289315.

Y.Y. 

02/27/2018, Tue 
Lecture 07: Research Paradigmns in the AI Age [ Lecture07a_SunQY.pdf ] [ Lecture07b_SunQY.pdf ]

SUN, Qingyun Stanford U. 

03/01/2018, Thu 
Lecture 08: Harmonic Analysis of Deep Convolutional Networks A [ Lecture08a.pdf ]

Y.Y. 

03/06/2018, Tue 
Lecture 09: Harmonic Analysis of Deep Convolutional Networks B [ Lecture08b.pdf ]
[Reference]:
 Joan Bruna and Stephane Mallat, Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012.
 Thomas Wiatowski and Helmut Bolcskei, A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction, 2016.
 Edouard Oyallon, Eugene Belilovsky, and Sergey Zagoruyko, Scaling the Scattering Transform: Deep Hybrid Networks, International Conference on Computer Vision (ICCV), 2017. [ GitHub ]

Y.Y. 

03/08/2018, Thu 
Lecture 10: An Introduction to Optimization Methods in Deep Learning. [ slides ]

Y.Y. Jason WU Peng XU Nayeon LEE 

03/13/2018, Tue 
Lecture 11: Transfer Learning and ContentStyle Features [ slides ]

Y.Y. Min FAN et al. 

03/15/2018, Thu 
Lecture 12: Student Seminar on Project 1

Y.Y. Yuan CHEN et al. 

03/20/2018, Tue 
Lecture 13: Introduction to Optimization and Regularization methods in Deep Learning [ slides ]
[Reference]
 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition, arXiv:1512.03385 [ Github ]
 An Overview of ResNet and its Variants, by Vincent Fung, [ link ]


03/22/2018, Thu 
Lecture 14: Introduction to Dynamic Neural Networks: RNN and LSTM [ slides ]


03/27/2018, Tue 
Lecture 15: Topology of Empirical Risk Landscapes for Overparametric Multilinear and 2layer Rectified Networks [ slides ]
[Reference]
 Kenji Kawaguchi, Deep Learning without Poor Local Minima, NIPS 2016. [ arXiv:1605.07110 ]
 Liao and Poggio. Theory of Deep Learning II: Landscape of the Empirical Risk in Deep Learning. [ arXiv:1703.09833 ]
 Chiyuan Zhang, Qianli Liao, Alexander Rakhlin, Brando Miranda, Noah Golowich, Tomaso Poggio. Theory of Deep Learning IIb: Optimization Properties of SGD. [ arXiv:1801.02254 ]
 Freeman, Bruna. Topology and Geometry of HalfRectified Network Optimization, ICLR 2017. [ arXiv:1611.01540 ]
 Luca Venturi, Afonso Bandeira, and Joan Bruna. Neural Networks with Finite Intrinsic Dimension Have no Spurious Valleys. [ arXiv:1802.06384 ]


03/29/2018, Thu 
Lecture 16: Project 2: Midterm. Due: April 12 11:59pm, 2018.

Y.Y. 

04/10/2018, Tue 
Lecture 17: Implicit regularization in Gradient Descent method: Regression. [ pdf ].
[Reference]
 Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data. [ arXiv:1710.10345 ]
 Poggio, T, Liao, Q, Miranda, B, Rosasco, L, Boix, X, Hidary, J, Mhaskar, H. Theory of Deep Learning III: explaining the nonoverfitting puzzle. [ MIT CBMM Memo v3, 1/30/2018 ].
 Yuan Yao, Lorenzo Rosasco and Andrea Caponnetto,
On Early Stopping in Gradient Descent Learning, Constructive Approximation, 2007, 26 (2): 289315.

Y.Y. 

04/12/2018, Thu 
Lecture 18: Rethinking Deep Learning [ slides ]

Prof. Dahua LIN CUHK 

04/17/2018, Tue 
Lecture 19: Implicit regularization in Gradient Descent method: Classification and MaxMargin Classifiers. [ pdf ].
[Reference]
 Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data. [ arXiv:1710.10345 ]
 Peter L. Bartlett, Dylan J. Foster, Matus Telgarsky. Spectrallynormalized margin bounds for neural networks.
[ arXiv:1706.08498 ].
 Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro.
A PACBayesian Approach to SpectrallyNormalized Margin Bounds for Neural Networks. ICLR 2018. [ arXiv:1707.09564 ]
 Tong Zhang and Bin Yu. Boosting with Early Stopping: Convergence and Consistency. Annals of Statistics, 2005, 33(4): 15381579.
[ arXiv:0508276 ].

Y.Y. 

04/19/2018, Tue 
Lecture 20: Generative Models and GANs. [ pdf ].

Y.Y. 

04/24/2018, Tue 
Lecture 21: From Image SuperResolution to Face Hallucination. [ slides (75M) ]
[Seminar]
 Guest Speaker: Prof. Chen Change (Cavan) Loy, Department of Information Engineering, The Chinese University of Hong Kong
 Abstract: Single image superresolution is a classical problem in computer vision. It aims at recovering a highresolution image from a single lowresolution image. This problem is an underdetermined inverse problem, of which solution is not unique. In this seminar, I will share our efforts in solving the problem by deep convolutional networks in a datadriven manner. I will then discuss our work on hallucinating faces of unconstrained poses and with very low resolution. In particular, I will show how face hallucination and dense correspondence field estimation can be optimized in a unified deep network. Finally, I will present a new method for recovering natural and realistic texture in lowresolution images by priordriven deep feature modulation.
 Biography: Chen Change Loy received his PhD (2010) in Computer Science from the Queen Mary University of London (Vision Group). From Dec. 2010 – Mar. 2013, he was a postdoctoral researcher at Queen Mary University of London and Vision Semantics Limited. He is now a Research Assistant Professor in the Chinese University of Hong Kong. He is also a visiting scholar of Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China.
His research interests include computer vision and pattern recognition, with focus on face analysis, deep learning, and visual surveillance. He has published more than 90 papers, including over 50 publications in main journals (SPM, TPAMI, IJCV) and top conferences (ICCV, CVPR, ECCV, NIPS). His journal paper on image superresolution was selected as the `Most Popular Article' by IEEE Transactions on Pattern Analysis and Machine Intelligence from March 2016 to August 2016. It remains as one of the top 10 articles to date. He was selected as an outstanding reviewer of ACCV 2014, BMVC 2017, and CVPR 2017.
He serves as an Associate Editor of IET Computer Vision Journal and a Guest Editor of the International Journal of Computer Vision and Computer Vision and Image Understanding. He will serve as an Area Chair of ECCV 2018 and BMVC 2018. He is a senior member of IEEE.

Prof. Chen Change (Cavan) Loy CUHK 

04/26/2018, Thu 
Lecture 22: Mathematical Analysis of Deep Convolutional Neural Networks.
[Seminar]
 Guest Speaker: Prof. DingXuan Zhou, Department of Mathematics, The City University of Hong Kong
 Abstract: Deep learning has been widely applied and brought breakthroughs in speech recognition, computer vision, and many other domains.
The involved deep neural network architectures and computational issues have been well studied in machine learning.
But there lacks a theoretical foundation for understanding the approximation or generalization ability of deep learning
methods such as deep convolutional neural networks. This talk describes a mathematical theory of deep convolutional neural
networks (CNNs). In particular, we discuss the universality of a deep CNN, meaning that it can be used to approximate any
continuous function to an arbitrary accuracy when the depth of the neural network is large enough. Our quantitative estimate,
given tightly in terms of the number of free parameters to be computed, verifies the efficiency of deep CNNs in dealing with
large dimensional data. Some related distributed learning algorithms will also be discussed.
[Reference]
 DingXuan ZHOU. Deep Distributed Convolutional Neural Networks: Universality. [ preprint ]

Prof. DingXuan ZHOU CityUHK 

05/03/2018, Thu 
Lecture 23: An Introduction to Reinforcement Learning [ slides ]
[Reference]
 Feifei Li, et al. cs231n.github.io
 Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, Recurrent Models of Visual Attention, NIPS 2014. [ arXiv:1406.6247 ] [ Kevin Zakka's Pytorch Implementation ]
 De Farias and Van Roy, The linear programming approach to approximate dynamic programming, Operations research 51 (6), 850865, 2003. [ pdf ]
 Mengdi Wang (2017), Randomized Linear Programming Solves the Discounted Markov Decision Problem In NearlyLinear (Sometimes Sublinear) Running Time. [ link ]
 Mengdi Wang (2017), PrimalDual $\pi$ Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems. 2017. [ arXiv:1710.06100 ]
 Yuandong Tian et al.: ELF OpenGo , an Extensive, Lightweight, and Flexible platform for game research, which has been used to build the Go playing bot, ELF OpenGo, and achieved a 140 record versus four global top30 players in April 2018.

Y.Y. 

05/08/2018, Tue 
Lecture 24: Final Project [ project3.pdf ]

Gijs Bruining Y.Y. 
