Math6380p: Deep Learning

PKU

MATH 6380p. Advanced Topics in Deep Learning
Fall 2018

Course Information

Synopsis

This course is a continuition from Math 6380o, Spring 2018, inspired by Stanford Stats 385, Theories of Deep Learning, taught by Prof. Dave Donoho, Dr. Hatef Monajemi, and Dr. Vardan Papyan, as well as the IAS-HKUST workshop on Mathematics of Deep Learning during Jan 8-12, 2018. The aim of this course is to provide graduate students who are interested in deep learning a variety of mathematical and theoretical studies on neural networks that are currently available, in addition to some preliminary tutorials, to foster deeper understanding in future research.
Prerequisite: There is no prerequisite, though mathematical maturity on approximation theory, harmonic analysis, optimization, and statistics will be helpful. it-yourself (DIY) and critical thinking (CT) are the most important things in this course. Enrolled students should have some programming experience with modern neural networks, such as PyTorch, Tensorflow, MXNet, Theano, and Keras, etc. Otherwise, it is recommended to take some courses on Statistical Learning (Math 4432 or 5470), and Deep learning such as Stanford CS231n with assignments, or a similar course COMP4901J by Prof. CK TANG at HKUST.

Reference

Theories of Deep Learning, Stanford STATS385 by Dave Donoho, Hatef Monajemi, and Vardan Papyan

On the Mathematical Theory of Deep Learning, by Gitta Kutyniok

Tutorials: preparation for beginners

Python-Numpy Tutorials by Justin Johnson

scikit-learn Tutorials: An Introduction of Machine Learning in Python

Jupyter Notebook Tutorials

PyTorch Tutorials

Deep Learning: Do-it-yourself with PyTorch, A course at ENS

Tensorflow Tutorials

MXNet Tutorials

Theano Tutorials

Manning: Deep Learning with Python, by Francois Chollet [GitHub source in Python 3.6 and Keras 2.0.8]

MIT: Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Instructors:

Yuan Yao

Time and Place:

MonWed 4:30-5:50pm, Academic Bldg Rm 2463, Lift 25-26, HKUST
Venue changed: Rm 4582 (Lift 27-28) from Sep 10, 2018.

Homework and Projects:

No exams, but extensive discussions and projects will be expected.

Teaching Assistant:

Email: Mr. Yifei Huang deeplearning.math (add "AT gmail DOT com" afterwards)

Schedule

Date	Topic	Instructor	Scriber
09/03/2018, Mon	Lecture 01: Overview I [ slides ] [Reference]: Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao, Why and When Can Deep-but Not Shallow-networks Avoid the Curse of Dimensionality: A Review, Hrushikesh Mhaskar, Qianli Liao, Tomaso Poggio, Learning Functions: When is Deep Better Than Shallow, 2016. Michael Kohler, Adam Krzyzak, Nonparametric Regression Based on Hierarchical Interaction Models. IEEE Transactions on Information Theory , 63(3):1620 - 1630, 2016.	Y.Y.
09/05/2018, Wed	Lecture 02: Overview II [ slides ] [Reference]: Stephane Mallat, Understanding Deep Convolutional Networks, Philosophical Transactions A, 2016. Stephane Mallat, Group Invariant Scattering, Communications on Pure and Applied Mathematics, Vol. LXV, 1331–1398 (2012) Stephane Mallat's short course on Mathematical Mysteries of Deep Neural Networks: [ Part I video ], [ Part II video ], [ slides ]	Y.Y.
09/10/2018, Mon	Lecture 03: Overview III [ slides ] [Reference]: Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals, Understanding deep learning requires rethinking generalization. ICLR 2017. [Chiyuan Zhang's codes] Peter L. Bartlett, Dylan J. Foster, Matus Telgarsky. Spectrally-normalized margin bounds for neural networks. [ arXiv:1706.08498 ]. NIPS 2017. Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data. [ arXiv:1710.10345 ]. ICLR 2018. Matus Telgarsky. Margins, Shrinkage, and Boosting. [ arXiv:1303.4172 ]. ICML 2013.	Y.Y.
09/12/2018, Wed	Lecture 04: Overview IV [ slides ] and Project 1 [ project1.pdf ] [Reference]: Freeman, Bruna. Topology and Geometry of Half-Rectified Network Optimization, ICLR 2017. [ arXiv:1611.01540 ] Luca Venturi, Afonso Bandeira, and Joan Bruna. Neural Networks with Finite Intrinsic Dimension Have no Spurious Valleys. [ arXiv:1802.06384 ] Stephane Mallat, Group Invariant Scattering, Communications on Pure and Applied Mathematics, Vol. LXV, 1331–1398 (2012) Joan Bruna and Stephane Mallat, Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012 Haixia Liu, Raymond Chan, and Yuan Yao, Geometric Tight Frame based Stylometry for Art Authentication of van Gogh Paintings, Applied and Computational Harmonic Analysis, 41(2): 590-602, 2016. Roberto Leonarduzzi, Haixia Liu, and Yang Wang, Scattering transform and sparse linear classifiers for art authentication. Signal Processing 150: 11-19, 2018. [Matlab codes]: Scattering Net codes A tutorial on ScatNet Matlab package	GU, Hanlin Y.Y.
09/19/2018, Wed	Lecture 05: Harmonic Analysis of Convolutional Networks: Wavelet Scattering Net [ slides ] [Reference]: Stephane Mallat, Group Invariant Scattering, Communications on Pure and Applied Mathematics, Vol. LXV, 1331–1398 (2012) Joan Bruna and Stephane Mallat, Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012 [Public codes]: Scattering Net Matlab codes pyscatwave: Scattering Transform in Python Deep Hybrid Transform in Python	Y.Y.
09/24/2018, Mon	Lecture 06: Harmonic Analysis of Convolutional Networks: Extension of Scattering Nets [ slides ] [Reference]: Thomas Wiatowski and Helmut Bolcskei, A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction, 2016. Qiang Qiu, Xiuyuan Cheng, Robert Calderbank, Guillermo Sapiro, DCFNet: Deep Neural Network with Decomposed Convolutional Filters, ICML 2018. arXiv:1802.04145.	Y.Y.
09/26/2018, Wed	Lecture 07: Convolutional Neural Network with Structured Filters [ slides ] [Abstract]: In this lecture I'll introduce a recent work by Prof. Xiuyuan CHENG et al. in Duke University. Filters in a Convolutional Neural Network (CNN) contain model parameters learned from enormous amounts of data. The properties of convolutional filters in a trained network directly affect the quality of the data representation being produced. In this talk, we introduce a framework for decomposing convolutional filters over a truncated expansion under pre-fixed bases, where the expansion coefficients are learned from data. Such a structure not only reduces the number of trainable parameters and computation load but also explicitly imposes filter regularity by bases truncation. Apart from maintaining prediction accuracy across image classification datasets, the decomposed-filter CNN also produces a stable representation with respect to input variations, which is proved under generic assumptions on the bases expansion. Joint work with Qiang Qiu, Robert Calderbank, and Guillermo Sapiro. [Reference]: Qiang Qiu, Xiuyuan Cheng, Robert Calderbank, Guillermo Sapiro, DCFNet: Deep Neural Network with Decomposed Convolutional Filters, ICML 2018. arXiv:1802.04145. Xiuyuan Cheng, Qiang Qiu, Robert Calderbank, Guillermo Sapiro. RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks, 2018. arXiv:1805.06846. [Project 1]: Assignment Reports at GitHub Doodle Votes: please vote your favorite 5 or less reports, NOT including your own.	Y.Y.
10/03/2018, Wed	Lecture 8: Student Seminars on Project 1 [Team]: DENG Yizhe, HUANG Yifei, SUN Jiaze, TAN Haiyi Title: Real or fake? A Comparison Between Scattering Network & Resnet-18 [ slides ]. [Team]: YIN, Kejing (Jake) and QIAN, Dong Title: Feature Extraction and Transfer Learning [ slides ].
10/08/2018, Mon	Lecture 9: Student Seminars on Project 1 [Team]: Bhutta, Zheng, Lan (Group 6) Title: Raphael painting analysis: Random cropping leading to high variance [ slides ].
10/10/2018, Wed	Lecture 10: Sparsity in Convolutional Neural Networks [ slides ] [Reference]: Jeremias Sulam, Vardan Papyan, Yaniv Romano, and Michael Elad. Multi-Layer Convolutional Sparse Modeling: Pursuit and Dictionary Learning, IEEE Transactions on Signal Processing, vol. 66, no. 15, pp. 4090-4104, 2018. arXiv:1708.08705. Vardan Papyan, Yaniv Romano, and Michael Elad. Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding, IEEE Transactions on Signal Processing, vol. 65, no. 21, pp. 5687-5701, 2018. arXiv:1707.06066. Vardan Papyan, Yaniv Romano, and Michael Elad. Convolutional Neural Networks Analyzed via Convolutional Sparse Coding, Journal of Machine Learning Research, 18:1-52, 2017. arXiv:1607.08194.	Y.Y.
10/15/2018, Mon	Lecture 11: Seminar: Exponentially Weighted Imitation Learning for Batched Historical Data. [ slides ] [Speaker]: WANG, Qing, Tecent AI Lab. [Abstract]: We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or “environment oracle” as in most reinforcement learning settings. To solve this problem, we propose a monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space. The method does not rely on the knowledge of the behavior policy, thus can be used to learn from data generated by an unknown policy. Under mild conditions, our algorithm, though surprisingly simple, has a policy improvement bound and outperforms most competing methods empirically. Thorough numerical results are also provided to demonstrate the efficacy of the proposed methodology. This is a joint work with Jiechao Xiong, Lei Han, Peng Sun, Han Liu, and Tong Zhang. [Team]: Huangshi Tian, Beijing Fang, Yunfei Yang (Group 3) Title: An In-Depth Look at Feature Transformation Ability of CNN [ slides ].
10/22/2018, Mon	Lecture 12: Implicit Regularization in Gradient Descent [ slides ] [Reference]: Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals, Understanding deep learning requires rethinking generalization. ICLR 2017. [ arXiv:1611.03530 ] [Chiyuan Zhang's codes] Peter L. Bartlett, Dylan J. Foster, Matus Telgarsky. Spectrally-normalized margin bounds for neural networks. [ arXiv:1706.08498 ]. Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data. [ arXiv:1710.10345 ] Poggio, T, Liao, Q, Miranda, B, Rosasco, L, Boix, X, Hidary, J, Mhaskar, H. Theory of Deep Learning III: explaining the non-overfitting puzzle. [ MIT CBMM Memo-73, 1/30/2018 ]. Liao, Q., Miranda, B., Hidary, J., and Poggio, T. Classical generalization bounds are surprisingly tight for Deep Networks. MIT CBMM Memo-91. [arXiv:1807.09659] Zhu, Weizhi, Yifei Huang, and Yuan YAO. On Breiman's Dilemma in Neural Networks: Phase Transitions of Margin Dynamics. [arXiv:1810.03389] Yuan Yao, Lorenzo Rosasco and Andrea Caponnetto, On Early Stopping in Gradient Descent Learning, Constructive Approximation, 2007, 26 (2): 289-315. Tong Zhang and Bin Yu. Boosting with Early Stopping: Convergence and Consistency. Annals of Statistics, 2005, 33(4): 1538-1579. [ arXiv:0508276 ].	Y.Y.
10/24/2018, Wed	Lecture 13: Seminar [Speaker]: Baoyuan WU, Tencent AI Lab [Abstract]: In this talk, I will introduce three topics if time permitted. Topic 1: Tencent ML-Images: large-scale visual representation learning. [ slides (.pptx) ] The success of deep learning strongly depends on large-scale high-quality training data. Tencent ML-Images is an important open-source project, and it publishes a large-scale multi-label image database (including 18M images and 11K categories), the checkpoints with excellent capability of visual representation (80.73% top-1 accuracy on the validation set of ImageNet), as well as the complete codes. In this talk, I will introduce the construction of ML-Images and its main characteristics, the training of deep neural networks using large-scale image database, the transfer learning to single-label image classification on ImageNet, the feature extraction and image classification using the trained checkpoint. This project tries to give you a clear picture of the complete process of visual presentation learning based on deep neural networks. Project address: https://github.com/Tencent/tencent-ml-images Topic 2: Lp-Box ADMM: a versatile framework for integer programming. [ slides (.pptx) ] In this talk, we revisit the integer programming (IP) problem, which plays a fundamental role in many computer vision and machine learning applications. We propose a novel and versatile framework called Lp-box ADMM, which is based on two main ideas. (1) The discrete constraint is equivalently replaced by the intersection of a box and the Lp-ball. (2) We infuse this equivalence into the ADMM (Alternating Direction Method of Multipliers) framework to handle these continuous constraints separately and to harness its attractive properties. The proposed algorithm is theoretically guaranteed to converge to the epsilon-stationary point. We demonstrate the applicability of Lp-box ADMM on four important applications: MRF energy minimization, graph matching, clustering and model compression of convolutional neural networks. Results show that it outperforms generic IP solvers both in runtime and objective. It also achieves very competitive performance when compared to state-of-the-art methods that are specifically designed for these applications. [ preprint ] Topic 3: Multimedia AI: A brief introduction of researches and applications of Tencent AI Lab. Tencent AI Lab was established in Shenzhen in 2016 as a company-level strategic initiative and focuses on advancing fundamental and applied AI research. The research fields include computer vision, speech recognition, natural language processing and machine learning. The technologies of AI Lab have been applied in more than 100 Tencent products, including WeChat, QQ and news app Tian Tian Kuai Bao. In this talk, I will give a brief introduction of the researches about multimedia AI, including AI + image, video, audio and text, ranging from modeling, analysis, understanding to generation, etc. https://ai.tencent.com/ailab/ [Bio]: Baoyuan Wu is currently a Senior Research Scientist in Tencent AI Lab. He was Postdoc in IVUL lab at KAUST, working with Prof. Bernard Ghanem, from August 2014 to November 2016. He received the PhD degree from the National Laboratory of Pattern Recognition, Chinese Academy of Sciences (CASIA) in 2014, supervised by Prof. Baogang Hu. His research interests are machine learning and computer vision, including probabilistic graphical models, structured output learning, multi-label learning and integer programming. His work has been published in TPAMI, IJCV, CVPR, ICCV, ECCV and AAAI, etc.	Y.Y.
10/29/2018, Mon	Lecture 14: Variational Inference and Deep Learning. [ slides ]	Prof. Can YANG
10/31/2018, Wed	Lecture 15: Phase Transitions of Margin Dynamics [ slides ] and Project 2 [ Assignment ] [Reference]: ZHU, Weizhi, Yifei HUANG, and Yuan YAO. On Breiman's Dilemma in Neural Networks: Phase Transitions of Margin Dynamics. [ arXiv:1810.03389 ] [Project 2]: Assignment Kaggle inclass contest Reports at GitHub Doodle Votes: please vote your favorite 5 or less reports, NOT including your own.	ZHU, Weizhi
11/05/2018, Mon	Lecture 16: Generative Models and Variational Autoencoders. [ slides ]	Y.Y.
11/07/2018, Wed	Lecture 17: Generative Adversarial Networks. [ pdf ]. [Reference] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. Generative Adversarial Networks. [ arXiv:1406.2661 ] Martin Arjovsky, Soumith Chintala, Léon Bottou. Wasserstein GAN. [ arXiv:1701.07875 ] Rie Johnson, Tong Zhang, Composite Functional Gradient Learning of Generative Adversarial Models. [ arXiv:1801.06309 ]	Y.Y.
11/12/2018, Mon	Lecture 18: A Walk Through Non-Convex Optimization Methods: [ A: Online PCA ] [ B: SPIDER ] [ Speaker ] Dr. Junchi Li, Tecent AI Lab and Princeton University [ Abstract ] In this talk, I will discuss briefly the theoretical advances of non-convex optimization methods stemmed from machine learning practice. I will begin with (perhaps the simplest) PCA model and show that scalable algorithms can achieve a rate that matches minimax information lower bound. Then, I will discuss scalable algorithms that escape from saddle points, the importance of noise therein, and how to achieve a $\cO(\varepsilon^{-3})$ convergence rate for finding an $(\varepsilon,\cO(\varepsilon^{0.5}))$-approximate second-order stationary point. If time permits, I will further introduce a very recent ``Lifted Neural Networks'' method that is non-gradient-based and serves as a powerful alternative for training feed-forward deep neural networks. [ Bio ] Dr. Junchi Li obtained his B.S. in Mathematics and Applied Mathematics at Peking University in 2009, and his Ph.D. in Mathematics at Duke University in 2014. He has since held several research positions, including the role of visiting postdoctoral research associate at Department of Operations Research and Financial Engineering, Princeton University. His research interests include statistical machine learning and optimization, scalable online algorithms for big data analytics, and stochastic dynamics on graphs and social networks. He has published original research articles in both top optimization journals and top machine learning conferences, including an oral presentation paper (1.23%) at NIPS 2017 and a spotlight paper (4.08%) at NIPS 2018. [ Reference ] Junchi Li, Mengdi Wang, Han Liu, and Tong Zhang. Near-Optimal Stochastic Approximation for Online Principal Component Estimation. Mathematical Programming 2018. [ arXiv:1603.05305 ] Dan Garber, Elad Hazan, Chi Jin, Sham M. Kakade, Cameron Musco, and Praneeth Netrapalli. Faster Eigenvector Computation via Shift-and-Invert Preconditioning. ICML 2016 Rong Ge, Furong Huang, Chi Jin, and Yuan Yang. Escaping from Saddle Points. COLT 2015 Jason Lee, Max Simchowitz, Michael Jordan, and Ben Recht. Gradient Descent Only Converges to Minimizers. COLT 2016 Zeyuan Allen-Zhu, and Yuanzhi Li. NEON2. NIPS 2018 Cong Fang, Junchi Li, Zhouchen Lin, and Tong Zhang. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator. NIPS 2018. [ arXiv:1807.01695 ] Jia Li, Cong Fang, and Zhouchen Lin. Lifted Proximal Operator Machines. AAAI 2018 Armin Askari, Geoffrey Negiar, Rajiv Sambharya, Laurent El Ghaoui. Lifted Neural Networks. arXiv:1805.01532 Fangda Gu, Armin Askari, Laurent El Ghaoui. Fenchel Lifted Networks: A Lagrange Relaxation of Neural Network Training. arXiv:1811.08039	Y.Y.
11/14/2018, Wed	Lecture 19: Robust Estimation and Generative Adversarial Networks. [ part A ] [ part B ]. [Reference] GAO, Chao, Jiyu LIU, Yuan YAO, and Weizhi ZHU. Robust Estimation and Generative Adversarial Nets. [ arXiv:1810.02030 ]	Y.Y.
11/19/2018, Mon	Lecture 20: Seminars [ Group 8 ]: Andrea Madotto, Genta Indra Winata, Zhaojiang Lin, and Jamin Shin. Nexperia Challenge. [ slides in pdf ]. [ report ] [ Group 2 ]: Zhicong LIANG, Zhichao HUANG, and Ruixue WEN. Experiments on DCFNet . [ slides in pptx ]. [ report ]	Y.Y.
11/21/2018, Wed	Lecture 21: Machine (Deep) Learning Problems in Cryo-EM. [ slides ]. [Reference] Yin Xian, Hanlin Gu, Wei Wang, Xuhui Huang, Yuan Yao, Yang Wang, Jian-Feng Cai. Data-Driven Tight Frame for Cryo-EM Image Denoising and Conformational Classification. The 6th IEEE Global Conference on Signal and Information Processing, Anaheim, California, Nov 26-29, 2018. [ arXiv:1810.08829 ] . Min Su, Hantian Zhang, Kevin Schawinski, Ce Zhang, Michael A. Cianfrocco. Generative adversarial networks as a tool to recover structural information from cryo-electron microscopy data. [ pdf ]	Hanlin GU
11/26/2018, Mon	Lecture 22: An Introduction to Adversarials in Deep Learning. [ slides ]	Zhichao HUANG
11/28/2018, Wed	Lecture 23: Final Project. [ project3.pdf ]. [Reference] Introduction to Reinforcement Learning. [ slides ] Recurrent Attention Models. [ slides ] [Project 3]: Assignment Reports at GitHub	Y.Y.

by YAO, Yuan.