PKU

MATH 6380p. Advanced Topics in Deep Learning
Fall 2018


Course Information

Synopsis

This course is a continuition from Math 6380o, Spring 2018, inspired by Stanford Stats 385, Theories of Deep Learning, taught by Prof. Dave Donoho, Dr. Hatef Monajemi, and Dr. Vardan Papyan, as well as the IAS-HKUST workshop on Mathematics of Deep Learning during Jan 8-12, 2018. The aim of this course is to provide graduate students who are interested in deep learning a variety of mathematical and theoretical studies on neural networks that are currently available, in addition to some preliminary tutorials, to foster deeper understanding in future research.
Prerequisite: There is no prerequisite, though mathematical maturity on approximation theory, harmonic analysis, optimization, and statistics will be helpful. it-yourself (DIY) and critical thinking (CT) are the most important things in this course. Enrolled students should have some programming experience with modern neural networks, such as PyTorch, Tensorflow, MXNet, Theano, and Keras, etc. Otherwise, it is recommended to take some courses on Statistical Learning (Math 4432 or 5470), and Deep learning such as Stanford CS231n with assignments, or a similar course COMP4901J by Prof. CK TANG at HKUST.

Reference

Theories of Deep Learning, Stanford STATS385 by Dave Donoho, Hatef Monajemi, and Vardan Papyan

On the Mathematical Theory of Deep Learning, by Gitta Kutyniok

Tutorials: preparation for beginners

Python-Numpy Tutorials by Justin Johnson

scikit-learn Tutorials: An Introduction of Machine Learning in Python

Jupyter Notebook Tutorials

PyTorch Tutorials

Deep Learning: Do-it-yourself with PyTorch, A course at ENS

Tensorflow Tutorials

MXNet Tutorials

Theano Tutorials

Manning: Deep Learning with Python, by Francois Chollet [GitHub source in Python 3.6 and Keras 2.0.8]

MIT: Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Instructors:

Yuan Yao

Time and Place:

MonWed 4:30-5:50pm, Academic Bldg Rm 2463, Lift 25-26, HKUST
Venue changed: Rm 4582 (Lift 27-28) from Sep 10, 2018.

Homework and Projects:

No exams, but extensive discussions and projects will be expected.

Teaching Assistant:


Email: Mr. Yifei Huang deeplearning.math (add "AT gmail DOT com" afterwards)

Schedule

Date Topic Instructor Scriber
09/03/2018, Mon Lecture 01: Overview I [ slides ]
Y.Y.
09/05/2018, Wed Lecture 02: Overview II [ slides ]
Y.Y.
09/10/2018, Mon Lecture 03: Overview III [ slides ]
Y.Y.
09/12/2018, Wed Lecture 04: Overview IV [ slides ] and Project 1 [ project1.pdf ]
GU, Hanlin
Y.Y.
09/19/2018, Wed Lecture 05: Harmonic Analysis of Convolutional Networks: Wavelet Scattering Net [ slides ]
Y.Y.
09/24/2018, Mon Lecture 06: Harmonic Analysis of Convolutional Networks: Extension of Scattering Nets [ slides ]
Y.Y.
09/26/2018, Wed Lecture 07: Convolutional Neural Network with Structured Filters [ slides ]
    [Abstract]:
  • In this lecture I'll introduce a recent work by Prof. Xiuyuan CHENG et al. in Duke University.
  • Filters in a Convolutional Neural Network (CNN) contain model parameters learned from enormous amounts of data. The properties of convolutional filters in a trained network directly affect the quality of the data representation being produced. In this talk, we introduce a framework for decomposing convolutional filters over a truncated expansion under pre-fixed bases, where the expansion coefficients are learned from data. Such a structure not only reduces the number of trainable parameters and computation load but also explicitly imposes filter regularity by bases truncation. Apart from maintaining prediction accuracy across image classification datasets, the decomposed-filter CNN also produces a stable representation with respect to input variations, which is proved under generic assumptions on the bases expansion. Joint work with Qiang Qiu, Robert Calderbank, and Guillermo Sapiro.
Y.Y.
10/03/2018, Wed Lecture 8: Student Seminars on Project 1
    [Team]: DENG Yizhe, HUANG Yifei, SUN Jiaze, TAN Haiyi
  • Title: Real or fake? A Comparison Between Scattering Network & Resnet-18 [ slides ].
    [Team]: YIN, Kejing (Jake) and QIAN, Dong
  • Title: Feature Extraction and Transfer Learning [ slides ].
10/08/2018, Mon Lecture 9: Student Seminars on Project 1
    [Team]: Bhutta, Zheng, Lan (Group 6)
  • Title: Raphael painting analysis: Random cropping leading to high variance [ slides ].
10/10/2018, Wed Lecture 10: Sparsity in Convolutional Neural Networks [ slides ]
    [Reference]:
  • Jeremias Sulam, Vardan Papyan, Yaniv Romano, and Michael Elad. Multi-Layer Convolutional Sparse Modeling: Pursuit and Dictionary Learning, IEEE Transactions on Signal Processing, vol. 66, no. 15, pp. 4090-4104, 2018. arXiv:1708.08705.
  • Vardan Papyan, Yaniv Romano, and Michael Elad. Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding, IEEE Transactions on Signal Processing, vol. 65, no. 21, pp. 5687-5701, 2018. arXiv:1707.06066.
  • Vardan Papyan, Yaniv Romano, and Michael Elad. Convolutional Neural Networks Analyzed via Convolutional Sparse Coding, Journal of Machine Learning Research, 18:1-52, 2017. arXiv:1607.08194.
Y.Y.
10/15/2018, Mon Lecture 11: Seminar: Exponentially Weighted Imitation Learning for Batched Historical Data. [ slides ]
    [Speaker]: WANG, Qing, Tecent AI Lab.
    [Abstract]:
  • We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or “environment oracle” as in most reinforcement learning settings. To solve this problem, we propose a monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space. The method does not rely on the knowledge of the behavior policy, thus can be used to learn from data generated by an unknown policy. Under mild conditions, our algorithm, though surprisingly simple, has a policy improvement bound and outperforms most competing methods empirically. Thorough numerical results are also provided to demonstrate the efficacy of the proposed methodology. This is a joint work with Jiechao Xiong, Lei Han, Peng Sun, Han Liu, and Tong Zhang.
    [Team]: Huangshi Tian, Beijing Fang, Yunfei Yang (Group 3)
  • Title: An In-Depth Look at Feature Transformation Ability of CNN [ slides ].
10/22/2018, Mon Lecture 12: Implicit Regularization in Gradient Descent [ slides ]
    [Reference]:
  • Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals, Understanding deep learning requires rethinking generalization. ICLR 2017. [ arXiv:1611.03530 ] [Chiyuan Zhang's codes]
  • Peter L. Bartlett, Dylan J. Foster, Matus Telgarsky. Spectrally-normalized margin bounds for neural networks. [ arXiv:1706.08498 ].
  • Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data. [ arXiv:1710.10345 ]
  • Poggio, T, Liao, Q, Miranda, B, Rosasco, L, Boix, X, Hidary, J, Mhaskar, H. Theory of Deep Learning III: explaining the non-overfitting puzzle. [ MIT CBMM Memo-73, 1/30/2018 ].
  • Liao, Q., Miranda, B., Hidary, J., and Poggio, T. Classical generalization bounds are surprisingly tight for Deep Networks. MIT CBMM Memo-91. [arXiv:1807.09659]
  • Zhu, Weizhi, Yifei Huang, and Yuan YAO. On Breiman's Dilemma in Neural Networks: Phase Transitions of Margin Dynamics. [arXiv:1810.03389]
  • Yuan Yao, Lorenzo Rosasco and Andrea Caponnetto, On Early Stopping in Gradient Descent Learning, Constructive Approximation, 2007, 26 (2): 289-315.
  • Tong Zhang and Bin Yu. Boosting with Early Stopping: Convergence and Consistency. Annals of Statistics, 2005, 33(4): 1538-1579. [ arXiv:0508276 ].
Y.Y.
10/24/2018, Wed Lecture 13: Seminar
    [Speaker]: Baoyuan WU, Tencent AI Lab
    [Abstract]: In this talk, I will introduce three topics if time permitted.
  • Topic 1: Tencent ML-Images: large-scale visual representation learning. [ slides (.pptx) ] The success of deep learning strongly depends on large-scale high-quality training data. Tencent ML-Images is an important open-source project, and it publishes a large-scale multi-label image database (including 18M images and 11K categories), the checkpoints with excellent capability of visual representation (80.73% top-1 accuracy on the validation set of ImageNet), as well as the complete codes. In this talk, I will introduce the construction of ML-Images and its main characteristics, the training of deep neural networks using large-scale image database, the transfer learning to single-label image classification on ImageNet, the feature extraction and image classification using the trained checkpoint. This project tries to give you a clear picture of the complete process of visual presentation learning based on deep neural networks. Project address: https://github.com/Tencent/tencent-ml-images
  • Topic 2: Lp-Box ADMM: a versatile framework for integer programming. [ slides (.pptx) ] In this talk, we revisit the integer programming (IP) problem, which plays a fundamental role in many computer vision and machine learning applications. We propose a novel and versatile framework called Lp-box ADMM, which is based on two main ideas. (1) The discrete constraint is equivalently replaced by the intersection of a box and the Lp-ball. (2) We infuse this equivalence into the ADMM (Alternating Direction Method of Multipliers) framework to handle these continuous constraints separately and to harness its attractive properties. The proposed algorithm is theoretically guaranteed to converge to the epsilon-stationary point. We demonstrate the applicability of Lp-box ADMM on four important applications: MRF energy minimization, graph matching, clustering and model compression of convolutional neural networks. Results show that it outperforms generic IP solvers both in runtime and objective. It also achieves very competitive performance when compared to state-of-the-art methods that are specifically designed for these applications. [ preprint ]
  • Topic 3: Multimedia AI: A brief introduction of researches and applications of Tencent AI Lab. Tencent AI Lab was established in Shenzhen in 2016 as a company-level strategic initiative and focuses on advancing fundamental and applied AI research. The research fields include computer vision, speech recognition, natural language processing and machine learning. The technologies of AI Lab have been applied in more than 100 Tencent products, including WeChat, QQ and news app Tian Tian Kuai Bao. In this talk, I will give a brief introduction of the researches about multimedia AI, including AI + image, video, audio and text, ranging from modeling, analysis, understanding to generation, etc. https://ai.tencent.com/ailab/
    [Bio]: Baoyuan Wu is currently a Senior Research Scientist in Tencent AI Lab. He was Postdoc in IVUL lab at KAUST, working with Prof. Bernard Ghanem, from August 2014 to November 2016. He received the PhD degree from the National Laboratory of Pattern Recognition, Chinese Academy of Sciences (CASIA) in 2014, supervised by Prof. Baogang Hu. His research interests are machine learning and computer vision, including probabilistic graphical models, structured output learning, multi-label learning and integer programming. His work has been published in TPAMI, IJCV, CVPR, ICCV, ECCV and AAAI, etc.
Y.Y.
10/29/2018, Mon Lecture 14: Variational Inference and Deep Learning. [ slides ]
Prof. Can YANG
10/31/2018, Wed Lecture 15: Phase Transitions of Margin Dynamics [ slides ] and Project 2 [ Assignment ]
    [Reference]:
  • ZHU, Weizhi, Yifei HUANG, and Yuan YAO. On Breiman's Dilemma in Neural Networks: Phase Transitions of Margin Dynamics. [ arXiv:1810.03389 ]
ZHU, Weizhi
11/05/2018, Mon Lecture 16: Generative Models and Variational Autoencoders. [ slides ]
Y.Y.
11/07/2018, Wed Lecture 17: Generative Adversarial Networks. [ pdf ].
    [Reference]
  • Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. Generative Adversarial Networks. [ arXiv:1406.2661 ]
  • Martin Arjovsky, Soumith Chintala, Léon Bottou. Wasserstein GAN. [ arXiv:1701.07875 ]
  • Rie Johnson, Tong Zhang, Composite Functional Gradient Learning of Generative Adversarial Models. [ arXiv:1801.06309 ]
Y.Y.
11/12/2018, Mon Lecture 18: A Walk Through Non-Convex Optimization Methods: [ A: Online PCA ] [ B: SPIDER ]
    [ Speaker ] Dr. Junchi Li, Tecent AI Lab and Princeton University
    [ Abstract ] In this talk, I will discuss briefly the theoretical advances of non-convex optimization methods stemmed from machine learning practice. I will begin with (perhaps the simplest) PCA model and show that scalable algorithms can achieve a rate that matches minimax information lower bound. Then, I will discuss scalable algorithms that escape from saddle points, the importance of noise therein, and how to achieve a $\cO(\varepsilon^{-3})$ convergence rate for finding an $(\varepsilon,\cO(\varepsilon^{0.5}))$-approximate second-order stationary point. If time permits, I will further introduce a very recent ``Lifted Neural Networks'' method that is non-gradient-based and serves as a powerful alternative for training feed-forward deep neural networks.
    [ Bio ] Dr. Junchi Li obtained his B.S. in Mathematics and Applied Mathematics at Peking University in 2009, and his Ph.D. in Mathematics at Duke University in 2014. He has since held several research positions, including the role of visiting postdoctoral research associate at Department of Operations Research and Financial Engineering, Princeton University. His research interests include statistical machine learning and optimization, scalable online algorithms for big data analytics, and stochastic dynamics on graphs and social networks. He has published original research articles in both top optimization journals and top machine learning conferences, including an oral presentation paper (1.23%) at NIPS 2017 and a spotlight paper (4.08%) at NIPS 2018.
    [ Reference ]
  • Junchi Li, Mengdi Wang, Han Liu, and Tong Zhang. Near-Optimal Stochastic Approximation for Online Principal Component Estimation. Mathematical Programming 2018. [ arXiv:1603.05305 ]
  • Dan Garber, Elad Hazan, Chi Jin, Sham M. Kakade, Cameron Musco, and Praneeth Netrapalli. Faster Eigenvector Computation via Shift-and-Invert Preconditioning. ICML 2016
  • Rong Ge, Furong Huang, Chi Jin, and Yuan Yang. Escaping from Saddle Points. COLT 2015
  • Jason Lee, Max Simchowitz, Michael Jordan, and Ben Recht. Gradient Descent Only Converges to Minimizers. COLT 2016
  • Zeyuan Allen-Zhu, and Yuanzhi Li. NEON2. NIPS 2018
  • Cong Fang, Junchi Li, Zhouchen Lin, and Tong Zhang. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator. NIPS 2018. [ arXiv:1807.01695 ]
  • Jia Li, Cong Fang, and Zhouchen Lin. Lifted Proximal Operator Machines. AAAI 2018
  • Armin Askari, Geoffrey Negiar, Rajiv Sambharya, Laurent El Ghaoui. Lifted Neural Networks. arXiv:1805.01532
  • Fangda Gu, Armin Askari, Laurent El Ghaoui. Fenchel Lifted Networks: A Lagrange Relaxation of Neural Network Training. arXiv:1811.08039
Y.Y.
11/14/2018, Wed Lecture 19: Robust Estimation and Generative Adversarial Networks. [ part A ] [ part B ].
    [Reference]
  • GAO, Chao, Jiyu LIU, Yuan YAO, and Weizhi ZHU. Robust Estimation and Generative Adversarial Nets. [ arXiv:1810.02030 ]
Y.Y.
11/19/2018, Mon Lecture 20: Seminars
Y.Y.
11/21/2018, Wed Lecture 21: Machine (Deep) Learning Problems in Cryo-EM. [ slides ].
    [Reference]
  • Yin Xian, Hanlin Gu, Wei Wang, Xuhui Huang, Yuan Yao, Yang Wang, Jian-Feng Cai. Data-Driven Tight Frame for Cryo-EM Image Denoising and Conformational Classification. The 6th IEEE Global Conference on Signal and Information Processing, Anaheim, California, Nov 26-29, 2018. [ arXiv:1810.08829 ] .
  • Min Su, Hantian Zhang, Kevin Schawinski, Ce Zhang, Michael A. Cianfrocco. Generative adversarial networks as a tool to recover structural information from cryo-electron microscopy data. [ pdf ]
Hanlin GU
11/26/2018, Mon Lecture 22: An Introduction to Adversarials in Deep Learning. [ slides ]
Zhichao HUANG
11/28/2018, Wed Lecture 23: Final Project. [ project3.pdf ].
    [Reference]
  • Introduction to Reinforcement Learning. [ slides ]
  • Recurrent Attention Models. [ slides ]
Y.Y.

by YAO, Yuan.