Date |
Topic |
Instructor |
Scriber |
09/05/2019, Thursday |
Lecture 01: Overview I [ slides ]
[Reference]:
- Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix Wichmann, Wieland Brendel,
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness, ICLR 2019
[ video ]
- Aleksander Madry (MIT),
A New Perspective on Adversarial Perturbation, Simons Institute for Theory of Computing, 2019.
[Adversarial Examples Are Not Bugs, They Are Features]
|
Y.Y. |
|
09/12/2019, Thursday |
Lecture 02: Symmetry and Network Architectures: Wavelet Scattering Net, Frame Scattering, DCFnet, and Permutation Invariant/Equivariant Nets [ slides ] and Project 1.
[Reference]:
- Stephane Mallat's short course on Mathematical Mysteries of Deep Neural Networks: [ Part I video ], [ Part II video ],
[ slides ]
- Stephane Mallat, Group Invariant Scattering, Communications on Pure and Applied Mathematics, Vol. LXV, 1331–1398 (2012)
- Joan Bruna and Stephane Mallat, Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012
- Thomas Wiatowski and Helmut Bolcskei, A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction, 2016.
- Qiang Qiu, Xiuyuan Cheng, Robert Calderbank, Guillermo Sapiro, DCFNet: Deep Neural Network with Decomposed Convolutional Filters, ICML 2018. arXiv:1802.04145.
- Taco S. Cohen, Max Welling, Group Equivariant Convolutional Networks, ICML 2016. arXiv:1602.07576.
- Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, Alexander Smola. Deep Sets , NIPS, 2017. arXiv:1703.06114.
- Akiyoshi Sannai, Yuuki Takai, Matthieu Cordonnier. Universal approximations of permutation invariant/equivariant functions by deep neural networks
, NIPS, 2017. arXiv:1903.01939.
- Haggai Maron, Heli Ben-Hamu, Nadav Shamir, Yaron Lipman. Invariant and Equivariant Graph Networks . ICLR 2019. arXiv:1812.09902
|
Y.Y. |
|
09/18/2018, Wednesday |
Seminar: Asymptotic Behavior of Robust Wasserstein Profile Inference (RWPI) Function Analysis --- selecting \delta for DRO (Distributionally Robust Optimization) Problems.
[ slides ]
[Speaker]: XIE, Jin, Stanford University.
[Abstract]:
Recently, [1] showed that several machine learning algorithms, such as Lasso, Support Vector Machines, and
regularized logistic regression, and many others can be represented exactly as distributionally robust
optimization (DRO) problems. The uncertainty is then defined as a neighborhood centered at the empirical
distribution. A key element of the study of uncertainty is the Robust Wasserstein Profile function. In [1],
the authors study the asymptotic behavior of the RWP function in the case of L^p costs under the true
parameter. We consider costs in more generalized forms, namely Bregman distance or in the more general
symmetric format of d(x-y) and analyze the asymptotic behavior of the RWPI function in these cases. For
the purpose of statistical applications, we then study the RWP function with plug-in estimators.
This is a joint work with Yue Hui, Jose Blanchet and Peter Glynn.
- [1] Blanchet, J., Kang, Y., & Murthy, K. Robust Wasserstein Profile Inference and Applications to Machine Learning, arXiv:1610.05627, 2016.
[ tutorial slides ]
|
|
|
09/19/2018, Thursday |
Lecture 03: Robust Statistics and Generative Adversarial Networks [ slides ]
[Reference]
- GAO, Chao, Jiyu LIU, Yuan YAO, and Weizhi ZHU.
Robust Estimation and Generative Adversarial Nets.
[ arXiv:1810.02030 ] [ GitHub ] [ GAO, Chao's Simons Talk ]
- GAO, Chao, Yuan YAO, and Weizhi ZHU.
Generative Adversarial Nets for Robust Scatter Estimation: A Proper Scoring Rule Perspective.
[ arXiv:1903.01944 ] []
|
Y.Y. |
|
09/26/2018, Thursday |
Lecture 04: Convolutional Neural Network on Graphs [ slides ]
[Seminar]: Multi-Scale and Multi-Representation Learning on Graphs and Manifolds [ slides ]
[Speaker]: Prof. ZHAO, Zhizhen, Department of ECE, UIUC
[Abstract]:
- The analysis of geometric (graph- and manifold-structured) data have recently gained prominence in the machine learning community. For the first part of the talk, I will introduce Lanczos network (LanczosNet), which uses the Lanczos algorithm to construct low rank approximations of the graph Laplacian for graph convolution. Relying on the tridiagonal decomposition of the Lanczos algorithm, we efficiently exploit multi-scale information via fast approximated computation of matrix power, and design learnable spectral filters. Being fully differentiable, LanczosNet facilitates both graph kernel learning as well as learning node embeddings. I will show the application of LanczosNet to citation networks and QM8 quantum chemistry dataset.
For the second part of the talk, I will introduce a novel multi-representation learning paradigm for manifolds naturally equipped with a group action. Utilizing a representation theoretic mechanism, multiple associated vector bundles can be constructed over the orbit space, providing multiple views for learning the geometry of the underlying manifold. The consistency across these associated vector bundles form a common base for unsupervised manifold learning, through the redundancy inherent to the algebraic relations across irreducible representations of the transformation group. I will demonstrate the efficacy of the proposed algorithmic paradigm through dramatically improved robust nearest neighbor search in cryo-electron microscopy image analysis.
[Reference]:
- Xavier Bresson, Convolutional Neural Networks on Graphs, IPAM, UCLA, 2017. [video][slides]
|
Y.Y. |
|
10/10/2019, Thursday |
Lecture 05: An Introduction to Optimization and Regularization Methods in Deep Learning [ slides ]
[ Gallery of Project 1 ]:
- Description of Project 1
- Peer Review requirement: Peer Review and Report Assignment
- Rebuttal Guideline: Rebuttal
- Doodle Vote for Top 3 Reports: vote link
- Group 1: XIAO Jiashun, LIU Yiyuan, WANG Ya, and YU Tingyu. [ report ] [ review ]
- Group 2: Abhinav PANDEY. [ report ][ review ]
- Group 3: LEI Chenyang, Yazhou XING, Yue WU, and XIE Jiaxin. [ report ] [ review ]
- Group 4: Oscar Bergqvist, Martin Studer, Cyril de Lavergne. [ report ] [ review ] [ rebuttal ]
- Group 5: Lanqing XUE, Feng HAN, Jianyue WANG, Zhiliang TIAN. [ report ] [ review ] [ rebuttal ]
- Group 6: CHEN Zhixian, QIAN Yueqi, and ZHANG Shunkang. [ report ] [ review ]
- Group 7: Zhenghui CHEN and Lei KANG. [ report ][ review ] [ rebuttal ]
- Group 8: Boyu JIANG. [ report ] [ review ]
- Group 9: LI Donghao, WU Jiamin, ZENG Wenqi and CAO Yang. [ report ][ review ] [ rebuttal ]
- Group 10: Shichao LI, Ziyu WANG and Zhenzhen HUANG. [ report ][ review ] [ rebuttal ]
- Group 11: NG Yui Hong. [ report ][ review ]
- Group 12: Luyu Cen, Jingyang Li, Zhongyuan Lyu and Shifan Zhao. [ report ] [ review ]
- Group 13: Mutian He, Qing Yang, Yuxin Tong, Ruoyang Hou. [ report ] [ review ] [ rebuttal ]
- Group 14: WANG, Qicheng. [ report ] [ review ]
|
Y.Y. |
|
10/17/2019, Thursday |
Lecture 06: The Landscape of Empirical Risk of Neural Networks [ slides ]
[Reference]:
- Freeman, Bruna. Topology and Geometry of Half-Rectified Network Optimization, ICLR 2017.
[ arXiv:1611.01540 ]
[ Stanford talk video ][ slides ]
- Luca Venturi, Afonso Bandeira, and Joan Bruna. Neural Networks with Finite Intrinsic Dimension Have no Spurious Valleys. [ arXiv:1802.06384 ]
- Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Sanjeev Arora and Rong Ge. Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets.
[ arXiv:1906.06247 ][ Simons talk video ][ Bilibili video ][ slides ]
|
Y.Y. |
|
10/24/2019, Thursday |
Lecture 07: Overparameterization and Optimization [ slides ]
[Speaker]: Prof. Jason Lee, Princeton University
[Abstract]: We survey recent developments in the optimization and learning of deep neural networks. The three focus topics are on:
- 1) geometric results for the optimization of neural networks,
- 2) Overparametrized neural networks in the kernel regime (Neural Tangent Kernel) and its implications and limitations,
- 3) potential strategies to prove SGD improves on kernel predictors.
|
Y.Y. |
|
10/31/2019, Thursday |
Lecture 08: Generalization in Deep Learning [ slides ]
[Abstract]: We review tools useful for the analysis of the generalization performance of deep neural networks on classification and regression problems. We review uniform convergence properties, which show how this performance depends on notions of complexity, such as Rademacher averages, covering numbers, and combinatorial dimensions, and how these quantities can be bounded for neural networks. We also review the analysis of the performance of nonparametric estimation methods such as nearest-neighbor rules and kernel smoothing. Deep networks raise some novel challenges, since they have been observed to perform well even with a perfect fit to the training data. We review some recent efforts to understand the performance of interpolating prediction rules, and highlight the questions raised for deep learning.
|
Y.Y. |
|
11/07/2019, Thursday |
Lecture 09: Generalization in Deep Learning (continued) [ slides ]
[Abstract]: We review tools useful for the analysis of the generalization performance of deep neural networks on classification and regression problems. We review uniform convergence properties, which show how this performance depends on notions of complexity, such as Rademacher averages, covering numbers, and combinatorial dimensions, and how these quantities can be bounded for neural networks. We also review the analysis of the performance of nonparametric estimation methods such as nearest-neighbor rules and kernel smoothing. Deep networks raise some novel challenges, since they have been observed to perform well even with a perfect fit to the training data. We review some recent efforts to understand the performance of interpolating prediction rules, and highlight the questions raised for deep learning.
[Reference]
- Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals,
Understanding deep learning requires rethinking generalization.
ICLR 2017.
[Chiyuan Zhang's codes]
- Peter L. Bartlett, Dylan J. Foster, Matus Telgarsky. Spectrally-normalized margin bounds for neural networks.
[ arXiv:1706.08498 ]. NIPS 2017.
- Neyshabur, B., Bhojanapalli, S., McAllester, D., and Srebro, N. A pac-bayesian approach to spectrally-normalized
margin bounds for neural networks. [ arXiv:1707.09564 ]. International Conference on Learning Representations (ICLR), 2018.
- Noah Golowich, Alexander (Sasha) Rakhlin, Ohad Shamir. Size-Independent Sample Complexity of Neural Networks.
[ arXiv:1712.06541 ]. COLT 2018.
- Weizhi Zhu, Yifei Huang, Yuan Yao. On Breiman's Dilemma in Neural Networks: Phase Transitions of Margin Dynamics.
[ arXiv: 1810.03389 ].
(This paper shows that when Rademacher complexity based generalization bounds can be informative to find early stopping,
as well as when such bounds fail with extremely over-parameterized models)
|
Y.Y. |
|
11/21/2019, Thursday |
Lecture 10: Implicit Regularization
[Abstract]: We review the implicit regularization of gradient descent type algorithms in machine learning.
[Reference]
- Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data. [ arXiv:1710.10345 ]. ICLR 2018.
- Matus Telgarsky. Margins, Shrinkage, and Boosting. [ arXiv:1303.4172 ]. ICML 2013.
- Vaishnavh Nagarajan, J. Zico Kolter. Uniform convergence may be unable to explain generalization in deep learning. [ arXiv:1902.04742 ]. NIPS 2019. [ Github ].
(It argues that all the generalization bounds above might fail to explain generalization in deep learning)
|
Y.Y. |
|
11/28/2019, Thursday |
Lecture 11: Seminars
[Title]: From Classical Statistics to Modern Machine Learning
[ slide ]
[Abstract]:
A model with zero training error is overfit to the training data and will typically generalize poorly" goes statistical textbook wisdom. Yet, in modern practice, over-parametrized deep networks with near perfect fit on training data still show excellent test performance. As I will discuss in the talk, this apparent contradiction is key to understanding the practice of modern machine learning.
While classical methods rely on a trade-off balancing the complexity of predictors with training error, modern models are best described by interpolation, where a predictor is chosen among functions that fit the training data exactly, according to a certain (implicit or explicit) inductive bias. Furthermore, classical and modern models can be unified within a single "double descent" risk curve, which extends the classical U-shaped bias-variance curve beyond the point of interpolation. This understanding of model performance delineates the limits of the usual ''what you see is what you get" generalization bounds in machine learning and points to new analyses required to understand computational, statistical, and mathematical properties of modern models.
I will proceed to discuss some important implications of interpolation for optimization, both in terms of "easy" optimization (due to the scarcity of non-global minima), and to fast convergence of small mini-batch SGD with fixed step size.
[Reference]
- Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal. Reconciling modern machine learning practice and the bias-variance trade-off.
PNAS , 2019, 116 (32). [ arXiv:1812.11118 ]
[Title]:
Benign Overfitting in Linear Prediction
[Abstract]:
Classical theory that guides the design of nonparametric prediction methods like deep neural networks involves a tradeoff between the fit to the training data and the complexity of the prediction rule. Deep learning seems to operate outside the regime where these results are informative, since deep networks can perform well even with a perfect fit to noisytraining data. We investigate this phenomenon of 'benign overfitting' in the simplest setting, that of linear prediction. We give a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization is in terms of two notions of effective rank of the data covariance. It shows that overparameterization is essential: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. We discuss implications for deep networks and for robustness to adversarial examples.
Joint work with Phil Long, Gábor Lugosi, and Alex Tsigler.
[Reference]
- Peter L. Bartlett, Philip M. Long, Gábor Lugosi, Alexander Tsigler. Benign Overfitting in Linear Regression. arXiv:1906.11300
|
Y.Y. |
|
10/28/2019, Thursday |
Lecture 12: Final Project [ PDF ]
[ Gallery of Final Project ]:
- Description of Final Project
- Group 1: XIAO Jiashun, LIU Yiyuan, WANG Ya, and YU Tingyu. Reproducible Study of Training and Generalization Performance. [ report ] [ video ]
- Group 2: Abhinav PANDEY. Anomaly Detection in Semiconductors. [ report ] [ video ]
- Group 3: LEI Chenyang, Yazhou XING, Yue WU, and XIE Jiaxin. Colorizing Black-White Movies Fastly and Automatically. [ report ] [ video ]
- Group 4: Oscar Bergqvist, Martin Studer, Cyril de Lavergne. China Equity Index Prediction Contest. [ report ] [ video ]
- Group 5: HAN, Feng, Lanqing XUE, Zhiliang Tian, and Jianyue WANG. Contextual Information Based Market Prediction using Dynamic Graph Neural Networks. [ report ] [ video ] [ Kaggle link]
- Group 6: CHEN Zhixian, QIAN Yueqi, and ZHANG Shunkang. Semi-conductor Image Classification. [ report ] [ video ]
- Group 7: Zhenghui CHEN and Lei KANG. On Raphael Painting Authentication. [ report ][ video ]
- Group 8: Boyu JIANG. Final project report on Nexperia Image Classification. [ report ]
- Group 9: LI Donghao, WU Jiamin, ZENG Wenqi and CAO Yang. On teacher-student network learning. [ report ][ video ]
- Group 10: LI Shichao, Ziyu WANG and Zhenzhen HUANG. Semiconductor Classification by Making Decisions with Deep Features. [ report ][ video ]
- Group 11: NG Yui Hong. Reproducible Study of Training and Generalization Performance. [ report ][ video ]
- Group 12: Luyu Cen, Jingyang Li, Zhongyuan Lyu and Shifan Zhao. Nexperia Kaggle in-class contest. [ report ] [ video ] [ source ]
- Group 13: Mutian He, Qing Yang, Yuxin Tong, Ruoyang Hou. Defects Recognition on Nexperia's Semi-Conductors. [ report ] [ video ]
- Group 14: WANG, Qicheng. Great Challenges of Reproducible Training of CNNs. [ report ]
|
Y.Y. |
|