跳到主要內容

Resolving the Mystery of Deep Learning by Statistical Physics

項目計劃:
優配研究金
項目年份:
2019/2020
項目負責人:
楊志豪博士
(科學與環境學系)
Resolving the Mystery of Deep Learning by Statistical Physics

In the proposed project, we will apply tools in statistical physics to derive a fundamental understanding of DNNs, which can be applied to boost their performance and most importantly reduce any risks when DNNs are used in vital applications.

The rapid development of deep learning in the past decade has led to many remarkable applications, ranging from speech recognition, which achieved human-level performance, to Go-playing algorithms, which beat top professional players. Surprisingly, despite the increasing number of applications using deep learning, we only have a limited understanding of their remarkable performance. In particular, many deep learning applications apply deep neural networks (DNN) to infer the non-trivial input-output relations on labeled datasets. The internal representations in DNNs, the mechanism by which they arrive at good decisions, and how the over-parametrized DNNs avoid over-fitting and achieve high generalizability are not fully understood. This incomplete understanding may cause fatal dangers, as deep learning is now commonly applied to vital applications such as medical image analyses and self-driving cars. In the proposed project, we will apply tools in statistical physics to derive a fundamental understanding of DNNs, which can be applied to boost their performance and most importantly reduce any risks when DNNs are used in vital applications. We remark that statistical physics tools have already been applied to analyze shadow neural networks to obtain their macroscopic properties inaccessible by tools in other areas, but the understanding of DNNs via statistical physics is far from complete. Here, we aim to (1) develop an improved fundamental understanding on DNNs in terms of their loss landscape, training dynamics, and most importantly their remarkable generalization performance; (2) establish theoretical frameworks to analyze DNNs, by drawing an analogy with spin glasses and disordered systems; these frameworks will play a crucial role in the future theoretical studies and understanding of deep learning; (3) understand and leverage the dilemma of exploration and exploitation in training DNNs, and introduce new protocols to be used in combination with the state-of-the-art algorithms for better training and generalization of DNNs. The above tasks are challenging, but our successful attempts will be highly rewarding, as they will provide a fundamental understanding to resolve the mystery underlying DNNs, which will lead to important insights on a wide range of existing and future deep learning applications.