UQ MATH7502 - Mathematics for Data Science 2 (2019)

Welcome to MATH7502. This course is part of the Masters of Data Science program at the University of Queensland. The course is coordinated by Yoni Nazarathy (y.nazarathy@uq.edu.au). The tutors are Samuel Hambleton (samuelahambleton@gmail.com) and Chris Raymond (christopher.raymond@uqconnect.edu.au).

Communication dealing with technical matters is best achieved via the dedicated slack workspace where you can communicate with peer students and teaching staff (use this invite link to sign up). Formal messages and grades are broadcasted via blackboard.

The prerequisite for the course is knowledge comparable to that of MATH7501. This includes basic discrete mathematics, calculus and elementary manipulation of vectors and matrices. Feel free to use the 7501 - course reader to brush-up as needed.

The current course, MATH7502, is a linear algebra foundations course focusing on data-science applications. There is also some emphasis on numerical computation via software. It also contains a few basic elements of multi-variable calculus and continuous optimization. At the end of the course students will possess mathematical foundations allowing understanding and execution of activities such as these:

Understanding how basic clustering algorithms work.
Solving linear systems of equations with applications.
Solving smooth non-linear systems of equations by iteration.
Formulating optimality conditions for unconstrained and constrained optimization of smooth multi-variable functions.
Using and understanding least squares approximations and generalizations.
Modelling evolution of linear systems over time and understanding the role of eigenvalues in such evolution.
Understanding linear transformations of multi-variate normal distributions.
Understanding the operation of Principal Component Analysis (PCA).
Understanding the use of singular value decomposition as used for lossy data compression.
Understanding the mathematics of gradient-descent, Gauss-Newton and the Levenberg-Marquardt, non-linear optimization methods. Also with applications to Neural Networks.

For more motivation see also 20 methods of the data scientist and the mathematics behind them.

Q: Should I take this course?
A: If you have done enough linear algebra and multi-variable calculus and you are able to independently realize how to apply it for machine learning and data science, then maybe there is no need. Otherwise, you should probably join. If you think you understand least squares, principal component analysis, gradient descent and clustering algorithms well then maybe there isn't a need. However, if you want to improve your mathematical understanding of such tools, please join.

Q: What if I haven't done MATH7501?
A: Depending on your background, you can perhaps make up (or review) the needed parts of MATH7501 independently. Consult with the teaching staff. You can also attend the "First Year Maths Support Tutes". They are run every afternoon, in Room 205 of building #41 between 2pm and 4pm.

Q: Is the format of the course similar to the previous offerings of MATH7501 or MATH7502?
A: No. The format is quite different both in comparison to MATH7501 and last year's MATH7502.

The course is mostly (but not solely) taught in "flipped mode". For this students are assigned reading of certain sections from two text books and are required to read prior to the lectures. Then in in the lectures, highlights from the reading are discussed and problems and examples are solved. One exception is the first week focusing on an introduction to the course as well as the Julia language. Note that tutorials are taking place in the first week in the form of a lecture.

The three required resources are:

[VMLS] The book: Vectors Matrices and Least Squares (2018) by Stephen Boyd and Lieven Vandenberghe. You can use the free on-line version or you can order the book. Here is the Julia Language Companion for the book.
[LALFD] The book: Linear Algebra and Learning from Data (2018) by Gilbert Strang. Up to 10% of the book will be supplied via the library. However, students need to obtain further sections of the book independently. Here it is in the university book store. Here are (2 x 24 hour loan) copies at UQ Library. The UQ Library has also scanned a few sections from the book here (these are VI.4, I.1, I.2, I.3, I.4).
The Julia programming language. This is the recommended software for the course. However you can use alternatives if you insist (R, Python, Mathematica, Matlab,...). For basics, see the Julia linear algebra docs. There are several modes in which you can run Julia. An easy option is Julia box. However installing Julia and Jupyter locally on your computer is recommended. See for example this explainer video.

There are also additional useful resources:

[ILA] The book: Introduction to Linear Algebra, Fifth Edition (2016) by Gilbert Strang. Here it is in the university book store. Selected sections are in the UQ Library. These are Sections 3.1, 5.1 and Chapter 7.

[3B1B] The video series: Essence of linear algebra by 3Blue1Brown (Grant Sanderson) as well as other selected videos.

[SWJ] The draft book: Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence (2019) by Hayden Klok and Yoni Nazarathy. Code examples from the book are available in this GitHub repo.

The course assessment includes the following:

Three homework assignments. These assignments are to be submitted individually with each student submitting a unique assignment (copying assignments will not be tolerated). Nevertheless, students are encouraged to collaborate and discuss the homework assignments in an open and constructive manner. Sharing ideas, helping each other and jointly working towards a goal is great. More details below.
A project report presented via a Jupyter notebook with an accompanying YouTube video.
These are group assignments covering additional material to the core material taught in the course. More details below.
Individual review of peer project reports.
This is an individual review of project reports (of other groups). More details below.
A final exam.
This is a (UQ central) final exam for the course. More details below.

Due dates for assessment items are listed on UQ's official course profile for MATH7502.

Outline of Material and Reading

Below is a detailed reading list. The semester has 13 weeks. The first week is an introductory lecture. Then lectures during weeks 2 to 11 require the students to read (and watch videos) prior to the lecture as per the schedule below. Minor refinements of the reading schedule will be communicated via blackboard. Week 13 is for wrap-up and exam review.

Lectures are recorded via UQ's blackboard system, however these recordings don't capture images of the whiteboard. For this, see the course's GitHub page, also containing Jupyter notebooks from class.

Unit 1: Introduction

MATH7502-Introduction-Lecture.ipynb

MATH7502-Introduction-Lecture.pdf

Unit 2: Vectors (week 2)

Unit 3: Using Vectors (week 3)

Unit 4: Matrices (weeks 4 and 5)

Unit 5: Matrices and Vector Spaces (week 6)

~~I.3 The Four Fundamental Subspaces, I.4 Elimination and A = LU, I.5 Orthogonal Matrices and Subspaces.~~

Unit 6: Spectral Analysis (weeks 7 and 8)

~~I.9 Principal Components and Best Low Rank Matrix, V.4 Covariance Matrices and Joint Probabilities.~~

~~See also (extra): Chapter 6 from [ILA].~~

Unit 7: Least Squares #1 (weeks 9 and 10)

~~II.2 Least Squares: Four Ways.~~

Video Resources: LeastSquaresForDataScience.ipynb, LeastSquaresForDataScience.pdf.
bestValue.gif.

Unit 8: Least Squares #2 (weeks 11 and 12)

~~15.5 Complexity (regularized data fitting).~~

~~III.4 Split Algorithms for l^2 + l^1, V.5 Multivariate Gaussians and Weighted Least Squares.~~

lecture notes

Phil Isaac

Homework assignments

HW1 on Units 1, 2 and 3.

Solution (PDF)

Solution (ipynb)

HW2 on Units 4 and 5.

Solution (PDF)

Solution (ipynb)

HW3 on Units 6 and 7.

Solution (PDF)

Solution (ipynb)

Project Reports

Projects are to be carried out in groups of up to 5 people and no less than 3 people per group. Each group needs to choose one project topic from the topics below. A topic has associated reading from [VMLS], [LALFD] and in certain cases [SWJ]. The group then needs to study the material and present key ideas, principals and methods. The presentation is via a Julia Jupyter notebook with an accompanying YouTube video. Here are detailed instructions. Due 18/10/2019.

After projects are submitted. Individual peer reviews of projects will be carried out (you review projects of others). This review (summarized as a written document) is also part of the course assessment. The review questionare is here.