This course is designed to build understanding of programming, software architecture, and general software development in the context of mathematical, statistical, and machine learning applications. Students learn the basic elements of software development in the context of mathematics and data-science. The course teaches students to implement algorithms from numerical mathematics, simulation, and machine learning, all using solid software development practices that allow for organized maintainable and extendable software.
The course is aimed at students in their fourth semester (or above) of a mathematics degree. It assumes knowledge of calculus, linear algebra, elementary statistics, basic probability, and elementary concepts of discrete mathematics. It also assumes some prior experience in scientific/computational/statistical scripting, using languages such as R, MATLAB, Python, or similar. No specific detailed knowledge of such programming is assumed, however it is assumed that students have seen code, variables, loops, conditionals, and similar constructs previously.
The focus of the course is programming of systems that involve mathematics and mathematical modelling. This includes the interface of mathematical principles and a programming language as well as tooling and the experience of using solid basic practices that lead to organized and efficient code. It is not a course about a specific programming language. Nevertheless one language needs to be used and studied. The language for this course is the Julia language. Julia is a modern compiled programming language that in many ways is easy to work with like other scripting languages, however it also allows to create very efficient code. In general, the Julia language and the surrounding eco-system focuses on scientific computing and mathematics, and hence it is a good fit for this course. For reference, here are some courses at other universities that use Julia: Computational Thinking at MIT, Top Ten Algorithms from the 20th Century at Cornell, Parallel Computing and Scientific Machine Learning at MIT, and Introduction to Matrix Methods at Stanford, among others. Note that these various generally have goals that differ from the current course, yet their innovative use of Julia is of interest to us.
The course content is broken into 7 study units and 6 items of assessment, details of which are described below.
The course is delivered via weekly lectures which include two theoretical hours per week and one demonstration hour per week. The theoretical lectures also involve some live demonstrations, but focus more on theory, whereas the demonstration hour is mostly made up of live examples. The theoretical lectures ideally take place during Monday 16:00-18:00, and the demonstration hour ideally takes place during Tuesday 17:00-18:00. There are also 3 (guest) one hour perspective lectures (scheduled either on the theoretical hours or demonstration hour). These perspective lectures are aimed at presenting students with further insights about mathematicians and statisticians working with software in industry.
In addition to the lectures there are also practicals aimed at smaller groups and operating both in flexible delivery and external delivery modes. The practicals are aimed to help the students prepare the assessment items and present the students with a chance to work live with help next to them. Practicals are only scheduled on part of the weeks, where tutor staff help students via Piazza throughout.
The use of a variety of features of a programing language (Julia in this case).
Technical tools such as: Unix, git, IDEs, Jupyter.
Mathematical and statistical algorithmic concepts, their theoretical analysis, and implementation.
Solid software development practices - with a view towards employability.
The theoretical lectures, demonstration lectures, perspective lectures, practicals, and online tutor support all aim to help the students build up these pillars of study, with the 6 assessment items serving as goals.
Ideally, after completion of this course a student will have the ability to continue self study of software and programming concepts after getting a 'jump-start' via this course. The student would ideally be able to work independently on projects for more advanced third and fourth year courses, and/or produce efficient code as part of Honours or higher degree research. Importantly, the student would have tools for contributing to open sourced projects, startup-teams, and be hirable in analytic software focused jobs in industry.
Clearly a one semester course cannot transform a mathematician into a software engineer, however it is hoped that through the course content, students will be able to further themselves on such a path if needed.
Study Units
The course is composed of the following 7 study units. Observe that each of the units feeds most of the pillars of study. The early units build up basic Julia, computer science, and tooling knowledge (mostly pillars 1 and 2), whereas the later units focus on deeper mathematical stories, mostly feeding pillars 3 and 4. Specifically with respect to pillar 3 (mathematical and statistical algorithmic concepts), there are four main concepts: numerical mathematics and ODEs (Unit 3), computer algebra systems (Unit 5), Monte Carlo and discrete event simulation (Unit 6), and machine learning (Unit 7). Clearly some of these concepts are often taught (often at greater depth) in other specialized courses. However in this course, the focus is software implementation.
Here is detail of the content of each of the Units:
Unit 1- Basics: Variables, arithmetic, logical statements, conditionals, iteration, generic functions, scope, arrays and similar structures, strings, input and output, the Julia language, Jupyter notebooks, REPL (command line), and markdown.
Unit 2 - On Algorithms and more: Sorting algorithms and their analysis. Quantifying performance via empirical measurement. Quantifying performance via mathematical analysis. Compilation steps. Memory organization. Representation of variables and quantities in memory. Additional tools: Unix command line, Git and GitHub like systems, IDEs (Integrated Development Environments).
Unit 3 - On data files, and basic numerics: Standard file formats (e.g. CSV, JSON), reading and writing to files, web input, basic plotting with Julia, basic descriptive statistics and statistical plotting with Julia, representation of floating point numbers, numerical inaccuracy issue (e.g. numerical derivatives), solutions of ODEs using standard methods, basic matrix operations and performance considerations, usage of third party packages and the Julia package manager. Usage of language features for dealing with special structures (e.g. sparsity). Further profiling and debugging tools and the basic usage of a debugger.
Unit 4 - More language features for software architecture: Julia types and multiple dispatch, defining structures and designing types, mutability, dictionaries, hash functions and hash tables. The heap data structure as an example with analytic performance analysis and implementation.
Unit 5 - Computer algebra systems and symbolic computation: Background from elementary number theory, p-addic lifting, Chinese remainder techniques, rational reconstruction, polynomial arithmetic, interpolation, GCD and Euclid's algorithm, factoring mod p, Zipple's algorithm, further symbolic computation applications.
Unit 6 - Monte Carlo and discrete event simulation: Pseudorandom number generators, from uniform distributions to any distribution, basic Monte Carlo based statistical analysis, discrete event simulation modelling, discrete event simulation engines. Modular software design, with modules and namespace control. Additional language features including meta-programming and further understanding of the compilation process. More on type inference and performance implications.
These are the 6 assessment items. The first 5 are due during semester and worked on progressively during the course. The last item is due during the final exam period. The course does not have a final exam. HW1 and Project 2 are to be worked on in pairs (or groups of 3 in special cases). The other items are individual.
HW1 (15%): Jupyter and REPL, basic Julia functionality and small programs – analysis of sorting (HW in pairs).
HW2 (10%): Using an IDE, Unix, and GitHub, file input output and numerical mathematics including solution of ODE.
Quiz (15%): Covering basics of Julia, representation of numbers, analysis of sorting performance.
There are 7 practicals (A-G) in total and this is a description of each practical.
Practical A - Basic tools: Using Jupyter. Markdown. Basic LaTeX formulas. Basic HTML in Jupyter. Basic Julia code running in Jupyter. Basic Julia REPL.
Practical B - Julia essentials: Variables, logical statements, conditional statements, loops, generic functions, scope, arrays, input/output, and a few more Julia essentials.
Each assessment item is to be handed in with an experience voice recording. In the voice recording, the student(s) state how they felt working on the assessment, what they found easy, difficult, dull, or interesting, and importantly state (if true) that the work is their own.
HW2, and the three projects (1-3) are to be handed in via GitHub (or GitLab). In these cases, the students are to create repositories for the submission. HW1, is in a more laid out and pre-specified format.
The quiz is in simple pen-and paper format (scans/photos are handed in).
For HW2 and the three projects (1-3), in addition to the GitHub (or GitLab), a single PDF file including printouts of all student source code should also be handed in. This is for easy annotation feedback of tutors.
Software Installation
It is recommended that you have the following on your machine:
A Unix style shell with Julia in your path. Note that such like shell is available by default for Linux or Mac users. For Windows users we recommended you install GitBASH.
This video describes installation of Julia and IJulia:
With the software installed, please bring your laptop to practicals (and ideally to the lectures). In exceptional circumstances, where you plan to attend a face to face practical and cannot bring your laptop, you may install Julia and the associated software on the Windows desktop machines available in the practical classroom. This is a workable solution but is not ideal. The installation may take several minutes and it is not guaranteed that it will remain on the machine over time. Hence whenever possible, bring your laptop.
Additional Resources
Here are additional resources that may be of use for the course (or for introductory Julia programming in general). None of these are mandatory as there are plenty of examples in the lecture units and practicals. That is, it is recommended that you try running every bit of code from the lectures and practicals, investigate it, look at the Julia help to explore. Nevertheless, you may find some of these additional resources helpful as well:
The Julia Express: Provides a dense summary of language features. It may be a bit too dense for those that haven't programmed much before. Still, it is useful.
MATLAB–Python–Julia cheatsheet: This is useful for those coming with a bit of MATLAB or Python experience. It shows how things are done in Julia in comparison to the other two languages.
Think Julia: How to Think Like a Computer Scientist: This is an excellent introductory book for programming. If you haven't done any programming previously it is a good read. However for more experienced programmers it can be a bit too elementary; still useful.