UQ MATH2504
Programming of Simulation, Analysis, and Learning Systems
(Semester 2 2022)

This is an OLDER SEMESTER.
Go to current semester

Main MATH2504 Page

Project #1 - Semester 2, 2022

Univariate Polynomials with Integer Coefficients

(last edit: September 13, 2022)

Due: End of Day, Friday, 23, September, 2022. Late submissions are not accepted.

Note: The teaching staff will only answer questions (via Piazza, consultation hour, or practicals) regarding this assignment up to the late evening of Wednesday 21/9.

Weights and marking criteria: Total number of points: 100. There are 10 points for handing in according to the hand-in instructions, including a voice recording, and neat output. There are 10 additional points for setting up the GitHub repo properly. The remaining 80 points are for the project questions.

Submission format: This assignment should be submitted via a GitHub Repo and a PDF file via Blackboard.

Specific instructions for the GitHub repo are below. It is important that the GitHub repo be made private and the course user name uqMATH2504 be invited as a collaborator. It is also important to name the repo exactly as <<FIRST NAME>>-<<LAST NAME>>-2504-2022-PROJECT1. So for example if your name is "Colin Kaepernick", the repo name should be Colin-Kaepernick-2504-2022-PROJECT1.

The PDF file should be a nice formatted file that has:

Your name, student number, and assignment title (Project 1 - 2022) on the top.
A (clickable) link to your GitHub repo.
An illustration of the code and output for example_script.jl: Show the lines of the script and display the corresponding output.
An illustration of the output of test/runtests.jl.
The main body of the PDF file is then an indication of each location of the code project where you made changes and improvements, in view of the tasks below. That is, present the new snippets of code that you created and indicate what they do via brief descriptions. It is also a place to answer any questions that require output, or description.

The recommended way to make the PDF file is via a Jupyter notebook where you copy in some code and output into the notebook, and in certain cases use include() to run Julia code if appropriate. You also comment on questions in this PDF (e.g. when asked to answer things not via code). If desired you could keep this jupyter notebook (.ipynb file) in the repo. However, this Jupyter notebook will not be checked (only the PDF file), and it isn't required to be a "runnable" notebook. In any case, please avoid pixilated screenshots of code, so for example if you choose to format your PDF file in MS-word (instead of Jupyter), make sure the code is clean, formatted, and readable.

Marking responses will be made by the teaching staff by annotating selected parts of your PDF file via blackboard. The teaching staff will also inspect your GitHub repo. A very readable and clean PDF file is important. Note that in printing Jupyter to PDF (or exporting to PDF) there may sometimes be excessive white space. Do not worry about this extra white space; it is not a problem.

Individual work: This is an individual work assignment. Plagiarism will not be accepted. Nevertheless, feel free to consult with friends or classmates via Ed and other means about how to go about certain tasks.

Marking Criteria:

10 points are allocated for following instructions.
10 points are allocated for setting up the GitHub hand-in according to the instructions.
80 points are for the polynomial project itself. Points are awarded for completing each of the tasks with full marks given to clean readable code that completes the task.
There is a 5 point bonus question.
There are also 10 bonus points for a summary of Maithili Mehta's perspective seminar.
In general, points will be deducted for sloppy coding style. Make sure to have your code properly indented, to use sensible and consistent variable names, and to write code that is in general clean and consistent.

Setting up your GitHub repo for hand-in (10pts):

Ideally use the same account you used for BigHW.
Ideally we would like you to fork the GitHub repo https://github.com/yoninazarathy/2504_2022_project1. The new forked repo under your GitHub account should then be made private. However, GitHub does not allow to make forked repositories private. So instead duplicate the repository as described here. You will first create your new repository and then follow the 4 steps under "Mirroring a repository".
Invite the course GitHub user, uqMATH2504 as a collaborator. You may do so early on while working on the project, and you must do this no later than the project due date. You must verify that uqMATH2504 is a collaborator on your project and let the course staff know if that is not the case.
Do not make any changes (commits) to the repo after the project due date.
Create a local clone of the repo. It is recommended that use use git command line via the shell to work on making changes/additions to the assignment and submitting the changes. However you are free to use any other mechanism (VS-Code, GitHub desktop, etc). Making intermediate commits is certainly recommended.
If for some reason you are not able or willing ot use GitHub, an alternative is GitLab. This is not recommended as it adds additional work to the teaching staff. However if you wish to use GitLab instead of GitHub contact the teaching staff for permission.

Your GitHub repo should be formatted like the original repo that you forked with additional files and modifications based on the tasks below.

Note: make sure that there aren't any excessive files in your submission repo. An exception is perhaps a Jupyter .ipynb file if you choose to use it for creating the PDF. Use a .gitignore file to ensure git does not commit additional files and output files to your repo.

Project tasks (80pts)

Build your project on top of the https://github.com/yoninazarathy/2504_2022_project1 by making modifications, additions, and improvements. Your first goal is to understand every bit of code of the basic repository. After that, you will make improvements, additions, and cleanups.

Please follow the suggested type names below (PolynomialZeroPacked, PolynomialSparseBI etc..). These are suggested so that tasks have backwards compatibility which mean you do not throw away old code, but rather add modifications. Note that in an actual project, you would sometimes replace older types with newer ones, and/or use parametric types - a concept which we do not use in this project.

Task 1 (5 pts) - Getting it to run, improving pretty printing, and creating your own example script.

Study the Repo

Once you create your own repository make sure it works by running test/runtests.jl and example_script.jl. At this point simply study the functionality of the repository by looking at the function signatures (not implementations) to understand what the repo can do.

Create Example Scripts

Create your own example script called example_script_2.jl and include polynomials for illustrating the key functionality of the operations available via the software. Namely, your script should

run and demonstrate how the code in the repository works,
contain some intermediate output (e.g. using println()) that describes what is going on, and in particular
have example polynomials over residue classes (primes).

Basically, example_script_2.jl is an improvement over the minimally supplied example_script.jl.

Pretty Printing

Improve the "pretty printing" of the polynomials by incorporating the following rules:

Terms of zero degree (constants) should exclude x^0, e.g. 2 not 2x^0.
Terms with unit degree should exclude the degree, e.g. 3x not 3x^1.
Excluding the constant term 1, terms with unit coefficient should exclude the coefficient, e.g. 3x^2 + x not 3x^2 + 1x.
Use subtraction instead of adding negative terms, e.g. 3x^2 - 2x not 3x^2 + -2x.
Display the terms in descending degree order. $a_n \cdot x^n + a_n \cdot x^{n-1} + \cdots + a_1 x + a_0$.
Do not print terms with zero coefficients, e.g. x^2 + 1 not x^2 + 0 x^1 + 1.

Task 2 (15pts) - Refactoring the data structure used

The current implementation for polynomials is a vector where the term of degree $k$ is stored at position $k+1$. This is optimal for dense polynomials (those with mostly nonzero coefficients) but our factoring algorithms rely heavily on manipulating sparse polynomials like $x^n -1$. Therefore,

Create a new type called PolynomialSparse that stores only nonzero terms in an array sorted by descending degree. PolynomialSparse should have identical functionality to Polynomial in terms of the functions and methods the later provides.
Create tests for PolynomialSparse and associated functions and ensure your implementation passes them.
Rename the supplied Polynomial type to be called PolynomialDense and rename all the methods that use it. Make sure the original supplied tests and scripts work for the new renamed type PolynomialDense.
Modify your solution to Task 1 so that example_script.jl is for PolynomialDense and example_script_2.jl is for PolynomialSparse.
Provide a short discussion contrasting the pros and cons of the dense and sparse representations. In particular, determine which operations are efficient (both in terms of memory and time) with the original PolynomialDense type and what operations are efficient with your new PolynomialSparse.

Note: You may to create an abstract base type, say Polynomial, for PolynomialSparse and PolynomialDense. However, this is not required.

Task 3 (10pts) - Refactoring the coefficient type from `Int` to `BigInt`

From now on we focus on the sparse representation.

Exact computation requires polynomials with (truly) integer coefficients and thereby we must use BigInt rather than (the finitely many integers comprising) Int.

Refactor PolynomialSparse to have BigInt coefficients instead of Int. Do this by introducing a new type PolynomialSparseBI which is short hand for "Polynomial Sparse Big Integer".
Duplicate the functionality (functions and methods) of PolynomialSparse for PolynomialSparseBI. Note that PolynomialSparseBI should have a sparse implementation just like PolynomialSparse.
Create tests for PolynomialSparseBI and associated functions and ensure your implementation passes them.
Find examples where PolynomialSparse (the one with Int coefficients) overflows and yields the wrong result whereas your new PolynomialSparseBI does not. The test code for PolynomialSparseBI should have such cases.
Empirically compare the run-time performance of polynomial multiplication for PolynomialSparse and PolynomialSparseBI. Do this for cases where PolynomialSparse does not overflow (i.e. returns the correct result). In general, your results should show that PolynomialSparse is faster, yet may overflow.

Note: Ideally we would use parametric types in Julia but for simplicity we simply introduce the new type PolynomialSparseBI. You may (if you wish) use parametric types for PolynomialSparse, PolynomialDense, and PolynomialSparseBI but it is not required.

Task 4 (15pts) - Creating a polynomial mod p datatype and further refactoring

Notice the PolynomialDense, PolynomialSparse, and PolynomialSparseBI data-types have integer coefficients whereas operations such as divide, gcd, and factor require the coefficient arithmetic to be done modulo $p$. Consequently, the aforementioned functions must do this reduction themselves, and worse yet can only do so once a prime $p$ is provided. It is for this reason that divide, gcd, and factor return a function that returns a polynomial once a $p$ is provided. It would be much cleaner to have a polynomial datatype where the "mod p" behavior is "baked in" and automated.

Create a type PolynomialModP which has a polynomial (of type PolynomialSparse) and a prime number (Int) as field names. PolynomialModP should support the same arithmetic operations as PolynomialSparse but with coefficient arithmetic over $\mathbb{Z}_p$ rather than Int. PolynomialModP should be easier to work with when carrying out division, gcd, and factorization. So for example f1 ÷ f2 where f1 and f2 are of type PolynomialModP, would directly yield a new PolynomialModP which is the result mod $p$. Similarly for the other operations.
Create tests for PolynomialModP and associated functions and ensure your implementation passes them.

Your new datatype should rely on the existing functionality of PolynomialSparse but restrict the coefficients to $0$, $1$, \ldots, $p-1$.

Once you complete this task, certain operations of PolynomialSparseBI like p1 ÷ p2, are now better done via PolynomialModP. That is, while you may use f1 ÷ f2 for f1 and f2 of types PolynomialSparseBI, it is better to only ÷ when f1 and f2 are of type PolynomialModP because you get back an instance of PolynomialModP rather than a function.

Implement ^ for PolynomialModP and update the pow_mod function to use it.
Update the show method for PolynomialModP to include an indication it is "mod p".

Task 5 (20pts) - Improving multiplication using the Chinese Remainder Theorem

The current multiplication implementation is not efficient for PolynomialSparseBI because big integer arithmetic is expensive. In this task we reconcile this problem by using CRT to compute products of PolynomialSparseBI via cheaper products of PolynomialSparse.

Implement multiplication for PolynomialModP (if you haven't done so yet in Task 4).
Implement multiplication using the Chinese Remainder Theorem (CRT) for PolynomialSparseBI as presented in the lecture. Here the product of two PolynomialSparseBIs will be computed by doing sufficiently many multiplications of their PolynomialModP counterparts.
Create tests for PolynomialSparseBI CRT multiplication and ensure your implementation passes them.
Benchmark PolynomialSparseBI CRT multiplication and contrast your results against the old multiplication (it is recommended to keep the old form of multiplication in another function or method). Find examples that demonstrate the CRT based approach is faster.

Task 6 (15pts) - Factorization mod p and improving power

At this point you should have efficient multiplication and power for PolynomialSparseBI as well as PolynomialModP and its associated functions and methods.

Power

As discussed in class, if you have a polynomial f and raise it to the power 2^n then instead of doing about 2^n multiplications you can do about n using repeated squaring.

Re-factor pow_mod and ^ into something more efficient by using repeated squaring. Your implementation should work for both PolynomialSparseBI and PolynomialModP.
Create tests for pow_mod and ^ for both implementations and ensure your implementation passes them.

Factoring

Adapt the supplied polynomial factorization code to incorporate the new methods and types you have built up to this point. The current supplied code works for limited cases but may get stuck on certain examples.
Create tests for factorization and ensure your implementation passes them. It is a good idea to pass your factorization algorithm a product of polynomials so your factorization is predictable. Note that your factorization algorithm may fail on small primes due to bad luck. It is therefore best to choose large primes.
Benchmark your factorization algorithm by factoring a database of problems (e.g. generated by you via rand). Do not use more than 120 seconds of local compute time.
Using a compute budget of approximately 120 seconds "stress test" your factorization implementation by factoring something substantial.

Bonus Task (5 bonus points)

Use factorization of PolynomialModP and CRT to write a factorization algorithm for PolynomialSparseBI.

Bonus Task (10 bonus points): Perspective seminar (Dr. Maithili Mehta).

This question presents you with an additional 10 bonus points. Watch the 2021 recording of Dr. Mehta's talk. We cannot have a live presentation this year due to scheduling changes involving the special Sep 22 holiday.

Now post an (anonymous) question to the speaker on this Google form. Try to post this no later than Sunday September 18.

Write your question here - and if a response by Dr. Mehta is supplied before you submit the project, write the response here.

Further write a 1-2 paragraph summary of Maithili Mehta's perspective seminar. The writeup should answer these questions without treating the questions directly. That is, don't write "the answer to Q1 is..." etc. Instead incorporate the answers in a flowing writeup report (like a blog post or letter). You may use first person and refer to the speaker as "Dr. Mehta", or "Maithili", or similar. Present the writeup as part of your PDF file submission (as the first item in the submission).

The writeup should address some of the following (in paragraph form):

What was Maithili's career path, and her relationship with software and mathematics?
Where does she currently work? What does she do in her job? How do you perceive her personal experience from her job?
Discuss a few key tools that she spoke about during the talk, such as programming languages, mathematics, machine learning, statistical tools, platforms, and the like. Was this the first time that you heard about any of these tools?
Have you picked up any tips from Maithili's talk, which you can perhaps use in your career down the road?
Feel free to state anything else that you found interesting from the talk.

UQ MATH2504Programming of Simulation, Analysis, and Learning Systems(Semester 2 2022)

This is an OLDER SEMESTER. Go to current semester