(last edit: July 24, 2023)
Due: August 25, 2023, 16:00. This assignment is in pairs.
Here is an explainer video for this assignment.
Note: Since this assignment is in pairs make sure to submit with the names of both partners (or all three in exceptional circumstances) as well as student numbers. Submission will be via the Blackboard account of only one of the teammates but will count equally for both (or all three). All cases that involve a group of a single person or three people in the team, require explicit permission from the course coordinator.
The recommended way to work in pairs is to have a lead for each question and a reviewer for each question. The leader will be responsible that a solution is created and the reviewer will be consulted and provide feedback. In previous years students that were not exposed to all questions ended up doing more poorly in the course quiz in comparison to students that reviewed (or led) all questions. There is certainly no need to state who is the lead and who is the reviewer for the questions. Ultimately the pair is jointly responsible for the assignment submission.
Submission format: The submission format is via a GitHub repository as described in Question 10. The submission should include the four submissions forms I, II, III, and IV.
The submission should also include an audio recording (30 sec minimum, 3 minutes maximum, 1 minute recommended). The voice recording should have voices of both (all) members of the group. The recording should verify that the work is your own. You can also state how you felt in the assignment. Say what was easy, what was hard, what you found useful, and what less.
To submit, go to Blackboard and submit as instructed there by providing a link to the GitHub repo, naming the partners, and uploading the voice recording (do not upload the voice recording to GitHub - upload it to Blackboard). PDFs of the Jupyter notebooks and voice recordings are for Blackboard. Jupyter notebooks are for GitHub.
Marking Criteria:
4 points are allocated for following instructions. This includes proper submission of the voice recording, the .ipynb
file(s), having names and student numbers on the hand-in, and most importantly neatly formatting all the answers of the jupyter notebook.
There are 16 questions in total, with 6 points allocated for each question. Some questions are easier and some harder. Some require more work and some require less work.
There is an additional bonus question, question 17, dealing with Dr. Megan Dawson's perspective seminar. It yields an additional 10 bonus points.
As this is the first assignment, points will not be deducted for sloppy coding style. However, in future assignments clean and consistent coding style is required.
This question deals with basic formatting tools.
1a: Consider the following recursive integration by parts formula for integer $n \ge 1$:
\[ \int_0^\infty x^n ~ e^{-x} ~ dx = -\underbrace{x^{n} e^{-x}\Big|^\infty_0}_0 ~~+~~ n \int_0^\infty x^{n-1} ~ e^{-x} dx. \]
Use LaTeX to represent this formula for $n=2$ by repeating the integration by parts twice.
1b: Use Markdown to write an ordered list of 5 programming languages, where you mention for each language where it is used. For example Javascript is often used for front end web development (but not limited to that), etc.
1c: Use HTML to embed an image with an hyperlink from the image. Use HTML to resize the image to a size you like. Make sure the aspect ratio is maintained.
The following function uses the quadratic formula to compute the real roots of the quadratic equation $ax^2 + bx + c = 0$. The return value is an array with 0, 1, or 2 roots.
function real_roots_of_quadratic(a::Number, b::Number, c::Number) #Start by initializing an empty array roots::Array{Float64} = [] #Compute the discriminant Δ = b^2 - 4a*c #\Delta + [TAB] #Based on the sign of the discriminant return 0, 1, or 2 roots. if Δ < 0 roots = [] elseif Δ == 0 roots = [-b/(2a)] else roots = [-b + √Δ, -b - √Δ] / (2a) #\sqrt + [TAB] end return roots end #Attempting on -x²+5x-6=0 real_roots_of_quadratic(-1,5,-6)
2-element Vector{Float64}: 2.0 3.0
2a: Experiment with multiple syntax variations by making a modification to the code snippet above and in each case observing the outcome.
The function accepts three arguments, a
, b
, and c
. They are each of type Number
as specified via ::Number
. Would the code still work without the ::Number
type specification? (yes/no). Try to explain why.
Observe that the expression for the discriminant uses 4a*c
as short hand for 4*a*c
. Would 4ac
work? (yes/no). Try to explain why.
A common typo in programming is to use =
in place of ==
. The former is an assignment operator and the later is a logical operator checking for equality. Insert this typo by replacing Δ == 0
with Δ = 0
. Do you get a syntax error or a logical program error?
The expression [-b + √Δ, -b - √Δ] / (2a)
can also be written as (-b .+ [√Δ,-√Δ])/(2a)
. Check this, does it work? (yes/no). Now do it with using +
in the new expression instead of .+
. Does it still work? (yes/no) Try to explain why.
The return
statement in the line return roots
is optional yet makes for more verbose code. Try removing return
. Does the function still work? (yes/no).
There is a certain set of inputs for which this function will not work. Try setting the input argument a
as 0
or 0.0
. What is the result?
To handle such cases where the quadratic equation is actually "not quadratic" add error checking code at the top of the function. One way to do this is to return some error message for example add at the top of the function:
if a == 0
return "Error - this is not a quadratic equation"
end
Check that this works.
Instead of adding these three lines you may also shorten them as,
a == 0 && return "Error - this is not a quadratic equation"
Check that this works and try to figure out (and explain) what this does.
Finally instead of returning a string as an error, we may use a more advanced mechanism. So use this instead:
a == 0 && throw(DomainError(a, "argument must be nonzero"))
You will discuss such exception handling in Unit 4, yet at this point try to do your best to explain what this does.
2b: Now that we have a function that find the real roots of such quadratic equations, let us try it on a few examples:
examples = [[1,-5,6], [1,2,3],[1,7,0]] for example in examples roots = real_roots_of_quadratic(example[1], example[2], example[3]) print("The equation $(example[1])x² + $(example[2])x + $(example[3]) = 0 ") #\^2 + [TAB] #*** if length(roots) == 0 #*** println("has no real roots.") #*** elseif length(roots) == 1 #*** println("has the single (real) root $(roots[1])") #*** else #*** println("has the real roots $(roots[1]) and $(roots[2])") #*** end #*** end
The equation 1x² + -5x + 6 = 0 has the real roots 3.0 and 2.0 The equation 1x² + 2x + 3 = 0 has no real roots. The equation 1x² + 7x + 0 = 0 has the real roots 0.0 and -7.0
Add two more examples of your own to the array (of arrays) examples
and see how it works.
The lines marked with #***
could have been set in a function of their own and that function can be called (invoked) in place of those lines. Name that function print_quadratic_roots()
and have it accept two arguments. The first argument is an array coefficients
which is interpreted as the coefficients ($a$, $b$, and $c$). The second argument is the array roots
. Define that function and run the code with your new function in place of the lines marked with #***
. Since your new function does not have a return value you may write return nothing
in the end.
Finally beautify the output of your function such that when a coefficient is $+1$ or $-1$ it is not printed but rather only the sign is printed (if needed). For example the polynomial $-1x^2-1x+1$ is better printed as $-x^2-x+1$. In doing so, use the ternary operator ? :
. In one way or another. This may have code with less lines (less explicit big if
- else
statements).
2c: It is common and useful to create testing code that checks that operational code works. For the function real_roots_of_quadratic()
we may create another function as follows:
using Random """ This function generates `num_tests` random triples of coefficients and checks that the function `real_roots_of_quadratic()` does its job. The return value is `true` if the test passed, otherwise it is `false`. """ function test_real_roots_of_quadratic(;num_tests = 10000, seed=42) Random.seed!(seed) test_passed = true for _ in 1:num_tests a, b, c = 2000rand(3) .- 1000 #uniform values in the range [-1000, 1000] roots = real_roots_of_quadratic(a,b,c) for x in roots err = a*x^2 + b*x + c test_passed = (test_passed && isapprox(err, 0.0, atol = 1e-8)) end end return test_passed end test_real_roots_of_quadratic() ? println("Test passed") : println("Test failed")
Test passed
Work on understanding the expression 2000rand(3) .- 1000
. How would you change this expression if you wanted the coefficients to be in the range [-5,5]? Make that change and see if the test still passes. (yes/no).
Instead of having the numbers 2000
and 1000
"hardcoded" in the function it is better practice to pass them as optional arguments, similarly to the arguments num_tests
and seed
. Do this by creating arguments coeff_min
and coeff_max
that are by default at -1000
and 1000
respectively. This means that the values in the expression 2000rand(3) .- 1000
need to be represented in terms of these arguments. Do this.
What happens in this test when there are no real roots? How many iterations does the for loop execute in this case?
In general a test passing is only a necessary condition for validity of a program rather than a sufficient condition. Can you suggest some sort of bug in the real_roots_of_quadratic()
function that can exist which would slip through the test? Try this bug (modify real_roots_of_quadratic()
and show that the test still passes).
A different way to test would be to use "random roots" instead of random coefficients. Such a test would first randomly decide if there are 0 real roots, 1 real root, or 2 real roots. It would then randomly draw the roots. It would then find the coefficients that match these roots. It would then test that real_roots_of_quadratic()
has these roots. Try to implement such a test function. For the case of 0 (real roots) you may "hardcode" a few specific examples. Alternatively (optional) try to use the complex numbers. For example rand()+rand()*im
is a complex number with random real and imaginary values, each in [0,1]
2d: In general, we may have had a solver that finds the (potentially) complex roots of a quadratic. Since the coefficients, $a$, $b$, and $c$ are real, these are complex conjugates. So we want to use the quadratic formula directly (potentially taking square roots of negative numbers).
Trying:
√(-4) #\sqrt + [TAB] is like the sqrt() function
ERROR: DomainError with -4.0: sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
We got an error. Instead we need to highlight that -4
is a complex number:
Complex(-4) #Create a complex number -4 + 0i
-4 + 0im
Now the sqrt()
function, which we shorten via the √ symbol works:
√Complex(-4) #This is \sqrt + [TAB], just like sqrt()
0.0 + 2.0im
So we can now create:
function roots_of_quadratic(a::Number, b::Number, c::Number) roots::Array{ComplexF64} = [] Δ = b^2 - 4a*c if Δ < 0 roots = [-b + √Complex(Δ), -b - √Complex(Δ)] / (2a) elseif Δ == 0 roots = -b/(2a) else roots = [-b + √Δ, -b - √Δ] / (2a) end return roots end roots_of_quadratic(1,2,3)
2-element Vector{ComplexF64}: -1.0 + 1.4142135623730951im -1.0 - 1.4142135623730951im
The above implementation is fine, however it has code duplication for both cases of $\Delta \neq 0$. That is the case Δ < 0
and the case Δ > 0
can be merged. Modify the function so that it does not have an elseif
statement. In fact, try to write the function without any if
statement but rather only using the trinary operator ? :
.
Faulhaber's formula presents an expression for
\[ \sum_{i=1}^n i^p, \]
where $p$ is some non-negative integer. This problem deals with the case of $p=5$ and to solve this problem you don't need to investigate Faulhaber's formula.
Say you believe that the expression resulting from the formula for $p=5$ is:
\[ \sum_{i=1}^{n} i^5 = \frac{1}{6} n^6 + \frac{1}{2} n^5 + \frac{5}{12} n^4 - \frac{5}{60} n^2, \]
but just to be sure you want to write a program that checks this for $20$ random values (repetitions allowed) of $n$ in the range of $1,\ldots,10^3$.
3a: Write short program that performs this check by comparing the formula calculation and a calculation using a loop.
3b: Now try to use a variant of the program to do the check for $n=10^4$. Does it work? If not try to explain why?
3c: Now use Julia's big()
function to do the calculation with BigInt
types and use //
to represent the fraction coefficients as rational types. Can you get an exact match now?
Consider this code which creates two functions for creating random strings representing binary numbers and hexadecimal numbers.
random_binary_string(n = 12) = *([rand(['0','1']) for _ in 1:n]...) random_hex_string(n=3) = *([rand(union('0':'9','A':'F')) for _ in 1:n]...);
For example,
using Random Random.seed!(0) @show random_binary_string() @show random_hex_string();
random_binary_string() = "001010010011" random_hex_string() = "456"
4a: Explain how these functions work.
Assume now that you are given inputs:
using Random Random.seed!(0) r_bin_strs = [random_binary_string() for _ in 1:10^3] r_hex_strs = [random_hex_string() for _ in 1:10^3];
4b: Use standard Julia functions to parse these arrays of strings into arrays of numerical values and report the sum of the values (present the result in decimal). So your output should be two decimal numbers, one which is the sum of all of the binary numbers in the array, and the other which is the sum of all of the hex numbers. Prior to checking your output, suggest an estimate (expected value) for the sum - explain why.
4c: Now create your own functions (from first principles parsing strings) to convert a binary digit string and a hex string into values. Repeat 4b with your own functions and see you get the same result. Present your function implementations and the output.
An abundant number is a positive integer for which the sum of its proper divisors (excluding the number itself) is greater than the number itself.
For example, consider the number 12. The proper divisors of 12 are 1, 2, 3, 4, and 6. The sum of these proper divisors is 1 + 2 + 3 + 4 + 6 = 16, which is greater than 12. Therefore, 12 is an abundant number.
5a: Use the divisors
functions from the Primes.jl
package to write a program that prints all the abundant numbers that are less than or equal to $200$.
5b: Now write a program that does the same thing without using the divisors
function from Primes.jl
(i.e. write your own divisors function or something similar).
Goldbach's conjecture states that every even number greater than two can be represented as a sum of two primes. For example $16 = 11+5$ and $40 = 37+3$ or $23+17$. In this question you will try to disprove the conjecture by searching for an even number that is not a sum of two primes (don't expect to succeed in disproving).
First consider the Sieve of Eratosthenes implemented as follows:
""" Returns the all the primes up to n. """ function sieve_of_Eratosthenes(n) primebits = ones(Bool,n) #Will contain true if the index is prime (initially all assumed prime) primebits[1] = false #The number 1 is not prime p = 2 #Smallest prime while p ≤ n i = 2p while i ≤ n # \le +[TAB] primebits[i] = false i += p end p += 1 while p ≤ n && !primebits[p] p += 1 end end (1:n)[primebits] end n = 100 println("First $n primes: ",sieve_of_Eratosthenes(n))
First 100 primes: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
This sieve yields a way to get a list of the first $n$ primes. However it is used inefficiently in the code below to create a plot of the number of Goldbach pairs.
using Plots function check_Goldbach(n) @assert iseven(n) num_pairs = 0 for p in sieve_of_Eratosthenes(n) if in(n-p,sieve_of_Eratosthenes(n)) num_pairs += 1 end end return num_pairs end n = 10^3 even_range = 4:2:n checks = check_Goldbach.(even_range) if 0 ∈ checks println("Found a counter example for Goldbach") end scatter(rand(10)) scatter(even_range,checks,legend=false,xlabel="n",ylabel="Number of Goldbach pairs")
Modify this code or create code of your own so that you can plot a figure similar to the above for the first $10^6$ Goldbach pairs. Do not use any in-built (or package) functions for prime numbers (e.g. Primes.jl. In the figure, instead of plotting all the points associated with Goldbach pairs, only plot a random subset of the points so that the figure won't have more than about $20,000$ points.
Assume you have monotonic increasing sequences such those given by these Julia functions (assuming n>=0
):
fn1(n) = 1 - 1/√(20n)
fn2(n) = 1 - exp(-(n-1)/50)
fn3(n) = max(min(1, n/600 + 0.2),0)
fn4(n) = 1/(1+exp(-(n - 100)/200))
7a: Create a single informative scatter plot that presents all four of these functions with discrete values of n
on the range 0:600
. Make sure that the plot is legible and the legend is clear. Adjust plotting parameters to make the plot as neat and suitable as you can.
7b: Create now a function called find_n_basic
which takes as a first argument a function such as fn1
, fn2
etc... f
(a pure mathematical function from the non-negative integers to the range $[0,1]$) and as a second argument a floating point number in the range [0,1]
, call it target
. Your function should then find the smallest n
such that f(n) >= target
. In your implementation do this by increasing n
from 0
, until f(n) >= target
. Test your find_n
function on fn1
, fn2
, fn3
, and fn4
with various values of target
. Show testing of this function for several random values of target
(e.g. 5 values for each test function).
7c: Your implementation of find_n_basic
probably started with n=0
and incremented n
until f(n) >= target
. Now go for a quicker implementation (call it find_n_geometric
). Increment n
from 0
to 1
and then by a multiple of 2
each time until exceeding (or equaling) target. Then decrease until finding the minimal n
. Compare your answer here for the same 5 values that you tested in 7b
(for each of the test functions).
7d: Now have a third implementation which operates like fine_n_geometric
yet after exceeding (or equaling the target), you cary out a binary search on the identified interval. So for example if for n=32
you have that f(n) < target
and further for n=64
you have that f(n) >= target
, then your new function (call it find_n_fast
), will carry out a binary search between n=32
and n=64
to find n
. Again compare your answer her for the same 5 values you tested above, for each of the test functions.
Consider this code which sorts a list of names:
function array_table(array, heading) println(heading,":") for (i,a) in enumerate(array) println(i,"\t",a) end end names = [ "Amy Chan", "Maithili Mehta", "Anna Foeglein", "Andy Ferris", "Thomas Graham", "Elaine Schenk", "Jesse Woods", "Tina Moghaddam", "Paul Bellette", "Paul Vrbik", "Tom Cranitch", "Yoni Nazarathy", "Sam Hambleton", "Isaac Beh", "Limao Chang", "Nazeef Hamid", "Cooper Janke", "Brandon Lowe", "Lief Lundmark", "Megan Dawson", "Alistair Falconer", "Emma Comino", "Ivana Carrizo-Molina"] sorted_names = sort(names) array_table(sorted_names, "Sorted by first name") println() sorted_names = sort(names,by=(x)->reverse(split(x," "))) array_table(sorted_names,"Sorted by last name")
Sorted by first name: 1 Alistair Falconer 2 Amy Chan 3 Andy Ferris 4 Anna Foeglein 5 Brandon Lowe 6 Cooper Janke 7 Elaine Schenk 8 Emma Comino 9 Isaac Beh 10 Ivana Carrizo-Molina 11 Jesse Woods 12 Lief Lundmark 13 Limao Chang 14 Maithili Mehta 15 Megan Dawson 16 Nazeef Hamid 17 Paul Bellette 18 Paul Vrbik 19 Sam Hambleton 20 Thomas Graham 21 Tina Moghaddam 22 Tom Cranitch 23 Yoni Nazarathy Sorted by last name: 1 Isaac Beh 2 Paul Bellette 3 Ivana Carrizo-Molina 4 Amy Chan 5 Limao Chang 6 Emma Comino 7 Tom Cranitch 8 Megan Dawson 9 Alistair Falconer 10 Andy Ferris 11 Anna Foeglein 12 Thomas Graham 13 Sam Hambleton 14 Nazeef Hamid 15 Cooper Janke 16 Brandon Lowe 17 Lief Lundmark 18 Maithili Mehta 19 Tina Moghaddam 20 Yoni Nazarathy 21 Elaine Schenk 22 Paul Vrbik 23 Jesse Woods
8a: Read the help entry for sort()
. Now modify the code to sort the list based on the shortest last name (shortest first, longest last).
8b: Implement your own sort function, say my_sort_bubble()
, using a Bubble sort algorithm. Your function should also accept a by=
argument and work in the same way. Show the output on the same list, sorting by last name first.
8c: Now implement a version of your function my_sort_quick()
that uses the quick sort algorithm. Test it on the list of names above, using last name first.
8d: Now generate lists of a million integers via data = rand(Int,10^6)
. Compare running times of the system's sort()
, my_sort_bubble()
, and my_sort_quick()
on this list. Use the @btime
macro. Then repeat for $10^7$ and $10^8$ integers. Report your results in an organized manner. Note that for my_sort_bubble()
you can stop at at $10^4$ or $10^5$ because longer inputs will take too much time. However, try to use the large inputs for my_sort_quick()
and the system's sort()
.
Here is code that creates two random matrices and displays their matrix product:
using LinearAlgebra Random.seed!(0) A = round.(10*rand(2,3)) B = round.(10*rand(3,4)) A*B
2×4 Matrix{Float64}: 146.0 99.0 80.0 173.0 21.0 16.0 11.0 26.0
This in built multiplication is designed to be very numerically efficient. However matrix multiplication can be computed in different ways and to explore these ways you will write your own matrix multiplication functions.
For example:
function my_mult_inner_products(A,B) nA,mA = size(A) nB,mB = size(B) @assert mA == nB n, m, p = nA, mA, mB C = Array{Float64}(undef,n,p) for i in 1:n for j in 1:p C[i,j] = A[i,:]' * B[:,j] #compute inner product of i'th row of A and j'th column of B end end return C end my_mult_inner_products(A,B)
2×4 Matrix{Float64}: 146.0 99.0 80.0 173.0 21.0 16.0 11.0 26.0
However, matrix products can be represented in other ways, and each of these other ways will have their own implementation. You will now create your own three functions that execute these implementations, and in each case test they get the same result for the test matrices A
and B
above.
9a: my_mult_by_cols()
: The matrix product can be represented as the linear combination of the columns. That is, the $j$th column of $C$ is a linear combination of the columns of $A$ with weights given by the $j$th column of $B$. Implement this making use of Julia's vector arithmetic e.g.:
vec1 = [1,2,3]; vec2 = [2,0,4]; 3*vec1 + 4*vec2
3-element Vector{Int64}: 11 6 25
9b: my_mult_by_rows()
: Similarly to the above, you can take linear combinations of the rows of $B$. This representation is the "dual" of 7a. Carry this out as well.
9c: my_mult_sum_outer_products()
: A third representation is to consider outer products and observe that,
\[ A B = \sum_{j=1}^m A_{:,\,j} B_{j,\,:}^\top ~~, \]
where $A_{:,\,j}$ is the $j$th column of $A$ and $B_{j,\,:}$ is the $j$th row of $B$ (represented as a column vector and hence its transpose is a row vector).
To see such outer products (rank one matrices) in Julia, consider:
vec1 = [1,2,3]; vec2 = [2,0,4]; vec1*vec2'
3×3 Matrix{Int64}: 2 0 4 4 0 8 6 0 12
So now rank one matrices may be added up.
9d: After you have implemented the functions for 9a, 9b, and 9c, it may appear that in each of these cases there is only a single loop while the function provided, my_mult_inner_products()
has a nested loop. Does this mean that in terms of complexity of the number of the operations, 9a, 9b, and 9c are necessarily much more efficient? Explain your result.
Create your GitHub account if you don't have one.
Create a new repo for this assignment. Name the repo exactly as <<FIRST NAME1>>-<<LAST NAME1>>__<<FIRST NAME2>>-<<LAST NAME2>>2504-2023-BigHW
. So for example if your names are "Jacinda Ardern" and "Anthony Albanese" the repo name should be Jacinda-Ardern__Anthony-Albanese-2504-2022-BigHW
.
Make sure the repo is private.
Invite the course GitHub user, uqMATH2504
as a collaborator. You may do so early on while working on the assignment, and must do this no later than the assignment due date.
Do not make any changes (commits) to the repo after the assignment due date.
Create a local clone of the repo. It is recommended that use use git
command line via the shell to work on making changes/additions to the assignment and submitting the changes. However you are free to use any other mechanism (VS-Code, GitHub desktop, etc).
If you haven't install git on mac or Linux, or use GitBash if you use Windows (both team mates should do this).
Setup authentication either via personal access token or via SSH (both team mates should do this).
If for some reason you are not able or willing to use GitHub, an alternative is GitLab. This is not recommended as it adds additional work to the teaching staff. However, if you wish to use GitLab instead of GitHub contact the teaching staff for permission.
Your GitHub repo should be formatted as exactly as follows:
Have a README.md
file which states your name, the assignment title, and has a (clickable) link to the assignment instructions on the course website (to this document).
Have a LICENSE.md
file. Choose a license as you wish (for example the MIT license). However keep in mind that you must keep the submission private until the end of the semester.
Have a .gitignore
file.
Have a notebooks
directory and in it have the 4 submission forms (notebooks) I–IV.
Note: make sure that there aren't any files in your submission repo except for the submission forms I–IV (.ipynb format). Use may use the .gitignore
file to ensure git
does not commit additional files and output files to your repo
. Voice recordings and PDF files are to be uploaded to blackboard, not GitHub.
In this question you will manipulate a few files via Unix command line. If you are using Windows, it is best you do it via GitBash. If you are using Linux or Mac, just use the shell. For the question create a folder hw2_q1
and make sure that hw2_q1
is your working directory in the terminal/shell/command-line via cd
. You will then use the commands ls
, pwd
, cd
, cp
, mv
, rm
(be careful with rm
), cat
, and echo
. You'll also use output redirection to file (>
).
Your submission should be comprised of the text from the shell including the commands you used and the outputs. Just get it by copying the shell text and pasting in a file.
11a: Create the folder hw2_q1
and make that your working directory in the shell. Illustrate this via pwd
.
11b: Display your name to the terminal using the echo
command.
11c: Repeat and use output redirection (>
) to send the output of the echo command to a file my_name.txt
.
11d: Display the contents of the file using the cat
command.
11e: Inspect the length of the file using ls
. Make sure to use a flag for ls
so as to see the number of bytes in the file.
11f: Create a duplicate of the file to another file called my_name2.txt
using the cp
command.
11g: Create a folder named people
using the mkdir
command.
11h: Move my_name2.txt
into people
, where the file name will be <<YOUR-NAME>>.txt
with <<YOUR-NAME>>
being your first and last name without white space using _
. For this use the mv
command.
11i: Using the same thing you did in 1c, create three additional files of peoples names in the folder people
.
11j: Change directory into people
. Now Use mv
to move one of the people files back into hw2_q1
. Use ..
(double dot).
11k: Move another file back into hw2_q1
, but now use an absolute path to describe the destination folder.
11l: Copy one of the files in people into a file name starting with .
(single dot as prefix). For example if the file name was marie_curie.txt
, copy it into .marie_curie.txt
.
11m: Show all the files in the directory (also showing .marie_curie.txt
), where you also see the size of the files and other information. Use the correct flag(s) for ls
to show hidden files.
11n: Delete all the files in the directory using a single command. Please be very careful when deleting files!
11o: Move back to hw2_q1
and delete the directory people
. Use either rm
with a correct flag or rmdir
.
Inspect the Jupyter notebook file for practical B, practical_B_julia_essentials.ipynb
. The source URL is available by clicking "Raw" on the GitHub page. It is in JSON format.
Write a program that reads in the JSON file (directly from the web) of the Jupyter notebook and prints a summary of the notebook including:
Total number of cells.
Number of code cells.
Number of markdown cells.
A breakdown of the number of code cells to cells that have output and those that don't.
Any other summary information you find relevant (you choose what summary to print - must be at least one item).
The program should then create two CSV files, both with header rows and named markdown_summary.csv
and code_summary.csv
. Both of these files should contain a data row for each respective markdown or code cell in the notebook. The first column in each of the files should be named cell_number
and should indicate the cell number in the Jupyter notebook (this is a unique count starting at 1 within the cells). The second and third columns are:
character_count
- determines how many characters total are in the cell (include characters such as \n
and count unique characters as a single character. E.g. √
is a single character.
line_count
- The number of lines the cell uses.
The remaining columns count frequency of elements within the markdown or code of the cell. These columns will differ for markdown_summary.csv
and code_summary.csv
, and are described now.
For the markdown summary file create columns: #
, ##
, ###
, ####
, each of which count the number of times the respective element appears in the cell. For example if a cell has #
and ##
in it, each once, then the respective entries in the line for this cell should be 1, 1, 0, 0
.
For the code summary CSV file do so similarly, for the keywords: return
, for
, if
, and using
(i.e. 4 columns, counting the frequency of these keywords in each of the code cells).
Thus in summary, your program reads in a JSON file representing a Jupyter notebook, displays basic summary information, and then stores additional information to two CSV files which in principle could be further analyzed at a later time.
Your relevant submission form for the assignment should show the first few lines of these CSV files.
The lecture notes contain code that illustrates the absolute relative error (as a function of $h$) for numerical derivatives using the forward difference scheme,
\[ f'(x) \approx \frac{f(x+h) - f(x)}{h}. \]
An alternative scheme is the central difference scheme using,
\[ f'(x) \approx \frac{f(x+h/2) - f(x-h/2)}{h}. \]
13a: Consider the functions, $f_1(x) = cos(x^2)$ at $x=1/2$, $f_2(x) = e^{x^2}$ at $x=1$, and $f_3(x) = atan(x)/(1+e^{-4x^2})$ at $x=2$. Evaluate the performance of each of the schemes (forward and central) for each of these functions at the given points. In each case plot the error as a function of $h$, and find the optimal $h$. Display your results in a neat manner using one or two plots and a minimal summary.
13b: Consider now a vector valued function of the form, $f: {\mathbb R}^K \to {\mathbb R}^K$, defined as,
\[ S(z) = \frac{1}{\sum_{i=1}^{K} e^{z_i}} \begin{bmatrix} e^{z_1} & \cdots & e^{z_{K}} \end{bmatrix}^\top. \]
This function is called the softmax function. Its Jacobian is a $K \times K$ matrix with elements of the form,
\[ J_{ij} = \frac{\partial}{\partial z_j} \frac{e^{z_i}}{\sum_{i=1}^{K} e^{z_i}}%\\ = \begin{cases} S(z)_{i}(1-S(z)_{i}), & i = j, \\[10pt] -S(z)_{i}S(z)_{j}, & i \neq j. \end{cases} \]
Assume you do not know the Jacobian expression and wish to compute it for a vector $z$ of dimension $K$. How many softmax evaluations are needed with the forward difference scheme? How many softmax function evaluations are needed with the central difference scheme? Explain your answer.
13c: Consider now $z = [1^{1/3}, 2^{1/3}, \ldots, K^{1/3}]^\top$ for $K=10$, $K=20$, and $K=100$. Plot the error of the Jacobian for both the forward difference scheme and the central difference scheme where as error you consider the euclidean distance between the approximated Jacobian and the explicit Jacobian (the square root of the sum up the squares of the differences between entries of the numerical derivative and the explicit one). What is the optimal $h$ in each case?
LinearAlgebra
A discrete time, time-homogenous, finite state markov chain can be defined as a random sequence $X_0,X_1,X_2,\ldots$ where $X_i \in \{1,2,\ldots,L\}$ with $L$ being the number of states, and where the probability of $X_{n+1} = j$ given that $X_n = i$ is determined by the element $P_{ij}$ of the (stochastic) matrix $P$. This is an $L \times L$ matrix with non-negative entries and, $\sum_{j=1}^L P_{ij} = 1$ for all $i \in \{1,\ldots,L\}$. Hence each row $i$ of the matrix is the probability distribution of the next value in the sequence in case where the current value is $i$. An initial probability of $X_0$ is also given via some vector $p^0$ where $p^0_i = {\mathbb P}(X_0 = i)$.
Markov chains are very useful models for a variety of applications and at UQ are briefly introduced in STAT2003 and studied in detail in STAT3004. Here we will simply use them as a motivating example for practicing using linear algebra and no further background from these courses is needed.
We focus on irreducible, aperiodic, finite state Markov chains. Irreducible means that for every pair $(i,j)$ there exists an $n$ such that $i$-th $j$-th entry of the matrix power $P^n$ is non-negative. Aperiodic is guaranteed when the diagonal entries of $P$ are strictly positive. Irreducibility means that every state $\{1,\ldots,L\}$ is reachable from every other state and being aperiodic implies there aren't times $n$ where only a subset of the states can be visited. For such Markov chains, one important vector quantity is the stationary probability (row) $L$-vector. Here it is denoted $\pi$ with $\sum_{i=1}^L \pi_i =1$ and $\pi_i>0$ for every $i$. This vector $\pi$ counts the long term statistical occupancy of states, and can be characterized/computed in several ways:
It satisfies $\pi P = \pi$ (note that this is an undetermined system of linear equations because $(I-P)$ is singular; still the additional requirement that $\sum_{i=1}^L \pi_i =1$ ensures uniqueness of $\pi$).
For any $i$,
\[ \lim_{n \to \infty} [P^n]_{ij} = \pi_j. \]
The theory of non-negative matrices implies that $P$ has a a real eigenvalue equal to $1$. This is the eigenvalue with maximal magnitude among all eigenvalues of $P$. The vector $\pi$ is proportional to the eigenvector associated with this maximal eigenvalue ($\sum_{i=1}^L \pi_i =1$).
Given a sequence of random variables from this Markov chain $\{X_0,X_1,X_2,\ldots\}$ we estimate $\pi_i$ for large $N$ via,
\[ \pi_i \approx \frac{1}{N} \sum_{n=1}^N {\mathbf 1} \{X_n = i\}, \]
where ${\mathbf 1}\{\cdot\}$ is the indicator function equalling $1$ if the argument occurs, and $0$ otherwise. This is true for any initial probability vector $p^{0}$.
In this problem you will compute/estimate $\pi$ using methods 1-4 for a specific class of problems.
The code below generates a very special class of Markov chains where there is an explicit expression for the stationary probability vector $\pi$. The function structured_P
with input argument L
creates and $L \times L$ matrix $P$ and the function structured_π
creates an $L$-vector (column vector) $\pi$ matching this $P$:
using LinearAlgebra function structured_P(L::Int; p::Float64 = 0.45, r::Float64 = 0.01)::Matrix{Float64} q = 1 - p - r P = diagm(fill(r,L)) + diagm(-1=>fill(q,L-1)) + diagm(1 => fill(p,L-1)) P[1,1] = 1-p P[L,L] = 1-q return P end structured_π(L::Int; p::Float64 = 0.45, r::Float64 = 0.01)::Vector{Float64} = begin q = 1 - p - r [(p/q)^i for i in 1:L] * (q-p) / p / (1-(p/q)^L) #Explicit expression (birth death) end;
Here is for example the matrix $P$ for $L=5$:
P = structured_P(5)
5×5 Matrix{Float64}: 0.55 0.45 0.0 0.0 0.0 0.54 0.01 0.45 0.0 0.0 0.0 0.54 0.01 0.45 0.0 0.0 0.0 0.54 0.01 0.45 0.0 0.0 0.0 0.54 0.46
And here is the vector $\pi$ for $L=5$:
π = structured_π(5) @show sum(π) @show sum(π) ≈ 1.0 #\approx + [TAB] π
sum(π) = 0.9999999999999996 sum(π) ≈ 1.0 = true 5-element Vector{Float64}: 0.2786497527413459 0.2322081272844549 0.1935067727370457 0.16125564394753808 0.13437970328961507
We can see that it is a stationary distribution:
@show norm(π'*P - π') π'*P ≈ π'
norm(π' * P - π') = 2.7755575615628914e-17 true
You will now consider $L=2,3,4,5,10,20,30,40,50,100,200,300,400,500,1000$ and in each case use methods 1-4 above to compute/estimate $\pi$. Create 4 functions, one for each method and in each case use the function to compute/estimate $\pi$ and compare your estimated vector with the output of structured_π
. Note that methods 1-3 will use functions from LinearAlgebra while method 4 requires random variable generation. You may make use of sample
from StatsBase.jl for this, or any other means. You may choose for how many steps to run the Markov Chain ($N$). Similarly, since the second method requires you take take matrix powers $P^n$ for high $n$, you may also choose which $n$ to use.
Once your code works for the four methods, create some graphic or tabular output of your choice (one or two plots at most) which evaluates performance of each of the methods in terms of accuracy (and optional: running time). To measure accuracy use the norm
function to compute the euclidean norm between your calculated $\pi$ and that arising from the explicit solution structured_π
.
In this problem and the next we deal with ordinary differential equations (ODEs), similar to those associated with planetary motion. Specifically we focus on a very simple two body problem reduced to two one body problems. Here you will implement the Euler Method and the classic Runge–Kutta method for solving ODEs. You will use these methods on the simple problem and analyze their performance in terms of step size.
In general, take a function $f: {\mathbb R}^d \to {\mathbb R}^d$ which we called the right hand side (RHS). The ODE seeks a function $u(\cdot)$ where $u(t) \in {\mathbb R}^d$, with,
\[ \frac{d u}{dt} = f(u), \qquad \text{and} \qquad u(0) = u_0. \]
Here $u_0 \in {\mathbb R}^d$ is a given initial condition. Note also that in this formulation here, $f(\cdot)$ only depends on the state $u$ and not on time, $t$. A more general formulation allows dependence on time.
A solution $u(\cdot)$ is a function which for any $t\ge 0$ specifies the state $u(t)$ and satisfies the ODE above.
For a given ODE problem, namely a RHS $f(\cdot)$ and an initial condition $u_0$, an ODE numerical solver is an algorithm that tries to find $u(t)$ over time $t$ in some range $[0,t_{\max}]$. Euler's method works by setting $u(0) = u_0$ and following the key recursion,
\[ u(t+h) = u(t) + h f\big(u(t)\big), \]
starting with $t=0$ and incrementing $t$ by $h$ until $t\ge t_{\max}$.
The classic Runge–Kutta method, called also RK4, works via a more intricate scheme where,
\[ u(t+h) = u(t) + \frac{h}{6}(k_1 + 2k_2 + 2k_3 +k_4), \]
where,
\[ \begin{align*} k_{1} &=f\left(u(t)\right) \\ k_{2} &=f\left(u(t) + h \, \frac{k_{1}}{2}\right), \\ k_{3} &=f\left(u(t) + h \, \frac{k_{2}}{2}\right) \\ k_{4} &=f\left(u(t) + h\, k_{3}\right) . \end{align*} \]
For both algorithms the step size $h>0$ is a key parameter.
Our example ODE deals with a mass orbiting a much larger mass on a plane (a reasonable approx to the earth around the sun). This is a two body problem reduced to two one body problems. A trivial one for the barycentre and a second order differential equation for the orbit around the centre of mass. We now develop this ODE from simple physical principles (note that the physics of the problem is not crucial for solving this assignment problem).
Newtons Law states that,
\[ m \, a = F(r), \]
where $m$ is the mass of earth, $a$ is the acceleration vector, and $F(r)$ is the gravitational force vector acting on earth as a function of $r$, the distance between the earth and the sun (which is the distance between earth and the barycentre). This force, driven by a $1/r$ potential, is given via,
\[ F(r) = -\frac{m\, M\, G}{~~r^2} r' \]
where $r'$ is a unit vector pointing at the barycentre, $M$ is the mass of the sun, and $G$ is the gravitational constant. Hence the acceleration vector of earth is
\[ a = -\frac{M\, G}{~~~r^2}r'. \]
We now describe the location of earth via $x(t)$ and $y(t)$ in Cartesian coordinates, using the standard unit vectors $\hat{i}$ and $\hat{j}$. With this description, the acceleration vector is,
\[ a = \frac{d^2x}{dt^2} \hat{i} + \frac{d^2y}{dt^2} \hat{j}. \]
Now since,
\[ r = \sqrt{x^2 + y^2} \qquad \text{and} \qquad r' = \frac{x}{|r|} \hat{i} + \frac{y}{|r|} \hat{j}, \]
we get a system of second order ordinary differential equations:
\[ \left\{ \begin{align*} \frac{d^2x}{dt^2} &= -\frac{M\, G\, x}{~~r^{3}},\\ \frac{d^2y}{dt^2} &= -\frac{M\, G\, y}{~~r^{3}}. \end{align*} \right. \]
This system has an explicit ellipsoid solution.
We also expect the total energy $E$ to be conserved where $E = T(t) + V(t)$ with the kinetic energy satisfying $T(t) \propto \|v(t)\|^2$ and the potential energy satisfying and $V(t) \propto -1/r(t)$. Here "$\propto$" indicates "proportional to". Here $v(t)$ is the velocity vector comprised of $v_x(t)$ and $v_y(t)$ with,
\[ v_x = \frac{dx}{dt}, \qquad \text{and} \qquad v_y = \frac{dy}{dt}. \]
With such an explicit solution and the invariant of conservation of energy, we can now evaluate the performance of the various numerical schemes and their parameters. To do so, we also convert the system into the ODE form presented above (converting it from a second order ODE to a first order one). This is done by representing the state vector $u(t)$ with $d=4$ as $v_x(t)$, $v_y(t)$, $x(t)$, $y(t)$. Thus,
\[ \left\{ \begin{align*} \frac{dv_x}{dt} &= -\frac{M\,G\,x}{~~r^{3}},\\ \frac{dv_y}{dt} &= -\frac{M\,G\,y}{~~r^{3}},\\ \frac{dx}{dt} &= v_x, \\ \frac{dy}{dt} &= v_y. \end{align*} \right. \]
The following code specifies the function df_dt_one_body
which is the RHS for this ODE. The other four one line functions are convenience accessors to the state variables.
#These four convenience functions extract the state variable from the state vector #It is assumed the layout of the vector u is u = [v_x, v_y, x, y] state_v_x(u::Vector{Float64}) = u[1] state_v_y(u::Vector{Float64}) = u[2] state_x(u::Vector{Float64}) = u[3] state_y(u::Vector{Float64}) = u[4] """ Computes the RHS for the one body problem. """ function df_dt_one_body(u::Vector{Float64}, t::Float64)::Vector{Float64} M, G = 1, 1 #We take these constants as normalized. Naturally they would need to be set for physical values. r = sqrt(state_x(u)^2 + state_y(u)^2) return [-M*G*state_x(u)/r^3, -M*G*state_y(u)/r^3, state_v_x(u), state_v_y(u)] end;
Once you use it to obtain a solution on a time vector t
and a state (vector of vectors) u
, a function such as plot_solution
below can be used to plot the solution, also allowing you to use a title
for the plot and a label
for the series:
using Plots, Measures function plot_solution( t::AbstractArray{T}, u::Vector{Vector{Float64}}; title::String = "", label::Union{String, Bool} = false) where T x, y, v_x, v_y = state_x.(u), state_y.(u), state_v_x.(u), state_v_y.(u) #"Energy" r = @. sqrt(x^2 + y^2) E = @. 0.5*(v_x^2 + v_y^2) - 1.0/r p1 = plot( x, y, label = label, xlabel= "X", ylabel = "Y", title = title*" (position)", aspectratio=1,legend=:topleft,ylim=(-7,7)) scatter!([0], [0], ms=15, msw=0, c=:orange, shape =:star, label="Sun") scatter!([x[1]], [y[1]], ms=5, msw=0, c=:blue, shape =:circle, label="Earth initial position") p4 = plot( t, E, xlabel = "Time", ylabel = "Energy", label = label, title = title*" (energy)") plot(p1, p4, margin = 10mm,size=(800,400)) end;
Here for example we read binary data from a file (from the web), where we previously created example_trajectory.dat
by running the Euler method on the time-horizon $[0,200]$ with a step size of $h=0.01$, an initial location of $(1.5,0)$, and initial vertical velocity of $1$. We stored the binary data of the trajectory of type Vector{Vector{Float64}}
in a file using the serialization mechanism and the code below reads the binary data back into u
so it can plot it.
The plot illustrates that that trajectory deviates from the desired ellipsoid solution and that the energy is not kept constant as needed.
using Serialization, HTTP h = 0.01 t = 0:h:200 u_0 = [0., 1, 1.5, 0] #Note: the data was written to file via serialize("data/example_trajectory.dat", u) and then committed to GitHub ode_web = HTTP.request("GET","https://github.com/yoninazarathy/"* "ProgrammingCourse-with-Julia-SimulationAnalysisAndLearningSystems/"* "raw/e71c6d085e0fcf79e3737c3a1db0fc56904e3f10/data/example_trajectory.dat") u = deserialize(IOBuffer(ode_web.body)) @show typeof(u) plot_solution(t,u,title = "Euler's method", label="h=$h")
typeof(u) = Vector{Vector{Float64}}
15a: Implement the Euler's method yourself and create similar plots with $h=0.001$ and $h=0.0001$ for the same time range and initial conditions. Plot plots similar to the plot above for the three values of $h$ ($0.01$, $0.001$, and $0.0001$). The trajectory plots should appear separately, but the energy plots should appear on the same plot.
15b: Repeat by implementing the RK4 method for $h=0.01$, $h=0.1$, and $h=0.5$.
15c: Comment on the efficiency of the methods and their ability to predict the trajectory, also as a function of $h$ and comparatively.
15d: Create a single animation which jointly illustrates the trajectory of the worst trajectory and the best trajectory from the 6 trajectories above. It is recommended that the animation display the running time, and trace the path of the earth in both trajectories, ideally using different colors and any other means you find for best visibility. The animation can be placed in your GitHub submission and a link to it provided with your submission.
Note: Your solution to this question does not need to deal with binary data or use serialization as we have. Here it is simply a mechanism for obtaining data for our example plot (which you can reproduce).
The equations of motion described above for a planetary system have the special property of being a Hamiltonian System. Hamiltonian Systems are of great interest to physicists and mathematicians due to their significance in physical modelling and their special properties. A (time invariant) Hamiltonian System is one where the dynamics of the system are completely determined by a smooth scalar function $H(q,p)$, for $q$ and $p \in {\mathbb R}^{\tilde{d}}$, where the dynamics are described by the system of first order differential equations for $i=1,\ldots,{\tilde d}$:
\[ \left\{ \begin{align*} \frac{dq_{i}}{dt} &= \frac{\partial H}{\partial p_{i}},\\ \frac{dp_{i}}{dt} &= -\frac{\partial H}{\partial q_{i}}. \end{align*} \right. \]
For the planetary motion example, $\tilde{d} = 2$, and the variables $q$ and $p$ can be interpreted as the position ($x \hat{i} + y \hat{j}$) and momentum ($v_{x} \hat{i} + v_{y} \hat{j}$) vectors respectively. The Hamiltonian can be interpreted as the total energy of the system and has the special separable form in terms of kinetic ($T$) and potential ($V$) energy,
\[ \begin{align*} H(q,p) = T(p) + V(q) = H(x, y, v_{x},v_{y}) = \frac{1}{2}(v_{x}^2 + v_{y}^2) - \frac{1}{\sqrt{x^2 + y^2}}. \end{align*} \]
Note that length and time scales have been resized to remove the mass and gravitational constants. It is beyond the scope of this course but a special property of Hamiltonian systems that can be exploited for numerical methods is Sympletic Geometry. The numerical methods for ODE solutions that use this property are correspondingly known as Symplectic Integrators.
A particularly simple and effective symplectic integrator that works for variable time steps ($h$) is a leapfrog integrator in kick-drift-kick form. This takes a second order ODE for $u(t) \in {\mathbb R}^{\tilde{d}}$ of the form
\[ \frac{d^2 u}{dt^2} = f(u), \qquad \frac{du}{dt} = v, \qquad \text{and} \qquad u(0) = u_0, \qquad \frac{du}{dt}(0) = v_0. \]
The leapfrog integrator uses an intermediate velocity update, $v_{\frac{1}{2}}$, computed as
\[ v_{\frac{1}{2}} = v(t) + \frac{h}{2} \, f\big(u(t)\big). \]
This value is then used to update the system via the recursion:
\[ \begin{align*} u(t + h) &= u(t) + h \, v_{\frac{1}{2}},\\ v(t + h) &= v_{\frac{1}{2}} + \frac{h}{2}\, f\big(u(t + h)\big).\\ \end{align*} \]
16a: Implement the Second Order Leapfrog method above for yourself for the second order form of the dynamics of planetary motion. Create similar plots of the Euler and RK4 methods above with $h=0.01$, $h=0.1$, and $h=0.5$.
16b: Compare the computational effort of the Leapfrog method with Euler and RK4 methods by comparing how many floating point operations per time step each method uses in your implementation. (This is not a coding question, but a theoretical question).
16c: Choose a range to time horizons to simulate over, comment on the trend of the accuracy of each method for long simulation runs. Summarize this analysis in a single compelling plot.
16d: The Julia package DifferentialEquations.jl has support for various state of the art techniques, including symplectic integrators. Choose a state of the art standard ODE solver and a symplectic integrator and use it to find a solution for this ODE system. Compare results with your own implementation of leapfrog and comment on the results.
This question presents you with an additional 10 bonus points.
Write a 1-4 paragraph summary of Megan's perspective seminar. The writeup should answer these questions without treating the questions directly. That is, don't write "the answer to Q1 is..." etc. Instead incorporate the answers in a flowing writeup report (like a blog post or letter). You may use first person and refer to the speaker as "Dr. Dawson", or "Megan" or similar. Present the writeup as part of your PDF file submission (as the first item in the submission).
The writeup should address the following questions (in paragraph form):
What was Mega's career path, and her relationship with engineering, software, mathematics, and statistics?
Where does she currently work? What does she do in her job? How do you perceive her personal experience from her job?
Discuss a few key tools that she spoke about during the talk, such as programming languages, mathematics, machine learning, statistical tools, platforms, and the like. Was this the first time that you heard about any of these tools?
Have you picked up any tips from Megan's talk? Of these, which can perhaps be of use in your career down the road?
Feel free to state anything else that you found interesting from the talk.