https://cuvids.io/video/302/watch/

https://introcs.cs.princeton.edu/java/code/
Here are the standard input and output libraries that we use throughout the textbook.
Below is a table of the Java programs in the textbook. Click on the program name to access the Java code; click on the reference number for a brief description; read the textbook for a full discussion. You can download all of the programs as introcs.jar and the data as introcs-data.zip; you can also download all of the programs and data as the IntelliJ project IntroCS.zip.
1 | ELEMENTS OF PROGRAMMING | |
---|---|---|
1.1.1 | HelloWorld.java | Hello, World |
1.1.2 | UseArgument.java | using a command-line argument |
1.2.1 | Ruler.java | string concatenation example |
1.2.2 | IntOps.java | integer multiplication and division |
1.2.3 | Quadratic.java | quadratic formula |
1.2.4 | LeapYear.java | leap year |
1.2.5 | RandomInt.java | casting to get a random integer |
1.3.1 | Flip.java | flippling a fair coin |
1.3.2 | TenHellos.java | your first while loop |
1.3.3 | PowersOfTwo.java | computing powers of 2 |
1.3.4 | DivisorPattern.java | your first nested loops |
1.3.5 | HarmonicNumber.java | harmonic numbers |
1.3.6 | Sqrt.java | Newton's method |
1.3.7 | Binary.java | converting to binary |
1.3.8 | Gambler.java | gambler's ruin simulation |
1.3.9 | Factors.java | factoring integers |
1.4.1 | Sample.java | sampling without replacement |
1.4.2 | CouponCollector.java | coupon collector simulation |
1.4.3 | PrimeSieve.java | sieve of Eratosthenes |
1.4.4 | SelfAvoidingWalk.java | self-avoiding random walks |
1.5.1 | RandomSeq.java | generating a random sequence |
1.5.2 | TwentyQuestions.java | interactive user input |
1.5.3 | Average.java | averaging a stream of numbers |
1.5.4 | RangeFilter.java | a simple filter |
1.5.5 | PlotFilter.java | standard input-to-drawing filter |
1.5.6 | BouncingBall.java | bouncing ball |
1.5.7 | PlayThatTune.java | digital signal processing |
1.6.1 | Transition.java | computing the transition matrix |
1.6.2 | RandomSurfer.java | simulating a random surfer |
1.6.3 | Markov.java | mixing a Markov chain |
2 | FUNCTIONS | |
2.1.1 | Harmonic.java | harmonic numbers (revisited) |
2.1.2 | Gaussian.java | Gaussian functions |
2.1.3 | Coupon.java | coupon collector (revisited) |
2.1.4 | PlayThatTuneDeluxe.java | play that tune (revisited) |
2.2.1 | StdRandom.java | random number library |
2.2.2 | StdArrayIO.java | array I/O library |
2.2.3 | IFS.java | iterated function systems |
2.2.4 | StdStats.java | data analysis library |
2.2.5 | StdStats.java | data analysis library |
2.2.6 | Bernoulli.java | Bernoulli trials |
2.3.1 | Euclid.java | Euclid's algorithm |
2.3.2 | TowersOfHanoi.java | towers of Hanoi |
2.3.3 | Beckett.java | Gray code |
2.3.4 | Htree.java | recursive graphics |
2.3.5 | Brownian.java | Brownian bridge |
2.3.6 | LongestCommonSubsequence.java | longest common subsequence |
2.4.1 | Percolation.java | percolation scaffolding |
2.4.2 | VerticalPercolation.java | vertical percolation |
2.4.3 | PercolationVisualizer.java | percolation visualization client |
2.4.4 | PercolationProbability.java | percolation probability estimate |
2.4.5 | Percolation.java | percolation detection |
2.4.6 | PercolationPlot.java | adaptive plot client |
3 | OBJECT ORIENTED PROGRAMMING | |
3.1.1 | PotentialGene.java | identifying a potential gene |
3.1.2 | AlbersSquares.java | Albers squares |
3.1.3 | Luminance.java | luminance library |
3.1.4 | Grayscale.java | converting color to grayscale |
3.1.5 | Scale.java | image scaling |
3.1.6 | Fade.java | fade effect |
3.1.7 | Cat.java | concatenating files |
3.1.8 | StockQuote.java | screen scraping for stock quotes |
3.1.9 | Split.java | splitting a file |
3.2.1 | Charge.java | charged-particle data type |
3.2.2 | Stopwatch.java | stopwatch data type |
3.2.3 | Histogram.java | histogram data type |
3.2.4 | Turtle.java | turtle graphics data type |
3.2.5 | Spiral.java | spira mirabilis |
3.2.6 | Complex.java | complex number data type |
3.2.7 | Mandelbrot.java | Mandelbrot set |
3.2.8 | StockAccount.java | stock account data type |
3.3.1 | Complex.java | complex number data type (revisited) |
3.3.2 | Counter.java | counter data type |
3.3.3 | Vector.java | spatial vector data type |
3.3.4 | Sketch.java | document sketch data type |
3.3.5 | CompareDocuments.java | similarity detection |
3.4.1 | Body.java | gravitational body data type |
3.4.2 | Universe.java | n-body simulation |
4 | DATA STRUCTURES | |
4.1.1 | ThreeSum.java | 3-sum problem |
4.1.2 | DoublingTest.java | validating a doubling hypothesis |
4.2.1 | Questions.java | binary search (20 questions) |
4.2.2 | Gaussian.java | bisection search |
4.2.3 | BinarySearch.java | binary search (in a sorted array) |
4.2.4 | Insertion.java | insertion sort |
4.2.5 | InsertionTest.java | doubling test for insertion sort |
4.2.6 | Merge.java | mergesort |
4.2.7 | FrequencyCount.java | frequency counts |
4.3.1 | ArrayStackOfStrings.java | stack of strings (array) |
4.3.2 | LinkedStackOfStrings.java | stack of strings (linked list) |
4.3.3 | ResizingArrayStackOfStrings.java | stack of strings (resizing array) |
4.3.4 | Stack.java | generic stack |
4.3.5 | Evaluate.java | expression evaluation |
4.3.6 | Queue.java | generic queue |
4.3.7 | MM1Queue.java | M/M/1 queue simulation |
4.3.8 | LoadBalance.java | load balancing simulation |
4.4.1 | Lookup.java | dictionary lookup |
4.4.2 | Index.java | indexing |
4.4.3 | HashST.java | hash table |
4.4.4 | BST.java | binary search tree |
4.4.5 | DeDup.java | dedup filter |
– | ST.java | symbol table data type |
– | SET.java | set data type |
4.5.1 | Graph.java | graph data type |
4.5.2 | IndexGraph.java | using a graph to invert an index |
4.5.3 | PathFinder.java | shortest-paths client |
4.5.4 | PathFinder.java | shortest-paths implementation |
4.5.5 | SmallWorld.java | small-world test |
4.5.6 | Performer.java | performer–performer graph |
Here is a list of solutions to selected coding exercises.
1 | ELEMENTS OF PROGRAMMING | |
---|---|---|
1.1.1 | TenHelloWorlds.java | ten Hello, Worlds |
1.1.5 | UseThree.java | three command-line arguments |
1.2.20 | SumOfTwoDice.java | sum of two dice |
1.2.23 | SpringSeason.java | is month and day in Spring? |
1.2.25 | WindChill.java | compute wind chill factor |
1.2.26 | CartesianToPolar.java | Cartesian to polar coordinates |
1.2.29 | DayOfWeek.java | compute day of week from date |
1.2.30 | Stats5.java | average, min, max of 5 random numbers |
1.2.34 | ThreeSort.java | sort three integers |
1.2.35 | Dragon.java | dragon curve of order 5 |
1.3.8 | FivePerLine.java | print integers five per line |
1.3.11 | FunctionGrowth.java | table of functions |
1.3.12 | DigitReverser.java | reverse digits |
1.3.13 | Fibonacci.java | Fibonacci numbers |
1.3.15 | SeriesSum.java | convergent sum |
1.3.31 | Ramanujan.java | taxicab numbers |
1.3.32 | ISBN.java | ISBN checksum |
1.3.38 | Sin.java | sine function via Taylor series |
1.3.41 | MonteHall.java | Monte Hall problem |
1.4.2 | HugeArray.java | creating a huge array |
1.4.10 | Deal.java | deal poker hands |
1.4.13 | Transpose.java | tranpose a square matrix |
1.4.25 | InversePermutation.java | compute inverse permutation |
1.4.26 | Hadamard.java | compute Hadamard matrix |
1.4.30 | Minesweeper.java | create Minesweeper board |
1.4.33 | RandomWalkers.java | N random walkers |
1.4.35 | Birthdays.java | birthday problem |
1.4.37 | BinomialCoefficients.java | binomial coefficients |
1.5.1 | MaxMin.java | max and min from standard input |
1.5.3 | Stats.java | mean and stddev from standard input |
1.5.5 | LongestRun.java | longest consecutive run from stdin |
1.5.11 | WordCount.java | word count from standard input |
1.5.15 | Closest.java | closest point |
1.5.18 | Checkerboard.java | draw a checkerboard |
1.5.21 | Rose.java | draw a rose |
1.5.22 | Banner.java | animate a text banner |
1.5.31 | Spirograph.java | draw spirograph |
1.5.32 | Clock.java | animate a clock |
1.5.33 | Oscilloscope.java | simulate an oscilloscope |
2 | FUNCTIONS | |
2.1.4 | ArrayEquals.java | are two integer arrays equal? |
2.1.30 | BlackScholes.java | Black-Scholes option valuation |
2.1.32 | Horner.java | Horner's method to evaluate a polynomial |
2.1.33 | Benford.java | Benford's law |
2.1.38 | Calendar.java | create a calendar |
2.2.1 | Gaussian.java | overloaded gaussian distribution functions |
2.2.2 | Hyperbolic.java | hyperbolic trig functions |
2.2.4 | StdRandom.java | shuffle an array of doubles |
2.2.6 | StdArrayIO.java | array IO methods |
2.2.11 | Matrix.java | matrix operations |
2.2.12 | MarkovSquaring.java | page rank via matrix squaring |
2.2.14 | StdRandom.java | exponential random variable |
2.3.14 | AnimatedHtree.java | animated H-tree |
2.3.15 | IntegerToBinary.java | integer to binary conversion |
2.3.17 | Permutations.java | all permutations |
2.3.18 | PermutationsK.java | all permutations of size k |
2.3.19 | Combinations.java | all combinations |
2.3.20 | CombinationsK.java | all combinations of size k |
2.3.22 | RecursiveSquares.java | recursive squares |
2.3.24 | GrayCode.java | Gray code |
2.3.26 | AnimatedHanoi.java | animated Towers of Hanoi |
2.3.29 | Collatz.java | Collatz function |
2.3.30 | BrownianIsland.java | Brownian island |
2.3.31 | PlasmaCloud.java | plama cloud |
2.3.32 | McCarthy.java | McCarthy's 91 function |
2.3.33 | Tree.java | fractal tree |
2.4.15 | PercolationDirectedNonrecursive.java | directed percolation |
Copyright © 2000–2019 Robert Sedgewick and Kevin Wayne. All rights reserved.
Below is a table of the booksite modules that we use throughout the textbook and booksite and beyond.
SECTION | MODULE | DESCRIPTION |
---|---|---|
1.5 | stdio.py | functions to read/write numbers and text from/to stdin and stdout |
1.5 | stddraw.py | functions to draw geometric shapes |
1.5 | stdaudio.py | functions to create, play, and manipulate sound |
2.2 | stdrandom.py | functions related to random numbers |
2.2 | stdarray.py | functions to create, read, and write 1D and 2D arrays |
2.2 | stdstats.py | functions to compute and plot statistics |
3.1 | color.py | data type for colors |
3.1 | picture.py | data type to process digital images |
3.1 | instream.py | data type to read numbers and text from files and URLs |
3.1 | outstream.py | data type to write numbers and text to files |
If you followed the instructions provided in this booksite (for Windows, Mac OS X, or Linux), then the booksite modules are installed on your computer. If you want to see the source code for the booksite modules, then click on the links in the above table, or download and unzip stdlib-python.zip.
Below is a table of the Python programs and data sets used in the textbook. Click on the program name to access the Python code; click on the data set name to access the data set; read the textbook for a full discussion. You can download all of the programs as introcs-python.zip and the data as introcs-data.zip.
1 | ELEMENTS OF PROGRAMMING | DATA | |
---|---|---|---|
1.1.1 | helloworld.py | Hello, World | – |
1.1.2 | useargument.py | using a command-line argument | – |
1.2.1 | ruler.py | string concatenation example | – |
1.2.2 | intops.py | integer operators | – |
1.2.3 | floatops.py | float operators | – |
1.2.4 | quadratic.py | quadratic formula | – |
1.2.5 | leapyear.py | leap year | – |
1.3.1 | flip.py | flipping a fair coin | – |
1.3.2 | tenhellos.py | your first loop | – |
1.3.3 | powersoftwo.py | computing powers of two | – |
1.3.4 | divisorpattern.py | your first nested loops | – |
1.3.5 | harmonic.py | harmonic numbers | – |
1.3.6 | sqrt.py | Newton's method | – |
1.3.7 | binary.py | converting to binary | – |
1.3.8 | gambler.py | gambler's ruin simulation | – |
1.3.9 | factors.py | factoring integers | – |
1.4.1 | sample.py | sampling without replacement | – |
1.4.2 | couponcollector.py | coupon collector simulation | – |
1.4.3 | primesieve.py | sieve of Eratosthenes | – |
1.4.4 | selfavoid.py | self-avoiding random walks | – |
1.5.1 | randomseq.py | generating a random sequence | – |
1.5.2 | twentyquestions.py | interactive user input | – |
1.5.3 | average.py | averaging a stream of numbers | – |
1.5.4 | rangefilter.py | a simple filter | – |
1.5.5 | plotfilter.py | standard input to draw filter | usa.txt |
1.5.6 | bouncingball.py | bouncing ball | – |
1.5.7 | playthattune.py | digital signal processing | elise.txt ascale.txt stairwaytoheaven.txt entertainer.txt firstcut.txt freebird.txt looney.txt |
1.6.1 | transition.py | computing the transition matrix | small.txt medium.txt |
1.6.2 | randomsurfer.py | simulating a random surfer | – |
1.6.3 | markov.py | mixing a Markov chain | – |
2 | FUNCTIONS | DATA | |
2.1.1 | harmonicf.py | harmonic numbers (revisited) | – |
2.1.2 | gauss.py | Gaussian functions | – |
2.1.3 | coupon.py | coupon collector (revisited) | – |
2.1.4 | playthattunedeluxe.py | play that tune (revisited) | elise.txt ascale.txt stairwaytoheaven.txt entertainer.txt firstcut.txt freebird.txt looney.txt |
2.2.1 | gaussian.py | Gaussian functions module | – |
2.2.2 | gaussiantable.py | sample Gaussian client | – |
2.2.3 | sierpinski.py | Sierpinski triangle | – |
2.2.4 | ifs.py | iterated function systems | sierpinski.txt barnsley.txt coral.txt culcita.txt cyclosorus.txt dragon.txt fishbone.txt floor.txt koch.txt spiral.txt swirl.txt tree.txt zigzag.txt |
2.2.5 | bernoulli.py | Bernoulli trials | – |
2.3.1 | euclid.py | Euclid's algorithm | – |
2.3.2 | towersofhanoi.py | towers of Hanoi | – |
2.3.3 | beckett.py | Gray code | – |
2.3.4 | htree.py | recursive graphics | – |
2.3.5 | brownian.py | Brownian bridge | – |
2.4.1 | percolationv.py | vertical percolation detection | test5.txt test8.txt |
2.4.2 | percolationio.py | percolation support functions | – |
2.4.3 | visualizev.py | vertical percolation visualization client | – |
2.4.4 | estimatev.py | vertical percolation probability estimate | – |
2.4.5 | percolation.py | percolation detection | test5.txt test8.txt |
2.4.6 | visualize.py | percolation visualization client | – |
2.4.7 | estimate.py | percolation probability estimate | – |
3 | OBJECT ORIENTED PROGRAMMING | DATA | |
3.1.1 | potentialgene.py | potential gene identification | – |
3.1.2 | chargeclient.py | charged particle client | – |
3.1.3 | alberssquares.py | Albers squares | – |
3.1.4 | luminance.py | luminance library | – |
3.1.5 | grayscale.py | converting color to grayscale | mandrill.jpg mandrill.png darwin.jpg darwin.png |
3.1.6 | scale.py | image scaling | mandrill.jpg mandrill.png darwin.jpg darwin.png |
3.1.7 | fade.py | fade effect | mandrill.jpg mandrill.png darwin.jpg darwin.png |
3.1.8 | potential.py | visualizing electric potential | charges.txt |
3.1.9 | cat.py | concatenating files | in1.txt in2.txt |
3.1.10 | stockquote.py | screen scraping for stock quotes | – |
3.1.11 | split.py | splitting a file | djia.csv |
3.2.1 | charge.py | charged-particle data type | – |
3.2.2 | stopwatch.py | stopwatch data type | – |
3.2.3 | histogram.py | histogram data type | – |
3.2.4 | turtle.py | turtle graphics data type | – |
3.2.5 | koch.py | Koch curve | – |
3.2.6 | spiral.py | spira mirabilis | – |
3.2.7 | drunk.py | drunken turtle | – |
3.2.8 | drunks.py | drunken turtles | – |
3.2.9 | complex.py | complex number data type | – |
3.2.10 | mandelbrot.py | Mandelbrot set | – |
3.2.11 | stockaccount.py | stock account data type | turing.txt |
3.3.1 | complexpolar.py | complex numbers (revisited) | – |
3.3.2 | counter.py | counter data type | – |
3.3.3 | vector.py | spatial vector data type | – |
3.3.4 | sketch.py | sketch data type | genome20.txt |
3.3.5 | comparedocuments.py | similarity detection | documents.txt constitution.txt tomsawyer.txt huckfinn.txt prejudice.txt djia.csv amazon.html actg.txt |
3.4.1 | body.py | gravitational body data type | – |
3.4.2 | universe.py | n-body simulation | 2body.txt 3body.txt 4body.txt 2bodytiny.txt |
4 | DATA STRUCTURES | DATA | |
4.1.1 | threesum.py | 3-sum problem | 8ints.txt 1kints.txt 2kints.txt 4kints.txt 8kints.txt 16kints.txt 32kints.txt 64kints.txt 128kints.txt |
4.1.2 | doublingtest.py | validating a doubling hypothesis | – |
4.1.3 | timeops.py | timing operators and functions | – |
4.1.4 | bigarray.py | discovering memory capacity | – |
4.2.1 | questions.py | binary search (20 questions) | – |
4.2.2 | bisection.py | binary search (inverting a function) | – |
4.2.3 | binarysearch.py | binary search (sorted array) | emails.txt white.txt |
4.2.4 | insertion.py | insertion sort | tiny.txt tomsawyer.txt |
4.2.5 | timesort.py | doubling test for sorting functions | – |
4.2.6 | merge.py | mergesort | tiny.txt tomsawyer.txt |
4.2.7 | frequencycount.py | frequency counts | leipzig100k.txt leipzig200k.txt leipzig1m.txt |
4.3.1 | arraystack.py | stack (resizing array implementation) | tobe.txt |
4.3.2 | linkedstack.py | stack (linked list implementation) | tobe.txt |
4.3.3 | evaluate.py | expression evaluation | expression1.txt expression2.txt |
4.3.4 | linkedqueue.py | queue (linked list implementation) | tobe.txt |
4.3.5 | mm1queue.py | M/M/1 queue simulation | – |
4.3.6 | loadbalance.py | load balancing simulation | – |
4.4.1 | lookup.py | dictionary lookup | amino.csv djia.csv elements.csv ip.csv ip-by-country.csv morse.csv phone-na.csv |
4.4.2 | index.py | indexing | mobydick.txt tale.txt |
4.4.3 | hashst.py | hash symbol table data type | – |
4.4.4 | bst.py | BST symbol table data type | – |
4.5.1 | graph.py | graph data type | tinygraph.txt |
4.5.2 | invert.py | using a graph to invert an index | tinygraph.txt movies.txt |
4.5.3 | separation.py | shortest-paths client | routes.txt movies.txt |
4.5.4 | pathfinder.py | shortest-paths client | – |
4.5.5 | smallworld.py | small-world test | tinygraph.txt |
4.5.6 | performer.py | performer-performer graph | tinymovies.txt moviesg.txt |
https://introcs.cs.princeton.edu/java/assignments/sequence.html
COS 126
Global Sequence Alignment |
Programming Assignment |
Write a program to compute the optimal sequence alignment of two DNA strings. This program will introduce you to the emerging field of computational biology in which computers are used to do research on biological systems. Further, you will be introduced to a powerful algorithmic design paradigm known as dynamic programming.
Biology review. A genetic sequence is a string formed from a four-letter alphabet {Adenine (A), Thymine (T), Guanine (G), Cytosine (C)} of biological macromolecules referred to together as the DNA bases. A gene is a genetic sequence that contains the information needed to construct a protein. All of your genes taken together are referred to as the human genome, a blueprint for the parts needed to construct the proteins that form your cells. Each new cell produced by your body receives a copy of the genome. This copying process, as well as natural wear and tear, introduces a small number of changes into the sequences of many genes. Among the most common changes are the substitution of one base for another and the deletion of a substring of bases; such changes are generally referred to as point mutations. As a result of these point mutations, the same gene sequenced from closely related organisms will have slight differences.
The problem. Through your research you have found the following sequence of a gene in a previously unstudied organism.
A A C A G T T A C C
What is the function of the protein that this gene encodes? You could begin a series of uninformed experiments in the lab to determine what role this gene plays. However, there is a good chance that it is a variant of a known gene in a previously studied organism. Since biologists and computer scientists have laboriously determined (and published) the genetic sequence of many organisms (including humans), you would like to leverage this information to your advantage. We'll compare the above genetic sequence with one which has already been sequenced and whose function is well understood.
T A A G G T C A
If the two genetic sequences are similar enough, we might expect them to have similar functions. We would like a way to quantify "similar enough."
Edit-distance. In this assignment we will measure the similarity of two genetic sequences by their edit distance, a concept first introduced in the context of coding theory, but which is now widely used in spell checking, speech recognition, plagiarism detection, file revisioning, and computational linguistics. We align the two sequences, but we are permitted to insert gaps in either sequence (e.g., to make them have the same length). We pay a penalty for each gap that we insert and also for each pair of characters that mismatch in the final alignment. Intuitively, these penalties model the relative likeliness of point mutations arising from deletion/insertion and substitution. We produce a numerical score according to the following simple rule, which is widely used in biological applications:
Penalty Cost Per gap 2 Per mismatch 1 Per match 0
As an example, two possible alignments of AACAGTTACC and TAAGGTCA are:
Sequence 1 A A C A G T T A C C Sequence 2 T A A G G T C A - - Penalty 1 0 1 1 0 0 1 0 2 2
Sequence 1 A A C A G T T A C C Sequence 2 T A - A G G T - C A Penalty 1 0 2 0 0 1 0 2 0 1
The first alignment has a score of 8, while the second one has a score of 7. The edit-distance is the score of the best possible alignment between the two genetic sequences over all possible alignments. In this example, the second alignment is in fact optimal, so the edit-distance between the two strings is 7. Computing the edit-distance is a nontrivial computational problem because we must find the best alignment among exponentially many possibilities. For example, if both strings are 100 characters long, then there are more than 10^75 possible alignments.
A recursive solution. We will calculate the edit-distance between the two original strings x and y by solving many edit-distance problems on the suffixes of the two strings. We use the notation x[i] to refer to character i of the string. We also use the notation x[i..M] to refer to the suffix of x consisting of the characters x[i], x[i+1], ..., x[M-1]. Finally, we use the notation opt[i][j] to denote the edit distance of x[i..M] and y[j..N]. For example, consider the two strings x = "AACAGTTACC" and y = "TAAGGTCA" of length M = 10 and N = 8, respectively. Then, x[2] is 'C', x[2..M] is "CAGTTACC", and y[8..N] is the empty string. The edit distance of x and y is opt[0][0].
Now we describe a recursive scheme for computing the edit distance of x[i..M] and y[j..N]. Consider the first pair of characters in an optimal alignment of x[i..M] with y[j..N]. There are three possibilities:
The key observation is that all of the resulting subproblems are sequence alignment problems on suffixes of the original inputs. To summarize, we can compute opt[i][j] by taking the minimum of three quantities:
opt[i][j] = min { opt[i+1][j+1] + 0/1, opt[i+1][j] + 2, opt[i][j+1] + 2 }
This equation works assuming i < M and j < N. Aligning an empty string with another string of length k requires inserting k gaps, for a total cost of 2k. Thus, in general we should set opt[M][j] = 2(N-j) and opt[i][N] = 2(M-i). For our example, the final matrix is:
| 0 1 2 3 4 5 6 7 8 x\y | T A A G G T C A - ----------------------------------- 0 A | 7 8 10 12 13 15 16 18 20 1 A | 6 6 8 10 11 13 14 16 18 2 C | 6 5 6 8 9 11 12 14 16 3 A | 7 5 4 6 7 9 11 12 14 4 G | 9 7 5 4 5 7 9 10 12 5 T | 8 8 6 4 4 5 7 8 10 6 T | 9 8 7 5 3 3 5 6 8 7 A | 11 9 7 6 4 2 3 4 6 8 C | 13 11 9 7 5 3 1 3 4 9 C | 14 12 10 8 6 4 2 1 2 10 - | 16 14 12 10 8 6 4 2 0
By examining opt[0][0], we conclude that the edit distance of x and y is 7.
A dynamic programming approach. A direct implementation of the above recursive scheme will work, but it is spectacularly inefficient. If both input strings have N characters, then the number of recursive calls will exceed 2^N. To overcome this performance bug, we use dynamic programming. (Read the first section of Section 9.6 for an introduction to this technique.) Dynamic programming is a powerful algorithmic paradigm, first introduced by Bellman in the context of operations research, and then applied to the alignment of biological sequences by Needleman and Wunsch. Dynamic programming now plays the leading role in many computational problems, including control theory, financial engineering, and bioinformatics, including BLAST (the sequence alignment program almost universally used by molecular biologist in their experimental work). The key idea of dynamic programming is to break up a large computational problem into smaller subproblems, store the answers to those smaller subproblems, and, eventually, use the stored answers to solve the original problem. This avoids recomputing the same quantity over and over again. Instead of using recursion, use a nested loop that calculates opt[i][j] in the right order so that opt[i+1][j+1], opt[i+1][j], and opt[i][j+1] are all computed before we try to compute opt[i][j].
Recovering the alignment itself. The above procedure describes how to compute the edit distance between two strings. We now outline how to recover the optimal alignment itself. The key idea is to retrace the steps of the dynamic programming algorithm backwards, re-discovering the path of choices (highlighted in red in the table above) from opt[0][0] to opt[M][N]. To determine the choice that led to opt[i][j], we consider the three possibilities:
Depending on which of the three cases apply, we move diagonally, down, or right towards opt[M][N], printing out x[i] aligned with y[j] (case 1), x[i] aligned with a gap (case 2), or y[j] aligned with a gap (case 3). In the example above, we know that the first T aligns with the first A because opt[0][0] = opt[1][1] + 1, but opt[0][0] ≠ opt[1][0] + 2 and opt[0][0] ≠ opt[0][1] + 2. The optimal alignment is:
A A C A G T T A C C T A - A G G T - C A 1 0 2 0 0 1 0 2 0 1
Analysis. Test your program using the example provided above, as well as the genomic data sets referred to in the checklist. Estimate the running time and memory usage of your program as a function of the lengths of the two input strings M and N. For simplicity, assume M = N in your analysis.
Submission. Submit the files: EditDistance.java and readme.txt.
This assignment was created by Thomas Clarke, Robert Sedgewick, Scott Vafai and Kevin Wayne.
Copyright © 2002.This assignment allows optional partnering. If you choose to do this, you must follow the pair programming guidelines. Please click the link and review them before you begin. Your partner can be from another precept (but ISC students may only partner with other ISC students). Please note that writing code with a partner without following the pair programming instructions is a violation of the course collaboration policy. All writing of code, comments, analysis and uploading to the dropbox should be done together from start to finish. If you come to office hours alone, you can get advice, but you may not change any code until both partners are together.
What are the main goals of this assignment? You will (i) solve a fundamental problem in computational biology, (ii) learn about the analysis of algorithms, and (iii) learn about a powerful programming paradigm known as dynamic programming.
How do I read in the two input strings from the file? Use StdIn.readString() and redirection as usual.
How do I access the length of a string s? The ith character? Use s.length() and s.charAt(i), respectively. As with arrays, indices start at 0. We'll learn about this notation for manipulating (String) objects in Section 3.1. For this assignment, this is all you'll need to know about objects.
Can I assume that the input characters will always be A, C, G or T? NO! Your program should work equally well for any letter, upper case or lower case.
What's a StringIndexOutOfBoundsException? It's just like an ArrayOutOfBoundsException. It results from invoking s.charAt(i) with an illegal value of i.
How could I get a NullPointerException? Did you forget to allocate memory for opt[][]?
How do I declare and initialize a two dimensional array in Java? Review the end of Section 1.4 in Intro to Programming.
It seems strange to define x[i..M] to be the substring consisting of x[i] through x[M-1] instead of x[i] through x[M]. Is this a typo? It's a little strange, but no, it's not a typo. It's consistent with Java's indexing notation where the left endpoint is inclusive and the right endpoint is exclusive.
Which alignment should I print out if there are two or more optimal ones? Output any one you like.
Should gaps be handled by penalty()? The solution we think is clearest does not call penalty() on gaps, since a gap is not a character. Of course, the alignment output should use the '-'symbol to denote a gap, and the inputs we give you will never contain '-' (only alphanumeric characters are used), so if you find it convenient for penalty() to recognize '-' as a gap, you are permitted to do so.
Where can I learn more about dynamic programming and backtracking? The LCS (longest common subsequence) problem described in booksite section 9.6 is an example of a dynamic programming problem on strings with backtracking. However, it is different from the current problem in many ways, so do not simply mimic the code without understanding what it does. The websheet exercises for this week include several dynamic programming exercises starting from a basic level, including KnapsackBacktrack which is an example of backtracking.
What does OutOfMemoryError mean? When Java runs, it requests a certain amount of memory from the operating system. The exact amount depends on the version of Java and your computer, but can vary from 64MB to 1024MB (1GB). After Java has started, the total size of all variables in use cannot be larger than what it originally requested. Trying to do so causes anOutOfMemoryError.
For this assignment, the largest test cases use huge arrays, and Java needs to ask for enough memory from the operating system. To explicitly ask for for more (or less) memory, use the -Xmxflag. For example, to request 500 megabytes (500 MB) of memory for a run, use
java-introcs -Xmx500m EditDistance < input.txt
Here 500m means 500 MB. You should adjust this number depending on the amount of memory your computer has and the size of the arrays you will need for the data set you are running. The amount 500MB should get you through ecoli10000.txt. To run ecoli20000.txt you will need to request more memory. (How much? The readme asks you to estimate this.)
What does "Could not reserve enough space for object heap" mean? This occurs if you use -Xmx with a value that is larger than the amount of available physical memory. Additionally, due to address space limitations, some 32-bit versions of Windows also will give this error if you try to request more than approximately 1.5GB, no matter how much physical memory is installed.
How do I determine how much physical memory is installed on my computer? On Mac, select About this Mac from the Apple menubar. On Windows, press Windows-R (or Run on the Start menu), enter msinfo32
and look for total physical memory.
How can I measure how long my program takes on each file? To measure the running time of your program, there are a few techniques.
java-introcs -Xmx500m EditDistance < input.txt > output.txt
and use a stopwatch. We redirect the output to a file to prevent printing text from becoming a bottleneck.
java-introcs -Xprof -Xmx500m EditDistance < input.txt > output.txt
The timing information will appear at the end of the file output.txt, and you want the "flat profile" for main. The line will look like
Flat profile of 4.5 secs (15 total ticks): main
but these numbers are made up and yours will be different. We don't care about the ticks.Piping can be useful here. You can skip the output file in the previous step by piping your output to another program that will look for "Flat profile". and it should print out the time (and throw away all the other program output). On a Mac, run
java-introcs -Xprof -Xmx500m EditDistance < input.txt | grep "main"
On Windows, use find instead of grep:
java-introcs -Xprof -Xmx500m EditDistance < input.txt | find "main"
This find/grep command searches through all of whatever text it is fed and only prints out the lines containing the text main. Type man grep (in Terminal) or find /? (in Command Prompt) for more information.
How do I use a cluster machine in Friend 016/017? See this page for general instructions. Please also read the first bullet point for the question immediately below.
My timing data do not fit a polynomial hypothesis. What could I be doing wrong?
Testing. To help you check the part of your program that generates the alignment, there are many test files in the sequence directory.
ecoli2500.txt 118 ecoli5000.txt 160 fli8.txt 6 fli9.txt 4 fli10.txt 2 ftsa1272.txt 758 gene57.txt 8 stx1230.txt 521 stx19.txt 10 stx26.txt 17 stx27.txt 19
% java-introcs EditDistance < endgaps7.txt % java-introcs EditDistance < fli10.txt Edit distance = 4 Edit distance = 2 a - 2 T T 0 t t 0 G G 0 a a 0 G G 0 t t 0 C T 1 t t 0 G G 0 a a 0 G G 0 t t 0 A T 1 - a 2 A A 0 C C 0 T T 0
These are purely suggestions for how you might make progress. You do not have to follow these steps.
// return the penalty for aligning character a with character b public static int penalty(char a, char b) // return the min of 3 integers public static int min(int a, int b, int c)
You will call these from your main method to compute penalties and to determine which of the three cases yields the minimum edit distance.
System.out.printf("%3d", opt[i][j]);
with nested for loops. Remember to remove this debugging print section before submitting the final version of your program.
Test this part thoroughly. For example, one corner case to test is to make sure that you print out the ENTIRE alignment, even when one sequence finishes before the other. (Use lastXgaps9.txt and lastYgaps9.txt to test.)
private static int MISMATCH = 1;
(and two other similar ones for gaps and matches). This makes your code easy to use if a different set of penalties is desired.
opt
matrix?) With more work, you can also compute the optimal alignment in linear space (and quadratic time). This is known as Hirschberg's algorithm (1975).The idea of calling one function from another immediately suggests the possibility of a function calling itself. The function-call mechanism in Java supports this possibility, which is known as recursion.
The "Hello, World" for recursion is the factorial function, which is defined for positive integers n by the equation
n!=n×(n−1)×(n−2)×…×2×1n!=n×(n−1)×(n−2)×…×2×1
The quantity n! is easy to compute with a for loop, but an even easier method in Factorial.java is to use the following recursive function:
public static long factorial(int n) { if (n == 1) return 1; return n * factorial(n-1); }
We can trace this computation in precisely the same way that we trace any sequence of function calls.
factorial(5) factorial(4) factorial(3) factorial(2) factorial(1) return 1 return 2*1 = 2 return 3*2 = 6 return 4*6 = 24 return 5*24 = 120
Our factorial() implementation exhibits the two main components that are required for every recursive function.
Recursive programming is directly related to mathematical induction, a technique for proving facts about natural numbers. Proving that a statement involving an integer n is true for infinitely many values of n by mathematical induction involves the following two steps:
Such a proof suffices to show that the statement is true for infinitely many values of n: we can start at the base case, and use our proof to establish that the statement is true for each larger value of n, one by one.
The greatest common divisor (gcd) of two positive integers is the largest integer that divides evenly into both of them. For example, the gcd(102, 68) = 34.
We can efficiently compute the gcd using the following property, which holds for positive integers p and q:
If p > q, the gcd of p and q is the same as the gcd of q and p % q.
The static method gcd() in Euclid.java is a compact recursive function whose reduction step is based on this property.
gcd(1440, 408) gcd(408, 216) gcd(216, 192) gcd(192, 24) gcd(24, 0) return 24 return 24 return 24 return 24 return 24
In the towers of Hanoi problem, we have three poles and n discs that fit onto the poles. The discs differ in size and are initially stacked on one of the poles, in order from largest (disc n) at the bottom to smallest (disc 1) at the top. The task is to move all ndiscs to another pole, while obeying the following rules:
Recursion provides just the plan that we need: First we move the top n−1 discs to an empty pole, then we move the largest disc to the other empty pole, then complete the job by moving the n−1 discs onto the largest disc. TowersOfHanoi.java is a direct implementation of this strategy.
Let T(n) be the number of move directives issued by TowersOfHanoi.java to move n discs from one peg to another. Then, T(n) must satisfy the following equation:
T(n)=2T(n−1) for n>1, with T(1)=1T(n)=2T(n−1) for n>1, with T(1)=1
Such an equation is known in discrete mathematics as a recurrence relation. We can often use them to derive a closed-form expression for the quantity of interest. For example, T(1) = 1, T(2) = 3, T(3) = 7, and T(4) = 15. In general, T(n) = 2n − 1. Assuming the monks move discs at the rate of one per second, it would take them more 5.8 billion centuries to solve the 64-disc problem.
An n-bit Gray code is a list of the 2n different n-bit binary numbers such that each entry in the list differs in precisely one bit from its predecessor. The n bit binary reflected Gray code is defined recursively as follows:
The 0-bit code is defined to be null, so the 1-bit code is 0 followed by 1.
![]() |
![]() |
Beckett.java uses an n-bit Gray code to print stage directions for an n-character play in such a way that characters enter and exit one at a time so that each subset of characters on the stage appears exactly once.
Simple recursive drawing schemes can lead to pictures that are remarkably intricate. For example, an H-tree of order n is defined as follows: The base case is null for n = 0. The reduction step is to draw, within the unit square three lines in the shape of the letter H four H-trees of order n − 1, one connected to each tip of the H with the additional provisos that the H-trees of order n − 1 are centered in the four quadrants of the square, halved in size.
![]() |
![]() |
![]() |
![]() |
![]() |
Htree.java takes a command-line argument n, and plots to standard drawing an H-tree of order n. An H-tree is a simple example of a fractal: a geometric shape that can be divided into parts, each of which is (approximately) a reduced size copy of the original.
Brownian.java produces a function graph that approximates a simple example of fractional Brownian motion known as Brownian bridge. You can think of this graph as a random walk that connects the two points (x0, y0) and (x1, y1), controlled by a few parameters. The implementation is based on the midpoint displacement method, which is a recursive plan for drawing the plot within the x-interval [x0, x1]. The base case (when the size of the interval is smaller than a given tolerance) is to draw a straight line connecting the two endpoints. The reduction case is to divide the interval into two halves, proceeding as follows:
The shape of the curve is controlled by two parameters: the volatility (initial value of the variance) controls the distance the graph strays from the straight line connecting the points, and the Hurst exponent controls the smoothness of the curve.
With recursion, you can write compact and elegant programs that fail spectacularly at runtime.
public static double harmonic(int n) { return harmonic(n-1) + 1.0/n; }
If you call this function, it will repeatedly call itself and never return.
public static double harmonic(int n) { if (n == 1) return 1.0; return harmonic(n) + 1.0/n; }
public static double harmonic(int n) { if (n == 0) return 0.0; return harmonic(n-1) + 1.0/n; }
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, ...
is defined by the formula
Fn=Fn−1+Fn−2 for n≥2, with F0=0 and F1=1Fn=Fn−1+Fn−2 for n≥2, with F0=0 and F1=1
A novice programmer might implement this recursive function to compute numbers in the Fibonacci sequence, as in Fibonacci.java:
// Warning: spectacularly inefficient. public static long fibonacci(int n) { if (n == 0) return 0; if (n == 1) return 1; return fibonacci(n-1) + fibonacci(n-2); }
However, this program is spectacularly inefficient! To see why it is futile to do so, consider what the function does to compute fibonacci(8) = 21. It first computes fibonacci(7) = 13 and fibonacci(6) = 8. To compute fibonacci(7), it recursively computes fibonacci(6) = 8 again and fibonacci(5) = 5. Things rapidly get worse. The number of times this program computes fibonacci(1)when computing fibonacci(n) is precisely Fn.
A general approach to implementing recursive programs, The basic idea of dynamic programming is to recursively divide a complex problem into a number of simpler subproblems; store the answer to each of these subproblems; and, ultimately, use the stored answers to solve the original problem. By solving each subproblem only once (instead of over and over), this technique avoids a potential exponential blow-up in the running time.
public static long fibonacci(int n) { long[] f = new long[n+1]; f[0] = 0; f[1] = 1; for (int i = 2; i <= n; i++) f[i] = f[i-1] + f[i-2]; return f[n]; }
- - G G C - - A - C C A C G A C G G C G G A T - - A C G
In general, if we let opt[i][j] denote the length of the LCS of the suffixes s[i..m) and t[j..n), then the following recurrence holds:
opt[i][j] = 0 if i = m or j = n = opt[i+1][j+1] + 1 if s[i] = t[j] = max(opt[i][j+1], opt[i+1][j]) otherwise
The final challenge is to recover the longest common subsequence itself, not just its length. The key idea is to retrace the steps of the dynamic programming algorithm backward, rediscovering the path of choices (highlighted in gray in the diagram) from opt[0][0] to opt[m][n]. To determine the choice that led to opt[i][j], we consider the three possibilities:
public static boolean gcdlike(int p, int q) { if (q == 0) return (p == 1); return gcdlike(q, p % q); }
public static int mystery(int a, int b) { if (b == 0) return 0; if (b % 2 == 0) return mystery(a+a, b/2); return mystery(a+a, b/2) + a; }
What are the values of mystery(2, 25) and mystery(3, 11)? Given positive integers a and b, describe what value mystery(a, b) computes. Answer the same question, but replace + with * and replace return 0 with return 1.
Solution: 50 and 33. It computes a*b. If you replace + with *, it computes a^b.
Next, rearrange the order of the recursive calls (and the base case), view the resulting animation, and explain each outcome.
bca cba cab acb bac abc
ab ac ad ba bc bd ca cb cd da db dc
a ab abc ac b bc c
Note that your program needs to print the empty string (subset of size 0).
abc abd abe acd ace ade bcd bce bde cde
Alternate solution using arrays instead of strings: Comb2.java.
RecursiveSquares.java gives a solution to the first pattern.
public static void collatz(int n) { StdOut.print(n + " "); if (n == 1) return; if (n % 2 == 0) collatz(n / 2); else collatz(3*n + 1); }
For example, a call to collatz(7) prints the sequence
7 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1
as a consequence of 17 recursive calls. Write a program that takes a command-line argument n and returns the value of i < n for which the number of recursive calls for collatz(i) is maximized. Hint: use memoization. The unsolved problem is that no one knows whether the function terminates for all integers (mathematical induction is no help because one of the recursive calls is for a larger value of the argument).
public static int mcCarthy(int n) { if (n > 100) return n - 10; else return mcCarthy(mcCarthy(n+11)); }
Determine the value of mcCarthy(50) without using a computer. Give the number of recursive calls used by mcCarthy()to compute this result. Prove that the base case is reached for all positive integers n or find a value of n for which this function goes into a recursive loop.
f(N) = 1 if N = 0 = 1 + 1 / f(N-1) if N > 0
Redo, but do not use recursion.
public static int mystery(int a, int b) { if (0 == b) return 0; else return a + mystery(a, b-1); }
Will the function in the previous exercise terminate for every pair of integers a and b between between 0 and 100? Give a high level description of what mystery(a, b) returns, given integers a and b between 0 and 100.
Answer: mystery(1, 7) = 1 + mystery(1, 6) = 1 + (1 + mystery(1, 5)) = ... 7 + mystery(1, 0) = 7.
Answer: Yes, the base case is b = 0. Successive recursive calls reduce b by 1, driving it toward the base case. The function mystery(a, b) returns a * b. Mathematically inclined students can prove this fact via induction using the identity ab = a + a(b-1).
public static void mystery(int a, int b) { if (a != b) { int m = (a + b) / 2; mystery(a, m); StdOut.println(m); mystery(m, b); } }
Answer: infinite loop.
public static void mystery(int a, int b) { if (a != b) { int m = (a + b) / 2; mystery(a, m - 1); StdOut.println(m); mystery(m + 1, b); } }
Answer: stack overflow.
public static int mystery(int a, int b) { if (a == b) StdOut.println(a); else { int m1 = (a + b ) / 2; int m2 = (a + b + 1) / 2; mystery(a, m1); mystery(m2, b); } }
public static int f(int n) { if (n == 0) return 0; if (n == 1) return 1; if (n == 2) return 1; return 2*f(n-2) + f(n-3);
F(n) = 1 if n = 1 or n = 2 = F((n+1)/2)2 + F((n-1)/2)2 if n is odd = F(n/2 + 1)2 - F(n/2 - 1)2 if n is even
What is the biggest Fibonacci number you can compute in under a minute using this definition? Compare this toFibonacci.java.
F(0) = 0 F(1) = 1 F(2n-1) = F(n-1)^2 + F(n)^2 F(2n) = (2F(n-1) + F(n)) * F(n)
public static void mystery(int n) { if (n == 0 || n == 1) return; mystery(n-2); StdOut.println(n); mystery(n-1); }
What does mystery(6) print out? Hint: first figure out what mystery(2), mystery(3), and so forth print out.
if (n == 0) return;
public static int square(int n) { if (n == 0) return 0; return square(n-1) + 2*n - 1; } public static int cube(int n) { if (n == 0) return 0; return cube(n-1) + 3*(square(n)) - 3*n + 1; }
What is the value of square(5)? cube(5)? cube(123)?
public static int f(int n) { public static int g(int n) { if (n == 0) return 0; if (n == 0) return 0; return f(n-1) + g(n-1); return g(n-1) + f(n); } }
% java Sequence 5 23 23 = ((5 * 2 + 1) * 2 + 1) % java Sequence 11 113 113 = ((((11 + 1) + 1) + 1) * 2 * 2 * 2 + 1)
* * * Q * * * * * * * * Q * * * * Q * * * * * * * Q * * * * * * * * * * * * Q * * * * Q * * * * * * Q * * * * * * * * * * * Q * * * * * * Q * * * * Q * * * * * * * * * * * * Q * * * * * * * Q * * * * Q * * * * * * * * Q * * Q * * * * * * * Q * * * * * * *
Hint: to determine whether setting q[n] = i conflicts with q[0] through q[n-1]
public static int f(int x) { if (x > 1000) return x - 4; else return f(f(x+5)); }
public static String expr(double p) { double r = Math.random(); if (r <= 1*p) return "(" + expr(p) + " + " + expr(p) + ")"; if (r <= 2*p) return "(" + expr(p) + " * " + expr(p) + ")"; return "" + (int) (100 * Math.random()); }
0 = {} = {} 1 = {0} = {{}} 2 = {0, 1} = {{}, {{}}} 3 = {0, 1, 2} = {{}, {{}}, {{}, {{}}}}
Write a program Ordinal.java with a recursive function vonNeumann() that takes a nonnegative integer N and returns a string representation of the von Neumann integer N. This is a method for defining ordinals in set theory.
% java Subsequence abcd 3 abc abd acd bcd
s = "ab" t = "CD" abCD CabD aCbD CaDb aCDb CDab
% java Partition 4 % java Partition 6 4 6 3 1 5 1 2 2 4 2 2 1 1 4 1 1 1 1 1 1 3 3 3 2 1 3 1 1 1 2 2 2 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1
% java JohnsonTrotter 3 012 (2 1) 021 (1 0) 201 (2 1) 210 (0 1) 120 (1 2) 102 (0 1)
% java PermutationsLex 3 012 021 102 120 201 210
Here's an 800-by-800 example. Here's a reference, including a simple 1D version. Note: some visual artifacts are noticeable parallel to the x and y axes. Doesn't have all of the statistical properties of 2D fractional Brownian motion.
// DO NOT RUN THIS CODE FOR LARGE INPUTS public static long binomial(int n, int k) { if (k == 0) return 1; if (n == 0) return 0; return binomial(n-1, k) + binomial(n-1, k-1); }
Copyright © 2000–2019 Robert Sedgewick and Kevin Wayne. All rights reserved.