Category Archives: cos 126
InformIT COS 126 Videos
COS 126 Java Programming
https://introcs.cs.princeton.edu/java/code/
Java Programs in the Textbook
Standard libraries.
Here are the standard input and output libraries that we use throughout the textbook.
Programs in the textbook.
Below is a table of the Java programs in the textbook. Click on the program name to access the Java code; click on the reference number for a brief description; read the textbook for a full discussion. You can download all of the programs as introcs.jar and the data as introcs-data.zip; you can also download all of the programs and data as the IntelliJ project IntroCS.zip.
1 | ELEMENTS OF PROGRAMMING | |
---|---|---|
1.1.1 | HelloWorld.java | Hello, World |
1.1.2 | UseArgument.java | using a command-line argument |
1.2.1 | Ruler.java | string concatenation example |
1.2.2 | IntOps.java | integer multiplication and division |
1.2.3 | Quadratic.java | quadratic formula |
1.2.4 | LeapYear.java | leap year |
1.2.5 | RandomInt.java | casting to get a random integer |
1.3.1 | Flip.java | flippling a fair coin |
1.3.2 | TenHellos.java | your first while loop |
1.3.3 | PowersOfTwo.java | computing powers of 2 |
1.3.4 | DivisorPattern.java | your first nested loops |
1.3.5 | HarmonicNumber.java | harmonic numbers |
1.3.6 | Sqrt.java | Newton's method |
1.3.7 | Binary.java | converting to binary |
1.3.8 | Gambler.java | gambler's ruin simulation |
1.3.9 | Factors.java | factoring integers |
1.4.1 | Sample.java | sampling without replacement |
1.4.2 | CouponCollector.java | coupon collector simulation |
1.4.3 | PrimeSieve.java | sieve of Eratosthenes |
1.4.4 | SelfAvoidingWalk.java | self-avoiding random walks |
1.5.1 | RandomSeq.java | generating a random sequence |
1.5.2 | TwentyQuestions.java | interactive user input |
1.5.3 | Average.java | averaging a stream of numbers |
1.5.4 | RangeFilter.java | a simple filter |
1.5.5 | PlotFilter.java | standard input-to-drawing filter |
1.5.6 | BouncingBall.java | bouncing ball |
1.5.7 | PlayThatTune.java | digital signal processing |
1.6.1 | Transition.java | computing the transition matrix |
1.6.2 | RandomSurfer.java | simulating a random surfer |
1.6.3 | Markov.java | mixing a Markov chain |
2 | FUNCTIONS | |
2.1.1 | Harmonic.java | harmonic numbers (revisited) |
2.1.2 | Gaussian.java | Gaussian functions |
2.1.3 | Coupon.java | coupon collector (revisited) |
2.1.4 | PlayThatTuneDeluxe.java | play that tune (revisited) |
2.2.1 | StdRandom.java | random number library |
2.2.2 | StdArrayIO.java | array I/O library |
2.2.3 | IFS.java | iterated function systems |
2.2.4 | StdStats.java | data analysis library |
2.2.5 | StdStats.java | data analysis library |
2.2.6 | Bernoulli.java | Bernoulli trials |
2.3.1 | Euclid.java | Euclid's algorithm |
2.3.2 | TowersOfHanoi.java | towers of Hanoi |
2.3.3 | Beckett.java | Gray code |
2.3.4 | Htree.java | recursive graphics |
2.3.5 | Brownian.java | Brownian bridge |
2.3.6 | LongestCommonSubsequence.java | longest common subsequence |
2.4.1 | Percolation.java | percolation scaffolding |
2.4.2 | VerticalPercolation.java | vertical percolation |
2.4.3 | PercolationVisualizer.java | percolation visualization client |
2.4.4 | PercolationProbability.java | percolation probability estimate |
2.4.5 | Percolation.java | percolation detection |
2.4.6 | PercolationPlot.java | adaptive plot client |
3 | OBJECT ORIENTED PROGRAMMING | |
3.1.1 | PotentialGene.java | identifying a potential gene |
3.1.2 | AlbersSquares.java | Albers squares |
3.1.3 | Luminance.java | luminance library |
3.1.4 | Grayscale.java | converting color to grayscale |
3.1.5 | Scale.java | image scaling |
3.1.6 | Fade.java | fade effect |
3.1.7 | Cat.java | concatenating files |
3.1.8 | StockQuote.java | screen scraping for stock quotes |
3.1.9 | Split.java | splitting a file |
3.2.1 | Charge.java | charged-particle data type |
3.2.2 | Stopwatch.java | stopwatch data type |
3.2.3 | Histogram.java | histogram data type |
3.2.4 | Turtle.java | turtle graphics data type |
3.2.5 | Spiral.java | spira mirabilis |
3.2.6 | Complex.java | complex number data type |
3.2.7 | Mandelbrot.java | Mandelbrot set |
3.2.8 | StockAccount.java | stock account data type |
3.3.1 | Complex.java | complex number data type (revisited) |
3.3.2 | Counter.java | counter data type |
3.3.3 | Vector.java | spatial vector data type |
3.3.4 | Sketch.java | document sketch data type |
3.3.5 | CompareDocuments.java | similarity detection |
3.4.1 | Body.java | gravitational body data type |
3.4.2 | Universe.java | n-body simulation |
4 | DATA STRUCTURES | |
4.1.1 | ThreeSum.java | 3-sum problem |
4.1.2 | DoublingTest.java | validating a doubling hypothesis |
4.2.1 | Questions.java | binary search (20 questions) |
4.2.2 | Gaussian.java | bisection search |
4.2.3 | BinarySearch.java | binary search (in a sorted array) |
4.2.4 | Insertion.java | insertion sort |
4.2.5 | InsertionTest.java | doubling test for insertion sort |
4.2.6 | Merge.java | mergesort |
4.2.7 | FrequencyCount.java | frequency counts |
4.3.1 | ArrayStackOfStrings.java | stack of strings (array) |
4.3.2 | LinkedStackOfStrings.java | stack of strings (linked list) |
4.3.3 | ResizingArrayStackOfStrings.java | stack of strings (resizing array) |
4.3.4 | Stack.java | generic stack |
4.3.5 | Evaluate.java | expression evaluation |
4.3.6 | Queue.java | generic queue |
4.3.7 | MM1Queue.java | M/M/1 queue simulation |
4.3.8 | LoadBalance.java | load balancing simulation |
4.4.1 | Lookup.java | dictionary lookup |
4.4.2 | Index.java | indexing |
4.4.3 | HashST.java | hash table |
4.4.4 | BST.java | binary search tree |
4.4.5 | DeDup.java | dedup filter |
– | ST.java | symbol table data type |
– | SET.java | set data type |
4.5.1 | Graph.java | graph data type |
4.5.2 | IndexGraph.java | using a graph to invert an index |
4.5.3 | PathFinder.java | shortest-paths client |
4.5.4 | PathFinder.java | shortest-paths implementation |
4.5.5 | SmallWorld.java | small-world test |
4.5.6 | Performer.java | performer–performer graph |
Exercise solutions.
Here is a list of solutions to selected coding exercises.
1 | ELEMENTS OF PROGRAMMING | |
---|---|---|
1.1.1 | TenHelloWorlds.java | ten Hello, Worlds |
1.1.5 | UseThree.java | three command-line arguments |
1.2.20 | SumOfTwoDice.java | sum of two dice |
1.2.23 | SpringSeason.java | is month and day in Spring? |
1.2.25 | WindChill.java | compute wind chill factor |
1.2.26 | CartesianToPolar.java | Cartesian to polar coordinates |
1.2.29 | DayOfWeek.java | compute day of week from date |
1.2.30 | Stats5.java | average, min, max of 5 random numbers |
1.2.34 | ThreeSort.java | sort three integers |
1.2.35 | Dragon.java | dragon curve of order 5 |
1.3.8 | FivePerLine.java | print integers five per line |
1.3.11 | FunctionGrowth.java | table of functions |
1.3.12 | DigitReverser.java | reverse digits |
1.3.13 | Fibonacci.java | Fibonacci numbers |
1.3.15 | SeriesSum.java | convergent sum |
1.3.31 | Ramanujan.java | taxicab numbers |
1.3.32 | ISBN.java | ISBN checksum |
1.3.38 | Sin.java | sine function via Taylor series |
1.3.41 | MonteHall.java | Monte Hall problem |
1.4.2 | HugeArray.java | creating a huge array |
1.4.10 | Deal.java | deal poker hands |
1.4.13 | Transpose.java | tranpose a square matrix |
1.4.25 | InversePermutation.java | compute inverse permutation |
1.4.26 | Hadamard.java | compute Hadamard matrix |
1.4.30 | Minesweeper.java | create Minesweeper board |
1.4.33 | RandomWalkers.java | N random walkers |
1.4.35 | Birthdays.java | birthday problem |
1.4.37 | BinomialCoefficients.java | binomial coefficients |
1.5.1 | MaxMin.java | max and min from standard input |
1.5.3 | Stats.java | mean and stddev from standard input |
1.5.5 | LongestRun.java | longest consecutive run from stdin |
1.5.11 | WordCount.java | word count from standard input |
1.5.15 | Closest.java | closest point |
1.5.18 | Checkerboard.java | draw a checkerboard |
1.5.21 | Rose.java | draw a rose |
1.5.22 | Banner.java | animate a text banner |
1.5.31 | Spirograph.java | draw spirograph |
1.5.32 | Clock.java | animate a clock |
1.5.33 | Oscilloscope.java | simulate an oscilloscope |
2 | FUNCTIONS | |
2.1.4 | ArrayEquals.java | are two integer arrays equal? |
2.1.30 | BlackScholes.java | Black-Scholes option valuation |
2.1.32 | Horner.java | Horner's method to evaluate a polynomial |
2.1.33 | Benford.java | Benford's law |
2.1.38 | Calendar.java | create a calendar |
2.2.1 | Gaussian.java | overloaded gaussian distribution functions |
2.2.2 | Hyperbolic.java | hyperbolic trig functions |
2.2.4 | StdRandom.java | shuffle an array of doubles |
2.2.6 | StdArrayIO.java | array IO methods |
2.2.11 | Matrix.java | matrix operations |
2.2.12 | MarkovSquaring.java | page rank via matrix squaring |
2.2.14 | StdRandom.java | exponential random variable |
2.3.14 | AnimatedHtree.java | animated H-tree |
2.3.15 | IntegerToBinary.java | integer to binary conversion |
2.3.17 | Permutations.java | all permutations |
2.3.18 | PermutationsK.java | all permutations of size k |
2.3.19 | Combinations.java | all combinations |
2.3.20 | CombinationsK.java | all combinations of size k |
2.3.22 | RecursiveSquares.java | recursive squares |
2.3.24 | GrayCode.java | Gray code |
2.3.26 | AnimatedHanoi.java | animated Towers of Hanoi |
2.3.29 | Collatz.java | Collatz function |
2.3.30 | BrownianIsland.java | Brownian island |
2.3.31 | PlasmaCloud.java | plama cloud |
2.3.32 | McCarthy.java | McCarthy's 91 function |
2.3.33 | Tree.java | fractal tree |
2.4.15 | PercolationDirectedNonrecursive.java | directed percolation |
Copyright © 2000–2019 Robert Sedgewick and Kevin Wayne. All rights reserved.
COS 126 Python Language
Python Programs in the Textbook
Booksite Modules
Below is a table of the booksite modules that we use throughout the textbook and booksite and beyond.
SECTION | MODULE | DESCRIPTION |
---|---|---|
1.5 | stdio.py | functions to read/write numbers and text from/to stdin and stdout |
1.5 | stddraw.py | functions to draw geometric shapes |
1.5 | stdaudio.py | functions to create, play, and manipulate sound |
2.2 | stdrandom.py | functions related to random numbers |
2.2 | stdarray.py | functions to create, read, and write 1D and 2D arrays |
2.2 | stdstats.py | functions to compute and plot statistics |
3.1 | color.py | data type for colors |
3.1 | picture.py | data type to process digital images |
3.1 | instream.py | data type to read numbers and text from files and URLs |
3.1 | outstream.py | data type to write numbers and text to files |
If you followed the instructions provided in this booksite (for Windows, Mac OS X, or Linux), then the booksite modules are installed on your computer. If you want to see the source code for the booksite modules, then click on the links in the above table, or download and unzip stdlib-python.zip.
Programs and Data Sets in the Textbook
Below is a table of the Python programs and data sets used in the textbook. Click on the program name to access the Python code; click on the data set name to access the data set; read the textbook for a full discussion. You can download all of the programs as introcs-python.zip and the data as introcs-data.zip.
1 | ELEMENTS OF PROGRAMMING | DATA | |
---|---|---|---|
1.1.1 | helloworld.py | Hello, World | – |
1.1.2 | useargument.py | using a command-line argument | – |
1.2.1 | ruler.py | string concatenation example | – |
1.2.2 | intops.py | integer operators | – |
1.2.3 | floatops.py | float operators | – |
1.2.4 | quadratic.py | quadratic formula | – |
1.2.5 | leapyear.py | leap year | – |
1.3.1 | flip.py | flipping a fair coin | – |
1.3.2 | tenhellos.py | your first loop | – |
1.3.3 | powersoftwo.py | computing powers of two | – |
1.3.4 | divisorpattern.py | your first nested loops | – |
1.3.5 | harmonic.py | harmonic numbers | – |
1.3.6 | sqrt.py | Newton's method | – |
1.3.7 | binary.py | converting to binary | – |
1.3.8 | gambler.py | gambler's ruin simulation | – |
1.3.9 | factors.py | factoring integers | – |
1.4.1 | sample.py | sampling without replacement | – |
1.4.2 | couponcollector.py | coupon collector simulation | – |
1.4.3 | primesieve.py | sieve of Eratosthenes | – |
1.4.4 | selfavoid.py | self-avoiding random walks | – |
1.5.1 | randomseq.py | generating a random sequence | – |
1.5.2 | twentyquestions.py | interactive user input | – |
1.5.3 | average.py | averaging a stream of numbers | – |
1.5.4 | rangefilter.py | a simple filter | – |
1.5.5 | plotfilter.py | standard input to draw filter | usa.txt |
1.5.6 | bouncingball.py | bouncing ball | – |
1.5.7 | playthattune.py | digital signal processing | elise.txt ascale.txt stairwaytoheaven.txt entertainer.txt firstcut.txt freebird.txt looney.txt |
1.6.1 | transition.py | computing the transition matrix | small.txt medium.txt |
1.6.2 | randomsurfer.py | simulating a random surfer | – |
1.6.3 | markov.py | mixing a Markov chain | – |
2 | FUNCTIONS | DATA | |
2.1.1 | harmonicf.py | harmonic numbers (revisited) | – |
2.1.2 | gauss.py | Gaussian functions | – |
2.1.3 | coupon.py | coupon collector (revisited) | – |
2.1.4 | playthattunedeluxe.py | play that tune (revisited) | elise.txt ascale.txt stairwaytoheaven.txt entertainer.txt firstcut.txt freebird.txt looney.txt |
2.2.1 | gaussian.py | Gaussian functions module | – |
2.2.2 | gaussiantable.py | sample Gaussian client | – |
2.2.3 | sierpinski.py | Sierpinski triangle | – |
2.2.4 | ifs.py | iterated function systems | sierpinski.txt barnsley.txt coral.txt culcita.txt cyclosorus.txt dragon.txt fishbone.txt floor.txt koch.txt spiral.txt swirl.txt tree.txt zigzag.txt |
2.2.5 | bernoulli.py | Bernoulli trials | – |
2.3.1 | euclid.py | Euclid's algorithm | – |
2.3.2 | towersofhanoi.py | towers of Hanoi | – |
2.3.3 | beckett.py | Gray code | – |
2.3.4 | htree.py | recursive graphics | – |
2.3.5 | brownian.py | Brownian bridge | – |
2.4.1 | percolationv.py | vertical percolation detection | test5.txt test8.txt |
2.4.2 | percolationio.py | percolation support functions | – |
2.4.3 | visualizev.py | vertical percolation visualization client | – |
2.4.4 | estimatev.py | vertical percolation probability estimate | – |
2.4.5 | percolation.py | percolation detection | test5.txt test8.txt |
2.4.6 | visualize.py | percolation visualization client | – |
2.4.7 | estimate.py | percolation probability estimate | – |
3 | OBJECT ORIENTED PROGRAMMING | DATA | |
3.1.1 | potentialgene.py | potential gene identification | – |
3.1.2 | chargeclient.py | charged particle client | – |
3.1.3 | alberssquares.py | Albers squares | – |
3.1.4 | luminance.py | luminance library | – |
3.1.5 | grayscale.py | converting color to grayscale | mandrill.jpg mandrill.png darwin.jpg darwin.png |
3.1.6 | scale.py | image scaling | mandrill.jpg mandrill.png darwin.jpg darwin.png |
3.1.7 | fade.py | fade effect | mandrill.jpg mandrill.png darwin.jpg darwin.png |
3.1.8 | potential.py | visualizing electric potential | charges.txt |
3.1.9 | cat.py | concatenating files | in1.txt in2.txt |
3.1.10 | stockquote.py | screen scraping for stock quotes | – |
3.1.11 | split.py | splitting a file | djia.csv |
3.2.1 | charge.py | charged-particle data type | – |
3.2.2 | stopwatch.py | stopwatch data type | – |
3.2.3 | histogram.py | histogram data type | – |
3.2.4 | turtle.py | turtle graphics data type | – |
3.2.5 | koch.py | Koch curve | – |
3.2.6 | spiral.py | spira mirabilis | – |
3.2.7 | drunk.py | drunken turtle | – |
3.2.8 | drunks.py | drunken turtles | – |
3.2.9 | complex.py | complex number data type | – |
3.2.10 | mandelbrot.py | Mandelbrot set | – |
3.2.11 | stockaccount.py | stock account data type | turing.txt |
3.3.1 | complexpolar.py | complex numbers (revisited) | – |
3.3.2 | counter.py | counter data type | – |
3.3.3 | vector.py | spatial vector data type | – |
3.3.4 | sketch.py | sketch data type | genome20.txt |
3.3.5 | comparedocuments.py | similarity detection | documents.txt constitution.txt tomsawyer.txt huckfinn.txt prejudice.txt djia.csv amazon.html actg.txt |
3.4.1 | body.py | gravitational body data type | – |
3.4.2 | universe.py | n-body simulation | 2body.txt 3body.txt 4body.txt 2bodytiny.txt |
4 | DATA STRUCTURES | DATA | |
4.1.1 | threesum.py | 3-sum problem | 8ints.txt 1kints.txt 2kints.txt 4kints.txt 8kints.txt 16kints.txt 32kints.txt 64kints.txt 128kints.txt |
4.1.2 | doublingtest.py | validating a doubling hypothesis | – |
4.1.3 | timeops.py | timing operators and functions | – |
4.1.4 | bigarray.py | discovering memory capacity | – |
4.2.1 | questions.py | binary search (20 questions) | – |
4.2.2 | bisection.py | binary search (inverting a function) | – |
4.2.3 | binarysearch.py | binary search (sorted array) | emails.txt white.txt |
4.2.4 | insertion.py | insertion sort | tiny.txt tomsawyer.txt |
4.2.5 | timesort.py | doubling test for sorting functions | – |
4.2.6 | merge.py | mergesort | tiny.txt tomsawyer.txt |
4.2.7 | frequencycount.py | frequency counts | leipzig100k.txt leipzig200k.txt leipzig1m.txt |
4.3.1 | arraystack.py | stack (resizing array implementation) | tobe.txt |
4.3.2 | linkedstack.py | stack (linked list implementation) | tobe.txt |
4.3.3 | evaluate.py | expression evaluation | expression1.txt expression2.txt |
4.3.4 | linkedqueue.py | queue (linked list implementation) | tobe.txt |
4.3.5 | mm1queue.py | M/M/1 queue simulation | – |
4.3.6 | loadbalance.py | load balancing simulation | – |
4.4.1 | lookup.py | dictionary lookup | amino.csv djia.csv elements.csv ip.csv ip-by-country.csv morse.csv phone-na.csv |
4.4.2 | index.py | indexing | mobydick.txt tale.txt |
4.4.3 | hashst.py | hash symbol table data type | – |
4.4.4 | bst.py | BST symbol table data type | – |
4.5.1 | graph.py | graph data type | tinygraph.txt |
4.5.2 | invert.py | using a graph to invert an index | tinygraph.txt movies.txt |
4.5.3 | separation.py | shortest-paths client | routes.txt movies.txt |
4.5.4 | pathfinder.py | shortest-paths client | – |
4.5.5 | smallworld.py | small-world test | tinygraph.txt |
4.5.6 | performer.py | performer-performer graph | tinymovies.txt moviesg.txt |
COS 126 Global Sequence Alignment
https://introcs.cs.princeton.edu/java/assignments/sequence.html
COS 126
Global Sequence Alignment |
Programming Assignment |
Write a program to compute the optimal sequence alignment of two DNA strings. This program will introduce you to the emerging field of computational biology in which computers are used to do research on biological systems. Further, you will be introduced to a powerful algorithmic design paradigm known as dynamic programming.
Biology review. A genetic sequence is a string formed from a four-letter alphabet {Adenine (A), Thymine (T), Guanine (G), Cytosine (C)} of biological macromolecules referred to together as the DNA bases. A gene is a genetic sequence that contains the information needed to construct a protein. All of your genes taken together are referred to as the human genome, a blueprint for the parts needed to construct the proteins that form your cells. Each new cell produced by your body receives a copy of the genome. This copying process, as well as natural wear and tear, introduces a small number of changes into the sequences of many genes. Among the most common changes are the substitution of one base for another and the deletion of a substring of bases; such changes are generally referred to as point mutations. As a result of these point mutations, the same gene sequenced from closely related organisms will have slight differences.
The problem. Through your research you have found the following sequence of a gene in a previously unstudied organism.
A A C A G T T A C C
What is the function of the protein that this gene encodes? You could begin a series of uninformed experiments in the lab to determine what role this gene plays. However, there is a good chance that it is a variant of a known gene in a previously studied organism. Since biologists and computer scientists have laboriously determined (and published) the genetic sequence of many organisms (including humans), you would like to leverage this information to your advantage. We'll compare the above genetic sequence with one which has already been sequenced and whose function is well understood.
T A A G G T C A
If the two genetic sequences are similar enough, we might expect them to have similar functions. We would like a way to quantify "similar enough."
Edit-distance. In this assignment we will measure the similarity of two genetic sequences by their edit distance, a concept first introduced in the context of coding theory, but which is now widely used in spell checking, speech recognition, plagiarism detection, file revisioning, and computational linguistics. We align the two sequences, but we are permitted to insert gaps in either sequence (e.g., to make them have the same length). We pay a penalty for each gap that we insert and also for each pair of characters that mismatch in the final alignment. Intuitively, these penalties model the relative likeliness of point mutations arising from deletion/insertion and substitution. We produce a numerical score according to the following simple rule, which is widely used in biological applications:
Penalty Cost Per gap 2 Per mismatch 1 Per match 0
As an example, two possible alignments of AACAGTTACC and TAAGGTCA are:
Sequence 1 A A C A G T T A C C Sequence 2 T A A G G T C A - - Penalty 1 0 1 1 0 0 1 0 2 2
Sequence 1 A A C A G T T A C C Sequence 2 T A - A G G T - C A Penalty 1 0 2 0 0 1 0 2 0 1
The first alignment has a score of 8, while the second one has a score of 7. The edit-distance is the score of the best possible alignment between the two genetic sequences over all possible alignments. In this example, the second alignment is in fact optimal, so the edit-distance between the two strings is 7. Computing the edit-distance is a nontrivial computational problem because we must find the best alignment among exponentially many possibilities. For example, if both strings are 100 characters long, then there are more than 10^75 possible alignments.
A recursive solution. We will calculate the edit-distance between the two original strings x and y by solving many edit-distance problems on the suffixes of the two strings. We use the notation x[i] to refer to character i of the string. We also use the notation x[i..M] to refer to the suffix of x consisting of the characters x[i], x[i+1], ..., x[M-1]. Finally, we use the notation opt[i][j] to denote the edit distance of x[i..M] and y[j..N]. For example, consider the two strings x = "AACAGTTACC" and y = "TAAGGTCA" of length M = 10 and N = 8, respectively. Then, x[2] is 'C', x[2..M] is "CAGTTACC", and y[8..N] is the empty string. The edit distance of x and y is opt[0][0].
Now we describe a recursive scheme for computing the edit distance of x[i..M] and y[j..N]. Consider the first pair of characters in an optimal alignment of x[i..M] with y[j..N]. There are three possibilities:
- The optimal alignment matches x[i] up with y[j]. In this case, we pay a penalty of either 0 or 1, depending on whether x[i] equals y[j], plus we still need to align x[i+1..M] with y[j+1..N]. What is the best way to do this? This subproblem is exactly the same as the original sequence alignment problem, except that the two inputs are each suffixes of the original inputs. Using our notation, this quantity is opt[i+1][j+1].
- The optimal alignment matches the x[i] up with a gap. In this case, we pay a penalty of 2 for a gap and still need to align x[i+1..M] with y[j..N]. This subproblem is identical to the original sequence alignment problem, except that the first input is a proper suffix of the original input.
- The optimal alignment matches the y[j] up with a gap. In this case, we pay a penalty of 2 for a gap and still need to align x[i..M] with y[j+1..N]. This subproblem is identical to the original sequence alignment problem, except that the second input is a proper suffix of the original input.
The key observation is that all of the resulting subproblems are sequence alignment problems on suffixes of the original inputs. To summarize, we can compute opt[i][j] by taking the minimum of three quantities:
opt[i][j] = min { opt[i+1][j+1] + 0/1, opt[i+1][j] + 2, opt[i][j+1] + 2 }
This equation works assuming i < M and j < N. Aligning an empty string with another string of length k requires inserting k gaps, for a total cost of 2k. Thus, in general we should set opt[M][j] = 2(N-j) and opt[i][N] = 2(M-i). For our example, the final matrix is:
| 0 1 2 3 4 5 6 7 8 x\y | T A A G G T C A - ----------------------------------- 0 A | 7 8 10 12 13 15 16 18 20 1 A | 6 6 8 10 11 13 14 16 18 2 C | 6 5 6 8 9 11 12 14 16 3 A | 7 5 4 6 7 9 11 12 14 4 G | 9 7 5 4 5 7 9 10 12 5 T | 8 8 6 4 4 5 7 8 10 6 T | 9 8 7 5 3 3 5 6 8 7 A | 11 9 7 6 4 2 3 4 6 8 C | 13 11 9 7 5 3 1 3 4 9 C | 14 12 10 8 6 4 2 1 2 10 - | 16 14 12 10 8 6 4 2 0
By examining opt[0][0], we conclude that the edit distance of x and y is 7.
A dynamic programming approach. A direct implementation of the above recursive scheme will work, but it is spectacularly inefficient. If both input strings have N characters, then the number of recursive calls will exceed 2^N. To overcome this performance bug, we use dynamic programming. (Read the first section of Section 9.6 for an introduction to this technique.) Dynamic programming is a powerful algorithmic paradigm, first introduced by Bellman in the context of operations research, and then applied to the alignment of biological sequences by Needleman and Wunsch. Dynamic programming now plays the leading role in many computational problems, including control theory, financial engineering, and bioinformatics, including BLAST (the sequence alignment program almost universally used by molecular biologist in their experimental work). The key idea of dynamic programming is to break up a large computational problem into smaller subproblems, store the answers to those smaller subproblems, and, eventually, use the stored answers to solve the original problem. This avoids recomputing the same quantity over and over again. Instead of using recursion, use a nested loop that calculates opt[i][j] in the right order so that opt[i+1][j+1], opt[i+1][j], and opt[i][j+1] are all computed before we try to compute opt[i][j].
Recovering the alignment itself. The above procedure describes how to compute the edit distance between two strings. We now outline how to recover the optimal alignment itself. The key idea is to retrace the steps of the dynamic programming algorithm backwards, re-discovering the path of choices (highlighted in red in the table above) from opt[0][0] to opt[M][N]. To determine the choice that led to opt[i][j], we consider the three possibilities:
- The optimal alignment matches x[i] up with y[j]. In this case, we must have opt[i][j] = opt[i+1][j+1] if x[i] equals y[j], or opt[i][j] = opt[i+1][j+1] + 1 otherwise.
- The optimal alignment matches x[i] up with a gap. In this case, we must have opt[i][j] = opt[i+1][j] + 2.
- The optimal alignment matches y[j] up with a gap. In this case, we must have opt[i][j] = opt[i][j+1] + 2.
Depending on which of the three cases apply, we move diagonally, down, or right towards opt[M][N], printing out x[i] aligned with y[j] (case 1), x[i] aligned with a gap (case 2), or y[j] aligned with a gap (case 3). In the example above, we know that the first T aligns with the first A because opt[0][0] = opt[1][1] + 1, but opt[0][0] ≠ opt[1][0] + 2 and opt[0][0] ≠ opt[0][1] + 2. The optimal alignment is:
A A C A G T T A C C T A - A G G T - C A 1 0 2 0 0 1 0 2 0 1
Analysis. Test your program using the example provided above, as well as the genomic data sets referred to in the checklist. Estimate the running time and memory usage of your program as a function of the lengths of the two input strings M and N. For simplicity, assume M = N in your analysis.
Submission. Submit the files: EditDistance.java and readme.txt.
This assignment was created by Thomas Clarke, Robert Sedgewick, Scott Vafai and Kevin Wayne.
Copyright © 2002.Princeton University: Programming Assignment Checklist: DNA Sequence Alignment
This assignment allows optional partnering. If you choose to do this, you must follow the pair programming guidelines. Please click the link and review them before you begin. Your partner can be from another precept (but ISC students may only partner with other ISC students). Please note that writing code with a partner without following the pair programming instructions is a violation of the course collaboration policy. All writing of code, comments, analysis and uploading to the dropbox should be done together from start to finish. If you come to office hours alone, you can get advice, but you may not change any code until both partners are together.
What are the main goals of this assignment? You will (i) solve a fundamental problem in computational biology, (ii) learn about the analysis of algorithms, and (iii) learn about a powerful programming paradigm known as dynamic programming.
How do I read in the two input strings from the file? Use StdIn.readString() and redirection as usual.
How do I access the length of a string s? The ith character? Use s.length() and s.charAt(i), respectively. As with arrays, indices start at 0. We'll learn about this notation for manipulating (String) objects in Section 3.1. For this assignment, this is all you'll need to know about objects.
Can I assume that the input characters will always be A, C, G or T? NO! Your program should work equally well for any letter, upper case or lower case.
What's a StringIndexOutOfBoundsException? It's just like an ArrayOutOfBoundsException. It results from invoking s.charAt(i) with an illegal value of i.
How could I get a NullPointerException? Did you forget to allocate memory for opt[][]?
How do I declare and initialize a two dimensional array in Java? Review the end of Section 1.4 in Intro to Programming.
It seems strange to define x[i..M] to be the substring consisting of x[i] through x[M-1] instead of x[i] through x[M]. Is this a typo? It's a little strange, but no, it's not a typo. It's consistent with Java's indexing notation where the left endpoint is inclusive and the right endpoint is exclusive.
Which alignment should I print out if there are two or more optimal ones? Output any one you like.
Should gaps be handled by penalty()? The solution we think is clearest does not call penalty() on gaps, since a gap is not a character. Of course, the alignment output should use the '-'symbol to denote a gap, and the inputs we give you will never contain '-' (only alphanumeric characters are used), so if you find it convenient for penalty() to recognize '-' as a gap, you are permitted to do so.
Where can I learn more about dynamic programming and backtracking? The LCS (longest common subsequence) problem described in booksite section 9.6 is an example of a dynamic programming problem on strings with backtracking. However, it is different from the current problem in many ways, so do not simply mimic the code without understanding what it does. The websheet exercises for this week include several dynamic programming exercises starting from a basic level, including KnapsackBacktrack which is an example of backtracking.
What does OutOfMemoryError mean? When Java runs, it requests a certain amount of memory from the operating system. The exact amount depends on the version of Java and your computer, but can vary from 64MB to 1024MB (1GB). After Java has started, the total size of all variables in use cannot be larger than what it originally requested. Trying to do so causes anOutOfMemoryError.
For this assignment, the largest test cases use huge arrays, and Java needs to ask for enough memory from the operating system. To explicitly ask for for more (or less) memory, use the -Xmxflag. For example, to request 500 megabytes (500 MB) of memory for a run, use
java-introcs -Xmx500m EditDistance < input.txt
Here 500m means 500 MB. You should adjust this number depending on the amount of memory your computer has and the size of the arrays you will need for the data set you are running. The amount 500MB should get you through ecoli10000.txt. To run ecoli20000.txt you will need to request more memory. (How much? The readme asks you to estimate this.)
What does "Could not reserve enough space for object heap" mean? This occurs if you use -Xmx with a value that is larger than the amount of available physical memory. Additionally, due to address space limitations, some 32-bit versions of Windows also will give this error if you try to request more than approximately 1.5GB, no matter how much physical memory is installed.
How do I determine how much physical memory is installed on my computer? On Mac, select About this Mac from the Apple menubar. On Windows, press Windows-R (or Run on the Start menu), enter msinfo32
and look for total physical memory.
How can I measure how long my program takes on each file? To measure the running time of your program, there are a few techniques.
- The simplest is to use
java-introcs -Xmx500m EditDistance < input.txt > output.txt
and use a stopwatch. We redirect the output to a file to prevent printing text from becoming a bottleneck.
- A second technique, which we think probably best suits the needs of this application, is to use the -Xprof runtime switch, which asks Java to print out timing data about the run. To use this with output redirection, type
java-introcs -Xprof -Xmx500m EditDistance < input.txt > output.txt
The timing information will appear at the end of the file output.txt, and you want the "flat profile" for main. The line will look like
Flat profile of 4.5 secs (15 total ticks): main
but these numbers are made up and yours will be different. We don't care about the ticks.Piping can be useful here. You can skip the output file in the previous step by piping your output to another program that will look for "Flat profile". and it should print out the time (and throw away all the other program output). On a Mac, run
java-introcs -Xprof -Xmx500m EditDistance < input.txt | grep "main"
On Windows, use find instead of grep:
java-introcs -Xprof -Xmx500m EditDistance < input.txt | find "main"
This find/grep command searches through all of whatever text it is fed and only prints out the lines containing the text main. Type man grep (in Terminal) or find /? (in Command Prompt) for more information.
- As a third technique, you can use Stopwatch objects, see Repeat.java for an example. We'll explain more about objects later in the course.
- Finally, you may elect to directly use System.currentTimeMillis() as shown in lecture.
How do I use a cluster machine in Friend 016/017? See this page for general instructions. Please also read the first bullet point for the question immediately below.
My timing data do not fit a polynomial hypothesis. What could I be doing wrong?
- If you are running your program and accessing the data files from the Windows H: drive (especially if via a wireless network) or ~ on a cluster Mac (Friend 016/017), the bottleneck for medium-sized test cases might be the network latency instead of the dynamic programming algorithm! Do one of the following:
- Use a Stopwatch or System.currentTimeMillis() to specifically isolate the time taken after the input is read (after all calls to StdIn) and before any output is printed. Remember to remove the time printing statements before submitting the final version of your code.
- Or, copy all files to a folder on a local hard drive. We don't recommend this on a cluster (Friend 016/017) machine.
- Or, report your problems and the data you obtained, and use the readme's sample data for analysis instead.
- When you run out of physical memory, your operating system may start using your hard drive as another form of storage. Accessing information from the hard drive is substantially slower than main memory, and you may be observing this effect. Avoid running extraneous complicated programs (media players, file sharing clients, word processors, web browsers, etc) while doing the timing tests if this seems to be a problem.
- Make sure you are using output redirection or piping (as in the examples above) to prevent printing text from becoming a bottleneck.
- Very small test cases are hard to use since the Java virtual machine takes a nontrivial amount of time to start, and since the processor "cache" may make small test cases run an order of magnitude faster than expected. If in doubt, use the test cases that take between 0.1 and 10.0 seconds.
Testing. To help you check the part of your program that generates the alignment, there are many test files in the sequence directory.
- Many of the small files are designed so that it is easy for you to determine what the correct answer should be by hand. Test your program on these cases to see that it gets these easy cases right.
- Here are the optimal edit distances of several of the supplied files.
ecoli2500.txt 118 ecoli5000.txt 160 fli8.txt 6 fli9.txt 4 fli10.txt 2 ftsa1272.txt 758 gene57.txt 8 stx1230.txt 521 stx19.txt 10 stx26.txt 17 stx27.txt 19
- The test case worked through as an example in the assignment description, which is the same as the example10.txt file, has a unique optimal alignment. (Some test inputs like "xx y" have more than one optimal alignment.) So your code should give the exact same output on example10.txt as in the assignment page.
- Here are two more test cases with unique optimal alignments:
% java-introcs EditDistance < endgaps7.txt % java-introcs EditDistance < fli10.txt Edit distance = 4 Edit distance = 2 a - 2 T T 0 t t 0 G G 0 a a 0 G G 0 t t 0 C T 1 t t 0 G G 0 a a 0 G G 0 t t 0 A T 1 - a 2 A A 0 C C 0 T T 0
- In addition, we require that you generate one small input file of your own to be used for testing special cases. Create a new input file with some interesting property. Then test your code using your file and make sure your program behaves as expected. For example, when we tested your RGBtoCMYK program in an earlier assignment, our special case was when R=0, G=0, B=0. In NBody, one of our special cases was a system with only 1 body. Include a description of your special test case in your readme.txt file.
These are purely suggestions for how you might make progress. You do not have to follow these steps.
- Download sequence.zip to your computer and unzip it, as described on the assignment page. It contains test files and the readme templates.
- Write the following two short helper methods.
// return the penalty for aligning character a with character b public static int penalty(char a, char b) // return the min of 3 integers public static int min(int a, int b, int c)
You will call these from your main method to compute penalties and to determine which of the three cases yields the minimum edit distance.
- Write the main() method in EditDistance.java to read in the two strings from standard input, using the method StdIn.readString(). For debugging, print them to standard output.
- Declare and initialize the (M+1)-by-(N+1) array opt[][]. Include the base cases. Print out the 2D array to check your work.To print the matrix out in nicely formatted columns, use
System.out.printf("%3d", opt[i][j]);
with nested for loops. Remember to remove this debugging print section before submitting the final version of your program.
- Now, it's time to implement the crucial dynamic programming part. First read the dynamic programming portion of Section 9.6 and make sure you understand how dynamic programming works. Think carefully about the order in which you will perform the computation since this is critical. Hint: first fill in the base case of opt[i][j], e.g., when i = M or j = N. Now, fill in the remaining values using a nested loop. Test that it works by printing out the contents of opt.
- Now, figure out how to recover the optimal alignment by backtracing.This is an iterative process. At each step we look to see which path choice we should make. Using the example from the assignment we start at i = 0, j=0 where x[i] = 'A' and y[i] = 'T'. The choices are to print "A -" and move down with a gap cost of 2, "- T" and move right with a gap cost of 2, or "A T" and move diagonally with a mismatch cost of 1. We know to pick "A T" because 7 - 6 = 1. This is the only choice which matches the matrix. (It is possible to have more than one choice which matches the matrix. In that case, either choice will lead to the same optimal edit distance.)
Test this part thoroughly. For example, one corner case to test is to make sure that you print out the ENTIRE alignment, even when one sequence finishes before the other. (Use lastXgaps9.txt and lastYgaps9.txt to test.)
- Measure the time that your program takes on the sample runs indicated in the readme. For help on performing timing tests, see above.
- Use the doubling method to estimate the running time as a polynomial function of the input size.
- Does each static method have a comment indicating what it does?
- Are there any hardwired constants? Do they appear in multiple methods? The cleanest solution to this is to use the static class constant
private static int MISMATCH = 1;
(and two other similar ones for gaps and matches). This makes your code easy to use if a different set of penalties is desired.
- The idea of dynamic programming was first advanced by Bellman (1957). Levenshtein (1966) formalized the notion of edit distance. Needleman-Wunsch (1970) were the first to apply edit distance and dynamic programming for aligning biological sequences, and our algorithm is essentially the one proposed in their seminal paper. The widely-used Smith-Waterman(1981) algorithm is quite similar, but solves a slightly different problem (local sequence alignment instead of global sequence alignment).
- The same technology is employed in spell checkers and to identify plagiarism in many courses (including this one).
- The genetic data are taken from GenBank. The National Center for Biotechnology Information also contains many examples of such database and alignment software.
- With a little work, you can compute the optimal cost in quadratic time but using only linear space (do we need the whole
opt
matrix?) With more work, you can also compute the optimal alignment in linear space (and quadratic time). This is known as Hirschberg's algorithm (1975).
PU: COS 126 - Recursion
The idea of calling one function from another immediately suggests the possibility of a function calling itself. The function-call mechanism in Java supports this possibility, which is known as recursion.
Your first recursive program.
The "Hello, World" for recursion is the factorial function, which is defined for positive integers n by the equation
n!=n×(n−1)×(n−2)×…×2×1n!=n×(n−1)×(n−2)×…×2×1
The quantity n! is easy to compute with a for loop, but an even easier method in Factorial.java is to use the following recursive function:
public static long factorial(int n) { if (n == 1) return 1; return n * factorial(n-1); }
We can trace this computation in precisely the same way that we trace any sequence of function calls.
factorial(5) factorial(4) factorial(3) factorial(2) factorial(1) return 1 return 2*1 = 2 return 3*2 = 6 return 4*6 = 24 return 5*24 = 120
Our factorial() implementation exhibits the two main components that are required for every recursive function.
- The base case returns a value without making any subsequent recursive calls. It does this for one or more special input values for which the function can be evaluated without recursion. For factorial(), the base case is n = 1.
- The reduction step is the central part of a recursive function. It relates the value of the function at one (or more) input values to the value of the function at one (or more) other input values. Furthermore, the sequence of input values values must converge to the base case. For factorial(), the value of n decreases by 1 for each call, so the sequence of input values converges to the base case.
Mathematical induction.
Recursive programming is directly related to mathematical induction, a technique for proving facts about natural numbers. Proving that a statement involving an integer n is true for infinitely many values of n by mathematical induction involves the following two steps:
- The base case: prove the statement true for some specific value or values of n (usually 0 or 1).
- The induction step: assume that the statement to be true for all positive integers less than n, then use that fact to prove it true for n.
Such a proof suffices to show that the statement is true for infinitely many values of n: we can start at the base case, and use our proof to establish that the statement is true for each larger value of n, one by one.
Euclid's algorithm.
The greatest common divisor (gcd) of two positive integers is the largest integer that divides evenly into both of them. For example, the gcd(102, 68) = 34.
We can efficiently compute the gcd using the following property, which holds for positive integers p and q:
If p > q, the gcd of p and q is the same as the gcd of q and p % q.
The static method gcd() in Euclid.java is a compact recursive function whose reduction step is based on this property.
gcd(1440, 408) gcd(408, 216) gcd(216, 192) gcd(192, 24) gcd(24, 0) return 24 return 24 return 24 return 24 return 24
Towers of Hanoi.
In the towers of Hanoi problem, we have three poles and n discs that fit onto the poles. The discs differ in size and are initially stacked on one of the poles, in order from largest (disc n) at the bottom to smallest (disc 1) at the top. The task is to move all ndiscs to another pole, while obeying the following rules:
- Move only one disc at a time.
- Never place a larger disc on a smaller one.
Recursion provides just the plan that we need: First we move the top n−1 discs to an empty pole, then we move the largest disc to the other empty pole, then complete the job by moving the n−1 discs onto the largest disc. TowersOfHanoi.java is a direct implementation of this strategy.
Exponential time.
Let T(n) be the number of move directives issued by TowersOfHanoi.java to move n discs from one peg to another. Then, T(n) must satisfy the following equation:
T(n)=2T(n−1) for n>1, with T(1)=1T(n)=2T(n−1) for n>1, with T(1)=1
Such an equation is known in discrete mathematics as a recurrence relation. We can often use them to derive a closed-form expression for the quantity of interest. For example, T(1) = 1, T(2) = 3, T(3) = 7, and T(4) = 15. In general, T(n) = 2n − 1. Assuming the monks move discs at the rate of one per second, it would take them more 5.8 billion centuries to solve the 64-disc problem.
Gray code.
An n-bit Gray code is a list of the 2n different n-bit binary numbers such that each entry in the list differs in precisely one bit from its predecessor. The n bit binary reflected Gray code is defined recursively as follows:
- the n−1 bit code, with 0 prepended to each word, followed by
- the n−1 bit code in reverse order, with 1 prepended to each word.
The 0-bit code is defined to be null, so the 1-bit code is 0 followed by 1.
Beckett.java uses an n-bit Gray code to print stage directions for an n-character play in such a way that characters enter and exit one at a time so that each subset of characters on the stage appears exactly once.
Recursive graphics.
Simple recursive drawing schemes can lead to pictures that are remarkably intricate. For example, an H-tree of order n is defined as follows: The base case is null for n = 0. The reduction step is to draw, within the unit square three lines in the shape of the letter H four H-trees of order n − 1, one connected to each tip of the H with the additional provisos that the H-trees of order n − 1 are centered in the four quadrants of the square, halved in size.
Htree.java takes a command-line argument n, and plots to standard drawing an H-tree of order n. An H-tree is a simple example of a fractal: a geometric shape that can be divided into parts, each of which is (approximately) a reduced size copy of the original.
Brownian bridge.
Brownian.java produces a function graph that approximates a simple example of fractional Brownian motion known as Brownian bridge. You can think of this graph as a random walk that connects the two points (x0, y0) and (x1, y1), controlled by a few parameters. The implementation is based on the midpoint displacement method, which is a recursive plan for drawing the plot within the x-interval [x0, x1]. The base case (when the size of the interval is smaller than a given tolerance) is to draw a straight line connecting the two endpoints. The reduction case is to divide the interval into two halves, proceeding as follows:
- Compute the midpoint (xm, ym) of the interval.
- Add to the y-coordinate ym of the midpoint a random value δ, drawn from the Gaussian distribution with mean 0 and a given variance.
- Recur on the subintervals, dividing the variance by a given scaling factor s.
The shape of the curve is controlled by two parameters: the volatility (initial value of the variance) controls the distance the graph strays from the straight line connecting the points, and the Hurst exponent controls the smoothness of the curve.
Pitfalls of recursion.
With recursion, you can write compact and elegant programs that fail spectacularly at runtime.
- Missing base case. The recursive function in NoBaseCase.java is supposed to compute harmonic numbers, but is missing a base case:
public static double harmonic(int n) { return harmonic(n-1) + 1.0/n; }
If you call this function, it will repeatedly call itself and never return.
- No guarantee of convergence. Another common problem is to include within a recursive function a recursive call to solve a subproblem that is not smaller than the original problem. For example, the recursive function in NoConvergence.javagoes into an infinite recursive loop for any value of its argument (except 1).
public static double harmonic(int n) { if (n == 1) return 1.0; return harmonic(n) + 1.0/n; }
- Excessive memory requirements. If a function calls itself recursively an excessive number of times before returning, the memory required by Java to keep track of the recursive calls may be prohibitive. The recursive function inExcessiveMemory.java correctly computes the nth harmonic number. However, calling it with a huge value of n will lead to a StackOverflowError.
public static double harmonic(int n) { if (n == 0) return 0.0; return harmonic(n-1) + 1.0/n; }
- Excessive recomputation. The temptation to write a simple recursive program to solve a problem must always be tempered by the understanding that a simple program might require exponential time (unnecessarily), due to excessive recomputation. For example, the Fibonacci sequence
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, ...
is defined by the formula
Fn=Fn−1+Fn−2 for n≥2, with F0=0 and F1=1Fn=Fn−1+Fn−2 for n≥2, with F0=0 and F1=1A novice programmer might implement this recursive function to compute numbers in the Fibonacci sequence, as in Fibonacci.java:
// Warning: spectacularly inefficient. public static long fibonacci(int n) { if (n == 0) return 0; if (n == 1) return 1; return fibonacci(n-1) + fibonacci(n-2); }
However, this program is spectacularly inefficient! To see why it is futile to do so, consider what the function does to compute fibonacci(8) = 21. It first computes fibonacci(7) = 13 and fibonacci(6) = 8. To compute fibonacci(7), it recursively computes fibonacci(6) = 8 again and fibonacci(5) = 5. Things rapidly get worse. The number of times this program computes fibonacci(1)when computing fibonacci(n) is precisely Fn.
Dynamic programming.
A general approach to implementing recursive programs, The basic idea of dynamic programming is to recursively divide a complex problem into a number of simpler subproblems; store the answer to each of these subproblems; and, ultimately, use the stored answers to solve the original problem. By solving each subproblem only once (instead of over and over), this technique avoids a potential exponential blow-up in the running time.
- Top-down dynamic programming. In top-down dynamic programming, we store or cache the result of each subproblem that we solve, so that the next time we need to solve the same subproblem, we can use the cached values instead of solving the subproblem from scratch. TopDownFibonacci.java illustrates top-down dynamic programming for computing Fibonacci numbers.
- Bottom-up dynamic programming. In bottom-up dynamic programming, we compute solutions to all of the subproblems, starting with the “simplest” subproblems and gradually building up solutions to more and more complicated subproblems.BottomUpFibonacci.java illustrates bottom-up dynamic programming for computing Fibonacci numbers.
public static long fibonacci(int n) { long[] f = new long[n+1]; f[0] = 0; f[1] = 1; for (int i = 2; i <= n; i++) f[i] = f[i-1] + f[i-2]; return f[n]; }
- Longest common subsequence problem. Given two strings x and y, we wish to compute their (LCS). If we delete some characters from x and some characters from y, and the resulting two strings are equal, we call the resulting string a common subsequence. The LCS problem is to find a common subsequence of two strings that is as long as possible. For example, the LCS of GGCACCACG and ACGGCGGATACG is GGCAACG, a string of length 7.
- - G G C - - A - C C A C G A C G G C G G A T - - A C G
- Longest common subsequence recurrence. Now we describe a recursive formulation that enables us to find the LCS of two given strings s and t. Let m and n be the lengths of s and t, respectively. We use the notation s[i..m) to denote the suffix of s starting at index i, and t[j..n) to denote the suffix of t starting at index j.
- If s and t begin with the same character, then the LCS of s and t contains that first character. Thus, our problem to reduces to finding the LCS of the suffixes s[1..m) and t[1..n).
- If s and t begin with different characters, both characters cannot be part of a common subsequence, so can safely discard one or the other. In either case, the problem reduces to finding the LCS of two strings—either s[0..m) and t[1..n) or s[1..m) and t[0..n).
In general, if we let opt[i][j] denote the length of the LCS of the suffixes s[i..m) and t[j..n), then the following recurrence holds:
opt[i][j] = 0 if i = m or j = n = opt[i+1][j+1] + 1 if s[i] = t[j] = max(opt[i][j+1], opt[i+1][j]) otherwise
- Dynamic programming solution. LongestCommonSubsequence.java begins with a bottom-up dynamic programming approach to solving this recurrence.
The final challenge is to recover the longest common subsequence itself, not just its length. The key idea is to retrace the steps of the dynamic programming algorithm backward, rediscovering the path of choices (highlighted in gray in the diagram) from opt[0][0] to opt[m][n]. To determine the choice that led to opt[i][j], we consider the three possibilities:
- The character s[i] matches t[j]. In this case, we must have opt[i][j] = opt[i+1][j+1] + 1, and the next character in the LCS is s[i]. We continue tracing back from opt[i+1][j+1].
- The LCS does not contain s[i]. In this case, opt[i][j] = opt[i+1][j] and we continue tracing back from opt[i+1][j].
- The LCS does not contain t[j]. In this case, opt[i][j] = opt[i][j+1] and we continue tracing back from opt[i][j+1].
Exercises
- Given four positive integers a, b, c, and d, explain what value is computed by gcd(gcd(a, b), gcd(c, d)).Solution: the greatest common divisor of a, b, c, and d.
- Explain in terms of integers and divisors the effect of the following Euclid-like function.
public static boolean gcdlike(int p, int q) { if (q == 0) return (p == 1); return gcdlike(q, p % q); }
- Consider the following recursive function.
public static int mystery(int a, int b) { if (b == 0) return 0; if (b % 2 == 0) return mystery(a+a, b/2); return mystery(a+a, b/2) + a; }
What are the values of mystery(2, 25) and mystery(3, 11)? Given positive integers a and b, describe what value mystery(a, b) computes. Answer the same question, but replace + with * and replace return 0 with return 1.
Solution: 50 and 33. It computes a*b. If you replace + with *, it computes a^b.
- Write a program AnimatedHtree.java that animates the drawing of the H-tree.
Next, rearrange the order of the recursive calls (and the base case), view the resulting animation, and explain each outcome.
Creative Exercises
- Binary representation. Write a program IntegerToBinary.java that takes a positive integer n (in decimal) as a command-line argument and prints its binary representation. Recall, in Binary.java, we used the method of subtracting out powers of 2. Now, use the following simpler method: repeatedly divide 2 into n and read the remainders backwards. First, write a while loop to carry out this computation and print the bits in the wrong order. Then, use recursion to print the bits in the correct order.
- Permutations. Write a program Permutations.java that take an integer command-line argument n and prints all n! permutations of the n letters starting at a (assume that n is no greater than 26). A permutation of n elements is one of the n! possible orderings of the elements. As an example, when n = 3 you should get the following output (but do not worry about the order in which you enumerate them):
bca cba cab acb bac abc
- Permutations of size k. Write a program PermutationsK.java that two command-line arguments n and k, and prints out all P(n,k)=n!(n−k)!P(n,k)=n!(n−k)! permutations that contain exactly k of the n elements. Below is the desired output when k = 2 and n= 4 (again, do not worry about the order):
ab ac ad ba bc bd ca cb cd da db dc
- Combinations. Write a program Combinations.java that takes an integer command-line argument n and prints all 2ncombinations of any size. A combination is a subset of the n elements, independent of order. As an example, when n = 3, you should get the following output:
a ab abc ac b bc c
Note that your program needs to print the empty string (subset of size 0).
- Combinations of size k. Write a program CombinationsK.java that takes two command-line arguments n and k, and prints all C(n,k)=n!k!(n−k)!C(n,k)=n!k!(n−k)! combinations of size k. For example, when n = 5 and k = 3, you should get the following output:
abc abd abe acd ace ade bcd bce bde cde
Alternate solution using arrays instead of strings: Comb2.java.
- Recursive squares. Write a program to produce each of the following recursive patterns. The ratio of the sizes of the squares is 2.2:1. To draw a shaded square, draw a filled gray square, then an unfilled black square.
RecursiveSquares.java gives a solution to the first pattern.
- Gray code. Modify Beckett.java to print the Gray code (not just the sequence of bit positions that change).Solution: GrayCode.java uses Java's string data type; GrayCodeArray.java uses a boolean array.
- Animated towers of Hanoi animation. Write a program AnimatedHanoi.java that uses StdDraw to animate a solution to the towers of Hanoi problem, moving the discs at a rate of approximately 1 per second.
- Collatz function. Consider the following recursive function in Collatz.java, which is related to a famous unsolved problem in number theory, known as the Collatz problem or the 3n + 1 problem.
public static void collatz(int n) { StdOut.print(n + " "); if (n == 1) return; if (n % 2 == 0) collatz(n / 2); else collatz(3*n + 1); }
For example, a call to collatz(7) prints the sequence
7 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1
as a consequence of 17 recursive calls. Write a program that takes a command-line argument n and returns the value of i < n for which the number of recursive calls for collatz(i) is maximized. Hint: use memoization. The unsolved problem is that no one knows whether the function terminates for all integers (mathematical induction is no help because one of the recursive calls is for a larger value of the argument).
- Brownian island. B. Mandelbrot asked the famous question How long is the coast of Britain? Modify Brownian.java to get a program BrownianIsland.java that plots Brownian islands, whose coastlines resemble that of Great Britain. The modifications are simple: first, change curve() to add a random Gaussian to the x-coordinate as well as to the y-coordinate; second, change main() to draw a curve from the point at the center of the canvas back to itself. Experiment with various values of the arguments to get your program to produce islands with a realistic look.
- Plasma clouds. Write a recursive program PlasmaCloud.java to draw plasma clouds, using the method suggested in the text.
- A strange function. Consider McCarthy's 91 function:
public static int mcCarthy(int n) { if (n > 100) return n - 10; else return mcCarthy(mcCarthy(n+11)); }
Determine the value of mcCarthy(50) without using a computer. Give the number of recursive calls used by mcCarthy()to compute this result. Prove that the base case is reached for all positive integers n or find a value of n for which this function goes into a recursive loop.
- Recursive tree. Write a program Tree.java that takes a command-line argument n and produces the following recursive patterns for n equal to 1, 2, 3, 4, and 8.
Web Exercises
- Does Euclid.java still work if the inputs can be negative? If not, fix it. Hint: Recall that % can return a negative integer if the first input is negative. When calling the function, take the absolute value of both inputs.
- Write a recursive program GoldenRatio.java that takes an integer input N and computes an approximation to the golden ratio using the following recursive formula:
f(N) = 1 if N = 0 = 1 + 1 / f(N-1) if N > 0
Redo, but do not use recursion.
- Discover a connection between the golden ratio and Fibonacci numbers. Hint: consider the ratio of successive Fibonacci numbers: 2/1, 3/2, 8/5, 13/8, 21/13, 34/21, 55/34, 89/55, 144/89, ...
- Consider the following recursive function. What is mystery(1, 7)?
public static int mystery(int a, int b) { if (0 == b) return 0; else return a + mystery(a, b-1); }
Will the function in the previous exercise terminate for every pair of integers a and b between between 0 and 100? Give a high level description of what mystery(a, b) returns, given integers a and b between 0 and 100.
Answer: mystery(1, 7) = 1 + mystery(1, 6) = 1 + (1 + mystery(1, 5)) = ... 7 + mystery(1, 0) = 7.
Answer: Yes, the base case is b = 0. Successive recursive calls reduce b by 1, driving it toward the base case. The function mystery(a, b) returns a * b. Mathematically inclined students can prove this fact via induction using the identity ab = a + a(b-1).
- Consider the following function. What does mystery(0, 8) do?
public static void mystery(int a, int b) { if (a != b) { int m = (a + b) / 2; mystery(a, m); StdOut.println(m); mystery(m, b); } }
Answer: infinite loop.
- Consider the following function. What does mystery(0, 8) do?
public static void mystery(int a, int b) { if (a != b) { int m = (a + b) / 2; mystery(a, m - 1); StdOut.println(m); mystery(m + 1, b); } }
Answer: stack overflow.
- Repeat the previous exercise, but replace if (a != b) with if (a <= b).
- What does mystery(0, 8) do?
public static int mystery(int a, int b) { if (a == b) StdOut.println(a); else { int m1 = (a + b ) / 2; int m2 = (a + b + 1) / 2; mystery(a, m1); mystery(m2, b); } }
- What does the following function compute?
public static int f(int n) { if (n == 0) return 0; if (n == 1) return 1; if (n == 2) return 1; return 2*f(n-2) + f(n-3);
- Write a program Fibonacci2.java that takes a command-line argument N and prints out the first N Fibonacci numbers using the following alternate definition:
F(n) = 1 if n = 1 or n = 2 = F((n+1)/2)2 + F((n-1)/2)2 if n is odd = F(n/2 + 1)2 - F(n/2 - 1)2 if n is even
What is the biggest Fibonacci number you can compute in under a minute using this definition? Compare this toFibonacci.java.
- Write a program that takes a command-line argument N and prints out the first N Fibonacci numbers using the following method proposed by Dijkstra:
F(0) = 0 F(1) = 1 F(2n-1) = F(n-1)^2 + F(n)^2 F(2n) = (2F(n-1) + F(n)) * F(n)
- Prove by mathematical induction that the alternate definitions of the Fibonacci function given in the previous two exercises are equivalent to the original definition.
- Write a program Pell.java that takes a command-line argument N and prints out the first N Pell numbers: p0 = 0, p1 = 1, and for n >= 2, pn = 2 pn-1 + pn-2. Print out the ratio of successive terms and compare to 1 + sqrt(2).
- Consider the following function from program Recursion.java:
public static void mystery(int n) { if (n == 0 || n == 1) return; mystery(n-2); StdOut.println(n); mystery(n-1); }
What does mystery(6) print out? Hint: first figure out what mystery(2), mystery(3), and so forth print out.
- What would happen in the previous exercise if the base case was replaced with the following statement?
if (n == 0) return;
- Consider the following recursive functions.
public static int square(int n) { if (n == 0) return 0; return square(n-1) + 2*n - 1; } public static int cube(int n) { if (n == 0) return 0; return cube(n-1) + 3*(square(n)) - 3*n + 1; }
What is the value of square(5)? cube(5)? cube(123)?
- Consider the following pair of mutually recursive functions. What does g(g(2)) evaluate to?
public static int f(int n) { public static int g(int n) { if (n == 0) return 0; if (n == 0) return 0; return f(n-1) + g(n-1); return g(n-1) + f(n); } }
- Write program to verify that (for small values of n) the sum of the cubes of the first n Fibonacci numbers F(0)^3 + F(1)^3 + ... + F(n)^3 equals (F(3n+4) + (-1)^n * 6 * f(n-1)) / 10, where F(0) = 1, F(1) = 1, F(2) = 2, and so forth.
- Transformations by increment and unfolding. Given two integers a ≤ b, write a program Sequence.java that transforms a into b by a minimum sequence of increment (add 1) and unfolding (multiply by 2) operations. For example,
% java Sequence 5 23 23 = ((5 * 2 + 1) * 2 + 1) % java Sequence 11 113 113 = ((((11 + 1) + 1) + 1) * 2 * 2 * 2 + 1)
- Hadamard matrix. Write a recursive program Hadamard.java that takes a command-line argument n and plots an N-by-N Hadamard pattern where N = 2n. Do not use an array. A 1-by-1 Hadamard pattern is a single black square. In general a 2N-by-2N Hadamard pattern is obtained by aligning 4 copies of the N-by-N pattern in the form of a 2-by-2 grid, and then inverting the colors of all the squares in the lower right N-by-N copy. The N-by-N Hadamard H(N) matrix is a boolean matrix with the remarkable property that any two rows differ in exactly N/2 bits. This property makes it useful for designing error-correcting codes. Here are the first few Hadamard matrices.
- 8 queens problem. In this exercise, you will solve the classic 8-queens problem: place 8 queens on an 8-by-8 chess board so that no two queens are in the same row, column, or diagonal. There are 8! = 40,320 ways in which no two queens are placed in the same row or column. Any permutation p[] of the integers 0 to 7 gives such a placement: put queen i in row i, column p[i]. Your program Queens.java should take an integer command-line argument n and enumerate all solutions to the n-queens problem by drawing the location of the queens in ASCII like the two solutions below.
* * * Q * * * * * * * * Q * * * * Q * * * * * * * Q * * * * * * * * * * * * Q * * * * Q * * * * * * Q * * * * * * * * * * * Q * * * * * * Q * * * * Q * * * * * * * * * * * * Q * * * * * * * Q * * * * Q * * * * * * * * Q * * Q * * * * * * * Q * * * * * * *
Hint: to determine whether setting q[n] = i conflicts with q[0] through q[n-1]
- if q[i] equals q[n]: two queens are placed in the same column
- if q[i] - q[n] equals n - i: two queens are on same major diagonal
- if q[n] - q[i] equals n - i: two queens are on same minor diagonal
- Another 8 queens solver. Program Queens2.java solves the 8 queens problem by implicitly enumeration all n! permutations (instead of the n^n placements). It is based on program Permutations.java.
- Euclid's algorithm and π. The probability that two numbers chosen from a large random set of numbers have no common factors (other than 1) is 6 / π2. Use this idea to estimate π. Robert Matthews use the same idea to estimate π by taken the set of numbers to be a function of the positions of stars in the sky.
- Towers of Hanoi variant II. (Knuth-Graham and Pathashnik) Solve the original Towers of Hanoi problem, but with the extra restriction that you are not allowed to directly transfer a disk from A to C. How many moves does it take to solve a problem with n disks? Hint: move n-1 smallest disks from A to C recursively (without any direct A to C moves), move disk n from A to B, move n-1 smallest disks from C to A (without any direct A to C moves), move disk N from B to C, and move n-1 smallest disks from A to C recursively (without any direct A to C moves).
- Towers of Hanoi variant III. Repeat the previous question but disallow both A to C and C to A moves. That is, each move must involve pole B.
- Towers of Hanoi with 4 pegs. Suppose that you have a fourth peg. What is the least number of moves needed to transfer a stack of 8 disks from the leftmost peg to the rightmost peg? Answer. Finding the shortest such solution in general has remained an open problem for over a hundred years and is known as Reve's puzzle.
- Another tricky recursive function. Consider the following recursive function. What is f(0)?
public static int f(int x) { if (x > 1000) return x - 4; else return f(f(x+5)); }
- Checking if n is a Fibonacci number. Write a function to check if n is a Fibonacci number. Hint: a positive integer is a Fibonacci number if and only if either (5*n*n + 4) or (5*n*n - 4) is a perfect square.
- Random infix expression generator. Run RandomExpression.java with different command-line argument p between 0 and 1. What do you observe?
public static String expr(double p) { double r = Math.random(); if (r <= 1*p) return "(" + expr(p) + " + " + expr(p) + ")"; if (r <= 2*p) return "(" + expr(p) + " * " + expr(p) + ")"; return "" + (int) (100 * Math.random()); }
- A tricky recurrence. Define F(n) so that F(0) = 0 and F(n) = n - F(F(n-1)). What is F(100000000)?Solution: The answer is related to the Fibonacci sequence and the Zeckendorf representation of a number.
- von Neumann ordinal. The von Neumann integer i is defined as follows: for i = 0, it is the empty set; for i > 0, it is the set containing the von Neumann integers 0 to i-1.
0 = {} = {} 1 = {0} = {{}} 2 = {0, 1} = {{}, {{}}} 3 = {0, 1, 2} = {{}, {{}}, {{}, {{}}}}
Write a program Ordinal.java with a recursive function vonNeumann() that takes a nonnegative integer N and returns a string representation of the von Neumann integer N. This is a method for defining ordinals in set theory.
- Subsequences of a string. Write a program Subsequence.java that takes a string command-line argument s and an integer command-line argument k and prints out all subsequences of s of length k.
% java Subsequence abcd 3 abc abd acd bcd
- Interleaving two strings. Given two strings s and t of distinct characters, print out all (M+N)! / (M! N!) interleavings, where M and N are the number of characters in the two strings. For example, if
s = "ab" t = "CD" abCD CabD aCbD CaDb aCDb CDab
- Binary GCD. Write a program BinaryGCD.java that finds the greatest common divisor of two positive integers using the binary gcd algorithm: gcd(p, q) =
- p if q = 0
- q if p = 0
- 2 * gcd(p/2, q/2) if p and q are even
- gcd(p/2, q) if p is even and q is odd
- gcd(p, q/2) if p is odd and q is even
- gcd((p-q)/2, q) if p and q are odd and p >= q
- gcd(p, (q-p)/2) if p and q are odd and p < q
- Integer partitions. Write a program Partition.java that takes a positive integer N as a command-line argument and prints out all partitions of N. A partition of N is a way to write N as a sum of positive integers. Two sums are considered the same if they only differ in the order of their constituent summands. Partitions arise in symmetric polynomials and group representation theory in mathematics and physics.
% java Partition 4 % java Partition 6 4 6 3 1 5 1 2 2 4 2 2 1 1 4 1 1 1 1 1 1 3 3 3 2 1 3 1 1 1 2 2 2 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1
- Johnson-Trotter permutations. Write a program JohnsonTrotter.java that takes an integer command-line argument n and prints all n! permutations of the integer 0 through n-1 in such a way that consecutive permutations differ in only one adjacent transposition (similar to way Gray code iterates over combinations in such a way that consecutive combinations differ in only one bit).
% java JohnsonTrotter 3 012 (2 1) 021 (1 0) 201 (2 1) 210 (0 1) 120 (1 2) 102 (0 1)
- Permutations in lexicographic order. Write a program PermutationsLex.java that take a command-line argument N and prints out all N! permutations of the integer 0 through N-1 in lexicographic order.
% java PermutationsLex 3 012 021 102 120 201 210
- Derangements. A derangement is a permutation p[] of the integers from 0 to N-1 such that p[i] doesn't equal i for any i. For example there are 9 derangements when N = 4: 1032, 1230, 1302, 2031, 2301, 2310, 3012, 3201, 3210. Write a program to count the number of derangements of size N using the following recurrence: d[N] = (N-1) (d[N-1] + d[N-2]), where d[1] = 0, d[2] = 1. The first few terms are 0, 1, 2, 9, 44, 265, 1854, 14833, 133496, and 1334961.
- Tribonacci numbers. The tribonacci numbers are similar to the Fibonacci numbers, except that each term is the sum of the three previous terms in the sequence. The first few terms are 0, 0, 1, 1, 2, 4, 7, 13, 24, 44, 81. Write a program to compute tribonacci numbers. What is the ratio successive terms? Answer. Root of x^3 - x^2 - x - 1, which is approximately 1.83929.
- Sum of first n Fibonacci numbers. Prove by induction that the sum of the first n Fibonacci numbers F(1) + F(2) + ... + F(N) is F(N+2) - 1.
- Combinational Gray code. Print out all combination of k of n items in such a way that consecutive combinations differ in exactly one element, e.g., if k = 3 and n = 5, 123, 134, 234, 124, 145, 245, 345, 135, 235, 125. Hint: use the Gray code, but only print out those integers with exactly k 1's in their binary representation.
- Maze generation. Create a maze using divide-and-conquer: Begin with a rectangular region with no walls. Choose a random gridpoint in the rectangle and construct two perpendicular walls, dividing the square into 4 subregions. Choose 3 of the four regions at random and open a one cell hole at a random point in each of the 3. Recur until each subregion has width or height 1.
- Plasma clouds. Program PlasmaCloud.java takes a command-line argument N and produces a random N-by-N plasma fractal using the midpoint displacement method.
Here's an 800-by-800 example. Here's a reference, including a simple 1D version. Note: some visual artifacts are noticeable parallel to the x and y axes. Doesn't have all of the statistical properties of 2D fractional Brownian motion.
- Fern fractal. Write a recursive program to draw a fern or tree, as in this fern fractal demo.
- Integer set partition. Use memoization to develop a program that solves the set partition problem for positive integer values. You may use an array whose size is the sum of the input values.
- Voting power. John F. Banzhaf III proposed a ranking system for each coalition in a block voting system. Suppose party i control w[i] votes. A strict majority of the votes is needed to accept or reject a proposal. The voting power of party i is the number of minority coalitions it can join and turn it into a winning majority coalition. Write a program VotingPower.java that takes in a list of coalition weights as command-line argument and prints out the voting power of each coalition. Hint: use Schedule.java as a starting point.
- Scheduling on two parallel machines. Program Schedule.java takes a command-line argument N, reads in N real number of standard input, and partitions them into two groups so that their difference is minimized.
- Conway's sequence. Consider the following recursive function. f(n) = f(f(n-1)) + f(n-f(n-1)) for n > 2 and f(1) = f(2) = 1. Compute f(3). Write a Java program to compute the first 50 values of f(n) in the sequence. Use dynamic programming. Conway's sequence has many interesting properties and connects with Pascal's triangle, the Gaussian distribution, Fibonacci numbers, and Catalan numbers.
- Running time recurrences. Use dynamic programming to compute a table of values T(N), where T(N) is the solution to the following divide-and-conquer recurrence. T(1) = 0, T(N) = N + T(N/2) + T(N - N/2) if N > 1.
- Gas station optimization. You are driving from Princeton to San Francisco in a car that gets 25 miles per gallon and has a gas tank capacity of 15 gallons. Along the way, there are N gas stations where you can stop for gas. Gas station i is d[i] miles into the trip and sells gas for p[i] dollars per gallon. If you stop at station i for gas, you must completely fill up your tank. Assume that you start with a full tank and that the d[i] are integers. Use dynamic programming to find a minimum cost sequence of stops.
- Unix diff. The Unix diff program compares two files line-by-line and prints out places where they differ. Write a program Diff.java that reads in two files specified at the command line one line at a time, computes the LCS on the sequence of constituent lines of each file, and prints out any lines corresponding to non-matches in the LCS.
- Longest common subsequence of 3 strings. Given 3 strings, find the longest common subsequence using dynamic programming. What is the running time and memory usage of your algorithm?
- Making change. Given A hundred dollar bills, B fifty dollar bills, C twenty dollar bills, D ten dollar bills, E five dollar bills, F one dollar bills, G half-dollars, H quarters, I dimes, J nickels, and K pennies, determine whether it is possible to make change for N cents. Hint: knapsack problem. (Greedy also works.)
- Making change. Suppose that you are a cashier in a strange country where the currency denominations are: 1, 3, 8, 16, 22, 57, 103, and 526 cents (or more generally d0, d1, ..., dN-1. Describe a dynamic programming algorithm to make change for c cents using the fewest number of coins. Hint: the greedy algorithm won't work since the best way to change 114 cents is 57 + 57 instead of 103 + 8 + 3.
- Longest increasing sequence. Given an array of N 64-bit integers, find the longest subsequence that is strictly increasing.Hint. Compute the longest common subsequence between the original array and a sorted version of the array where duplicate copies of an integer are removed.
- Longest common increasing sequence. Computational biology. Given two sequences of N 64-bit integers, find the longest increasing subsequence that is common to both sequences.
- Activity selection with profits. Job i has start time s_i, finish time f_i and profit p_i. Find best subset of jobs to schedule.
- Diff. Write a program that reads in two files and prints out their diff. Treat each line as a symbol and compute an LCS. Print out those lines in each file that aren't in the LCS.
- Knapsack problem. Knapsack.java.
- Text justification. Write a program that takes a command line argument N, reads text from standard input, and prints out the text, formatted nicely with at most N characters per line. Use dynamic programming.
- Viterbi algorithm. Given a directed graph where each edge is labeled with a symbol from a finite alphabet. Is there a path from one distinguished vertex x that matches the characters in the string s? Dynamic programming. A(i, v) = 0 or 1 if there is a path from x to v that consumes the first i characters of s. A(i, v) = max (A(i-1, u) : (u, v) in E labeled with s[i]).
- Viterbi algorithm. Speech recognition, handwriting analysis, computational biology, hidden Markov models. Suppose each edge leaving v has a probability p(v, w) of being traversed. Probability of a path is the product of the probability on that path. What is most probable path? Dynamic programming.
- Smith–Waterman algorithm. Local sequence alignment.
- Binomial coefficients (brute-force). The binomial coefficient C(n, k) is the number of ways of choosing a subset of k elements from a set of n elements. It arises in probability and statistics. One formula for computing binomial coefficients is C(n, k) = n! / (k! (n-k)!). This formula is not so amenable to direct computation because the intermediate results may overflow, even if the final answer does not. For example C(100, 15) = 253338471349988640 fits in a 64-bit long, but the binary representation of 100! is 525 bits long.Pascal's identity expresses C(n, k) in terms of smaller binomial coefficients:
SlowBinomial.java fails spectacularly for medium n or k, not because of overflow, but rather because the same subproblems are solved repeatedly. // DO NOT RUN THIS CODE FOR LARGE INPUTS public static long binomial(int n, int k) { if (k == 0) return 1; if (n == 0) return 0; return binomial(n-1, k) + binomial(n-1, k-1); }
- Binomial coefficients (dynamic programming). Write a program Binomial.java that takes two command-line arguments n and k and uses bottom-up dynamic programming to compute C(n, k).
Copyright © 2000–2019 Robert Sedgewick and Kevin Wayne. All rights reserved.