Library of Congress Cataloging-in-Publication Data. Necaise, Rance D. Data structures and algorithms using Python / Rance D. Necaise. p. cm. Python provides several benefits over other languages such as C++ and Java, the most. Download as PDF, TXT or read online from Scribd. Flag for inappropriate . 4 A Run-time Analysis of BST Algorithms. 7. 8 C++ Introduction for Python Programmers. Introd uction. . Python's built-in data structures and large standard. Data Structures and Algorithms Using Python and C++ David M. Reed, John Zelle. THIS BOOK is intended for use in a traditional college-level data structures course (commonly known as. CS2). . John Zelle ebook PDF download. Data.

Data Structures And Algorithms Using Python And C++ Pdf

Language:English, French, German
Genre:Fiction & Literature
Published (Last):21.02.2016
ePub File Size:17.49 MB
PDF File Size:15.71 MB
Distribution:Free* [*Register to download]
Uploaded by: LILLIA

Data structures and algorithms using Python and C++ by David M. Reed and John Zelle Franklin, Beedle and Associates Full Text: PDF. THIS BOOK is intended for use in a traditional college-level data structures course (commonly known as CS2). This book assumes that students have learned. Problem Solving with Algorithms and Data Structures, Release CONTENTS. 1 To review the Python programming language. Getting.

Grinstead and J. Downey Think Stats: Probability and Statistics for Programmers - Allen B. A Quickstart guide - Paul Swartout, Packt. Edward Lavieri, Packt. Pretty Darn Quick: Selected Essays of Richard M. Gabriel Open Advice: Downey Think OS: Demeyer, S. Ducasse and O. A piece of cake! Smith ASP. Shotts, Jr. A Programmer's Guide - Jonathan E. David Carlson and Br. Morelli and R. Souza and Fabio M. Adams, Packt. The Definitive Guide - Matthew A.

Cooper, Jr. An Interactive Approach - Stuart C. Hoyte Lisp Hackers: Graham Paradigms of Artificial Intelligence Programming: Steele Jr. The Definitive Guide Mercurial: In fact , some software engineering methods employ fully formal mathematical notations for specifying all system components.

The use of these so-called formal methods adds precision to the development process by allowing properties of programs to be stated and proved mathematically.

In the best case, one might actually be able to prove the correctness of a program, that is, that the code of a program faithfully implements its specification. Using such methods requires substantial mathematical prowess and has not been widely adopted in industry. For now, we'll stick with somewhat less formal specifications but use well-known mathematical and programming notations where they seem appropriate and helpful.

Another important consideration is where to place specifications in code. In Python, a developer has two options for placing comments into code: Docstrings are carried along with the objects to which they are attached and are inspectable at run-time. Docstrings are also used by the internal Python help system and by the PyDoc documentation utility. This makes docstrings a particularly good medium for specifications, since API documentation can then be created automatically using PyDoc.

As a rule of thumb, docstrings should contain information that is of use to client programmers, while internal comments should be used for information that is intended only for the implementers. The basic idea of design by contract requires that if a function's precondition is met when it is called, then the postcondition must be true at the end of the function.

If the precondition is not met, then all bets are off. This raises an interesting question. What should the function do when the precondition is not met? From the standpoint of the specification, it does not matter what the function does in this case, it is "off the hook," so to speak.

If you are the implementer, you might be tempted to simply ignore any precondition violations. Sometimes, this means executing the function body will cause the program to immediately crash; other times the code might run, but produce nonsensical results.

Neither of these outcomes seems particularly good. A better approach is to adopt defensive programming practices. An unmet precondition indicates a mistake in the program. Rather than silently ignoring such a situation, you can detect the mistake and deal with it. But how exactly should the function do this?

One idea might be to have it print an error message. The sqrt function might have some code like this: The problem with printing an error message like this is that the calling program has no way of knowing that something has gone wrong. The output might appear, for example, in the middle of a generated report.

Furthermore, the actual error message might go unnoticed. In fact, if this is a general-purpose library, it's very possible that the sqrt function is called within a GUI program, and the error message will not even appear anywhere at all. Most of the time, it is simply not appropriate for a function that implements a service to print out messages unless printing something is part of the specification of the method.

It would be much better if the function could somehow signal that an error has occurred and then let the client program decide what to do about the problem. For some programs, the appropriate response might be to terminate the program and print an error message; in other cases, the program might be able to recover from the error.

The point is that such a decision can be made only by the client. The function could signal an error in a number of ways. Sometimes, returning an out-of-range result is used as a signal. Here's an example:. Since the specification of sqrt clearly implies that the return value cannot be negative, the value - 1 can be used to indicate an error.

Client code can check the result to see if it is OK. Another technique that is sometimes used is to have a global accessible to all parts of the program variable that records errors. The client code checks the value of this variable after each operation to see if there was an error. Of course, the problem with this ad hoc approach to error detection is that a client program can become riddled with decision structures that constantly check to see whether an error has occurred.

The logic of the code starts looking something like this: The continual error checking with each operation obscures the intent of the original algorithm. Most modern programming languages now include exception handling mecha nisms that provide an elegant alternative for propagating error information in a program. The basic idea behind exception handling is that program errors don't directly lead to a "crash," but rather they cause the program to transfer control to a special section called an exception handler.

What makes this particularly useful is that the client does not have to explicitly check whether an error has occurred. The client just needs to say, in effect, "here's the code I want to execute should any errors come up. In Python, run-time errors generate exception objects. A program can include a try statement to catch and deal with these errors.

For example, taking the square root of a negative number causes Python to generate a ValueError, which is a subclass of Python's general Exception class.

If this exception is not handled by the client, it results in program termination. Here is what happens interactively: The statement s indented under try are executed, and if an error occurs, Python sees whether the error matches the type listed in any except clauses.

The first matching except block is executed. If no except matches, then the program halts with an error message.

To take advantage of exception handling for testing preconditions, we just need to test the precondition in a decision and then generate an appropriate exception object. This is called raising an exception and is accomplished by the Python raise statement. The raise statement is very simple: When the raise statement executes, it causes the Python interpreter to interrupt the current operation and transfer control to an exception handler.

If no suitable handler is found, the program will terminate. The sqrt function in the Python library checks to make sure that its parameter is non-negative and also that the parameter has the correct type either int or float.

The code for sqrt could implement these checks as follows: Notice that there are no elses required on these conditions. When a raise executes, it effectively terminates the function, so the "compute square root" portion will only execute if the preconditions are met. Oftentimes, it is not really important what specific exception is raised when a precondition violation is detected.

The important thing is that the error is diagnosed as early as possible. Python provides a statement for erubedding assertions directly into code. The statement is called assert. It takes a Boolean expression and raises an Asserti onError exception if the expression does not evaluate to True. Using assert makes it particularly easy to enforce preconditions.

As you can see, the assert statement is a very handy way of inserting ssertions directly into your code. This effectively turns the documentation of preconditions and other assertions into extra testing that helps to ensure that programs behave correctly, that is, according to specifications. One potential drawback of this sort of defensive programming is that it adds extra overhead to the execution of the program.

A few CPU cycles will be consumed checking the preconditions each time a function is called. However, given the ever-increasing speed of modern processors and the potential hazards of incorrect programs, that is a price that is usually well worth paying.

That said, one additional benefit of the assert statement is that it is possible to turn off the checking of assertions, if desired. Executing Python with a -0 switch on the command line causes the interpreter to skip testing of assertions.

That means it is possible to have assertions on during program testing but turn them off once the system is judged to be working and placed into production. Of course, checking assertions during testing and then turning them off in the production system is akin to practising a tightrope act 10 feet above the ground with a safety net in place and then performing the actual stunt feet off the ground on a windy daywithout the net. As important as it is to catch errors during testing, it's even more important to catch them when the system is in use.

Our advice is to use assertions liberally and leave the checking turned on. One popular technique for designing programs that you probably already know about is top-down design. Top-down design is essentially the direct application of functional abstraction to decompose a large problem into smaller, more manageable components.

As an example, suppose you are developing a program to help your instructor with grading. Your instructor wants a program that takes a set of exam scores as input and prints out a report that summarizes student performance. Specifically, the program should report the following statistics about the data: This is the largest number in the data set.

This is the smallest number in the data set.

This is the "average" score in the data set. It is often denoted x and calculated using this formula: This is a measure of how spread out the scores are.

The Seven Spiritual Laws of Yoga: A Practical Guide to Healing

The standard is given by the following formula: In this formula x is the mean, Xi represents the ith data value, and n is the number of data values. The formula looks complicated, but it is not hard to compute. The expression x Xi 2 is the square of the "deviation" of an individual item from the mean. The numerator of the fraction is the sum of the deviations squared across all the data values. As a starting point for this program, you might develop a simple algorithm such as this. Get scores from the user Calculate the minimum score Calculate the maximum score Calculate the average mean score Calculate the st andard deviat ion.

Suppose you are working with a friend to develop this program. You could divide this algorithm up into parts and each work on various pieces of the program.

Before going off and working on the pieces, however, you will need a more complete design to ensure that the pieces that each of you develops will fit together to solve the problem. Using top-down design, each line of the algorithm can be written as a separate function. The design will just consist of the specification for each of these functions. One obvious approach is to store the exam scores in a list that can be passed as a parameter to various functions.

Using this approach, here is a sample design:. With the specification of these functions in hand, you and your friend should easily be able to divvy up the functions and complete the program in no time. Let's implement one of the functions just to see how it might look. Notice how this code relies on the average function.


Since we have that function specified, we can go ahead and use it here with confidence, thus avoiding duplication. This is a convenient way of accumulating a sum. The rest of the program is left for you to complete. As you can see, top-down design and functional specification go hand in hand. As necessary functionality is identified, a specification formalizes the design decisions so that each part can be worked on in isolation.

You should have no trouble finishing up this program. In order for specifications to be effective, they must spell out the expectations of both the client and the implementation of a service.

Any effect of a service that is visible to the client should be described in the postcondition. This is bad code. Don ' t use it. This version uses the pop method of Python lists. The call to nums. The loop continues until all the items in the list have been processed. However, the list object nums passed as a parameter is mutable, and the changes to the list will be visible to the client. These sorts of interactions between function calls and other parts of a program are called side effects.

Generally, it's a good idea to avoid side effects in functions, but a strict prohibition is too strong. Some functions are designed to have side effects. The pop method of the list class is a good example. It's used in the case where one wants to get a value and also, as a side effect , remove the value from the list. What is crucial is that any side effects of a function should. The only visible effects of a function should be those that are described in its postcondition.

By the way, printing something or placing information in a file are also examples of side effects. When we said above that functions should generally not print any thing unless that is part of their stated functionality, we were really just identifying one special case of potentially undocumented side effects.

When we start dealing with programs that contain collections of data, we often need to know more about a function than just its pre- and postconditions. Dealing with a list of 10 or even exam scores is no problem, but a list of customers for an online business might contain tens or hundreds of thousands of items.

A programmer working on problems in biology might have to deal with a DNA sequence containing Inillions or even billions of nucleotides. Applications that search and index web pages have to deal with collections of a similar magnitude.

When collection sizes get large, the efficiency of an algorithm can be just as critical as its correctness. An algorithm that gives a correct answer but requires 10 years of computing time is not likely to be very useful. Algorithm analysis allows us to characterize algorithms according to how much time and memory they require to accomplish a task. In this section, we'll take a first look at techniques of algorithm analysis in the context of searching a collection.

Searching is the process of looking for a particular value in a collection.

For example, a program that maintains the membership list for a club might need to look up the information about a particular member. This involves some form of a search process. It is a good problem for us to examine because there are numerous algorithms that can be used, and they differ in their relative efficiency. Boiling the problem down to its simplest essence, we'll consider the problem of finding a particular number in a list. The same principles we use here will apply to more complex searching problems such as searching through a customer list to find those who live in Iowa.

No-Drama Discipline: The Whole-Brain Way to Calm the Chaos and Nurture Your Child's Developing Mind

The specification for our simple search problem looks like this:. II lI lI Locate target in item s pre: Here are a couple interactive examples that illustrate its behavior: In the first example, the function returns the index where 4 appears in the list. In the second example, the return value -1 indicates that 7 is not in the list. Using the built-in Python list methods, the search function is easily imple mented: The index method returns the first position in the list where a target value occurs.

If target is not in the list, index raises a ValueError exception. In that case, we catch the exception and return - 1. Clearly, this function meets the specification; the interesting question for us is how efficient is this method?

One way to determine the efficiency of an algorithm is to do empirical testing. We can simply code the algorithm and run it on different data sets to see how long it takes.

A simple method for timing code in Python is to use the t ime module's time function, which returns the number of seconds that have passed since January 1 , We can just call that method before and after our code executes and print the difference between the times.

If we placed our search function in a module named search1 py, we could test it directly like this: Try this code on your computer and note the time to search for the three numbers. What does that tell you about how the index method works? By the way, the Python library contains a module called timei t that provides a more accurate and sophisticated way of timing code. If you are doing much empirical testing, it's worth checking out this module. Let's try our hand at developing our own search algorithm using a simple "be the computer" strategy.

Suppose that I give you a page full of numbers in no particular order and ask whether the number 13 is in the list. How will you solve this problem? If you are like most people, you simply scan down the list comparing each value to When you see 13 in the list, you quit and tell me that you found it. If you get to the very end of the list without seeing 13, then you tell me it's not there.


This strategy is called a linear search. You are searching through the list of items one by one until the target value is found. This algorithm translates directly into simple code. You can see here that we have a simple f or loop to go through the valid indexes for the list range len items. We test the item at each position to see if it is the target. If the target is found, the loop terminates by immediately returning the index of its position.

If this loop goes all the way through without finding the item, the function returns - 1. One problem with writing the function this way is that the range expression creates a list of indexes that is the same size as the list being searched.

Since an int generally requires four bytes 32 bits of storage space, the index list in our test code would require four megabytes of memory for a list of one million numbers. In addition to the memory usage, there would also be considerable time wasted creating this second large list. Python has an alternative form of the range function called xrange that could be used instead. An xrange is used only for iteration, it does not actually create a list.

However, the use of xrange is discouraged in new Python code. This elegant alternative allows you to iterate through a list and, on each iteration, you are handed the next index along with the next item. Here's how the search looks using enumerate. Notice that all of these search functions implement the same algorithm, namely linear search. How efficient is this algorithm? To get an idea, you might try experimenting with it.

Try timing the search for the three values as you did using the list index method.

The only code you need to change is the import of the actual search function, since the parameters and return values are the same. Because we wrote to a specification, the client code does not need to change, even when different implementations are mixed and matched. This is implementation independence at work. Pretty cool, huh? The linear search algorithm was not hard to develop, and it will work very nicely for modest-sized lists. For an unordered list, this algorithm is as good as any.

The Python in and index operations both implement linear searching algorithms. If we have a very large collection of data, we might want to organize it in some way so that we don't have to look at every single item to determine where, or if, a particular value appears in the list.

Suppose that the list is stored in sorted order lowest to highest. As soon as we encounter a value that is greater than the target value, we can quit the linear search without looking at the rest of the list. On average, that saves us about half of the work. But if the list is sorted, we can do even better than this.

When a list is ordered, there is a much better searching strategy, one that you probably already know.

Data Structures and Algorithms Using Python

Have you ever played the number guessing game? I pick a number between 1 and , and you try to guess what it is. Each time you guess, I will tell you if your guess is correct, too high, or too low. What is your strategy? If you play this game with a very young child, they might well adopt a strategy of simply guessing numbers at random.

An older child might employ a systematic approach corresponding to linear search, guessing 1 , 2, 3, 4, and so on until the lnystery value is found. Of course, virtually any adult will first guess If told that the number is higher, then the range of possible values is The next logical guess is Each time we guess the middle of the remaining numbers to try to narrow down the possible range. This strategy is called a binary search Binary means two, and at each step, we are dividing the remaining numbers into two parts.

We can employ a binary search strategy to look through a sorted list. The basic idea is that we use two variables to keep track of the endpoints of the range in the list where the item could be. Initially, the target could be anywhere in the list, so we start with variables low and high set to the first and last positions of the list, respectively. The heart of the algorithm is a loop that looks at the item in the middle of the remaining range to compare it to x. If x is smaller than the middle item, then we move high, so that the search is narrowed to the lower half.

If x is larger, then we move low, and the search is narrowed to the upper half. The loop terminates when x is found or there are no longer any more places to look i. The code below implements a binary search using our same search API. Found it! This algorithm is quite a bit more sophisticated than the simple linear search. You might want to trace through a couple of sample searches to convince yourself that it actually works.

So far, we have developed two very different algorithms for our simple searching problem. Which one is better? Well, that depends on what exactly we mean by better.

The linear search algorithm is much easier to understand and implement. On the other hand, we expect that the binary search is more efficient, because it doesn't have to look at every value in the list. Intuitively, then, we might expect the linear search to be a better choice for small lists and binary search a better choice for larger lists.

How could we actually confirm such intuitions? One approach would be to do an empirical test. We could simply code both algorithms and try them out on various-sized lists to see how long the search takes. These algorithnls are both quite short, so it would not be difficult to run a few experiments.

When this test was done on one of our computers a somewhat dated laptop , linear search was faster for lists of length 10 or less, and there was not much noticeable difference in the range of length 1 , After that, binary search was. For a list of a million elements, linear search averaged 2. The empirical analysis has confirmed our intuition, but these are results from one particular machine under specific circumstances amount of memory, processor speed, current load, etc.

How can we be sure that the results will always be the same? Another approach is to analyze our algorithms abstractly to see how efficient they are.

Other factors being equal, we expect the algorithm with the fewest number of "steps" to be the more efficient. But how do we count the number of steps? For example, the number of times that either algorithm goes through its main loop will depend on the particular inputs.

We have already guessed that the advantage of binary search increases as the size of the list increases.

Become a Programmer, Motherfucker

Computer scientists attack these problems by analyzing the number of steps that an algorithm will take relative to the size or difficulty of the specific problem instance being solved.

For searching, the difficulty is determined by the size of the collection. Obviously, it takes more steps to find a number in a collection of a million than it does in a collection of ten. The pertinent question is how many steps are needed to find a value in a list of size n.

We are particularly interested in what happens as n gets very large. Let's consider the linear search first. If we have a list of 10 items, the most work our algorithm might have to do is to look at each item in turn. The loop will iterate at most 10 times. Suppose the list is twice as big. Then we might have to look at twice as many items.

If the list is three times as large, it will take three times as long, etc. In general, the amount of time required is linearly related to the size of the list n. This is what computer scientists call a linear time algorithm. Now you really know why it's called a linear search.

What about the binary search? Let's start by considering a concrete example. Suppose the list contains 16 items. Each time through the loop, the remaining range is cut in half.

After one pass, there are eight items left to consider. The next time through there will be four, then two, and finally one. How many times will the loop execute? It depends on how many times we can halve the range before running out of data.

This table might help you to sort things out: List Size 1 2 4 8 Can you see the pattern here? Each extra iteration of the loop allows us to search a list that is twice as large. If the binary search loops i times, it can find a single value in a list of size 2i. Each time through the loop, it looks at one value the middle in the list.

To see how many items are examined in a list of size n, we need to solve this relationship: In this formula, i is just an exponent with a base of 2. Using the appropriate logarithm gives us this relationship: If you are not entirely comfortable with logarithms, just remember that this value is the number of times that a collection of size n can be cut in half. OK, so what does this bit of math tell us? Binary search is an example of a log time algorithm. The amount of time it takes to solve a given problem grows as the log of the problem size.

In the case of binary search, each additional iteration doubles the size of the problem that we can solve. You might not appreciate just how efficient binary search really is. Let's try to put it in perspective. Suppose you have a New York City phone book with, say, 12 million names listed in alphabetical order.

You walk up to a typical New Yorker on the street and make the following proposition assuming their number is listed: Each time I guess a name, you tell me if your name comes alphabetically before or after the name I guess.

Our analysis above shows the answer to this question is l 12,, If you don't have a calculator handy, here is a quick way to estimate the result. So, searching a million items requires only 20 guesses. Continuing on, we need 21 guesses for two million, 22 for four million, 23 for eight million, and 24 guesses to search among sixteen million names. We can figure out the name of a total stranger in New York City using only 24 guesses!

By comparison, a linear search would require on average 6 million guesses. Binary search is a phenomenally good algorithm!

We said earlier that Python uses a linear search algorithm to implement its built in searching methods. If a binary search is so much better, why doesn't Python use it? The reason is that the binary search is less general; in order to work, the list must be in order.

If you want to use binary search on an unordered list, the first thing you have to do is put it in order or sort it. This is another well-studied problem in computer science, and one that we will return to later on. In the comparison between linear and binary searches we characterized both algo rithms in terms of the number of abstract steps required to solve a problem of a. We determined that linear search requires a number of steps directly proportional to the size of the list, whereas binary search requires a number of steps proportional to the base 2 log of the list size.

The nice thing about this characterization is that it tells us something about these algorithms independent of any particular implementation. We expect binary search to do better on large problems because it is an inherently more efficient algorithm. When doing this kind of analysis, we are not generally concerned with the exact number of instructions an algorithm requires to solve a specific problem.

This is extremely difficult to determine, since it will vary depending on the actual machine language of the computer, the language we are using to implement the algorithm, and in some cases, as we saw with the searching algorithms, the specifics of the particular input. Instead, we abstract away many issues that affect the exact running time of an implementation of an algorithm; in fact, we can ignore all the details that do not affect the relative performance of an algorithm on inputs of various sizes.

Always keep in mind that our goal is to determine how the algorithm will perform on large inputs. After all, computers are fast; for small problems, efficiency is unlikely to be an issue. To summarize, in performing algorithm analysis, we can generally make the following simplifications.

We ignore the differences caused by using different languages and different machines to implement the algorithm. We ignore the differences in execution speed of various operations i.

We assume all constant time operations that are independent of the input size are equivalent Le. Obviously, each of these simplifications could make a significant difference in comparing the actual running time of two algorithms, or even two implementations of the same algorithm, but the result still shows us what to expect as a function of the input size.

Hence, the results do tell us what kind of relative performance to expect for larger problems. Computer scientists use a notation known both as big o or asymptotic notation to specify the efficiency of an algorithm based on these simplifications. Before looking at the details of big 0 notation, let's look at a couple simple mathematical functions to gain some intuition. Suppose you are trying to estimate the value of this function as n grows very large. You would be justified in only considering the first term.

Although for smaller values of n the lOOn term dominates, when n gets large, the contributions of the second and third term are insignificant. To see why the first term dominates as n increases, you just have to look at the "shape" of the graphs for the first and second terms see Figure 1. No matter what constants we multiply these functions by, the shape of the two graphs dictates that for sufficiently large values, the curve for x 2 will eventually dominate.

The idea of a dominating function is formalized in big 0 notation. To prove an algorithm is O n2 we would have to find those two constants. In most cases,. We do not need to care about having a tight bound. If an algorithm is 2n3 , can we find two constants to prove it is O n3?

In practice we generally do not worry about finding the constants. In most cases, it is fairly easy to convince ourselves of the relative growth rate. It should be clear that for any polynomial, it is the largest degree that matters so any polynomial of degree x is O nX. Now that you've seen the mathematical details, let's look at some short examples and determine the running time.

This code fragment is O n. The input size, n, determines how many operations occur. The print statement will be executed n times. The input statement will be executed once. If we think about how the f or statement works, we realize that the range statement generates a list of n items that itself takes at least n steps. WordPress Shortcode. Published in: Full Name Comment goes here. Are you sure you want to Yes No. Be the first to like this. No Downloads. Views Total views.

Actions Shares. Embeds 0 No embeds. No notes for slide. This book assumes that students have learned the basic syntax of 3.Priority Queues and Heaps. Program assertions document a program by stating what must be true at a given point of execution. The big 0 notation allows us to extrapolate and determine how long our program will take to run on a larger data set.

By virtue of being anonymous during the interview, the inclusive interview process is unbiased and low risk. The rank and suit operations simply unpackage the appropriate part of the card tuple.

Card rank name post: In the first example, the function returns the index where 4 appears in the list. After completing the discussion of linked structures in Chapter 4, the basic concepts of stacks and queues can be covered quickly, or the example applications can be used to continue developing algorithm and design skills.

Read up on the recommended coding style for your language and stick to it. This code fragment runs in the same constant time regardless of the input, and we refer to all constant operations as simply 0 1.