Data Structures( 数据结构 ) Course 1:Pseudocode, ADT, Algorithm Efficiency

Data Structures(数据结构)Course 1:Pseudocode, ADT, Algorithm Efficiency

Vocabulary • Pseudocode: 伪代码。 • Algorithm: 算法,解决一个问题的一系列基本步骤。 • Atomic Data: 不可拆分的数据，或原子数据。 • Composite data: 复合数据。 • Modular programming: 模块化程序设计。 • Object Oriented programming: 面向对象程序设计。 • Encapsulation:封装，是面向对象程序设计的重要思想之一。 • Abstract Data Type: 抽象数据类型。 • Algorithm Efficiency: 算法效率。

Pseudocode Abstract data type Algorithm efficiency analysis Highlights in this chapter

Algorithm A list of instructions which are carried out in a fixed order to find answer to a question. Tools to define algorithms There are several tools can be used to define algorithms, for example pseudocode, flow chart , and others. Pseudocode Pseudocode is one of the most common tools for defining algorithms. It is part English, part structured code. About Pseudocode

English Part Provide a relax syntax Code Part Consists of an extended version of basic algorithmic constructs(Sequence, Selection, Iteration 循环) About Pseudocode • We use pseudocode for both data structures and code.

Basic format for data type Count<Integer> About Pseudocode Type Name • The Structure of the • Node • data <DataType> • Link <Pointer to node> • End Node

Algorithm Header • Each algorithm begin with a header • Algorithm Header consists of : • Name:算法名称 • Parameters(参数）:pass by reference or pass by value • List: a short statement about what algorithm does • Pre: Pre Condition调用算法前提条件 • Post: Post Condition算法执行完后的状况描述

Purpose, Conditions and Return • Purpose:它不必描述算法的所有处理过程，而只是概要处理过程。 • Conditions • Pre: sometime there are no precondition(pre Nothing) • Several parameters, Pre should shown for each • Post :identifies any action taken and the status of any output parameters • Return: values returned, it will identified by a return condition. Otherwise, no return condition need

Purpose, Conditions and Return Example: Algorithm search( Val list <array>, Val argument <integer>, Ref Location <index>) Search array for specific item and return index location Pre List Contain data array to be searched argument contains data to be located in list Post Location contains index of element matching argument or undetermined if not find Return <Boolean> true if found, false if not found

Algorithm Header • Importance • The programmer using the algorithm often see only the header information • The Header information must complete enough to communicate to the programmer everything he or she must know to use the algorithm

Variables(变量） • 在一个算法中没有必要定义没一个变量，特别是变量名具有一定的含义。为了使变量容易理解，我们往往使用智能变量，也既是变量名称能描述变量的意思。 • Rules • Do not use single character names(i,j) • Do not use generic names(max,count,sum) • 可能在智能变量中采用缩写(StuCnt,NumofStu,Nostu)

Sequence statements Selection statements Loop statements Statement Constructs A sequence is a series of statements that do not alter the execution path within an algorithm. Because an algorithm has only one entry and one exit, a statement that call another algorithm can be regarded as a sequence statement. According to the evaluation, different execution path will be taken. Pseudocode statement is: 1 if (condition) 1 action1(statements group1) 2 else 1 action2(statements group2) 3 end if Iterate a block of statements, until the condition turns to false. Pseudocode is: 1 loop (condition) 1 action (statements) 2 end loop

Sample Algorithm Algorithm sample (ref pageNumber <integer>) This algorithm reads a file and prints a report Pre pageNumber must be initialized Post Report printed. pageNumber contains number of pages in report. Return Number of lines printed. 1 open file 2 lines=0 3 loop (not end of file) 1 read file 2 if (full page) 1 form feed 2 pageNumber= pageNumber+1 3 write page heading 3 end if 4 write report line 5 lines=lines+1 4 end loop 5 close file 6 return lines end sample Intelligent Data Name: The name of variable or data can tell the meaning of the data. This is very important for a computer programmer Statement numbers: These numbers are used to identify the individual statement. For instance statement form feed can be expressed as statement 3.2.1 Algorithm Header Algorithm name: sample Parameter(s): pageNumber Purpose description Pre-conditions Post-conditions Return

Development of programming techniques Data type Atomic Data Composite Data Data Structure Abstract Data Type The Abstract Data Type Data Structure can be defined as: 1.A combination of elements each of which is either a data type or another data structure. 2.A set of associations or relationships(structure) involving the combined elements • Non-structured, linear programs. In linear program, the logic flow wound through the program like spaghetti on a plate. • Modular programming. Programs were organized in functions or subroutines. In each function, statements were still organized in linear fashion. • Structure programming. Formulated in the 1970s and still use today. • Object oriented programming. Programs are taken as a series of object. Each of them encapsulates its properties, actions and operations. The objects interact each other, exchange information among them, and thus the task will be finished. • Atomic Data are single and non-decomposable entity. • Atomic data type is defined as a set of atomic data and a set of operation on the data. • An atomic data type has identical properties that distinguish it from another atomic data type. An abstract data type is a data declaration packaged together with the operations that are meaningful for the data type. That means: 1. Declaration of data 2. Declaration of operations 3. Encapsulation of data and operations • A data type consists two parts, a set of data and the operations that can be performed on the data. • 16bits integer can be taken as an example of a data type. The data set consists of numbers between the range of {-32768~32767}, and the operations are {+,-,*,/…} Composite data can be broken into sub-fields that have definite meaning.

Four Logic structures

Data Structure Internal data flow Private Functions Public Function A DATA Public Function B External Interface DATA Internal Call A Model for an Abstract Data Type

From the model we can get: The physical representation of the data type has been encapsulated within the ADT (the irregular outlined area). The operations of the data type have been also encapsulated within the ADT. ADT can interact and exchange information with outer environment through and only through public functions. A Model for an Abstract Data Type(Continue)

Application program ADT class object ADT class code Search Delete Insert Destroy Traverse Create Dynamic Memory The Concept of ADT Data Structure

Choose right algorithm to solve a problem. The efficiency of linear function depends on the number of instructions it contains. The study of algorithm efficiency focuses on loops. We introduce f(n) to express the efficiency of an algorithm, n represents the number of elements to be processed. f(n)=efficiency Algorithm Efficiency

Linear Loops Logarithmic Loops Nested Loops Linear logarithmic Quadratic Dependent Quadratic The average number of iterations in the inner loop is (10+1)/2, and the number of iteration in the outer loop is 10. Because the inner loop is controlled by the outer loop. Therefore the number of the execution of the application code is 10 multiply (10+1)/2. Generally, we can conclude: f(n) = n(n+1)/2 1 i =1 2 loop (i <=10) 1 j = 1 2 loop (j<=i) 1 Application code 2 j = j +1 3 end loop 4 i = i + 1 3 end loop The number of iterations in the inner loop is 10, and the number of iteration in the outer loop is 10. Because the inner loop is controlled by the outer loop. Therefore the times of the execution of the application code is 10* 10. Generally, we can conclude: f(n) = n2 1 i =1 2 loop (i <=10) 1 j = 1 2 loop (j<=10) 1 Application code 2 j = j +1 3 end loop 4 i = i + 1 3 end loop The number of iterations in the inner loop is log210, and the number of iteration in the outer loop is 10. Because the inner loop is controlled by the outer loop. Therefore the times of the execution of the application code is 10* log210. Generally, we can conclude: f(n)=n log2n 1 i =1 2 loop (i <=10) 1 j = 1 2 loop (j<=10) 1 Application code 2 j = j * 2 3 end loop 4 i = i + 1 3 end loop 1 i = 1 2 loop ( i<=1000) 1 application code 2 i = i+2 3 end loop 1 i = 1 2 loop ( i<=1000) 1 application code 2 i = i*2 3 end loop 1 i = 1000 2 loop ( i> =1) 1 application code 2 i = i / 2 3 end loop 1 i = 1 2 loop ( i<=1000) 1 application code 2 i = i+1 3 end loop Statement 2.1 and 2.2 will be executed for 500 times. The number of iteration is directly proportional to the loop factor. The efficiency of this algorithm segment is: f(n)=n/2 From the simple program, we got the conclusion that the application code had been executed for 10 times. That is exactly log21000. The efficiency of this algorithm segment is: f(n)= log2n From the simple program, we got the conclusion that the application code had been executed for 10 times. That is exactly log21000. The efficiency of this algorithm segment is: f(n)= log2n Statement 2.1 and 2.2 will be executed for 1000 times. The number of iteration is directly proportional to the loop factor. The efficiency of this algorithm segment is: f(n)=n Typical Loops and their efficiency

Big-O and its purpose. The way to derive Big-O expression In each term, set the coefficient of the term to 1. Keep the largest term in the function and discard the others. Terms are ranked from the lowest to highest as: 常数，log2n, n, nlog2n, n2, n3 ,…,nk , 2n , n! Big-O Notation

5n4+10nlog2n+100log2n Example of Big-O derivation Set the coefficient of each term to 1 • Keep the largest term in the function and discard the others. n4+nlog2n+log2n O(n4)

Problem analysis 4 2 1 6 1 7 10 3 8 + 3 -1 3 0 -3 4 3 2 -1 = 9 12 4 5 6 2 4 6 2 Big-O analysis Example(Add two matrices) • The calculation begins from the first column on the first row, followed by the second item on the first row, then the third one, until to the last item on the first row. • After completes the work on the first row; the same work will begin on the second row; and followed by the work on the successive rows until the work on all rows has been finished. • We can use nested loop algorithm to solve this problem, in the outer loop, complete the work on the rows. In the inner loop complete the work on each item on one row. 2 + 1 3 4 + 6 10

Algorithm addMatrix(val matrix1 <matrix>, val matrix2 <matrix>, val size <integer>, ref matrix3 <matrix>) Pre matrix1 and matrix2 have data Size is the number of columns and rows in matrix Post matrices added—result in matrix3 1 r=0 2 loop (r<size) 1 c=0 2 loop (c<size) 1 matrix3[r,c]=matrix1[r,c]+matrix2[r,c] 2 c=c+1 3 end loop 4 r=r+1 3 end loop 4 return Algorithm for add two matrices The outer loop in this algorithm will repeat for size times and within each outer loop the inner loop will also repeat for size times. Therefore the number of add of two items will be size×size or size2. So the efficiency of this algorithm is O(size2) or O(n2)

One of the most common tools used to define algorithms is pseudocode. Pseudocode is an English-like representation of the code required for an algorithm. It is part English, part structured code. Atomic data are data that are single, non-decomposable entities. An atomic data type is a set of atomic data with identical properties. Atomic data types are defined by a set of values and a set of operations that act on the values. A data structure is an aggregation of atomic data and composite data types into a set with defined relationship. An ADT is a data declaration packaged together with the operations that are meaningful for the data type. Algorithm efficiency is generally defined as a function of the number of elements being processed and the type of loop being used. Summary of this chapter

The efficiency of a logarithmic loop is f(n)=log2n The efficiency of a linear loop is f(n)=n. The efficiency of a linear logarithmic loop is f(n)=n(log2n). The efficiency of a dependent quadratic loop is f(n)=n(n+1)/2. The efficiency of a quadratic loop is f(n)= n2. The efficiency of a cubic loop is f(n)= n3. The simplification of efficiency is known as big-O notation. The seven standard measures of efficiencies are O(log2n), O(n), O(nlog2n), O(n2), O(nk), O(Cn), O(n!). Summary of this chapter (Continue)

Calculate the deviation(偏差）from a mean(average) requirement: Read number into array Calculate their average and print it Print the data in array and its deviation Complete this algorithm with pseudocode algorithm deviation Pre Nothing Post Average and numbers with their deviation printed 1 i=0 2 loop(not end of file) … Exercises

Determine the Big-O notation for the following 5n5/2+n2/5 6log2(n)+9n Calculate the run-time efficiency of the following program segment 1 i=1 2 loop (i<=n) 1 print (i) 2 i=i+1 3 end loop Exercises

Calculate the run-time efficiency of the following program segment 1 i=1 2 loop (i<=n) 1 j=1 2 loop(j<=n) 1 k=1 2 loop (k<=n) 1 print(i,j,k) 2 k=k+1 3 end loop 4 j=j+1 3 end loop 4 i=i+1 3 end loop Exercises

Recorder the following efficiencies from smallest to largest 2n,n!,N5,10000,nlog2(n) Determine the Big-O notation for the following 3n4+nlog2(n) 5n2+n3/2 Calculate the run-time efficiency of the following program segment 1 i=1 2 loop(i<=n) 1 print(i) 2 i=i+1 3 end loop homework

Data Structures( 数据结构 ) Course 1:Pseudocode, ADT, Algorithm Efficiency