1 / 79

Bioinformatics Programming

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang). In the last slide. More Unix features worthy to mention job control I/O redirection and piping text processing (vi, grep , sed , awk , …) Programming vs. language. Programming. Before.

kuniko
Download Presentation

Bioinformatics Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Programming EE, NCKU Tien-Hao Chang (Darby Chang)

  2. In the last slide • More Unix features worthy to mention • job control • I/O redirection and piping • text processing (vi, grep, sed, awk, …) • Programming vs. language

  3. Programming

  4. Before Learning advanced data structures and the associated algorithms

  5. struct A brick to construct advanced data structure in C

  6. struct • struct is similar to array from the view that both of them can aggregate a set of objects into a single object (here is not that one in object-oriented) • array: aggregate objects with the same type • struct: aggregate objects with different types • struct is the condensation of ‘structure’ • Each entry is a struct declaration is usually called a ‘field’ or ‘member’

  7. structDeclaration • A struct declaration consists of a list of fields, each of which can have any type • structmydata { //declare the structure of mydata char name[8]; char id[10];int math;int eng;}; • defines a type, referred to as structmydata • To create a new variable of this type • // define a variable ‘student’ of the type ‘mydata’structmydata student;

  8. structThe Memory Space Memory name id Student math eng

  9. structTest Memory Space • #include<stdio.h>#include<stdlib.h>int main(void) {struct data { char name[10]; char sex[2];int math; };struct data student;printf("sizeof(student)=%d\n", sizeof(student)); return 0;} • Result 16

  10. structAccess Fields • The dot (.) operator • struct_variable.field_name • For example • student.math = 90; • student.eng = 20; • printf("%s’s Math score is %d\n", student.name, student.math); • A convenient shortcut to initializing members of struct is shown below • struct data student={"Mary Wang",74};

  11. structArray of Structures • You may define an array of structures • struct student { //declare the structure of student char name[8]; char id[10];int math;int eng;};// define an array of 3 variable of the type ‘student’struct student stu[3]; name id math eng .. .

  12. structPointer to Structure • Pointers can be used to refer to a struct by its address • structmydata { // declare the structure of mydata char name[8]; char id[10];int math;int eng;} student; // define a mydata variable, studentstructmydata * ptr; // define a pointer of mydataptr = &student; // point ptr to the variable, student • Access files from struct pointers • the dereference (->) operator • struct_pointer_variable->field_name • student->math = 90

  13. structNested Structures • Since struct declaration constructs new types, it is trivial to use struct fields just like normal types such as int, double, … • #include<stdio.h>#include<stdlib.h>int main(void) {struct date { // declare dateint month;int day; };struct student { // declare nested structure, student char name[10];int math;struct date birthday; } s1={"David Li", 80, {2,10}}; // define a student variable, s1printf("student name:%s\n",s1.name);printf("birthday:%d month, %d day\n", s1.birthday.month, s1.birthday.day);printf("math grade:%d\n",s1.math); return 0;}

  14. structSelf-referential Structure • Fields are not allowed to be defined as the same type as the declaration they belong • But fields can be defined as pointers to the same type as the declaration they belong • Such a struct with pointer fields referencing to the same strcut type, is called self-referential structure • struct PERSON { char name[8];int age;struct PERSON * son; // self-referential pointer};

  15. Why Fields are not allowed to be defined as the same type as the declaration they belong? But fields can be defined as pointers to the same type as the declaration they belong? Hint: think from the perspective of memory

  16. The Closeness Between C and the realistic representation is the reason of both a) why C-based program is so fast and b) why C is suitable for teaching

  17. Languages Comparison • Since the 1950s, computer scientists have devised thousands of programming languages. Many are obscure, perhaps created for a Ph.D. thesis and never heard of since. • Compiling to machine code • some languages transform programs directly into Machine Code—the instructions that a CPU understands directly • this transformation process is called compilation • assembly, C, and C++ • Interpreted languages • other languages are either interpreted such as Basic, Perl, and Javascript • or a mixture of both being compiled to an intermediate language, including Java and C#

  18. Languages ComparisonCompile vs. Interpret • An interpreted language is processed at runtime. Every line is read, analyzed, and executed. Having to reprocess a line every time in a loop is what makes interpreted languages so slow. • this overhead results in that interpreted code runs between 5–10 times slower than compiled code • their advantage is not needing to be recompiled after changes and that is handy when you're learning to program. • Because compiled programs almost always run faster than interpreted, languages such as C and C++ tend to be the most popular for writing games. • Java and C# both compile to an interpreted language which is very efficient. Because the Virtual Machine that interprets Java and the .NET framework that runs C# are heavily optimized, it's claimed that applications in those languages are as fast if not faster as compiled C++.

  19. Languages ComparisonLevel of Abstraction • How close a particular language is to the hardware? • Machine Code is the lowest level followed by assembly. • C++ is higher than C because C++ offers greater abstraction. • Java and C# are higher than C++ because they compile to an intermediate language called bytecode. • When computers first became popular in the 1950s, programs were written in machine code. Programmers had to physically flip switches to enter values. This is such a tedious and slow way of creating an application that higher level computer languages had to be created.

  20. http://www.evula.org/dragoon/pics/supercoder.jpg Super coder!

  21. Assembler: Fast to run, slow to write • The readable version of Machine Code • Mov A,$45 • Because it is tied to a particular CPU, assembly is not very portable. • Languages like C have reduced the need for assembly except where memory is limited or time critical code is needed. This is typically in the kernel code or in a driver. • Basic: For beginners • Basic is an acronym for Beginners All purpose Symbolic Instruction Code and was created to teach programming in the 1960s. • Microsoft have made the language their own with many different versions including VBScript for websites and the very successful Visual Basic. • It is an interpreted language with the only advantage of easy-to-learn. But now it is more like a syntax alternative to C because most programmers are lazy. • Pascal: Conscientious programming • Pascal was devised as a teaching language a few years before C but had limited usage. • Until Borland's Turbo Pascal (for Dos) and Delphi (for Windows) appeared, it is suitable for commercial development. • However Borland was up against Microsoft and lost the battle.

  22. C: System programming • C was devised in the early 1970s by Dennis Ritchie. It can be thought of as a general purpose tool—very useful and powerful but very easy to let bugs through that can make systems insecure. • C has been described as portable assembly. • The syntax of many scripting languages is based on C. • C++: A classy language • C++ (or C plus classes as it was originally known) came about ten years after C and successfully introduced Object Oriented Programming to C, as well as features like exceptions and templates. • Learning all of C++ is a big task—it is by far the most complicated of the programming languages here but once you have mastered it, you'll have no difficulty with any other language. • C#: Microsoft's big bet • C# was created by Delphi's architect Anders Hejlsberg after he moved to Microsoft and Delphi developers will feel at home with features such as Windows forms. • C# syntax is very similar to Java, which is not surprising as Hejlsberg also worked on J++ after he moved to Microsoft. • Learn C# and you are well on the way to knowing Java. Both languages are semi-compiled, so that instead of compiling to machine code, they compile to bytecode and are then interpreted.

  23. Perl: Websites and utilities • Very popular in the Linux world, Perl was one of the first web languages and remains very popular today. • For doing ‘quick and dirty’ programming on the web it remains unrivalled and drives many websites. • It has though been somewhat eclipsed by PHP as a web scripting language. • PHP: Websites coding • PHP was designed as a language for Web Servers and is very popular in conjunction with Linux, Apache, MySql and PHP or LAMP for short. • It is interpreted, but pre-compiled so code executes reasonably quickly. • It can be run on desktop computers but is not as widely used for developing desktop applications. • Based on C syntax, it also includes Objects and Classes. • JavaScript : Programs in your browser • Javascript is nothing like Java, instead its a scripting language based on C syntax but with the addition of Objects and is used mainly in browsers. • JavaScript is interpreted and a lot slower than compiled code but works well within a browser. • Invented by Netscape and in doldrums for years. Popular again because of AJAX; Asynchronous Javascript and XML. This allows parts of web pages to update from the server without redrawing the entire page.

  24. http://www.simplyhired.com/a/jobtrends/graph/q-Perl%2C+Ruby%2C+Python%2C+Php%2C+Javascript%2C+Flex%2C+Groovy/t-linehttp://www.simplyhired.com/a/jobtrends/graph/q-Perl%2C+Ruby%2C+Python%2C+Php%2C+Javascript%2C+Flex%2C+Groovy/t-line

  25. Languages ComparisonSummary • Other noteworthy programming languages • Java, Python, Ruby, Go, … • The popularity forms for many reasons • history (programmers are lazy), business, and functionality • Lasting wars • Java vs. .NET (C will, in some form, live forever) • Perl vs. PHP vs. Ruby (web programming) • Perl vs. Python (scripting) • There might be a dominant system language and a scripting language in the future, but it probably converges to a coexistence world. Lower Level » easy to debug » faster program » general purpose » powerful to do evil » more readable » faster to develop » more coding sugar » avoid careless mistakes Higher Level

  26. Algorithm

  27. Algorithm • Specification • a finite set of instructions that accomplishes a particular task • criteria • input: zero or more quantities that are externally supplied • output: at least one quantity is produced • definiteness: clear and unambiguous • finiteness: terminate after a finite number of steps • effectiveness: instruction is basic enough to be carried out • Representation • a natural language, like English or Chinese • a graphic, like flowcharts • a computer language, like C

  28. AlgorithmSelection Sort • From those integers that are currently unsorted, find the smallest and place it next in the sorted listi [0] [1] [2] [3] [4]- 30 10 50 40 200 10 30 50 40 201 10 20 50 40 302 10 20 30 40 503 10 20 30 40 50

  29. AlgorithmBinary Search • [0] [1] [2] [3] [4] [5] [6]8 14 26 30 43 50 52left right middle [middle] : target0 6 3 30 < 434 6 5 50 > 434 4 4 43 == 43 (found)0 6 3 30 > 180 2 1 14 < 182 2 2 26 > 182 1 - (not found) • Searching a sorted listwhile (there are more integers to check) { middle = (left + right) / 2; if (target < list[middle]) right = middle - 1; else if (targeeet == list[middle]) return middle; else left = middle + 1;}

  30. intbinsearch(int list[], int target,int left, int right){int middle; while (left <= right) { middle = (left + right) / 2; switch (COMPARE(list[middle], target)) { case -1: left = middle + 1; break; case 0: return middle; case 1: right = middle – 1; } } return -1;}» Program 1.6: Searching an ordered list

  31. AlgorithmRecursive Algorithms • Beginning programmers view a function as something that is invoked (called) by another function • it executes its code and then returns control to the calling function • This perspective ignores the fact that functions can call themselves (direct recursion) • They may call other functions that invoke the calling function again (indirect recursion) • extremely powerful • frequently allow us to express an otherwise complex process in very clear term • We should express a recursive algorithm when the problem itself is defined recursively

  32. intbinsearch(int list[], int target,int left, int right){int middle; while (left <= right) { middle = (left + right) / 2; switch (COMPARE(list[middle], target)) { case -1: returnbinsearch(list,target,middle+1,right); case 0: return middle; case 1 : returnbinsearch(list,target,left,middle-1); } } return -1;}» Program 1.7: Recursive implementation of binary search

  33. Data Abstraction

  34. Data Abstraction • Data type • A data type is a collection of objects and a set of operations that act on those objects • For example, the data typeintconsists of the objects{0, +1, -1, +2, -2, …, INT_MAX, INT_MIN}and the operations+, -, *, /, and % • The data types of C • basic data types: char, int, float, and double • group data types: array and struct • pointer data type • user-defined types • Abstract data type • An abstract data type (ADT) is a data type that is organized in such a way that the specification of the objects and the operations on the objects is separated from the representation of the objects and the implementation of the operations. • We know what is does, but not necessarily how it will do it.

  35. The array as an ADT

  36. To Evaluate which algorithm is better

  37. AlgorithmPerformance Analysis • Criteria • Is it correct? • Is it readable? • … • Performance analysis (machine independent) • space complexity: storage requirement • time complexity: computing time • Performance measurement (machine dependent)

  38. Performance AnalysisSpace Complexity • S(P)=C+SP(I) • Fixed space requirements (C) • independent of the inputs and outputs • instruction, constants, simple variables • Variable space requirements (SP(I)) • depend on the instance characteristic I • number, size, values of inputs and outputs associated with I • recursive stack space, including formal parameters, local variables, and return address

  39. Analyze Someone’s exercise

  40. The recursion stack space needed is 6(n+1), since the depth of recursion is n+1.

  41. Performance AnalysisTime Complexity • T(P)=C+TP(I) • The time, T(P), taken by a program, P, is the sum of its compile time C and its run (or execution) time, TP(I) • TP(I)=caADD(I)+csSUB(I)+… • Program step: A syntactically or semantically meaningful program segment whose execution time is independent of the instance characteristics. • Introduce a new variable, count, into the program • Tabular method

  42. Time ComplexityIterative Summation • float sum(float list[], int n) { float tmp = 0; ++count; // for assignmentint I; for (i = 0; i < n; ++i) { ++count; // for the for looptmp += list[i]; ++count; // for assignment } ++count; // last execution of for ++count; // for return return tempsum;} • 2n+3 steps

  43. Time ComplexityTabular Method

More Related