Comparison of Citation Discovery Methods

1 / 1

# Comparison of Citation Discovery Methods - PowerPoint PPT Presentation

Comparison of Citation Discovery Methods. Using AMS, AGU and IEEE Journal Search Tools: AMS 2 EASI 7 IEEE 2 Wiley 4 Elsevier 7 ( can not confirm ) Total 22 Advantages: C an specify month range in search Can search in full text of document Can time order search results

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Comparison of Citation Discovery Methods

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### CS201: PART 1

Data Structures & Algorithms

S. Kondakcı

Analysis of Algorithms

Algorithm

Input

Output

• An algorithmis a step-by-step procedure for
• solving a problem in a finite amount of time.
• Theoretical Analysis of Algorithms:
• Uses a high-level description of thealgorithm instead of an implementation
• Characterizes running time as afunction of the input sizen.
• Takes into account all possible inputsAllows us to evaluate the speed of any design independent of its implementation.

Analysis of Algorithms

Program Efficiency

Program efficiency: is a measure of the amount of resources required to produce desired results.

Efficiency Aspects:

1) What are the important resources we should try to optimize?

2) Where are the important efficiency gains to be made?

3) How important is efficiency in the first place?

Analysis of Algorithms

Efficiency Today
• User Efficiency.The amount of time and effort users will spend to learn how o use the program, how to prepare the data, how to configure and customize the program, and how to interpret and use the output.
• Maintenance Efficiency. The amount of time and effort maintenance group will spend reading a program and its technical documentation in order to understand it well enough to make any necessary modifications.
• Algorithmic Complexity. The inherent efficiency of the method itself, regardless of which machine we run it on or how we code it.
• Coding Efficiency. This is the traditional efficiency measure. Here we are concerned with how much processor time and memory space a computer program requires to produce desired results. Coding efficiency is the key step towards optimal usage of machine resources.

Analysis of Algorithms

Programmer’s Duty
• Programmers should should keep these in mind:
• Correct, robust, and reliable.
• Easy to use for its intended end-user group.
• Easy to understand and easy to modify.
• Portable.
• Consistency in Input/Output behavior.
• User documentation.

Analysis of Algorithms

Optimization
• Optimization on CPU-Time: Consider a network security assessment tool as a real-time application. The application works like a security scanner protocol designed to audit, monitor, and correct all aspects of network security. Real-time processing of the intercepted network packets containing inspection information requires faster data processing. Besides, such a process should generate some auditing information.
• Optimization on Memory:Developing programs that do not fit into the memory space available on your systems is often quite a bit demanding. Kernel level processing of the network packets requires kernel memory optimization and a powerful and failsafe memory management capability.
• Providing Run-time Continuity:Extensive machine-level optimization is a major requirement for continuously running programs, such as the security scanner daemons.
• Reliability and Correctness:One of the inevitable efficiency requirements is the absolute reliability. The second important efficiency factor is correctness. That is, your program should do exactly what it is supposed to do. Choosing and implementing a reliable inspection methodology should be done with precision.
• Optimization on Programmer’s Time:How efficient a programmer works depends on the choice of team policy and developmen tool selection.

Analysis of Algorithms

CodingEfficiency: Unstructured Code

/Efficient Programming/S. Kondakci-1999

Analysis of Algorithms

CodingEfficiency: Structured Code

/Efficient Programming/S. Kondakci-1999

Analysis of Algorithms

Protecting Against Run-time Errors
• Illegal pointer operations.
• Array subscript out of bound.
• Endless loops may cause stacks grow into the heap area.
• Presentational errors, such as network byte order, number conversions, division by zero, undefined results, e.g., tan(90) = undefined.
• Trying to write over the kernel’s text area, or the data area.
• Referencing objects declared as prototype but not defined.
• Performing operations on a pointer pointing at NULL.
• Operating system weaknesses.

Analysis of Algorithms

Assertions

A general pitfall:making assumptions that turn out not to be justified.

Most of the mistakes arise from simply misunderstanding the interaction between various pieces of code

The assertion rulestates that you should always express yourself boldly or forcefully of the fact that there are some other things that you have not covered clear enough yet. Any assumptions you make in writing your programs should be documented somewhere in the code itself, particularly if you know or expect the assumption to be false in other environments.

Analysis of Algorithms

Does the Machine Understand Your Assumptions?

Remember those assumptions are yours:They should be presented to the machine by any means that you are supposed to provide in your code. The machine will not be able to check your assumptions. This is simply a matter of including explicit checks in your code, even for things that “cannot happen”.

if (p == NULL)

panic(“Driver routine: p is NULL\n”);

if (p->p_flags & BUSY); /* Safe to continue */

…<etcetera>

ASSERT(p !=NULL);

If (p->p_flags & BUSY); /* Safe to continue */

…<etcetera> …

Analysis of Algorithms

Guidelines for the implementation
• Protect input parameters using call-by-value.
• Avoid global variables and functions with side effects.
• Make all temporary variables local to functions where they are used.
• Never halt or sleep in a function. Spawn a dedicated function if necessary.
• Avoid producing output within a function unless the sole purpose of the function is output.
• Where appropriate use return values to return the status of function calls.
• Avoid confusing programming tricks.
• Always strive for simplicity and clarity. Never sacrifice clarity of expression for cleverness of expression.
• Never sacrifice clarity of expression for minor reductions in execution time.

Analysis of Algorithms

Debugging and Tracing

Making use of the preprocessor can allow you to incorporate many debugging aids in your module, for instance, the driver module. Later, in the production version these debugging aids can be removed.

#ifdef DEBUG

#define TRACE_OPEN (debugging && 0x01)

#define TRACE_CLOSE (debugging && 0x02)

#define TRACE_WRITE (debugging && 0x08)

int debugging = -1; /* enable all traces output */

#else

#define TRACE_OPEN 0

#define TRACE_CLOSE 0

#define TRACE_WRITE 0

#endif

...

Analysis of Algorithms

Tracing: Later in the Program

Later, in the code the output of the trace information can be done by a manner similar to this:

printf(‘’Device driver read, Packet number (%d) \n’’,pack_no);

… <etcetera>…

Analysis of Algorithms

Checking Programs With lint (Unix)

The lint utility is intended to verify some facets of a C program, such as its potential portability. lint derives from the idea of picking the “fluff” out of a C program. It does this, by advising on C constructs (including functions) and usage which might turn out to be ‘bugs’, portability problems, inconsistent declarations, bad function and argument types, or dead code. See the manual section lint(1)for further explanations.

Analysis of Algorithms

Now, Lint’ing

\$ lint –hxa mytest.c

(8) warning: loop not entered at top

(8) warning: constant in conditional context

variable unused in function

(3) z in main

implicitly declared to return int

(10) printf

declaration unused in block

(5) duble

function returns value, which is always ignored

printf

Analysis of Algorithms

Test Coverage Analysis

Yet another tool born for execution tracing and analysis of programscalled tcov,it can be used to trace and analyze a source code to report a coverage test. tcov does this by analysing the source code step-by-step. The extra code is generated by giving the –xa option to the compiler command, i.e.,

\$ gcc -xa -o src src.c

The –xa option invokes a runtime recording mechanism that creates a .d file for every .c file. The .dfile accumulates execution data for the corresponding source file.

The tcov utility can then be run on the source file to generate statistics about the program. The following example source file, getmygid.c, is analysed as:

\$ cc -xa -o getmygid getmygid.c

\$ tcov -a getmygid.c

\$ ls –l getmy???*

-rwxr-xr-x 1 staff 25120 Feb 11 12:07 getmygid

-rw------- 1 staff 519 Sep 9 1994 getmygid.c

-rw-r--r-- 1 staff 9 Feb 11 12:07 getmygid.d

-rw-r--r-- 1 staff 1025 Feb 11 12:08 getmygid.tcov

Analysis of Algorithms

Example: getmygid.c

\$ cat getmygid.c

#include <stdio.h>

char *msg = "I am sorry I cannot tell you everything" ;

int gid,egid;

int uid,euid, pid ,ppid, i;

int main()

{

gid = getgid();

if (gid >= 0) printf("1- My GID is: %d\n", gid);

egid = getegid();

if (egid >=0 ) printf("2- My EGID is: %d\n", egid);

uid = getuid();

if ( uid >=0) printf("3- My uid is: %d\n", uid);

euid = geteuid();

if (euid >= 0) printf("4- My Euid is: %d\n", euid);

pid = getpid();

if ( pid >=0 ) printf("5- My pid is: %d\n", pid);

ppid = getppid();

if ( ppid >= 0) printf("6- My ppid is: %d\n", ppid);

prt_msg("We came to end!!!");

return 0;

prt_msg(msg);

}

prt_msg(char *mesg){

printf("%s \n", mesg);

}

Analysis of Algorithms

Tcov’ing getmygid.c

\$ cat getmygid.tcov

##### -> #include <stdio.h>

##### -> char *msg = "I am sorry I cannot tell you everything" ;

##### ->

##### -> int gid,egid;

##### -> int uid,euid, pid ,ppid, i;

##### -> int main()

##### -> {

2 -> gid = getgid();

2 -> if (gid >= 0) printf("1- My GID is: %d\n", gid);

2 -> egid = getegid();

2 -> if (egid >=0 ) printf("2- My EGID is: %d\n", egid);

2 -> uid = getuid();

2 -> if ( uid >=0) printf("3- My uid is: %d\n", uid);

2 -> euid = geteuid();

2 -> if (euid >= 0) printf("4- My Euid is: %d\n", euid);

2 -> pid = getpid();

2 -> if ( pid >=0 ) printf("5- My pid is: %d\n", pid);

2 -> ppid = getppid();

2 -> if ( ppid >= 0) printf("6- My ppid is: %d\n", ppid);

2 -> prt_msg("We came to end!!!");

2 -> return 0;

2 -> prt_msg(msg);

2 -> }

2 -> prt_msg(mesg)

2 -> char *mesg;

2 -> {

2 -> printf("%s \n", mesg);

2 -> }

Analysis of Algorithms

Tcov’ing getmygid.c

As shown, tcov(1) generates an annotated listing of the source file (getmygid.tcov), where each line is prefixed with a number indicating the count of execution of each statement on the line. Finally per line and per block statistics are shown.

Top 10 Blocks

Line Count

9 2

11 2

13 2

15 2

17 2

19 2

21 2

292

8 Basic blocks in this file

8 Basic blocks executed

100.00 Percent of the file executed

16 Total basic block executions

2.00 Average executions per basic block

Analysis of Algorithms

Have nice break!

Analysis of Algorithms

### Analysis of Algorithms

Input

Algorithm

Output

An algorithm is a step-by-step procedure for

solving a problem in a finite amount of time.

Running Time
• Most algorithms transform input objects into output objects.
• The running time of an algorithm typically grows with the input size.
• Average case time is often difficult to determine.
• We focus on the worst case running time.
• Easier to analyze
• Crucial to applications such as games, finance and robotics

Analysis of Algorithms

Experimental Studies
• Write a program implementing the algorithm
• Run the program with inputs of varying size and composition
• Use a function, like the built-in clock() function, to get an accurate measure of the actual running time
• Plot the results

Analysis of Algorithms

Limitations of Experiments
• It is necessary to implement the algorithm, which may be difficult
• Results may not be indicative of the running time on other inputs not included in the experiment.
• In order to compare two algorithms, the same hardware and software environments must be used

Analysis of Algorithms

Theoretical Analysis
• Uses a high-level description of the algorithm instead of an implementation
• Characterizes running time as a function of the input size, n.
• Takes into account all possible inputs
• Allows us to evaluate the speed of an algorithm independent of the hardware/software environment

Analysis of Algorithms

Example: find max element of an array

AlgorithmarrayMax(A, n)

Inputarray A of n integers

Outputmaximum element of A

currentMaxA[0]

fori1ton  1do

ifA[i]  currentMaxthen

currentMaxA[i]

returncurrentMax

Pseudocode
• High-level description of an algorithm
• More structured than English prose
• Less detailed than a program
• Preferred notation for describing algorithms
• Hides program design issues

Analysis of Algorithms

Control flow

if…then… [else…]

while…do…

repeat…until…

for…do…

Indentation replaces braces

Method declaration

Algorithm method (arg [, arg…])

Input…

Output…

Method/Function call

method (arg [, arg…])

Return value

returnexpression

Expressions

Assignment(like  in C++)

Equality testing(like  in C++)

n2 Superscripts and other mathematical formatting allowed

Pseudocode Details

Analysis of Algorithms

2

1

0

The Random Access Machine (RAM) Model
• A CPU
• A potentially unbounded bank of memory cells, each of which can hold an arbitrary number or character
• Memory cells are numbered and accessing any cell in memory takes unit time.

Analysis of Algorithms

Basic computations performed by an algorithm

Identifiable in pseudocode

Largely independent from the programming language

Exact definition not important

Assumed to take a constant amount of time in the RAM model

Examples:

Evaluating an expression

Assigning a value to a variable

Indexing into an array

Calling a method

Returning from a method

Primitive Operations

Analysis of Algorithms

By inspecting the pseudocode, we can determine the maximum number of primitive operations executed by an algorithm, as a function of the input size

AlgorithmarrayMax(A, n)

# operations

currentMaxA[0] 2

fori1ton 1do 2+n

ifA[i]  currentMaxthen 2(n 1)

currentMaxA[i] 2(n 1)

{ increment counter i } 2(n 1)

returncurrentMax 1

Total 7n 1

Counting Primitive Operations

Analysis of Algorithms

Algorithm arrayMax executes 7n 1 primitive operations in the worst case. Define:

a = Time taken by the fastest primitive operation

b = Time taken by the slowest primitive operation

Let T(n) be worst-case time of arrayMax.Thena (7n 1) T(n)b(7n 1)

Hence, the running time T(n) is bounded by two linear functions

Estimating Running Time

Analysis of Algorithms

Growth Rate of Running Time
• Changing the hardware/ software environment
• Affects T(n) by a constant factor, but
• Does not alter the growth rate of T(n)
• The linear growth rate of the running time T(n) is an intrinsic property of algorithm arrayMax

Analysis of Algorithms

Growth Rates
• Growth rates of functions:
• Linear  n
• Cubic  n3
• In a log-log chart, the slope of the line corresponds to the growth rate of the function

Analysis of Algorithms

Constant Factors
• The growth rate is not affected by
• constant factors or
• lower-order terms
• Examples
• 102n+105is a linear function
• 105n2+ 108nis a quadratic function

Analysis of Algorithms

Big-Oh Notation
• Given functions f(n) and g(n), we say that f(n) is O(g(n))if there are positive constantsc and n0 such that

f(n)cg(n) for n n0

• Example: 2n+10 is O(n)
• 2n+10cn
• (c 2) n  10
• n  10/(c 2)
• Pick c = 3 and n0 = 10

Analysis of Algorithms

Big-Oh Example
• Example: the function n2is not O(n)
• n2cn
• n c
• The above inequality cannot be satisfied since c must be a constant

Analysis of Algorithms

More Big-Oh Examples

• 7n-2

7n-2 is O(n)

need c > 0 and n0 1 such that 7n-2  c•n for n  n0

this is true for c = 7 and n0 = 1

• 3n3 + 20n2 + 5

3n3 + 20n2 + 5 is O(n3)

need c > 0 and n0 1 such that 3n3 + 20n2 + 5  c•n3 for n  n0

this is true for c = 4 and n0 = 21

• 3 log n + log log n

3 log n + log log n is O(log n)

need c > 0 and n0 1 such that 3 log n + log log n  c•log n for n  n0

this is true for c = 4 and n0 = 2

Analysis of Algorithms

Big-Oh and Growth Rate
• The big-Oh notation gives an upper bound on the growth rate of a function
• The statement “f(n) is O(g(n))” means that the growth rate of f(n) is no more than the growth rate of g(n)
• We can use the big-Oh notation to rank functions according to their growth rate

Analysis of Algorithms

Big-Oh Rules
• If is f(n) a polynomial of degree d, then f(n) is O(nd), i.e.,
• Drop lower-order terms
• Drop constant factors
• Use the smallest possible class of functions
• Say “2n is O(n)”instead of “2n is O(n2)”
• Use the simplest expression of the class
• Say “3n+5 is O(n)”instead of “3n+5 is O(3n)”

Analysis of Algorithms

Asymptotic Algorithm Analysis
• The asymptotic analysis of an algorithm determines the running time in big-Oh notation
• To perform the asymptotic analysis
• We find the worst-case number of primitive operations executed as a function of the input size
• We express this function with big-Oh notation
• Example:
• We determine that algorithm arrayMax executes at most 7n 1 primitive operations
• We say that algorithm arrayMax “runs in O(n) time”
• Since constant factors and lower-order terms are eventually dropped anyhow, we can disregard them when counting primitive operations

Analysis of Algorithms

Computing Prefix Averages
• We further illustrate asymptotic analysis with two algorithms for prefix averages
• The i-th prefix average of an array X is average of the first (i+ 1) elements of X:

A[i]= (X[0] +X[1] +… +X[i])/(i+1)

Analysis of Algorithms

• The following algorithm computes prefix averages in quadratic time by applying the definition

AlgorithmprefixAverages1(X, n)

Inputarray X of n integers

Outputarray A of prefix averages of X #operations

A new array of n integers n

fori0ton 1do n

sX[0] n

forj1toido 1 + 2 + …+ (n 1)

ss+X[j] 1 + 2 + …+ (n 1)

A[i]s/(i+ 1)n

returnA 1

Analysis of Algorithms

The running time of prefixAverages1 isO(1 + 2 + …+ n)

The sum of the first n integers is n(n+ 1) / 2

There is a simple visual proof of this fact

Thus, algorithm prefixAverages1 runs in O(n2) time

Arithmetic Progression

Analysis of Algorithms

Prefix Averages (Linear)

• The following algorithm computes prefix averages in linear time by keeping a running sum

AlgorithmprefixAverages2(X, n)

Inputarray X of n integers

Outputarray A of prefix averages of X #operations

A new array of n integers n

s 0 1

fori0ton 1do n

ss+X[i] n

A[i]s/(i+ 1)n

returnA 1

• Algorithm prefixAverages2 runs in O(n) time

Analysis of Algorithms

Computing Spans
• We show how to use a stack as an auxiliary data structure in an algorithm
• Given an an array X, the span S[i] of X[i] is the maximum number of consecutive elements X[j] immediately preceding X[i] and such that X[j]  X[i]
• Spans have applications to financial analysis
• E.g., stock at 52-week high

X

S

Analysis of Algorithms

Algorithmspans1(X, n)

Inputarray X of n integers

Outputarray S of spans of X #

S new array of n integers n

fori0ton 1do n

s 1n

while s i X[i - s]X[i]1 + 2 + …+ (n 1)

ss+ 11 + 2 + …+ (n 1)

S[i]sn

returnS 1

• Algorithm spans1 runs in O(n2) time

Analysis of Algorithms

Have nice break!

Analysis of Algorithms

Recursion

Recursion = a function calls itself as a function for unknown times. We call this recursive call

for (i = 1 ; i <= n-1; i++)

sum = sum +1;

int sum(int n) {

if (n <= 1)

return 1

else

return (n + sum(n-1));

}

Analysis of Algorithms

Recursive function

int f( int x )

{

if( x == 0 )

return 0;

else

return 2 * f( x - 1 ) + x * x;

}

Analysis of Algorithms

Recursion

Calculate factorial (n!) of a positive integer:

n! = n(n-1)(n-2)...(n-n-1), 0! = 1! = 1

int factorial(int n) {

if (n <= 1)

return 1;

else

return (n * factorial(n-1));

}

Analysis of Algorithms

Fibonacci numbers, Bad algorith for n>40 !

long fib(int n) {

if (n <= 1)

return 1;

else

return fib(n-1) + fib(n-2);

}

Analysis of Algorithms

Algorithm IterativeLinearSum(A,n)

Algorithm IterativeLinearSum(A,n):

Input: An integer array A and an integer n (size)

Output: The sum of the first n integers

if n = 1 then

return A[0]

else

while n  0do

sum = sum + A[n]

n  n - 1

return sum

Analysis of Algorithms

Algorithm LinearSum(A,n)

Algorithm LinearSum(A,n):

Input: An integer array A and an integer n (size)

Output: The sum of the first n integers

if n = 1 then

return A[0]

else

return LinearSum(A,n-1) + A[n-1]

Analysis of Algorithms

Iterative Approach: Algorithm IterativeReverseArray(A,i,n)

Algorithm IterativeReverseArray(A,i,n):

Input: An integer array A and an integers i and n

Output: The reversal of n integers in A starting at index i

while n > 1 do

swap A[i] and A[i+n-1]

i  i +1

n  n-2

return

Analysis of Algorithms

Algorithm ReverseArray(A,i,n)

Algorithm ReverseArray(A,i,n):

Input: An integer array A and an integers i and n

Output: The reversal of n integers in A starting at index i

if n > 1 then

swap A[i] and A[i+n-1]

call ReverseArray(A, i+1, n-2)

return

Analysis of Algorithms

Higher-Order Recursion

Making recursive calls more than a single call at a time.

Algorithm BinarySum(A,i,n):

Input: An integer array A and an integers i and n

Output: The sum of n integers in A starting at index i

if n = 1 then

return A[i]

return BinarySum(A,i,[n/2])+BinarySum(A,i+[n/2],[n/2])

Analysis of Algorithms

Kth Fibonacci Numbers

Analysis of Algorithms

kth Fibonacci Numbers

Linear recursion

Algorithm BinaryFib(k):

Input: An integer k

Output: A pair ( ) such that is the kth Fibonacci number and is the (k-1)st Fibonacci number

if (k <= 1) then

return (k,0)

else

(i,j)  LinearFibonacci(k-1)

return (i+j,i)

Analysis of Algorithms

kth Fibonacci Numbers

Binary recursion

Algorithm BinaryFib(k):

Input: An integer k

Output: The kth Fibonacci number

if (k <= 1) then

return k

else

return BinaryFib(k-1)+BinaryFib(k-2)

Analysis of Algorithms

Math you need to Review
• Summations
• Logarithms and Exponents
• Proof techniques
• Basic probability
• properties of logarithms:

logb(xy) = logbx + logby

logb (x/y) = logbx - logby

logbxa = alogbx

logba = logxa/logxb

• properties of exponentials:

a(b+c) = aba c

abc = (ab)c

ab /ac = a(b-c)

b = a logab

bc = a c*logab

Analysis of Algorithms

Relatives of Big-Oh

• big-Omega
• f(n) is (g(n)) if there is a constant c > 0

and an integer constant n0  1 such that

f(n)  c•g(n) for n  n0

• big-Theta
• f(n) is (g(n)) if there are constants c’ > 0 and c’’ > 0 and an integer constant n0  1 such that c’•g(n)  f(n)  c’’•g(n) for n  n0
• little-oh
• f(n) is o(g(n)) if, for any constant c > 0, there is an integer constant n0  0 such that f(n)  c•g(n) for n  n0
• little-omega
• f(n) is (g(n)) if, for any constant c > 0, there is an integer constant n0  0 such that f(n)  c•g(n) for n  n0

Analysis of Algorithms

Intuition for Asymptotic Notation

Big-Oh

• f(n) is O(g(n)) if f(n) is asymptotically less than or equal to g(n)

big-Omega

• f(n) is (g(n)) if f(n) is asymptotically greater than or equal to g(n)

big-Theta

• f(n) is (g(n)) if f(n) is asymptotically equal to g(n)

little-oh

• f(n) is o(g(n)) if f(n) is asymptotically strictly less than g(n)

little-omega

• f(n) is (g(n)) if is asymptotically strictly greater than g(n)

Analysis of Algorithms

Example Uses of the Relatives of Big-Oh

• 5n2 is (n2)

f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0  1 such that f(n)  c•g(n) for n  n0

let c = 5 and n0 = 1

• 5n2 is (n)

f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0  1 such that f(n)  c•g(n) for n  n0

let c = 1 and n0 = 1

• 5n2 is (n)

f(n) is (g(n)) if, for any constant c > 0, there is an integer constant n0  0 such that f(n)  c•g(n) for n  n0

need 5n02  c•n0  given c, the n0 that satisfies this is n0  c/5  0

Analysis of Algorithms