1 / 44

Programming with Data (PWD 2019) Revision Class

This revision class covers exam clarifications, material to be covered, and example questions. It also provides resources for download, assessment details, expectations, and tips for approaching exam questions.

keshia
Download Presentation

Programming with Data (PWD 2019) Revision Class

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming with Data (PWD 2019)Revision Class Tuesday, 7 May 2019 Stelios Sotiriadis Prof. Alessandro Provetti

  2. Agenda for today Part 1: • Exam clarifications and material to be covered Part 2: • Revision and example questions for exam

  3. Part 1 Clarifications for exam and material to be covered

  4. Preparing for exam • All material seen in class is available for download in: • Moodle PWD • Stelios web site for FT students: • https://www.dcs.bbk.ac.uk/~stelios/pwd2018/ • Alessandro web site for PT students: • http://www.dcs.bbk.ac.uk/~ale/pwd/pwd-calendar.html • Key chapters from book are available for download • Code samples: • https://www.dcs.bbk.ac.uk/~stelios/pwd2018/code/

  5. How PWD is going to be assessed? • The written examination is on: • Monday 10th of June 2019, 1:30-3:30pm • Please double check! • http://www.bbk.ac.uk/student-services/exams/timetable • There are FIVE questions in the exam paper • Answer only FOUR of the FIVE questions • Each question carries 25 marks

  6. What is expected? • To be able to critically analyse concepts taught in class • Explain a concept • To write code fragments (short)

  7. How to approach questions? • Be logical • Explain clearly the concepts based on your own words • No need to memorize definitions, but be able to simply explain a concept. • Answers will be evaluated based on the logic and critical understanding: • Arguments or code

  8. Answering coding questions i=0 for i in range(10): print(i) if i == 8: break i=0 for i in range(10): print(i) if i == 8: break Python indentation might be confusing for markers Provide clear code, make sure you don’t overthink it Think about indentation to make sure your program could work

  9. In practice • What is the role of Python? • Python will be used to answer a question • Is not examined per se • Ethics of Computing and Data Mining/Science? • Ethics material is not in the final exam. • Lecture 4 and Lecture 5 will not be examined • SQLite and Pandas will not be examined

  10. Topics to be examined Past exam papers GROUP 1 GROUP 2 GROUP 3 There will always be a question from the first group and one from the second group • Computational problems, cost estimates, timing of a function • Random numbers and their application in algorithms • Probabilities and how to estimate events • Gradient descent • Informal database specification (E-R models) • SQL: • Create tables, adding constraints and primary, foreign keys • Update a table • Select statements (SELECT/FROM/WHERE) • Graphs and matrices • Greedy and dynamic programming algorithms • Complexity classes, intractable problems and approximation

  11. Part 2 Revision and example questions for exam

  12. Class 1 Big O & sorting and searching

  13. O(2^n) O(n^2) O(nlogn) Lecture 1 O(n!) operations O(n) O(logn) O(1) elements Big O Computational costs Complete Question 1 of the revision class quiz

  14. Lecture 1 Example question: What are the differences between the following complexities: O(n^2), O(n) and O(logn), give an example of an algorithm for each complexity. [ 5 marks] O(n^2): bubble sort O(n): linear search O(logn): binary search O(nlogn): Merge sort • Sorting • Merge sort • Insertion sort • Tim sort • Sorts small pieces using insertion sort • Merges the pieces using merge sort • Searching • Complete question 2 of the revision class quiz • Linear search/Naive search:  operations • Advanced search (e.g. Binary search):  operations (divide and conquer)

  15. Class 2 Linear vs. Binary search & code benchmarking

  16. Lecture 2 • Example question: • Explain briefly the binary and interpolation search algorithms [5 marks] • Recursive algorithms • Recursive algorithm is the algorithm that calls it self! • Linear search • Binary search • Interpolation search

  17. Lab 2 import time defsome_algorithm(n): start=time.time() for i in range(n): #your actual algorithm elapsed=time.time()-start print(elapsed) some_algorithm(10) • Example question: • Give an example of how to measure a running time of an algorithm in Python [5 marks] • Complete question 3 of the revision class quiz • Benchmarking • To measure the time cost of an algorithm we use the computer’s clock to obtain an actual run time. • The program implements an algorithm that counts from 1 to a given number.

  18. Class 3 Secretary problem & probabilities

  19. Lecture 3 HHH HHT HTH HTT THH THT TTH TTT Probability (no heads!) 3/8 (I have 1 head) 1/8 (I have 2 heads) 0 1 2 3 • Optimal stopping strategy and the “Secretary problem” • Read the following: https://www.geeksforgeeks.org/secretary-problem-optimal-stopping-problem/ • Probabilities • What is the probability to toss a fair coin 3 times, and have a head each time? • We have: 2^3 = 8 options (all heads!) Value for X

  20. Lab 3 • Consult Joel Grus’ Data Science from Scratch for the ‘probability of having two baby girls example’ • What is the probability of a couple to have two girls conditional on the event “the first child is a girl?” • 1/2

  21. Class 4 More probabilities

  22. Example question: • We expect 5 yellow and 4 black cars to pass by, what is the probability that the second car to pass by is black, conditional on the event that the first car that passed by was yellow? [1 mark] Lab 4 • 4 out of 8 = 0.5 • Example question: 2 blues and 3 red balls are in a bag, what are the chances of getting a blue ball? [1 mark] • 2 in 5 = 0.4 • Probabilities continue… • Unconditional events • Conditional events • Probability definitions: • Pr[E,F] = Pr [E] * Pr[F] when events E and F are independent. • Pr[E,F] = Pr[F] * Pr[E|F] when events E and F are dependent. • Complete question 4 of the revision class quiz

  23. Class 5 Knapsack fractional vs. Knapsack 0-1

  24. Lab 5 n objects! (n=7) m is size of a bag (m=15) Objects: O 1 2 3 4 5 6 7 Profits: P 10 5 15 7 6 18 3 Weight: W 2 3 5 7 1 4 1 • The question is: How to fill the bag such that the profit is maximized? (Solution is in notes of Lab 5) • Greedy algorithms • Make the local choice that maximizes a local (easy to check) criterion in the hope that the thus-generated solution will maximise the global (costly to check) criterion. • Knapsack fractional

  25. Lab 5 • Example question: Describe the Knapsack 0-1 problem [10 marks] • Knapsack fractional vs Knapsack 0-1 • Knapsack 0-1 (Study lab 5 notes and the following): • Read the following: • https://www.geeksforgeeks.org/0-1-knapsack-problem-dp-10/

  26. Class 6 Database design and SQL, Gradient descent

  27. Lecture 6: E-R diagram • Example question: Create an E-R diagram for a company to handle customer orders for products. A customer can place orders for one or more items. [10 marks] • Complete question 5 of the revision class quiz

  28. Lecture 6: Example of SQL statements • Complete question 6 of the revision class quiz • Example question: Give an SQL statement to delete all cities from table ‘cities’ where in country is US [1 mark] • DELETE * from cities WHERE in_country = 'US’; [1 mark] Examples CREATE TABLE cities (city_id integer primary key, name varchar(24) not null, in_country varchar(2)); DELETE * FROM CITIES WHERE in_country = 'GB’ UPDATE actors SET birth_year=1974 WHERE actor_id=4; SELECTfilm_id,title,release_year FROM films WHEREruntime_minutes >= 100;

  29. Lab 6: Gradient descent A method to optimize a function, in our example minimize the error (mse) to find the best fit line! • Example question: Describe the gradient descent algorithm [15 marks]

  30. Lab 6: Gradient descent • Example question: Explain the differences between batch, mini-batch and stochastic gradient descent [5 marks] • Batch gradient descent: • Use all data, in Class6-grad_descent(ax+b).py we have 5 points • But what happens if we have 1000 or 1 billion points? • Algorithm becomes very slow! • Mini-batch • Instead of going over all examples, Mini-batch Gradient Descent sums up over lower number of examples based on the batch size. • Stochastic gradient descent • Shuffle the training data and uses a single randomly picked training example

  31. Class 7 More SQL and gradient descent

  32. Lecture 7: SQL examples Cont. • Two SQL statement styles in class: • MySQL (lecture notes) • Oracle (lab exercises) • Main differences are in CREATE and INSERT statements • Feel free to use the style you prefer

  33. Lecture 7: SQL examples Cont. • Create a table with foreign keys (example of lecture 7) • Select data from two tables SELECT * FROMactors,citiesWHEREactors.birth_place=cities.city_id; • Select country and sum of populations from cities grouping by country SELECTin_country, SUM(population) FROM cities GROUP BY in_country;

  34. Lab 7: Gradient descent:Class6-grad_descent(mx+b).py

  35. Class 8 Dynamic programming and Dijkstra algorithm, more SQL

  36. Lecture 8 • Example question: Explain the Dijkstra algorithm [10 marks] • Dynamic Programming • is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, • solving each of those subproblems just once, • and storing their solutions using a memory-based data structure (e.g. an array). • Dijkstra algorithm • Single source shortest path problem • Comparison of Divide and conquer, Greedy algorithms and Dynamic programming algorithms

  37. Lab 8: SQL Example question: Find the sum of salaries of all the employees of the same department. Answer: SELECT Dept, SUM(Salary) FROM Employee GROUP BY Dept; [1 mark] • Find the monthly salary of the employees named White. SELECT Salary / 12 as MonthlySalary FROM Employee WHERE Surname = 'White'; • Find the maximum salary among the employees who work in a department based in London. SELECT MAX(Salary) FROM Employee, Department WHERE Department.Dept = Department.DeptName AND Department.City = 'London';

  38. Class 9 P vs. NP & Transactional systems

  39. Lecture 9: P vs. NP problems • We want to find algorithms faster than the existing ones • E.g. for sorting from O(n2) (insertions sort) we went to O(nlogn) (merge sort). • Problems that need exponential times we need to make them solved faster • Methods need exponential times need to be solved in polynomial times. • NP is the class of all problems for which checking a putative solution costs poly-time. • Example: Travelling Saleperson Problem (mentioned FoC) • a closed tour of n cities, • Maximum W km/mi in O(2^n)

  40. Lab 9: Transactional systems • Example question: Explain the ACID properties (Atomicity, Consistency, Isolation, Durability) [5 marks] • SQL would rather generate errors than let you spoil the data • a rollback mechanism brings the DB back to its previous, consistent state. • ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures.

  41. Class 10 NP vs NP Hard

  42. Lecture 10: NP vs. NP Hard • Example question: Explain the differences between probabilistic and approximation algorithms [ 5 marks] • NP vs. NP Hard • What is NP Hard? • Probabilistic algorithms • Build algorithms using a ‘random’ element so as gain improved performance. • For some cases, improved performance is very dramatic, moving from intractable to tractable. • Approximation algorithms • An approximate algorithm is a way of dealing with NP-completeness for optimization problem. This technique does not guarantee the best solution. The goal of an approximation algorithm is to come as close as possible to the optimum value in a reasonable amount of time which is at most polynomial time.

  43. Quote of the day “Do or do not. There is no try.” Thank you and good luck!

More Related