1 / 12

Error Tolerance and Feature Selection for the Logical Analysis of Data

Error Tolerance and Feature Selection for the Logical Analysis of Data. Presenter: Kathryn Davidson University of Pennsylvania Mentor: Dr. Endre Boros RUTCOR. The Problem. a few tests on the patient experience with past patients limited acquired outside knowledge

edythe
Download Presentation

Error Tolerance and Feature Selection for the Logical Analysis of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Error Tolerance and Feature Selection for the Logical Analysis of Data Presenter: Kathryn Davidson University of Pennsylvania Mentor: Dr. Endre Boros RUTCOR

  2. TheProblem • a few tests on the patient • experience with past patients • limited acquired outside knowledge • large-scale laboratory experiments • gene mapping Before, doctors used a small amount of information for diagnoses: Now, available information is too large for humans to analyze:

  3. A Solution • Partition the data into two classes (for example, one healthy, one sick.) • Use these binary partitions to write a separating function which will place future patients into one of the two categories. The Catch • Medical data is subject to error. If we partition the data too strictly, we risk incorrect manipulation. Our Goal • We’ll try to incorporate large tolerance for error while producing the most useful formulas.

  4. Example data set:

  5. Red shows the positive data pointsBlue shows the negative data points

  6. Red shows the positive data pointsBlue shows the negative data points

  7. Another way to write this information: We can make binary columns cut off at each value that is halfway between a positive and negative point. This means that a full binary table for this data will have 4 (positives) x 3 (negatives) x 2 (attributes) = 24 columns

  8. What if we want to allow for error in the measuring of attributes? • The separation lines in the data graph will be surrounded by an “unsure” zone • Our binary chart starts to have missing pieces of information • More error means larger unsure zones and more missing information • How much error can we allow and still correctly separate the positive from the negative?

  9. If we allow an error of 0.6 for A1 and 26 for A2, our (reduced) binary table will look like this: Are we able to allow this much error and still arrive at a formula that correctly categorizes each positive and negative entry?

  10. Answer: No.Since rows a and g can no longer be distinguished, we cannot not have a reliable separating function Theorem: There exists a robust separating function if and only if for each pair  positive and  negative there is an index i such that ii , i  , and  i

  11. General Procedure: • Create the full binary table for a given input data • Find the combinations of attribute error (error vectors) that are maximal Example: {(infinity, 2.5), (0.7,16), (0.55,25.5), (0.05, infinity)}

  12. 3. For the most promising error vectors, create a binary table that involves only the columns that are relevant for distinguishing positive from negative with that error tolerance 4. Examine which attributes’ columns were used, how they were used, and how much error is allowed when we use them. References: [1] Boros, E., Hammer, P.L., Ibaraki, T., and Kogan, A. "Logical Analysis of Numerical Data," Math. Programming, 79, (1997), 163-190. [2] Boros, E., Ibaraki, T., and Makino, K., "Variations on Extending Partially Defined Boolean Functions with Missing Bits," June 6, 2000

More Related