machine learning applications in biological classification of river water quality l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Machine Learning Applications in Biological Classification of River Water Quality PowerPoint Presentation
Download Presentation
Machine Learning Applications in Biological Classification of River Water Quality

Loading in 2 Seconds...

play fullscreen
1 / 16

Machine Learning Applications in Biological Classification of River Water Quality - PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on

Machine Learning Applications in Biological Classification of River Water Quality. Saso Dzeroski, Jasna Grobovic and William J. Walley 98419-548 조 동 연. Contents. Introduction Learning Rules for Biological Classification of British Rivers The Data The Experiment

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Machine Learning Applications in Biological Classification of River Water Quality' - gilon


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
machine learning applications in biological classification of river water quality

Machine Learning Applications in Biological Classification of River Water Quality

Saso Dzeroski, Jasna Grobovic and William J. Walley

98419-548

조 동 연

contents
Contents
  • Introduction
  • Learning Rules for Biological Classification of British Rivers
    • The Data
    • The Experiment
  • Analysis of Data about Slovenian Rivers
    • The Influence of Physical and Chemical Parameters on Selected Organisms
    • Biological Classification
  • Discussion
introduction
Introduction
  • Indicator Organisms (Bioindicators)
    • Given a biological sample, information on the presence and density of all indicator organisms present in the sample is usually combined to derive a biological index that reflects the quality of the water as the site where the sample was taken
  • Saprobic Index
    • The main Problem: subjectivity
  • The subjectivity introduced at intermediate levels can and should be minimized.
learning rules for biological classification of british river
Learning Rules for Biological Classification of British River
  • Data
    • 292samples of 80 benthic macroinvertebrates
    • Abundance of animals
      • 0: no members of the particular family
      • 1: 1-2
      • 2: 3-9
      • 3: 10-49
      • 4: 50-99
      • 5: 100-999
      • 6: more than 1000
    • Sparse matrix
    • Five classes
slide5
Experiments 1
    • Modified CN2 algorithm
      • Measure the relative information score
      • Use the m-estimate instead of the Laplace estimate
    • The rules were required to be highly significant (99%).
    • 15 difference values of m were tried (0, 0.01, 0.25., 0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024).
    • Criterion
      • Information Score
      • Accuracy
      • Smaller value of the parameter m
slide6
Result 1
    • 12 rules, m = 32
    • 83% accuracy on the training set, 75% information content
    • Each rule covered 25 examples and contained 5 conditions.
    • The expert’s conclusions confirmed the rules.
slide7
Experiment 2
    • The main criticism was that the rules use only a small number of taxa, whereas the expert takes into account the whole community.
    • Six additional attributes
      • MoreThan0, MoreThan1, …, MoreThan5
      • reflect the number of families
  • Result 2
    • 13 rules, m = 64
    • accuracy 84%, information content 80%
slide8
Experiment 3
    • 195 training example, 97 test example
    • Obvious performance improvement from the original to the extended problem.
analysis of data about slovenian rivers
Analysis of Data about Slovenian Rivers
  • Data
    • 4 years (1990 - 1993)
    • Biological samples are taken twice a year (summer, winter).
    • Physical and chemical analyses are performed several times a year for each sampling site.
    • 698 water examples
    • training (70% - 489 cases), test (30% - 209 cases)
slide10
The Influence of Physical and Chemical Parameters on Selected Organisms
    • From an ecological and water quality of view, these are important research topic.
    • Binary Classification: Present / Absent
    • Attributes
      • Plants: Hardness, NO2, NO3, NH4, PO4, SiO2, Fe, Detergents, COD, BOD
      • Animals: Temperature, PH, O2, Saturation, COD, BOD
slide12
Result
    • Accuracy: 66% - 85%
    • Information score: 23% - 50%
    • 10 - 20 rules for each taxa
    • The average rule length was less than 5 conditions.
    • Average rule coverage was 15 to 45 examples.
slide14
Biological Classification
    • 13 physical and chemical parameters
    • 27 bioindicators
    • 7 classes
    • The majority class comprises 339 of the 698 examples, thus the default accuracy is 48.6%.
discussion
Discussion
  • We have described several applications of rule induction in the domain of biological water quality classification.
    • The produced rules are transparent and can be easily understood by experts.
    • The induced rule contained valuable knowledge about the domain studied.
    • Machine learning techniques can be useful tools for classification and data analysis in the domain of river water quality and other ecological domains.