1 / 19

A fuzzy clustering approach to improve the accuracy of Italian students’data

A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores Claudio Quintano, Rosalia Castellano, Sergio Longobardi UNIVERSITY OF NAPLES “PARTHENOPE”

bette
Download Presentation

A fuzzy clustering approach to improve the accuracy of Italian students’data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores Claudio Quintano, Rosalia Castellano, Sergio Longobardi UNIVERSITY OF NAPLES “PARTHENOPE” claudio.quintano@uniparthenope.it lia.castellano@uniparthenope.it sergio.longobardi@uniparthenope.it

  2. OUTLINE This work considers data on students’ performance assessments collected by the Italian National Evaluation Institute of the Ministry of Education (INVALSI) • 5 SCHOOL LEVELS • 2th and 4th year of primary school • 1th year of lower secondary • 1th and 3th year of upper secondary THE INVALSI SURVEY 3 AREAS reading, mathematics and science • OUTLIER UNITS, at class level, which brings to biased distributions of the average scores by class • The AIM is to MITIGATE THE PRESENCE of outliers and correcting the overestimation of children ability

  3. DISTRIBUTIONS OF MEAN SCORES AT CLASS LEVEL (MATHEMATICS ASSESSMENT) MATHEMATICS CLASS MEAN SCORE - S.Y 2004/05 I CLASS LOWER SECONDARY SCHOOL III CLASS UPPER SECONDARY SCHOOL I CLASS UPPER SECONDARY SCHOOL II CLASS PRIMARY SCHOOL IV CLASS PRIMARY SCHOOL

  4. CLASS MEAN SCORE Reading s.y. 2004/05 Mathematics s.y. 2004/05 Science s.y. 2004/05 II CLASS - PRIMARY SCHOOL Reading s.y. 2005/06 Mathematics s.y. 2005/06 Science s.y. 2005/06

  5. STEP I Deletion of micro units –students- considered as “PSEUDO NON RESPONDENTS” Students who haven’t given the minimum number of answers to compute a performance score The presence of these units varies from 9% to 16%

  6. SUMMARY COMPUTATION OF CLASS LEVEL INDICATOR For each student class the following indexes are computed: Class mean score Standard deviation of mean score Class non response rate Index of answers’ homogeneity Class mean score : At first step the micro units considered as “pseudo-non respondents” have been dropped from dataset then the following indexes, at class level, are computed: Class non response rate Index of answers’ homogeneity Standard deviation of mean score NUMBER BOTH OF ITEM NON REPSONSES AND OF INVALID RESPONSES FOR THE ITH STUDENT OF THE JTH CLASS SCORE OF ITH STUDENT OF JTH CLASS GINI MEASURE OF HETEROGENEITY COMPUTED FOR EACH STHTEST QUESTION ADMINISTERED TO EACH STUDENT OF JTH CLASS NUMBER OF ADMINISTERED ITEMS TO JTH CLASS NUMBER OF RESPONDENT STUDENTS OF JTH CLASS NUMBER OF RESPONDENT STUDENTS OF JTH CLASS

  7. PRINCIPAL COMPONENT ANALYSIS (PCA) By the PCA we are able to describe the answer behaviour of each student class through two variables FIRST Component SECOND Component Class non response rate INDEX OF CLASS COLLABORATION TO SURVEY OUTLIERS IDENTIFICATION AXIS CONTRAPOSITION

  8. PRINCIPAL COMPONENT ANALYSIS (PCA) It is possible to detect, graphically, the outlier classes of students Projection on the first two factorial axes plane of second class primary students OUTLIER CLASSES

  9. THE FUZZY K-MEANS APPROACH On the basis of the two factorial dimensions the students’classes are classified in 8 clusters by a FUZZY K-MEANS algorithm Computation of fuzzy partition matrix where for each students’ class (rows of the matrix) the degree of belonging to each cluster (columns of the matrix) is computed

  10. DETECTION OF OUTLIERS High negative scores on “outliers identification axis” (x-axis) that indicates a high class average scores and minimum within variability respect to scores and test answers OUTLIER CLUSTER Projection of centroids computed by fuzzy k-means Factorial scores close to zero respect to the “index of class collaboration to survey”

  11. DETECTION OF OUTLIERS Indicating with “a” the outlier cluster, the degree of belonging to this cluster is:µja This measure is considered as the “outlier probability” of jth class Otherwise it can be interpreted as the “outlier level” of each class

  12. CORRECTION PROCEDURE On the basis of the outlier cluster degree, a weighting factor is developed: Wj varies from 0 to 1 The students’ class with high probability to belong to outlier cluster will have a low weight while the class very far from this cluster will have a weight close to 1 Weighting factor Outlier probability Wj =1 - µja

  13. EFFECTS OF THE CORRECTION PROCEDURE ADJUSTED DISTRIBUTION ORIGINAL DISTRIBUTION

  14. THE INSPIRATION PRINCIPLE OUTLIER Go over the dichotomous logic NOT OUTLIER Compute an “OUTLIER LEVEL” measure for each unit to calibrate the correction FUZZY APPROACH

  15. RELATIONSHIP BETWEEN THE SCHOOL LOCALIZATION AND THE PRESENCE OF OUTLIER CLASSES Box plot of outlier level µja Degree to belonging to the outlier cluster (cluster n.2)

  16. RELATIONSHIP BETWEEN THE SCHOOL LOCALIZATION AND THE PRESENCE OF OUTLIER CLASSES CLASS AVERAGE SCORE DISTRIBUTIONS ONLY FOR THE NORTHERN AND CENTRAL REGIONS

  17. REGIONAL SCORES NOT WEIGHTED AVERAGE WEIGHTED AVERAGE

  18. Index of answers’ homogeneity Index of answers’ homogeneity The mean of the Q Gini indexes (Esj)computed for each sth test Question administered to each student of jth class: Where Esjis a Gini measure of heterogeneity: denotes the ratio of students of jth class that has given the tth answer to sth question The Gini measure is equal to zero when all students of jth class have given the same answer to the sth question. It reaches the maximum value: h-1/h (h is the number of alternative answers to question sth) when there is perfect heterogeneity of answers to sth question in the jth class

  19. EFFECTS OF THE CORRECTION PROCEDURE

More Related