Prediction of Project Problem Effects on Software Risk Factors

Prediction of Project Problem Effects on Software Risk Factors Yildiz Technical University ComputerEngineeringDepartment İstanbul

Content • Software Risk and Software Risk Management • Purpose • Related Works • Definition of Work • Ranking Risk Factor • Problem ImpactPrediction • ResultsandDiscussion • FutureWork • References

Software Risk • Software risk can be defined as uncertainty and loss in software project process • Risks can be • Unspecifiedrequirements • UnclearScope • Unrealistic time and cost estimates • Real-time performanceshortfalls • Personalshortfalls • Not managingchangeproperly

Software Risk Management • Software Risk Management steps • Identification • Estimation • Refinement • Mitigation • MonitoringandMaintanence • SRM has one objective to reduce the harm due to risks • Risk management benefits group under two categories: direct and indirect benefits • Direct (Primary) • People, product, cost • Indirect ( Secondary) • optimization, pragmatic decision making, better process management and alternative approaches

QualitativeandQuantitativeMethods • Different kinds of methods in risk management techniques have allowed some software organizations to make comment about software risks • First one is qualitative methods: brainstorming, Swot analysis, maps, checklist and interviews. These methods are formed and evaluated by experts to select the most important risks • Second approach contains quantitative methods: decision trees, Monte Carlo analysis, Borda voting method,etc. Quantitative methods help project managers to give priority to risk in terms of their occurrence probability and potential impact

Purpose • Problems at software project can also have values on risk factors so evaluation of the values can give some clues about importance of problems for software projects. • Turkcell ICT data set is used in thiswork

Purpose • Turkcell ICT data set thatconsists of • 384 software problems that each problem has values on eachsix risk factorsseparately • Problemsalso has a severityvalue • Purposes • predict impact of problems with using values of problems on risk factors • rank risk factors according to their distinctiveness for determining the risk factors that are more valuable for prediction of problem impact

Related Works • Hu andHuanghave 120 software projectresultsand 64 risk factorsthataredefinedforeach software projects • 100 projectsareused in trainingphase, 20 software projectsareused in testingphase • Theyused SVM, MLP andmodified MLP forpredictingresults of software projectsthathavevalues on each 64 risk factors • Predictionresults ( classificationresults) arelisted in terms of accuracyvalues • SVM ~ %80 acc. • MLP ~%70 acc. • Modified MLP ~%85 acc.

Related Works • Hu andZhangusethesame data set tocompareclassifiersforpredictingresults of software projectsandtheydiscoveredthat SVM is betterthanotherclassifiers • TangandWangusefuzzylogicforprediction • KlainandPleetusetheirown software data setsforgatheringpredictionresults

Definition of OurWork • Turkcell ICT data sets • 384 problems • Each problem has values on sixdifferent risk factor • Each problem has oneseverityvalue ( classlabel) • Eachproblems can be indicated as tuple • Each risk factor can be indicated as feature in data set • Soif a new problem occursduringthe software developmentprocess, we can predicttheseverityvalue of projectsaccordingtotheitsvalues on risk factors

Definition of OurWork • SRM can strugglewithlots of risk factorssosome of thesefactorsaremoredistinctivethanother risk factors • FeatureSelectionmethods can rank risk factorsbyusingtheirdistinctiveness of classspecification • Turkcell ICT data set has onlysix risk factors but weintendtofindoutwhichonesareimportantforspecifyingseverity of problems

Risk FactorsandSeverity (Dataset)

Ranking Risk Factors • Inthiswork, ourfirstaim is torank risk factors • 2 FeatureSelectionmethodsareusedforranking risk factors • Information Gain: IG can be described as power of the feature information that dispels uncertainty of the class label. High information gain value for a feature means that the feature is important of identifying class label. • ChiSquaredStatistics: dependency of the feature and the specified class. Higher chi-square statistics of feature shows higher relation between this feature and specified class

Ranking Risk Factors

Problem ImpactPrediction • Turkcell data set supply problem severity values so prediction of problem impact become a classification problem. Data set has six features (six risk factors) and each problem has a class label (severity value) • 10 fold cross validation evaluation technique is used to get accuracy values in classification phase. 10 fold cross validation evaluation technique splits data set into ten parts randomly then it uses nine part to build training model and one part is used as test data. It is repeated ten times to get all classification test results • Classification performances of classifiers are measured by using Precision, Recall and F-measure values. F-measure is harmonic mean of recall and precision so it is a reliable measurement.

Problem ImpactPrediction • We selected these five classification methods because of their popularity and success rate. • NaiveBayes (NB), • SupportVectorMachines (SVMs), • DecisionTree (J48), • MultilayerPerceptrons (MLPs) • k- NearestNeighbour (kNN) • TurkcellICT data set is not a homogenous data set. Class label of 350 problems is “High”, class label of 30 problems is “Medium” and class label of 4 problems is “Low” so if a classifier predict class label of all 384 problems as a “High”, it gets 91.6 percent accuracy so we also used Kappa statistic for classification evaluation. Kappa statistic takes into account chance factor so it is important to evaluate the results according to Kappa statistic

Problem ImpactPrediction

Problem ImpactPrediction • The highest F-measure value, 97.5 percent, is obtained from MLPs classifier. MLPs also give highest Kappa statistic than other classifiers. It classified 376 problem severity values correctly • The result of SVMs follows results of MLPs • Numberof correctly classified instances by J48 classifier is less than number of correctly classified instances by kNN and NB classifiers but Kappa statistic and F-measure value of J48 is higher than Kappa statistic and F-measure values of kNN and NB

Problem ImpactPrediction • F-measure and Kappa statistic are more reliable for non-homogenous data sets in classificationso J48 predictionperformance is betterthan NB andkNN

ResultsandDiscussion • In our study,we intend to find out which risk factor is more distinctivefor risk management and we also want to predict impact of the problems with using Turkcell ICT data set • Experimental results show that Regulation Effect and Financial Effect are more distinctive than other risk factors. Problem values on these two factors are more informative for prediction of problem impact. • With the same logic “Employee Effect” and “Brand Effect” are less distinctive than other risk factors to predict the problem severity

ResultsandDiscussion • Our second target is to predict impact of software project problems with using problem values on risk factors. Each problem has six values on six risk factor and these values indicate severity of problem. In our project we have 384 problems and six risk factors as data set. Five different classification methods are used for prediction of problem impact • 97.5 percent F-measure value to predict impact of the problems. The highest classification success rates are obtained from MLPs and SVM classifiers. These results are similar to related works that indicates SVM and MLPs success rates are higher than other classifiers success rates.

FutureWork • For the future we want to obtain more data than ours to compare our results. In our study we only use one project problems but we can get more general results with using lots of different software project problems and lots of risk factors. However, classes of our data set don’t have uniformly distribution and Turkcell ICT data set isn’t homogenous data set so using homogenous data set can be very useful to generalize. In future also we can also increase number of classifiers to make better comparison

References • K.. Wiegers, Know your enemy: software risk management.Software Development,1998, 6, 38-44. • C. R. Pandian, Applied Software Risk Management A Guide for Software Project Managers, Auerbach Publications(T.&F.Group), 2007 • R. S. Pressman, D. Ince, Software engineering: a practitioner's approach (Vol. 5). New York: McGraw-hill., 1992 • J. McManus, Risk management in software development projects. Elsevier Butterworth-Heinemann, 2004. • R.N. Charette, “Large-scale project management is risk management”, IEEE Software, vol. 13, Jul. 1996, pp. 110-117. • S. McConnell, “Software Project Survival Guide”, Microsoft Press,1997. • IRM,A Risk Management Standard Published by AIRMIC, ALARM, 2002. • CMMI Product Team, Capability Maturity Model Integration (CMMISM),Version 1.1, March 2002. • Y. Hu, J. Huang, J. Chen, M. Liu, K. Xie, "Software Project Risk Management Modeling with Neural Network and Support Vector Machine Approaches", International Conference on Natural Computation, 2007 • A.S.Klair, R.P.Kaur, “Software Effort Estimation using SVM and kNN” International Conference on Computer Graphics, Simulation and Modeling, 2012, Pattaya (Thailand) • Y. Hu, X. Zhang, X. Sun, M. Liu, J. Du “An Intelligent Model for Software Project Risk Prediction”, International Conference on Information Management, 2009 • A.Tang,R.Wang, “Software Project Risk Assesment Model Based on Fuzzy Theory”, International Conference on Computer and Communication Technologies in Agriculture Engineering, 2010 • T. M. Mitchell, Machine Learnig, McGraw Hill, 1997 • R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, Wiley-Interscience , 2000 • http://www.cs.waikato.ac.nz/ml/weka

Prediction of Project Problem Effects on Software Risk Factors