Data Collection and Analysis ( 資料收集與分析 )

Data Collection and Analysis(資料收集與分析) By C. L. Hsieh Department of Industrial Management Aletheia University

Introduction (介紹) • “You can observe a lot just by watching”(你可以只以視覺方式來觀察許多數據) • Data gathering results a conceptual model of how the system operated (資料收集可以產生一個觀念式模型來解釋系統是如何運作的) • Data gathering should avoid ending up with lots of data but with very little useful information (資料收集應避免最後留下一堆資料但無太多有用的資訊) Data Collection and Analysis

Questions for Data Gathering(資料收集的可能問題) • What is the best procedure to follow? (資料收集的最佳程序為何?) • What types of data should be gathered? (哪些資料種類應被收集?) • What sources should be used ? (資料的來源為何?) • What types of analyses should be performed on the data? (資料應進行何種分析?) • How do you select the right probability distribution representing the data? (如何找出代表資料的分配?) • How should data be documented? (資料如何文件化?) Data Collection and Analysis

Guidelines for Data Gathering(資料收集應注意事項) • Identify triggering events:(了解啟動活動原因) • identify the causes or conditions that trigger the activities，e.g. the causes of downtime: failure, idle, unavailability of stock…… (了解啟動活動的原因或狀況，如停機原因：機器故障、閒置、缺貨…… ) • Look for common grouping(分群以化簡資料) • the solution is to reduce the data to common behaviors and patterns (化簡資料至一般行為與樣式) • Identify general categories (確定一般性分類) Data Collection and Analysis

Guidelines for Data Gathering(資料收集應注意事項) • Focus on key impact factors (處理主要影響因子) • Avoid little impact information (e.g. off-hour performance, extremely rare downtime, negligible move time..) (避免影響性小的因素，如加班特例、罕見的機器故障、可忽略的移動時間.) • Separate input variables from response variables (區分輸入變數與回應變數) • Input variables define how the system works(輸入變數決定系統運作) • Response variables do not “drive” model behavior Data Collection and Analysis

Guidelines for Data Gathering (資料收集應注意事項) • Focus on essence rather than substance • Capture cause-effect relationships and ignore meaningless details (著重因果關係，忽略細節) • Focus on the activity of using resources or the delay of entity flow (system abstraction) (重視系統抽象層面) • Isolate actual activity times.(確定真正活動時間) • Exclude any extra time waiting (排除額外等候時間) Data Collection and Analysis

Steps to Gathering Data (資料收集的步驟) • Determine data requirements (決定資料需求) • Identify data resources (確定資料來源) • Collect the data (收集資料) • Make assumptions (建立假設) • Analyze the data (收集資料) • Document and approve the data (資料文件化與驗證) Data Collection and Analysis

Determining Data Requirements(決定資料需求) • Structural data (結構型資料) • All the objects in the system to be modeled (系統中被建模的所有物件) • Describe the layout of the system (結構型資料描述系統的佈置情形) • Identify the items to be processed (e.g. entities, resources, locations….) (結構型資料確定被處理的項目，如實體、資源、工作站等) Data Collection and Analysis

Determining Data Requirements(決定資料需求) • Operational Data (作業型資料) • Explain how the system operates (解釋系統如何運作) • When, where and how events & activities take place (解釋事件與活動發生的方式、地點與時間) • Consist of the logic information about the system, e.g. routing, schedules, downtime behavior and resource allocation. (說明系統中的運作邏輯、如路線、排程方式、故障方式、資源分派方式) Data Collection and Analysis

Determining Data Requirements(決定資料需求) • Numerical Data (數值型資料) • Provide quantitative information of the system (提供系統的數量資料) • Some are easy to get but some are not (有些容易獲得，但有些並不容易獲得) • e.g. capacities, arrival rates, activity time… (如工作站容量、到達率、活動時間等 ) Data Collection and Analysis

Determining Data Requirements(決定資料需求) • Use of a Questionnaire (sample see p.103) (使用問卷，樣本請見第103頁) • Questionnaire help gathering right information (問卷可以幫忙獲得正確資料) • If sample data are not available, it is useful to get at least estimate of the minimum, most likely, and maximum value until more precise data obtained. (若樣本資料無法獲得，至少要得到最差、最可能、最佳等三類估計值直到較佳資料獲得為止) Data Collection and Analysis

Identifying Data Sources (確定資料來源) • Good sources of data (好的資料來源) • Historical Records (歷史資料，如生產量，銷售量…) • System Documentation (系統文件，如生產計劃，設施規劃..) • Personal Observation (個人觀察，如工作取樣,時間動作研究..) • Personal Interviews (訪談，如作業方法,修理程序,排程..) • Comparison with similar systems (與相似系統比較) • Vendor claim (零售商意見，如處理時間，新機器可靠度..) • Design estimation (設計過程之估計值，處理時間,搬運時間..) • Research literature (文獻探討..) Data Collection and Analysis

Collecting the Data(資料收集) • Defining Entity Flow (定義實體流) • Entity flow establishes a skeletal framework for additional data be attached (實體流可以建立大綱式的架構) • Follow the entity movement (實體流按實體移動路線定義) • Use Entity flow diagram (EFD) (使用實體流程圖) • Difference between Entity flow diagram & Process Flowchart (程序流程圖) Data Collection and Analysis

Collecting the Data(資料收集) • Difference between Entity flow diagram & Process Flowchart (實體流程圖與程序流程圖之區別) Data Collection and Analysis

Developing A Description of Operation (作業流程描述) • Description of Operation (作業流程描述) • Explain how entities are processed & provides the details of the EFD (解釋實體如何處理並提供EDF細節) • Requirements (需求項目) • Time & resource requirements of the activity or operation (活動或作業的時間或資源需求) • Where, when & in what quantities entities get routed next (實體於何地，何時並以何種數量前進至下站) • Time & resource requirements for moving to the next location (移動至下一站的活動或作業的時間或資源需求) Data Collection and Analysis

Entity Flow Diagram for Patient Processing(病患處理過程之實體流程圖) Data Collection and Analysis

Process Description for Patient Processing (病患處理過程之過程敘述表) Data Collection and Analysis

Defining Incidental Details(定義附帶的細節) • Incidental data (downtimes, setups & work priority) are not essential but necessary in order to have a complete & accurate model (附帶的細節非必要，如故障時間、裝置時間、工作優先順序等，但昰若要完成一個正確模式是有必要的) • Once a basic model constructed, any numerical values (e.g. activity time, arrival rates ..) should be firmed up (一旦基本模型已建立，任何數值資料如活動時間、到達率等應被強化 ) Data Collection and Analysis

Making Assumptions (建立假設) • Simulation can’t run with incomplete data, soassumptions are required for any unknown future conditions (模擬無法執行不完全資料，對於未確定狀況應建立假設) • Assumption must make sense in the overall operation of the model. Seeing absurd behavior may tell us that certain assumptions don’t make sense (建立假設應合理，異常行為的發生可能是假設不合理) Data Collection and Analysis

Making Assumptions (建立假設) • Simulation can’t run with incomplete data, soassumptions are required for any unknown future conditions (模擬無法執行不完全資料，對於未確定狀況應建立假設) • Assumption must make sense in the overall operation of the model. Seeing absurd behavior may tell us that certain assumptions don’t make sense (建立假設應合理，異常行為發生可能是假設不合理) Data Collection and Analysis

Making Assumptions (建立假設) • Sensitivity analysis assess the influence of an assumption on the validity of a model. (敏感性分析可以用來評估假設對模型的影響) • Best or most optimistic case (最樂觀情形) • Worst or most pessimistic case (最悲觀情形) • Most likely or best guess case (最可能情形) Data Collection and Analysis

Statistical Analysis of Numerical Data (數值資料統計分析) • Data should be analyzed to ascertain their suitability for use. (資料應分析使用適合度) • Data characteristics: (資料特徵) • Independence (randomness) (獨立性或隨機性) • Homogeneity (data from the same distribution) (齊一性：是否來自相同分配) • Stationary (distribution of data no change over time) (穩定性：資料分配是否隨時間改變) Data Collection and Analysis

Statistical Analysis of Numerical Data (數值資料統計分析) • Stat::Fit in Promodel can automatically analyze & test data in a simulation (Stat:Fit可以自動分析與測試模擬中的資料) • Parameters (常見統計參數) • Mean (平均數) ─ the average of the data • Median (中位數) ─the value of middle observation • Mode (眾數) ─ the value with greatest frequency Data Collection and Analysis

Descriptive Statistics(敘述統計) • Parameters (常見統計參數) • Standard Deviation (標準差) ─ measure of average deviation • Variance (變異數) ─ the square of standard deviation • Coefficient of variation (變異係數) ─ standard deviation divided by mean • Skewness (偏態) ─measure of symmetry • Kurtosis (峰態) ─measure of flatness or peakedness Data Collection and Analysis

Suitability of Data for Use (資料適合度) • Test for Independency (獨立性檢定):Data are independent if the value of one observation is not influenced by the value of another observation (資料為獨立若其一觀察值不受其他觀察值的影響) • Test for Homogeneity (齊一性檢定)：data from the same distribution(資料來自相同分配) • Test for Stationary Data (穩定性檢定)：distribution of data does not change over time (資料分配不隨時間改變而改變) Data Collection and Analysis

Test for Independency(檢定資料獨立性的方法) • Scatter Plot (分散點圖) • A plot of adjacent points in the sequence of observed values plotted against each other • A pair of consecutive observations (Xi, Xi+1), i=1,..,n-1 (一連串連續觀察值) • Xi’s Positively correlated (正相關) → positively sloped trend line (正斜率直線) • Xi’s Negatively correlated (負相關) → Negatively sloped trend line (附斜率直線) Data Collection and Analysis

Test for Independency (檢定資料獨立性的方法) • Autocorrelation Plot (自相關性) • If observations in a sample are independent, they are uncorrelated. (若觀察值獨立則不相關) • Assume that data are taken from stationary process • The measure of autocorrelation is called rho (ρ) (see, p. 104) (自相關測量值稱為ρ) • Autocorrelation ρ is between [-1,1]. (-1<= ρ<=1) • If ρ is near either extreme 1 or -1, the data is auto-correlated. (ρ越靠近1 or -1，則自相關越強) • If ρ is near 0, the data is little or unrelated (ρ越靠近0，則相關性越弱) Data Collection and Analysis

Test for Independency(檢定資料獨立性的方法) • Runs Test (執行測試) • A run in a series of observations is the occurrence of an uninterrupted sequence of numbers showing the same trend e.g run “up” or “down” ; (顯現相同趨勢之序列，如向上或向下走勢) Data Collection and Analysis

Test for Independency(檢定資料獨立性的方法) • Types of runs tests: if there are too many or too few, the randomness of the series is rejected. (趨勢出現次數過多，則應棄卻隨機性假設) • Median Test (中位數檢定法): measure the number of runs (sequences of numbers) above and below the median • Turning Point Test(轉折點檢定法): measure the number of times the series changes directions Data Collection and Analysis

Test for Homogeneity(齊一性檢定) • Test for Identically Distributed Data): Test if data set come from the same distribution. (同一分配檢定) • Examples of non-homogenous data set (非齊一分配資料項實例) • Activity times that take longer or shorter depending on the type of entity being processed (活動時間隨實體改變) • Inter-arrival times vary in length depending on the time of the day or week (到達間隔時間隨時間改變而改變) Data Collection and Analysis

Test for Homogeneity(齊一性檢定) • Visually inspect the distribution to see if it has more than one mode (眾數) (p.118 Fig. 5.9) (可以使用視覺觀察是否有一個以上的眾數) • Analysis of variance (ANOVA) for normally distributed data (以變異數分析決定是否為常態分配資料) • Two-Sample test, Chi-square multi-sample test, Kruskal-Wallis non-parametric test…. (無母數分析法) Data Collection and Analysis

Test for Homogeneity(齊一性檢定) • One type of nonhomogenous data occurs when the distribution changes over time • Example of time-changing distribution (隨時間而變之分配) • Learning Curve (學習曲線) • Non-stationary or time variant (據時間變異性 • Arrival rate of customers to a service facility (顧客到率) Data Collection and Analysis

Approaches for Stationary Data (穩定性測試) • Non-stationary data can be detected by plotting subgroups of data that occur within successive time intervals (Fig 5.10) (非穩定性可以依連續時間區間描點資料子群組來觀察) • Run Stat::Fit and see what distribution best fits each data set. If the same distribution fits both, the same population is assumed (執行Stat::Fit 檢查何種分配適合資料集) Data Collection and Analysis

Distribution Fitting(分配配對) • Three ways of Data Representation (資料表示法) • Original data record (原始資料) • The data set is usually not large enough • Empirical distribution (characterize data) (次數分配) • Continuous frequency distribution (次數分配): the percentage of values that fall within given intervals (數值落在特定區間之比例) Data Collection and Analysis

Distribution Fitting(分配配對) • Empirical distribution (characterize data) • Discrete frequency distribution: the percentage of times a particular value occurs. (特定值出現之次數的比例) • Drawbacks (缺點) • Insufficient sample size may create artificial bias (樣本數太少) • Fail to capture rare extreme values that may exist in the population from which they were sampled (無法解釋少數臨界值) Data Collection and Analysis

Distribution Fitting(分配配對) • Theoretical distribution (理論分配) • Fitting theoretical distribution to the data (找出一個適當的理論分配) • Random variates (generated from the probability distribution provide the simulated random values. (由亂數產生程式產生隨機變量) Data Collection and Analysis

Distribution Fitting (分配配對) • Theoretical distribution(理論分配) • Fitting a theoretical distribution to sample data smoothes artificial irregularities (理論分配緩和資料不規則行為) • Ensure extreme values are includes (不排除臨界點) • Most simulation software provide utilities for fitting distributions to numerical data (大多數模擬軟體提供公用程式來配對數值資料) Data Collection and Analysis

Theoretical Distribution (理論分配) • Uniform Distribution (均一分配) (see p. 124) • X~U(a,b) with EX=(a+b)/2, VarX=(b-a)^2/12 • Used as a “first” model that is felt to be randomly varying between a & b which little else is known (當已知資料訊息極少時，可以當做第一個模式) Data Collection and Analysis

Theoretical Distribution (理論分配) • Triangular Distribution (三角分配)(see p. 124) • X~Triang(a,m,b) with EX=(a+m+b)/3, VarX=(a^2+m^2+b^2-am-ab-bm)/18 • Used as a rough model and good approximation to use in the absence of data (當已知資料不足時，可以當做一個粗糙模式) Data Collection and Analysis

Theoretical Distribution(理論分配) • Normal Distribution (常態分配)(see p. 125) • X~N(μ,σ2) with EX=μ, VarX= σ2 • Symmetry (Bell-shaped curve) (對稱鐘型曲線) • Physical measurements – height, length… (實際測量值) • Certain activity time (特定活動時間) Data Collection and Analysis

Theoretical Distribution(理論分配) • Poisson Distribution (普瓦松分配) (p. 126) • X~Po(λ) with EX= λ, VarX= λ • Used as numbers of events that occur in an interval of time when the events are occurring at a constant (特定時間區間事件發生數) • e.g. # of items in a batch of random size (一批隨機大小產品的個數) • e.g. # of items demanded from an inventory (訂單需求量) Data Collection and Analysis

Theoretical Distribution(理論分配) • Exponential Distribution (指數分配)(p. 126) • X~Exp(μ) with EX= μ, VarX= μ2 • Used frequently in initerarrival times of “customers” to a system that occur at a constant rate or time to failure of a piece of equipment (顧客到達系統的間隔時間或機器故障時間) • If an occurrence happens at a rate of Po(λ), the time between occurrences is Exp (1/ λ) (若一事件以Po(λ)到達率到達，則到達間隔時間為Exp (1/ λ)) • Exp(μ) is memory-less (help for events occurred independently of one another) (指數分配具有無記憶特性) Data Collection and Analysis

Theoretical Distribution(理論分配) • Gamma Distribution (伽瑪分配) • X~Gamma(α,β) with EX = αβ , with VarX=αβ2 • Used as time to complete some tasks, e.g. customer service or machine repair. (完成工作之時間，如顧客處理時間,機器修理時間) • Distribution of a random proportion, e.g. the proportion of defective items in a shipment; time to complete a task in a PERT (可用於設備不良率，PERT中完成某工作之時間) Data Collection and Analysis

Theoretical Distribution(理論分配) • Beta Distribution (貝他分配) • X~Beta(α1,α2) • Used as a rough model in the absence of data (可用於資料缺乏時的粗操模式上) • Distribution of a random proportion, e.g. the proportion of defective items in a shipment; time to complete a task in a PERT (可用於設備不良率，PERT中完成某工作之時間) Data Collection and Analysis

Theoretical Distribution(理論分配) • Weibull Distribution (韋伯分配) • X~Weibull(α,β) • Exp(β)=Weibull(1,β) • Used as time to complete some task or time to failure of a piece of equipements (完成工作時間、設備持續至故障的時間) • Distribution of a random proportion, e.g. the proportion of defective items in a shipment; time to complete a task in a PERT (可用於設備不良率，PERT中完成某工作之時間) Data Collection and Analysis

Fitting Theoretical Distribution (理論分配配對) • Stat::Fit does a reasonable job of data fitting which ranks distribution. (Stat::Fit 可以提供一些合理的資料配對的排名分配 (p.127)) • Trial and Error Process (試誤法) • Goodness of fit test evaluates each fitted distribution to ascertain the relative goodness of fit. (適合度檢定評估每個配對的分配來確定相關的適合度 Data Collection and Analysis

Fitting Theoretical Distribution (理論分配配對) • Two common goodness of fit tests: χ2 and Kolmogorov-Smirnov tests (兩種常見的適合度檢定： χ2 檢定與 Kolmogorov-Smirnov 檢定) • If little data are available, goodness of fit test is unlikely to reject any candidate distribution (資料不足時，適合度檢定無法拒絕任何分配) • Good idea to look at graphical display in a histogram (直方圖) before making decisions (作決定前，先看看直方圖視個好辦法) Data Collection and Analysis

Data Absence (缺乏資料) • Most likely or Mean Value (最可能資料或平均值) • About 10 customers arrivals per hour • Approximately 20 mins to assemble parts • Around five machine failure per day • Minimum and Maximum Values (最大值與最小值) • 1.5 to 3 mins to inspect items • 5 to 10 customer arrivals per hour • 4 to 6 minutes to set up a machine • Minimum, Most likely, Maximum Values can be easily set up as a triangular distribution (缺乏資料時，可以使用最小值最可能值與最大值建立一個三角分配) Data Collection and Analysis

Summary (結論) • Data should be collected systematically (資料應有系統的收集) • Three types of data: structural, operational and numerical (資料分為；結構型、作業型與數值型三種) • Questionnaire is a good way to request information (問卷昰獲得資訊的有效方法) Data Collection and Analysis

Summary (結論) • Numerical data for random variables should be analyzed to test for independency and homogeneity (數值型資料應分析獨立性與齊一性) • A theoretical distribution should be fit to the data whenever possible (理論分配應可用於配對資料) • Data should be documented, reviewed and approved (資料應被文件化、複習與證明) Data Collection and Analysis

Data Collection and Analysis ( 資料收集與分析 )