Six Sigma Quality Resources for Finance & Financial Services In association withDeLeeuw Associates, a division of CSI
 Main Site > Financial Services Channel > Statistics  > Normality Search:
 
 for    
Publications
Marketplace
| iSixSigma
Stuff
| iSixSigma
Blogosphere
| Events
Calendar
| The
Dictionary
| Discussion
Forum
| Find
a Job
| Post
a Job
| Industry
News
| Newsletter
Signup
| Sigma
Calculator
| Online
Surveys
2008 Version! DMAIC Training Slides: 1,176 Slides + Instructor Notes and More for $99.95
iSixSigma Magazine Signup
 iSixSigma Live!  
  iSixSigma Live! Summit
  Agenda
  Registration Info
  Breakthrough Awards
 Free Newsletters!  
  Sign Up Now!
  Manage Subscriptions
  New To Six Sigma?
  Six Sigma Q&A
  Cert. Practice Test
  Problem Solving Wizard
  ISSSP Info
ISSSP Is The Official Six Sigma Society of iSixSigma
 Channels 
  iSixSigma Main
  Europe
  Healthcare
  Military
  Software / IT
 Quality Directory 
  Recent Articles
  Certifications/Awards
  Consultants
  Culture Evolution
  Methodologies
  News & Events
  Organizations
  Product/Service Guides
  Statistics & Analysis
   Normality
   Variation
  Tools & Templates
  Voice of the Customer
  Free Whitepapers
 Related Topics 
  Innovation
  Outsourcing/Offshoring
  Business Process Mgt
 Quick Access 
  Help
  Search
  Advertise Here
  Article Archives
  Newsletter Archives
 User Feedback 
  Please suggest site
  improvements.
 
  [ larger form ]

Handling Non-Normal Data

Bookmark This Page Bookmark This Page
Email This Page Email This Page
Format for Printing Format for Printing
Cite This Article Cite This Article
Submit an Article Submit an Article
Six Sigma Article Archive Read More Articles
Related Tools & Articles
  • Six Sigma Quick Poll
    Were you taught to test for normality before analyzing the data?
    Yes, definitely
    Yes, I vaguely recall
    I can't remember
    No, I don't think so
    No, definitely not
    Discussion Forum
    "I am not entirely sure why normal distributions are important. During my BB training, the master BB sometimes says, if you collect data and it is not normal, 'you need to collect more data.' What does non-normal data actually say about the process in question?"
    Normal Distributions - Why Does It Matter?
    By S
    Download Products
    hree Padnis

    Figure 1: Bell Shaped CurveMost processes, particularly those involving life data and reliability, are not normally distributed. Most Six Sigma and process capability tools, however, assume normality. Only through verifying data normality and selecting the appropriate data analysis method will the results be accurate. This article discusses the non-normality issue and helps the reader understand some options for analyzing non-normal data.

    Introduction
    Some years ago, some statisticians held the belief that when processes were not normally distributed, there was something "wrong" with the process, or even that the process was "out of control." In their view, the purpose of the control chart was to determine when processes were non-normal so they could be "corrected," and returned to normality. Most statisticians and quality practitioners today would recognize that there is nothing inherently normal (pun intended) about the normal distribution, and its use in statistics is only due to its simplicity. It is well defined, so it is convenient to assume normality when errors associated with that assumption would be minor. In fact, most of the efforts done in the interest of quality improvement lead to non-normal processes, since they try to narrow the distribution using process stops. Similarly, nature itself can impose stops to a process, such as a service process whose waiting time is physically bounded at the lower end by zero. The design of a waiting process would move the process as close zero as economically possible, causing the process mode, median and average to move towards zero. This process would tend towards non-normality, regardless of whether it is stable or non-stable.

    Many processes do not follow the normal distributions. Some examples of non-normal distributions would include:

    • Cycle time
    • Calls per hour
    • Customer waiting time
    • Straightness
    • Perpendicularity
    • Shrinkage
    • And many others

    To help you understand the concept, let us consider a data set of cycle time of a process (Table 1). The lower limit of the process is zero and the upper limit is 30 days. Using the Table 1 data, the process capability can be calculated. The results are displayed in Figure 2.

    Table 1: Cycle Time Date
    19113024242028272620
    17533233392620482134
    36434243414035242123
    22202539265319132728
    35114238322724221817
    171515 9512625134737
    5217 4 8 91816 54911
    8311431241941 225 1
    8 7163421 914311614
    5 21042211521111522
    9325648272421243133
    31154027242214131314
    14433718174710131422
    854 825 81918 9 332
    2116 63636 921 72828
    20172515211011 6 4 8
    212322 5 521151314 6
    12341514 7 6 9142318
    7101426122830263414
    25171318192127272313
    12 2 224351228   

    Process Capability Of Non-Normal Data

    If you observe the normal distribution curve for both the within and overall performance, you would see that the curve extends beyond zero and calculates the long term PPM less than zero as 36465.67.

    Is this applicable in this process where the process is bounded by zero? Use of the normal distribution for calculating the process capability actually penalizes this process because it assumes data points outside of the lower specification limit (below zero) when it is not possible for that to occur.

    The first step in data analysis should be to verify that the process is normal. If the process is determined to be non-normal, various other analysis methods must be employed to handle and understand the non-normal data.

    For the above data, if we calculate the basic statistics they would indicate whether the data is normal or not. Figure 3 below indicates that the data is not normal. The P value of zero and the histogram help in confirming that the data is not normal. Also, the fact that the process is bounded by zero is an important point to consider.

    Descriptive Statistics of Non-Normal Data

    The most common methods for handling non-normal data are:

    • Sub group averaging
    • Segmenting data
    • Transforming data
    • Using different distributions
    • Non-parametric statistics

    Sub Group Averaging

  • Averaging the sub groups (recommended size greater than 4) usually produces a normal distribution
  • This is often done with control charts
  • Works on the central limit theorem
  • The more skewed the data, the more samples are needed

    Segmenting Data

  • Data sets can often be segmented into smaller groups by stratification of data
  • These groups can then be examined for normality
  • Once segmented, non-normal data sets often become groups of normal data sets

    Transforming Data

  • Box-Cox transformations of data
  • Logit transformation for Yes or No data
  • Truncating distributions for hard limits (like the data set presented here)
  • Application of meaningful transformations
  • Transformations do not always work

    Using Different Distributions

  • Wiebull Distributions
  • Log Normal
  • Exponential
  • Extreme value
  • Logistic
  • Log logistic

    Non-Parametric Statistics

  • Used for statistical tests when data is not normal
  • Tests using medians rather than means
  • Most often used when sample sizes of groups being compared are less than 100, but just as valid for larger sample sizes
  • For larger sample sizes, the central limit theorem often allows you to use regular comparison tests

    When performing statistical tests on data, it is important to realize that many statistical tests assume normality. If you have non-normal data, there are parametric equivalent statistical tests that should be employed. Table 2 below summarizes the statistical tests to use with normal process data, as well as the and non-parametric statistical test equivalents.

    Table 2: Common Statistical Tests For Normal & Non Parametric Data
    Assumes NormalityNo Assumption Required
    One sample Z testOne sample Sign
    One sample t testOne sample Wilcoxon
    Two sample t testMann - Whitney
    One way ANOVAKruskal - Wallis
    Moods Median
    Randomized Block
    (Two way ANOVA Analysis)
    The Friedman Test

    If we look back at our Table 1 data set where zero was the hard limit, we can illustrate what tests might be employed when dealing with non-normal data.

    Setting A Hard Limit
    If we set a hard limit at zero and re-run the process capability, the results are presented in Figure 4.

    Figure 4: Process Capability for Cycle Time

    Figure 4 now indicates that the long term PPM is 249535.66, as opposed to 286011.34 in Figure 2. This illustrates that quantification can be more accurate by first understanding whether the distribution is normal or non-normal.

    Weibull Distribution
    If we take this analysis a step further, we can determine which non-normal distribution is a best fit. Figure 5 displays various distributions overlaid on the data. We can see that the Weibull distribution is the best fit for the data.

    Figure 5: Four-Way Probability Plot for Cycle Time

    Knowing that the Weibull distribution is a good fit for the data, we can then recalculate the process capability. Figure 6 shows that a Wiebull model with the lower bound at zero would produce a PPM of 233244.81. This estimate is far more accurate than the earlier estimate of the bounded normal distribution.

    Figure 6: Process Capability Analysis for Cycle Time

    Box-Cox Transformation
    The other method that we discussed earlier was transformation of data. The Box-Cox transformation can be used for converting the data to a normal distribution, which then allows the process capability to be easily determined. Figure 7 indicates that a Lambda of 0.5 is most appropriate. This Lambda is equivalent to the square root of the data.

    Figure 7: Box-Cox Plot for Cycle Time

    After using this data transformation, the process capability is presented in Figure 8. This transformation of data estimates the PPM to be 227113.29, which is very close to the estimate provided by the Weibull modeling.

    Figure 8: Process Capability Analysis for Cycle Time

    We have seen three different methods for estimating the appropriate process capability of the process in case the data is from a non-normal source: Setting a hard limit on a normal distribution, using a Weibull distribution, and using the Box-Cox transformation.

    Sub-Group Averaging
    Now let us assume that the data is collected in time sequence with a subgroup of one. The X bar R chart with subgroups cannot be used. Had the data been collected in subgroups the Central limit theorem would come in handy and the data would have exhibited normality. If we use the Individual Moving Range chart -- which is the more appropriate chart to use -- Table 3 displays the results.

    Table 3: I/MR for Cycle Time
    Test Results for I Chart
    TEST 1. One point more than 3.00 sigmas from center line.
    Test Failed at points: 12 36 55 61 103 132

    TEST 2. 9 points in a row on same side of center line.
    Test Failed at points: 28

    TEST 3. 6 points in a row all increasing or all decreasing.
    Test Failed at points: 49 50

    TEST 5. 2 out of 3 points more than 2 sigmas from center line (on one side of CL).
    Test Failed at points: 23 24 25 61 80 104 203

    TEST 6. 4 out of 5 points more than 1 sigma from center line (on one side of CL).
    Test Failed at points: 15 22 23 24 25 26 27 45 82 159 160

    TEST 8. 8 points in a row more than 1 sigma from center line (above and below CL).
    Test Failed at points: 27

    Test Results for MR Chart
    TEST 1. One point more than 3.00 sigmas from center line.
    Test Failed at points: 12 55 62 69 70 78 127 132 133

    TEST 2. 9 points in a row on same side of center line.
    Test Failed at points: 52 53 54 200 201 202 203

    FIgure 9: I and MR Chart for Cycle Time

    Figure 9 indicates that the process is plagued by special causes. If we focus only on those points that are beyond the three Sigma limits on the I chart, we find the following data points as special causes.

    TEST 1. One point more than 3.00 sigmas from centerline.
    Test Failed at points: 12 36 55 61 103 132.

    The primary assumption in the Figure 9 control chart is that the data is normal. If we plot the I-MR chart applying the Box-Cox transformation as above, the looks much different (see Figure 10).

    FIgure 10: I and MR Chart for Cycle Time

    Table 4: I/MR for Cycle Time
    Test Results for I Chart
    TEST 1. One point more than 3.00 sigmas from center line.
    Test Failed at points: 80

    TEST 2. 9 points in a row on same side of center line.
    Test Failed at points: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 110 111

    TEST 3. 6 points in a row all increasing or all decreasing.
    Test Failed at points: 49 50

    TEST 5. 2 out of 3 points more than 2 sigmas from center line (on one side of CL).
    Test Failed at points: 61 80 92 104 165 203

    TEST 6. 4 out of 5 points more than 1 sigma from center line (on one side of CL).
    Test Failed at points: 15 22 23 24 25 26 27 45 82 113 159 160

    TEST 8. 8 points in a row more than 1 sigma from center line (above and below CL).
    Test Failed at points: 27

    Test Results for MR Chart
    TEST 1. One point more than 3.00 sigmas from center line.
    Test Failed at points: 55 69 78 80 132 133 140

    TEST 2. 9 points in a row on same side of center line.
    Test Failed at points: 29 30 31 32 33 52 53 54 200 201

    If we attempt to study the number of points outside the three sigma limits on the I chart, we note that the test fails at only one point -- number 80 -- and not at points 12 36 55 61 103 132 as indicated by the Figure 8 chart earlier. In fact, if you study the results of other failure points one would realize the serious consequences of assuming normality: doing so might cause you to react to common causes as special causes, which would lead to tampering of the process. It is important to note that for X bar R chart the problem with normality is not serious due to the central limit theorem, but in the case of Individual Moving Range chart there could be serious consequences.

    Now let's consider a case where we would like to study whether the mean cycle time of this process is at 20 days. If we assume data normality and run a one sample t test to confirm this hypothesis, Table 5 displays the results.

    Table 5: One-Sample T - Cycle Time
    Test of mu = 20 vs mu not = 20

    Variable N Mean StDev SE Mean
    turn around 207 21.787 12.135 0.843

    Variable 95.0% CI T P
    turn around ( 20.125, 23.450) 2.12 0.035

    Based on the above statistics, one would pronounce at an alpha risk of 5% that the mean of the data set is different than 20. If we were to verify the fact that the data is not normal, we would have run the one sample Wilcoxon test which is based on the medians rather than the means, and we would have obtained the results found in Table 6.

    Table 6: Wilcoxon Signed Rank Test: Cycle Time
    Test of median = 20.00 versus median not = 20.00

    N for Wilcoxon Estimated
    N Test Statistic P Median
    turn aro 207 202 11163.5 0.273 21.00

    The Wilcoxon test indicates that the null hypothesis (test median is equal to 20) is accepted, and there is no statistical evidence that the median is different than 20.

    The above example illustrates the fact that assuming that the data is normal and applying statistical tests is dangerous. As a better strategy in data analysis, it is better to verify the normality assumption and then -- based on the results -- use an appropriate data analysis method.

    About The Author
    Shree Phadnis is a Master Black Belt at KPMG India. Mr. Phadnis is an ASQ certified Quality Manager and ASQ Certified Quality Engineer. Mr. Phadnis can be reached at shreephadnis@usa.net.

     
    Rate This Article:  Current Rating: 4.53
      Poor    Excellent     
              1    2    3     4    5
    Copyright © 2000-2008 iSixSigma – All Rights Reserved
    Reproduction Without Permission Is Strictly Prohibited – Copyright Requests


    Publish an Article: Do you have a Six Sigma tip, learning or case study?
    Share it with the largest community of Six Sigma professionals, and be recognized by your peers.
    It's a great way to promote your expertise and/or build your resume. Read more about submitting an article.

    BEST SELLING PRODUCTS (iSixSigma Publications)
    1. Certified Lean Six Sigma Green Belt Assessment Exam
      This assessment exam is useful for students interested in assessing their knowledge of Lean Six Sigma on the Green Belt ...
    2. Certified Lean Six Sigma Black Belt Assessment Exam
      Interested in assessing your knowledge of Lean Six Sigma? Preparing for certifications? Testing your students and traine...
    3. Six Sigma Black Belt (DMAIC) Training Slides
      The 2008 Six Sigma Black Belt course is comprised of: 1,176 PowerPoint slides, Instructor notes, Slide explanations, 37 ...
    4. Certified Lean Six Sigma Black Belt E-book
      In 670 pages learn everything within the Lean Six Sigma DMAIC body of knowledge to successfully achieve Black Belt certi...
    5. Gage R&R Excel Template
      Gage Repeatability and Reproducibility (R&R) studies measure the amount of measurement variation that is attributabl...
    6. CSSBB Preparation Pack
      The CSSBB Preparation Pack includes materials for passing the Certified Six Sigma Black Belt (CSSBB) exam. This CSSBB Pr...
    7. Cost of Poor Quality (COPQ) Course
      The biggest issue facing companies today is the ability to tie waste to the financial balance sheet of the company. Co...
     
  • Six Sigma AdLinks
    Valeocon: Six Sigma for Financial Services
    iSixSigma Live! Save up to $700
    iSixSigma Job Shop: Find The Key Person



    Google AdWords
     
    Home | Discussion Forum | Event Calendar | Job Shop
    Link To iSixSigma | Rate This Page | Report A Problem | Free Content For Your Site | Submit Article For Publishing
     Terms of Service. ©2000-2008 iSixSigma. All rights reserved. v3.0lb, 1.9-C-246
    About iSixSigma · Contact Us · Privacy Policy · Site Map
    nogeo