Hello and welcome to the third section of the Lean Six Sigma Black Belt Training program offered by Simplilearn. This section is titled, Measure, is divided into ten lessons. Before we start off with this section, let us understand the agenda in the next slide.

In this section we will cover the details of the Measure Phase. The Measure phase is the most important phase in any Six Sigma project. It sets the precedence for all the calculations. If we cannot get the right measurements, we may not be solving the right problem and can go onto a totally incorrect path. We will begin with discussing the agenda of this section. This section is divided into ten lessons. In lesson 1, we will understand some basic considerations before we start the Measure phase, and the tools associated with it. In lesson 2 we will discuss the different types of data and measurement scales. Then in lesson 3, we will highlight the information about the measures of central tendency and dispersion. Next, in lesson 4 we will discuss Measurement System Analysis. In this lesson, we will cover variables GAGE RR and attribute RR. This will help us validate whether the measurement system is accurate and capable of doing the right measurements. In lesson 5, stability conditions will be discussed, explaining why it is so important for a process. Then in lesson 6, we will understand Capability Metrics in detail. Further in Lesson 7, we will discuss variations, variability, capability, and other process conditions. In Lesson 8 we will discuss the different data distributions, which include normal, binomial, Poisson, and exponential distributions. Next, in lesson 9 we will discuss sigma shift, mean shift, and reducing variations. And lastly, in lesson number 10, we will highlight information about the baseline data. Let us move on to the next slide, where lesson 1 will be introduced.

Now we will look into the lesson named "Pre-Measure Considerations and Tools". In this, we will discuss the pre-requisites for the Measure phase that include exiting from the earlier Define phase, building Cause and effect metrics, etc.

In this lesson we will understand the Define phase toll gate review, followed by DFMEA, and finally, the cause and effect matrix.

Post completion of the Define phase, a toll gate review is conducted with the Six Sigma team and management. The toll gate review is also referred to as phase gate review. The Sponsor presides over the toll gate review meeting. The Black Belt conducts the review meeting and presents the project in front of everybody along with the Green Belt. The problem statement and the key project area opportunity are discussed upon. The project deadline and also, the possible allocation of funds for the project are finalized by the Sponsor. Understanding the extent of funding is important for the Black Belt as well. The Six Sigma team should prepare additional tools like the DFMEA (Design Failure Mode and Effects Analysis) and the project storyboard before the toll gate review meeting and stepping into the Measure Phase. Let us move on to the next slide, where we will understand DFMEA.

DFMEA stands for Design Failure Mode and Effects Analysis and is also known as design FMEA. Other types of FMEA include process FMEA and system FMEA. At the beginning of the Measure phase the Six Sigma team must update only the key process steps in the FMEA document. These process steps then feed on to the key process output variables (also known as KPOVs) that eventually drive the project forward. The DFMEA template is ideal when it is updated post the completion of the Cause and Effect Matrix, which will be covered as the next tool. The DFMEA template is provided as part of the toolkit titled in a separate Excel worksheet, Design FMEA Template. Also note that DFMEA (pronounced as D-F-M-E-A) is not used typically in the DMAIC (pronounced as “d-mack”) approach. This tool is used more in the DFSS (pronounced as D-F-S-S) approach. In the next slide, we will understand the DFMEA tool.

Here we will discuss how to use the DFMEA (pronounced as D-F-M-E-A) tool. The DFMEA tool contains three main worksheets. The first one is to Prepare FMEA, second is the FMEA, and the third worksheet is data drawings and pictures. First use the ‘Prepare FMEA worksheet’, which provides a checklist for all the necessary items to do an FMEA. Second use the FMEA worksheet which serves as the basic DFMEA template. The third and last one to be used is the Data Drawings and Pictures that helps in providing supplementary drawings to the FMEA template. In the next slide, we will look into the snapshot of preparing FMEA toolkit.

The slide provides a snapshot of FMEA worksheet. Here, the background information is selected and the necessary information is provided. Let us discuss the snapshot more in the next slide.

In this slide, we can see the details of each part added related to its failure mode, occurrence, severity, and detectability. Ratings are provided to each and once the action is identified, updated ratings are captured.

In this slide, a step by step process is explained on how to update the DFMEA template. We will begin with the identification of the function, where one needs to update the key process step. Next step is to identify the failure mode for each line item and update the point or key issue that could happen. After this, identify the causes of failure, and update what caused this key issue to happen. Then, understand the customer effects. Here, update the key issues to the customer due to the failure mode. In the ratings column, we will begin with the 1st rating. Here provide severity, occurrence, and detection ratings as per the guidelines. Then, update the recommended action. In this add the recommended action that is updated in the Improve phase. After that, document the planned action. Add the planned action post the first stage validation of the Improve phase pilot run. And finally provide the 2nd rating. Here provide the revised ratings post the Control phase deliverables done.

In the next few slides we will discuss the CE Matrix, which is also known as the Cause and Effect matrix. CE matrix is a prioritization tool that helps in prioritizing the causes or input variables on the basis of the relationship of the input variable with the CTQ of the process. To understand it better, look into the CE Matrix template provided as part of the toolkit in the worksheet titled CE Matrix. The CTQs are placed on the top of the matrix. The input variables are placed on the vertical side of the matrix. The correlation scores for the input variables and the CTQs are provided in the matrix. The highest overall scores are then prioritized using the Pareto charts. Let us proceed to the next slide, where we will continue discussing the Cause and Effect Matrix (CE Matrix).

In this slide, we will cover the details on how to update the CE Matrix? First, list the CTQs across the top of the matrix. Then, rank and assign the score to each CTQ according to the importance to the customer. Scores are provided on a scale of 1-9. The CTQ that has least importance is given a score of 1. After this, list all input variables on the left side of the matrix. For each of the CTQ, determine the correlation scores in the matrix. 1 is given for weak correlation For moderate correlation it would be 3 and for any strong correlation use 9. After that, prioritize correlation scores as per the overall scores. Let us look into the snapshot of CE Matrix, in the next slide.

By looking at the snapshot given on the slide, we can understand how a CE Matrix looks like. The first column is to list down the process inputs. Next, list down all process output variables in their respective columns. Then, provide all the correlation numbers in the cells that are corresponding to the output and input variable. The last column is the total column which computes the total for each input parameter.

This slide provides a CE matrix template. The first column is to list down the process inputs along with their KPIVs (called as Key Process Input Variables). In the next column, fill the KPOVs (Key Process Output Variables) respectively. Then, provide all the correlation numbers in the cells that are corresponding to the output and input.

With the help of an example let us figure out how a CE Matrix is updated. A supply chain company works on the CTQs, delivery hours, packaging defects, and delivery cost. These would be the KPOVs (also called as Key Process Output Variables). The input variables that form the integral portion of the process are the delivery allocation time, delivery units, the number of resources for packaging, and the experience used in packaging.

In this slide, we can see the updated matrix. You can update the matrix as mentioned in the previous slide. A sample matrix is updated here in the toolkit that can be referred to as a template. Let us now move on to the next slide, where we will look into the summary this lesson.

In this lesson we started off with the define phase toll gate review, and then we looked into design failure mode and effects analysis. Finally, we understood the cause and effect matrix. With this we move on to the next lesson in the following slide.

After completion of all the pre-requisites of the Measure phase, we will now move on to do the measurements. In this lesson, we will cover in detail the different types of data and measurement scales.

In this lesson we will understand the objectives of the Measure phase. Then we will understand the process, flowcharts and the SIPOC map. Later, we will understand metrics and measurement scales. Finally, we will look into the different types of data.

The first topic we are going to discuss in this lesson is the objectives of measure. Here, we will understand the objectives of the Measure phase. In this slide, we will cover all the objectives of the measure phase. First one is process definition. We need to ensure that the specific process under investigation is clearly defined to remove any confusion or misunderstanding. Second object is to ensure metric definition. In this we will define reliable means of measuring the process relative to the project deliverables. After this, the important step is to establish the project baseline. In this, we will establish the current operating metrics performance, document it, and then will use the same for future reference. And finally, the most important objective is to evaluate the measurement system. Here, we need to validate the reliability of data to ensure meaningful conclusions are drawn. In the next slide, we will understand what a process is.

So here is the question, what really is a process? To understand a process better, we should know that a process has a series of repeatable tasks carried out in a specific order. If a process cannot be defined by the people who are using the process, it probably means multiple processes exist or a well-defined process doesn’t exist. These are the signals that need to be understood. Also, note that a process is one of the prerequisites of doing a Six Sigma project. A well-defined process must exist as a basic precondition for the Six Sigma efforts to succeed. Process can be also defined by Flowcharts, SIPOC, and process maps. Let us discuss flowcharts in the next slide.

Flowchart is probably one of the simplest ways to represent a process. A flowchart is a basic graphical format of how the current process is, as on date. It is important to note that, such flowcharts must be called As-Is flowcharts that are often drawn in the Measure stage. Flowchart is the pictorial way of presenting a process, its logic flow, decision points, external, and internal interfaces, etc. The figure in this slide represents the symbols used in a flowchart. These are ANSI’s (or American National Standards Institute’s) standard symbols. Let us look into these symbols and their indications shown in the figure. As we can see in the image, the oval box represents the start or end of a process. Next is flow line, which looks like an arrow that denotes the direction of logic flow in a program. Parallelogram is the next symbol that denotes either an input operation (for example, INPUT) or an output operation (for example, PRINT). Another is the rectangle box that represents a process to be carried out (for example, an addition). Lastly, it is the diamond symbol that denotes a decision (or branch) to be made. The program should continue along one of the two routes (for example, IF or THEN or ELSE). In the next slide we will understand the process through SIPOC.

SIPOC is a utility tool, which is important for scoping a project. The Define phase helps to identify the projects that can be implemented enterprise wide, yet most of these projects fail due to a large scope. A SIPOC map is a broad-level map that allows to identify the scope of the project, by setting project boundaries. A SIPOC map is a mandatory requirement for all the Six Sigma projects, because without this map the scope of the project may not be determined. Scoping can be done on the process as well as at the customer’s end. Also note that a SIPOC map helps in setting boundaries for the project. Especially during the Enterprise Wide Deployments this may be critical for a Black Belt. Now, we will look into the questions to be asked before preparing a SIPOC. This will be discussed more in the coming slide.

Here are the questions that are to be asked before preparing a SIPOC diagram. First question is who is the stakeholder for the process? For finding the stakeholders identify the people who are to be involved in the project like the supplier, customer, partners, etc. Second question is what the value for the process is and what value is created for the customers? Value is what is considered and recognized by the customer, both internal and external. Next, find out who is the owner for the process? Then, work with the owner to get the details on who provides input for the process? Understand if the inputs are consistently of good quality? Find out what are the inputs for the processes? Also, find out what resources does the process use? And, the final question to be asked is what process steps create the value? SIPOC stands for suppliers, inputs, process, output, and customers. This is always conducted by starting off with a focused brainstorming session. In the next slide we will continue understanding more about SIPOC.

In this slide a snapshot of the SIPOC map is provided. SIPOC is also provided in the form of an excel spreadsheet in the toolkit. Let us now analyze metrics in the next slide.

It is important to know about Metrics, as it is one of the most used words by any Six Sigma practitioner. Metrics should be considered in the Measure stage, as it quantifies the project deliverable, about which it has been mentioned in the Define phase. Metrics are typically defined with the help of process expertise. This means, while setting up a metric, a process expert should always be involved. Metrics in some cases could also be process end point measurements. For example, in chemical and manufacturing sectors, the customers may define their end product requirements and not really the process metrics. Measurement is just a numerical value assigned to an observation. Let us proceed to the next slide explore the measurement scales.

Now let us discuss the measurement scales. This is an important precursor before we enter into the core Measure phase discussion. Broadly, we have four types of measurement scales, namely nominal, ordinal, interval, and ratio. Nominal measurements are not measurements in the purest sense. They are purely indicative. Nominal measurements are also considered as the weakest form of measurements or the weakest type of measurement scales, as they reveal very less information about the data itself. Nominal scales define or classify only the presence or absence of a certain attribute. With nominal scales, the items can only be counted; and the statistics that can be used are percentage, proportion, and chi-square tests (pronounce it as KaiSquare tests). To understand more about Measurement Scales, let us move on to the next slide.

Here, we will understand ordinal scale. It defines one item having more or less of the attribute than the other item. In simple words it is a comparative scale of measurement. Let’s take up an example of taste and attractiveness. The food tastes better than the drink. As we can see, the ordinal scale does not reveal much information about the item being studied. As a statistical measure one can use the rank order correlation to quantify the comparative data between the two. We will now understand interval scale and ratio scale.

Interval scale defines the difference between any two successive points. The difference may or may not be equal for the data to be treated on the interval scale. Take the example of the calendar time and temperature to understand this better. Correlations, f tests, t tests, and regression can be used for interval scales. Next we will understand ratio scales. Ratio scale defines all true zero value points. One can perform addition, subtraction, division, and multiplication with such data points. For example, in calculating the elapsed time, distance, etc., we can do the standard mathematical calculations like addition, subtraction, division, and multiplication with the data. Types of data, is explained in the next slide.

After understanding the measurement scale, it is important for us to know the type of data that we have for the project. Understanding the type of data that is being dealt with will provide a more intuitive way of understanding the data. In other words, measurement scales form the foundation while the types of data form the ‘frontend’ of the understanding. We can categorize data in three divisions, namely continuous or variables, discrete, and attribute data. Data collected on nominal scale are attribute data. The only possible relation with the attribute data on a nominal scale is equality and inequality. Ordinal attribute data is often converted to nominal data and then, further analyzed using binomial and Poisson models. We will understand this in the later slides. Let us continue discussing the types of data in the next slide.

Discrete data can be defined as the data values that can be counted, and take the form of non-negative integers. For example, the number of defective parts in a shipment, the number of failed calls in a process, and so on. Another type of data is the continuous data. This would be the data values that can be measured using a measurement system. Such data values can take any whole number in the finite or infinite set. For example, the time taken for a shipment to reach the customer can be called a continuous data. To understand this better, let us take up a quick assignment. Know what the customer wants in terms of deliverables. Map the Project Y and find out the type of data that would be worked with. Let us move on to the next slide, where we will summarized the topics we have learned.

In this lesson, we have learned the different objectives of Measure phase. We also understood the significance of process, flowcharts, and SIPOC map in the Six Sigma. Finally, we discussed the measurement scales and types of data. Let us now proceed towards the next slide, where we will discuss the third lesson of this section.

In this lesson, we will take a closer look at the data for Central Tendency and Dispersion. This will help us understand the data better, and to do a focused analysis.

Let us start this lesson by looking into the agenda. We will begin our discussion by introducing ourselves to the central tendency and dispersion, and then we will move on to understand, mean, median, mode, range, variance, and standard deviation. We will discuss mean deviation as the last topic of this lesson. Now let us start with lesson’s first topic, introduction to central tendency and dispersion.

Let us first understand what we mean by Central Tendency and Dispersion A sample set of data can be easily represented with the help of measures of central tendency and dispersion. Central tendency represents the sample average, while dispersion broadly represents the spread or variation in the data. For example, if anyone wishes to report the scores of fifty students in a class it will be cumbersome to represent individual scores. The best method is to find out the class average and also, the dispersion in the class scores. When put together, the data would represent the class effectively. For example the class average is sixty two and the range of the scores is fifty. Let us now move on to the next slide, where we will understand mean.

Mean is the most popular and often used measure of central tendency in Six Sigma applications. It is the arithmetic average of all the data points and is popularly known as Arithmetic Mean. Mean is extremely sensitive to the values in the data set. Extreme values can skew the result of the mean rendering it unusable. In the case of extreme values, also known as Outliers, one should use median instead of mean.

We will now see how to calculate the mean of the data set mentioned in the previous slide. In Microsoft Excel, type all the numbers provided in the data set one below the other. Type the formula, equals Average and open the bracket. Select all the numbers and close the bracket. From the Excel Sheet, we can find that the mean of the data set is four point six six. Consider a modification. Add a new number hundred to the list. The number hundred here is considered an extreme value (also known as an outlier.) Use the same function of Average in Microsoft Excel. The mean now changes to fourteen point two. Interpreting this mean gives us an interesting insight on why one shouldn’t use mean when data has outliers. Manually check the data. It can be found that, out of the ten values only one lies above fourteen point two and the rest of the points lie below fourteen point two. Clearly, this cannot be a measure of Central Tendency. Now let us discuss median in the next slide.

Median, also known as the positional mean, is the central location or position of the data list. With the same data list taken in the example of the mean, the results are as below, using the function equals MEDIAN open the bracket select all the numbers and close the bracket. Median when the data is without the outlier is four point five. Median when the data is with the outlier is five. As we can see, the Median doesn’t change much even with the introduction of the outliers. Also, it is important to note that in the case of extreme values or outliers one should use Median instead of Mean.

Mode, also known as the Frequency Mean, is probably one of the least used measures of central tendency in Six Sigma applications. Mode represents the value in a data list which repeats itself the most. The data list given in the example of mean calculations is used. Use the function, equals Mode open the bracket select all the numbers and close the bracket to find the mode of the data list. The result says #N/A (pronounced as hash N slash A), which means this data list doesn’t have any mode. According to the interpretations, presence of one mode in a data list is a must for anyone to proceed further in a Six Sigma project case. Presence of two modes or bimodal data means, we need to look into the reasons of two modes and screen out one. Also note that, one cannot proceed ahead in the Measure phase with the bi-modal data. Let us move on to the next slide, where we will understand Range.

In this slide and next few slides we will discuss the Measures of Dispersion, and we will first take up Range. Range is a measure of dispersion or the difference between the maximum value and the minimum value of the data list. Range does not need the presence of all the values in the data list to move ahead with the calculations. It needs the largest and the smallest values only in the data list. Range doesn’t work well for the sample sizes greater than ten. This is a statistically proven fact. The data list which we will use for calculating Dispersion is given in the slide. Let us first find out the range. Use the MAX function in Excel to find out the largest value. Look at the values mentioned on the slide. The value we get by using the MAX function is 7. Next, use the MIN function to find the smallest value from this data set. The value comes out to be 2. Finally, to find the range, subtract the minimum value from the maximum value. In this case, it will be 7 minus 2, which is 5. So, the range for this data set is 5.

Variance is a measure of dispersion and is defined as the average of squared mean differences. With the same data list in Microsoft Excel, one can use the VAR (pronounced as V-A-R) function, which is given as, is equals to VAR open bracket select the range close bracket. Please note that VAR function is used to calculate the sample variance and VARP function (that is equals to VARP open bracket select the range close bracket), is used to calculate the population variance. In Six Sigma project applications, sample variance is used more than population variance. Using the formula VAR, the sample variance for the data list is two point eight seven five. Variance is unit-less and two point eight seven five represent the difference or dispersion in the data.

Standard Deviation is a measure of dispersion and is technically defined as the square root of variance. In simple words, standard deviation is the value of difference from the mean. It is represented by the Greek alphabet, which is popularly called as sigma. Standard deviation is a very popular measure of dispersion. Use the formula STDEV, which is given as, is equals to STDEV open bracket select the range close bracket, to calculate sample standard deviation. And use STDEVP to calculate the population standard deviation. It works with the sample standard deviation in Six Sigma project applications most of the times.

Mean deviation is not a frequently used measure of dispersion. Technically, mean deviation is the mean of absolute deviations from the mean. Use the same data list (2nd column in the table) and find out the deviation of every data point from the mean (it is shown as 3rd column). In Excel, one can use the ABS function to get the absolute value of each data deviation. Finally, take an average of all these points, which will give the mean of the list. This would get mean deviation. In the next slide, we will summarize the topics we have learned in this lesson.

In this lesson we have discussed the introduction to central tendency and dispersion. We then moved on to understand mean, median, mode, range, variance, standard deviation, and lastly,mean deviation. Let us now move on to the fourth lesson, Measurement System Analysis, in the coming slides.

We started with understanding the data and measurement scales, looked at the central tendency and dispersion of data. In this lesson we will understand how to validate the measurement system to ensure what we are measuring is accurate and repeatable.

Let us begin this lesson by discussing the agenda. In the beginning, we will understand the purpose of measurement system analysis. And then, we will further discuss measurement system errors, properties of good measurement systems and measurement system errors illustrated. We will then move on to understand measurement system discrimination, measurement system analysis – process flow, and part variation. This will be followed by Measurement Systems Analysis Formulas, Measurement Systems Analysis Example and Measurement Systems Analysis Graphs. Towards the end, we will be discussing attribute RR, when to do measurement system analysis and data collection plan. To understand the purpose of measurement system analysis, let us move on to the next slide.

MSA also called as measurement system analysis or Gage Repeatability and Reproducibility (Also known as Gage RR. It is a tool that is used to validate the measurement system to see if the Measurement system is able to give the data that can be relied upon. If the measurement system is not validated, the data coming out of the measurement system would be questionable. The main purpose of MSA (measurement system analysis) is to verify whether the measurement system is able to provide representative values of characteristic being measured, unbiased results, and minimal variability. Let us now proceed to the next slide to understand measurement system errors.

The main errors that can be noted in a Measurement System are as follows. Resolution is one among the common errors that we generally come across. It is the lowest increment of measure that the measurement system can display. It is the ability of the measurement system to provide granular readings. Next is accuracy, and it can be defined as the difference between observed measurement and actual measurement. Precision is the difference observed when the same part is measured with the same equipment or instrument. Precision is further categorized into repeatability and reproducibility. Repeatability is when the same operator measures the same part with the same instrument. Whereas, reproducibility is when different operators measure the same part with the same instrument. Let us continue discussing Measurement System Errors in the next slide.

A graphical representation of measurement system errors is given in this slide. Observe all the errors that we commonly face in a measurement system. In the image, it shows 4 types of measurement system errors related to accuracy and precision. We will extend our discussion on measurement system errors in the next slide as well.

Let us understand more about the measurement system errors. The error, accuracy further has three components to be studied in the order. The first component is stability, which is consistency and predictability of measurements over time. The second component is Bias, which is the tendency of measurement to provide a different value and not the actual value. And the last and final one is linearity, which is considered to be a measure of bias values through the range of measurements. Precision has two components that need to be studied. One component is repeatability, also known as equipment variation and another is reproducibility, which is also known as appraiser variation. Let us proceed to the next slide to understand the properties of good measurement systems.

Here we will discuss the different properties of a good measurement system. The measurement system, also referred to as MS, should produce a number that is close to the actual property being measured. This shows Accuracy. The measurement system should produce the same measurements when measured again by the same equipment, person, and under the same condition. This shows Repeatability. The measurement system should produce consistent results in an expected range. This shows Linearity. The MS or measurement system should produce almost the same measurements (or the variation should be smaller than the agreed limit) when measured by different individuals or different equipment or at different locations. This shows Reproducibility. The MS or measurement system should also produce same results in the future as it did in the past. This shows Stability. In the next slide, we will understand the illustration of Measurement System Errors.

In this slide, we have illustrated graphically the concept of bias and reproducibility. In the first graph on the left side, we can see that the measurement data is plotted and a normal curve is formed. The difference between the center of the normal curve and the reference value is called the bias. This means that the measurement data will be on a higher side compared to the actual reference value (this is known as bias). Next, we will look into reproducibility. Look at the charts on the right side, there are 3 different people who have taken the measurements. When we plot all these 3 people's data together, one can observe that there is difference in the measurements between each observer. Dick's measurements are on the lower side, while Jane's measurements are on a higher side compared to Frank's measurements. This is termed as reproducibility of the measurements. We will understand the measurement system discrimination in the next slide.

The Xbar R chart plotted indicates adequate gage discrimination. The data is randomly distributed on the XBar chart and the range chart. This helps us understand that the measurement tools (also called as Gage) are providing results that have adequate variation. As we have discussed before, discrimination is one of the important points to be determined in the measurement system analysis. In the next slide, we will look into Bias.

Let us take an example to understand how Bias is calculated. A standard with a known value of twenty five point four millimeters is checked ten times by one mechanical inspector using a dial caliper with a resolution of point zero two five millimeter. The readings obtained are given in the slide. Also note that the tolerance range of this gage is plus or minus point two five millimeters. Calculate the bias value and interpret. Now let us discuss how to calculate the bias value in the coming slide.

Here is the solution to the problem. First, find the average of measurements. In this case, Xbar = 25.405 mm (this is the average of all the values from the sample.) In the next step find bias which is average minus reference value. So, bias is calculated as25.405 minus 25.4 millimeters equals 0.005millimeters. This shows how much is the bias of the measured data with the actual value. In the last step, percentage bias is found. Percentage Bias equals 100 multiplied by (Bias divided by Tolerance) which equals 100 multiplied by 0.005 divided by 0.5) which equals 1 percent. Based on what we have observed, the interpretation is that on an average, the measurement system will produce readings 0.005 mm greater than the actual reading. This shows 1% of acceptable variation is contributed due to Bias in the measurement system. This concludes that the measurement is in acceptable range. Further we will discuss the process flow of measurement system analysis.

Here, we will discuss the process flow for the MSA or measurement system analysis. First step is to prepare for study. Second step is to evaluate the stability. Then in the third step, we will evaluate discrimination or resolution. Determine accuracy will occur as the fourth step in the process flow. Calibrate if necessary, is the fifth step. Next is the sixth step, which is to evaluate linearity. The final two steps of the process flow are to determine repeatability and reproducibility as the seventh step and determine part variation as the eighth. Let us understand part variation in the next slide.

The Xbar or the Averages chart are plotted to capture part variation, or part to part variation, and is also known as time series variation. If the measurement system is adequate, most parts would fall outside the control limits of the averages charts. If very few parts fall outside the control limits of the averages chart the measurement system is not adequate. Part to part variation is determined only when the measurement system has adequate discrimination, stable, accurate, linear, and consistent repeatability and reproducibility. These properties are required in any Measurement System, for it to be considered good to be used.

Here are some important Measurement System Analysis formulas. The Sigma m represents gage variation. Sigma p represents part variation. The total process variation Sigma t equals square root of Sigma m the whole square and Sigma p the whole square. Sigma m equals square root of Sigma r the whole square and Sigma o the whole square with Sigma r representing repeatability or equipment variation and Sigma o representing reproducibility or appraiser variation. Percent GRR equals hundred multiplied by sigma m divided by tolerance specified. Percent part variation equals hundred multiplied by part variation divided by tolerance specified. We can see that ndc equals 1.41 multiplied by PV divided by GRR. If percent GRR is less than thirty percent, Gage is acceptable, else not. If ndc is less than five the gage needs to be replaced. Let us look into the example of Measurement Systems Analysis.

Here is an example of Measurement System Analysis. A measurement system study is conducted on ten parts. Three operators measure the ten parts on three trials. The specification levels provided are nine and eleven. The operators measure all the ten parts on three trials. Conduct a variables GAGE RR study and interpret the results. As solution, first use the GAGE RR Worksheet provided in the toolkit. Then, find the operator readings in the worksheet under columns highlighted part one, part two, and so on. Next, check the percent GRR value. It is five point five seven percent and therefore GAGE is ok. Check the ndc. It is three and thus GAGE discrimination needs improvement.

This slide shows a snapshot of the MSA GRR.

See the worksheet GRR Graphical provided in the GRR Variable Tool. A snapshot of the graph is shown on the slide. We can see that in the averages chart the values for each appraisers are moving close to each other. Similarly, it is observed that on the range chart, initially there is a lot of variation between the appraiser's data and as things progress the values converge. Let us understand Run chart in the next slide.

Continuing the discussion on the charts mentioned in the previous slide, we will see a run chart of all observations in this slide. On the X-Axis, it shows the part number, and the Y-axis shows the measurement value. The run chart plots the value for each part. We will discuss Whiskers chart in the next slide.

A whiskers chart is a convenient way of graphically depicting groups of numerical data through their quartiles. Whisker plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles. Whisker charts are creating by dividing the ascending order data set into four quartiles and finding out the median for each of the quartiles. Then a box is drawn that includes second and third quartlies. The first quartile is plotted from the box and similarly fourth quartile is plotted. This way, we get a box and a whisker, and hence the name. In the chart one can see that there are 2 whisker plots from two Appariser's data points.

Here is an assignment that will help you to understand gage repeatibility and reproducability (also called as GRR) better. Conduct a measurement systems analysis for eight test parts. Two operators measure these eight test parts and conduct three trials for each part. Use the GAGE RR Variable Data Graphical Analysis and the GAGE RR Blank worksheet to write the readings. For training purposes one can use dummy or assumed readings. The specification limits provided are eight point five and twelve point five. Interpret the readings and discuss with the facilitator. In the coming slide, we will understand Attribute RR.

Attribute data is about labeling a measurement based on a pre-defined operational definition. Measuring attribute data is fraught with plenty of variations as these types of data rely on human judgment. Some examples of attribute data are good or bad, pass or fail, etcetera. Measuring attribute data may result in some variation because tester’s interpretation for operational definition might differ on what is good and what is bad. A lot of industries rate their products and internal measurements, hence it is important to work on a technique that eliminates these measurement variations. The technique we use is known as attribute RR also known as attribute GAGE RR. We will continue discussing Attribute RR in the following slides as well.

Let us understand how to conduct an attribute RR study. In the first step, select a set of twenty samples. This should be a mix of okay and defective samples. Ideally, a thirty to seventy mix of good and defective samples is considered acceptable. As the second step, let a process expert do the rating for each of the twenty samples. This process expert is called a master appraiser. Snapshot of the findings can be found in the next slide. Use the Attribute RR Excel worksheet provided in the toolkit to do the Attribute RR study. In the next slide, we will look into the snapshot.

The snapshot in this slide shows the findings of the master appraiser. The process expert has evaluated the samples in terms of “Okay” and “Defective”.

In this slide, we will continue discussing the steps involved in conducting an Attribute RR study. Let us understand the third step. Here, select a minimum of two appraisers, name them as operator 1 (Op1) and operator 2 (Op2). Ask them to rate the product, without letting them know how the Master Appraiser has rated the products. The table provided here provides a snapshot of the data captured by the master appraiser (kept as the standard) and two set of trials by operators 1 and 2. Using some of the standard functions in Excel, one can compute and fill the columns to list the variations within the operator, across operator, and in comparison with the standard. In the summary, the numbers of times the observations are matched and verified are computed and accordingly the percent of the time it agrees with the standard is computed. We will look into the next step, in the next slide.

In step four, use the ‘IF’ function in Excel and check if the measurements or ratings of the individual operator match across both his trials, i.e., if one trial is scored okay, the other is scored okay or not. The individual repeatability score of Op1 is 85%. This may not be acceptable to the company. Therefore, operator 1 needs retraining. The fifth step is mentioned in the next slide.

In this step, check if the individual measurements of the operators tally with the standards set by the master appraiser. From the Excel tool, attribute RR snapshot we can see that the individual effectiveness of the operator 1 is eighty percent. This means only eighty percent of the time he could match up to the readings of the Master Appraiser and therefore, he needs retraining. The next slide shows a snapshot of the MSA attribute RR done until now.

In the snapshot, we can see the data captured by the master inspector (kept as standard) and the two set of trials by operators 1 and 2. There are 20 set of observations done. Using some of the standard functions in Excel one can compute and fill the columns to list variations within the operator (Provided in the column within Op1), and compare it with the standard in the column "Op1 with standard". If the values match it shows 1, else it shows zero. In the summary, the numbers of times the observations are matched and verified are computed and accordingly, per cent of the time it agrees with the standard is computed.

Now we will discuss the sixth step in conducting an Attribute RR study. Step 6 is to calculate the ‘Between Operators Match’, by finding the readings that all the operators agree upon. Snapshot from the toolkit is shown on the slide. We will introduce the seventh and final step in the next slide.

In the last step, calculate the overall effectiveness of the measurement system. In the toolkit, we just have to enter the observations in the respective columns; the computations are automatically done. The entire attribute RR working is shown in the toolkit. We will proceed to the next slide to look into the interpretations of Attribute RR.

Here are some interpretations of attribute RR study. Operator 1 effectiveness is 85% and when compared to the standard, it is 80%. This operator needs training with respect to the standard procedures, as 15% of the work performed by this operator is defective and will incur loss to the organization. Once the operator improves his or her effectiveness, the effectiveness compared to the standard can be taken beyond 90%. ‘Operator two effectiveness’ is ninety percent and when compared to the standard effectiveness, it is eighty five percent. This operator needs training but not to the extent of operator one. The measurement accuracy of both the operators with respect to the standards set is only seventy percent of the times. Therefore, the overall measurement system is ineffective and needs improvement. Use the attribute RR Tool provided in the toolkit to conduct an attribute RR study. In the coming slide, we will understand when to do measurement system analysis.

Measurement system analysis is the first activity done in the Measure stage, post conducting the first baseline study. Although the baseline study may still have some measurement system variations, the first baseline is an unbiased estimate of what happened in the Define phase. So, conduct a baseline, then, perform measurement system analysis. Finally, repeat the baseline. We will understand data collection plan in the following slide.

One of the integral parts of Measure phase is the data collection plan. We need to know what to measure, how to measure, and who will measure at what frequency. All of these are documented into a data collection plan. The primary goal is to develop a written strategy for collecting the data for the project. It helps in defining a clear strategy for collecting a reliable data, efficiently at the right point. It focuses on the project’s output Y, the operational definition and performance standards for Y in the project. The data collection plan uses the required performance range for the project Y to help us in selecting a measurement tool. It also helps in ensuring that the resources are used effectively to collect only data that is critical to the success of the project. The plan considers potential input variables or Xs for the selected project output variables or Ys. If we get the data collection plan in place in the right way, the team will have the same understanding of the measurements for the project. Now we will understand further data collection plan template & example.

We can see the data collection plan template with an example. The project is on cycle time for hiring process. In the template, it can be seen that for every metric to be measured, there needs to be a detailed operational definition. It helps us to document the source of data along with its location. The sample size helps in determining the scope of the project and the number of samples to be collected during the particular set of dates or time period. And the last column specifies the information to be collected at the same time. Let us now summarize this lesson in the next slide.

In this lesson we have learned, the purpose of measurement system analysis and measurement system errors including accuracy and precision. Also, we have discussed linearity, resolution, bias, and stability, along with repeatability and reproducibility. We also looked into how to conduct a Variables GAGE RR and Attribute RR and how to interpret them. Finally, we understood the data collection plan on who will do the measurement, where and how. Now we will move on to the next lesson, Stability Conditions.

So far we have understood the data, measurement system, and the tendency of the data. In this lesson we will discuss various stability conditions and how to measure the stability of a process.

We will begin this lesson by discussing the agenda. The first topic that we are going to discuss in this lesson is the controlled process and variation, then the introduction to special and common causes of variation. Next, we will discuss other topics, such as stability introduction and SPC, stability check with Minitab, Stability conditions, and finally, the central limit theorem. So let us move on to the next slide, which will introduce us to the controlled process and variation.

According to Walter Shewhart, “A phenomenon is said to be controlled when through the past experiences we can predict within limits how the phenomenon can vary in the future.” It is here that Walter Shewhart calls a phenomenon ‘controlled’. A controlled phenomenon doesn’t mean absence of variation. It just means predictable variation. A controlled process would normally not deliver an out-of-specifications or a non-conforming product. In the next slide, we will understand the special causes of variation.

Special causes are unusual, not previously observed, non-quantifiable, or unanticipated variation. These can bring an emergent change in the system or previously neglected inherent phenomena within the system. The variation caused due to Special Causes is usually higher in magnitude and can be attributed to a specific reason. They are usually unpredictable and outside the historical values and range. Presence of special causes of variations in a process leads to instability and uncontrolled process. Now, let us look into an example to understand special causes of variation. You normally take 35 to 40 minutes to reach office. However, it took 65 minutes to reach office and you are late to work today because of an accident on the road. This is a special cause of variation as it something that does not happen every day; unpredictable and outside historical value. Let us discuss common causes of variation, in the coming slide.

Any unknown random cause of variation in the process is known as a Chance Cause or Random Cause of Variation. As these unknown random causes of variation keep happening regularly, they are referred to as common causes of variation. These cannot be assigned to any cause. Common-cause variation is the noise within the system. They are the usual, historical, quantifiable variation happening in a system. If the effect of a common cause of variation is small and the presence of common cause effects in number is large, the variation becomes predictable and within limits. If a process has only common causes of variation and no other variation, one can call the process as controlled. Let us look into an example to clarify the common cause of variation. You normally take 35 to 40 minutes to reach office. The range 35 to 40 minutes is considered as common cause of variation. It takes more or less time due to traffic signals, variations in the number of cars on the road, etc. Here, the difference is small and within the expected limits of variation.

In short, all special causes of variation need to be identified and eliminated, and common causes of variation should be minimized if possible. Also, remember that one might not be able to eliminate all the common causes of variations. Does that mean we should leave common causes of variation unattended? No. Common causes of variation would be attended to, but as per Walter Shewhart these variations are attended to in the process of long term improvement. One could also have a situation when the effects of common causes are marginal and the state of control is maintained. In such a scenario do not make modifications. Most of the times distance between control limits indicate common causes of variation. And, the actual data are due to common causes. Sometimes special causes happen due to trends, oscillations, and shifts. All of these special cause variations are detected with the help of Control Charts.

In this slide, we will understand stability introduction and statistical process control. A process is said to be stable when all the causes of variations are known. Even though some of the variations may not be desirable, as long as we know the cause, it is considered stable. Once all the special causes of variation are eliminated, the process is then governed by common causes of variation to bring the process in statistical control. This results in the process output to be fairly predictable and consistent. A stable process will have no special causes of variation. Even if special causes do manage to exist, they will be desirable special causes of variation. As per SPC or statistical process control, every measurable phenomenon is a statistical distribution, i.e., every set of observed data is a set of effects of unknown common causes. Even after special causes of variation are eliminated, some variability will exist, which is explained because of common causes of variation. Six Sigma Black Belts should seek to reduce common cause of variation. They may not be able to achieve a complete elimination. This is important to understand that ZERO variation is not the goal, but understanding and reducing the variation is the goal. After reducing variation up to a certain level, the return on investment is negligible or negative. In a stable process the location of the measurement distribution curve, spread, and shape will not change significantly over a period of time. Let us further understand Stability Check with Minitab.

We can find the data for checking stability in the slide. A process measures delivery hours as the project Y. Day-wise recordings of the measurements are shown on the slide. Conduct a stability check and interpret. The table shows the details of delivery hours for 18 days. The chart plotted using Minitab is in the next slide.

For this example, the I-MR chart in the Minitab is used. Open Minitab, click on Stat, then select Control Charts, then go to Variables Charts for Individuals and finally to I-MR Chart. The output from the Minitab is shown on the screen. The top chart is the Individual values chart that plots delivery hours data for each day. Minitab computes the USL and LSL automatically, and generates a control chart with all the values plotted on it with the control limits. As one can see the data point for the 10th day is out of control at the value 33 hours, since the USL computed is 32.795. The next chart is the moving range chart. Here, the control chart is created with the values 2.133 for USL and 0 for LSL. All the moving range data are computed and plotted on the chart. Here, none of the points are out of the control limits. Please note that the reasoning for the use of I-MR chart will be revealed in Section VI. We will continue discussing Stability Check with Minitab in the next slide as well.

As we can see from the graph, day ten shows a point out of control. This clearly indicates the process being unstable with respect to its control limits. The observation of thirty three could clearly be due to a special cause of variation, which would trigger off a series of actions to eliminate the SCV or Special Cause of Variation. It is important to understand that on identification of a special cause of variation, an investigation needs to be triggered on what caused the variation and how to eliminate the SCV. The SCV needs to be eliminated and the process needs to be brought back in control. Please Note: - Details on how to eliminate Special Causes of Variation will be discussed in Section V. We will understand more about Stability Check with Minitab in the next slide as well.

Stability Check with Minitab (Contd.) After the identification of the special cause, the data point for the 10th day which had delivery hours as 33 is eliminated, and the chart is redrawn in Minitab. The chart shown here is the updated chart after updating the data set. As one can observe, the control limits are updated for both Individual values chart and moving range accounts. As we can see in the chart, the process is stable and we can now proceed with our Measure Phase activities. In the next slide, we will understand Stability Check using Run Charts.

'One can also check stability using Run Charts. In Minitab, Click on Stat menu, select "Quality Tools" and then go to "Run Chart". Check the four p values. If any of the p value is less than zero point zero five, the process is unstable due to special causes of variation, otherwise the process can be considered as stable. Now we will move on to understand the Stability Conditions.

The process having output variables (called as Y) as delivery hours is seen to be stable or in control. Let us consider the customer having given the target of twenty nine hours with Lower Specification Limit of twenty six and Upper Specification Limit of thirty two. As we can see from the graph, the mean is working at thirty one with a moving range of approximately two hours from the mean. From the available data, the process appears to be stable, but the confidence of this system would be low because of high variation in the stability conditions. We can interpret that the process could be stable today but the degree of confidence on stability for future is subjectively low. We will now move on to the next slide to understand Central Limit Theorem.

Here we will figure out what exactly Central Limit Theorem is. Irrespective of the shape of the distribution of the population or universe, the distribution of average values of samples drawn from that universe approaches a normal distribution as the sample size increases. The average of sample averages or mean of sample means equals the mean of the universe or population. The standard deviation of the averages denoted by SEM or Standard Error of Mean is represented by standard deviation of population divided by square root of sample size. It is important to note that the Central Limit Theorem is the most powerful basis for Walter Shewhart’s SPC Charts or Control Charts. Let us move on to the next slide to conclude what we have learned so far.

We started with understanding the controlled process and variation, moving on to understand special and common causes of variation. Then, we looked into stability introduction and the working of Statistical Process Control. We explored Minitab and learned how to do the Stability Check there and then we covered the details on determining stability conditions. Finally, we covered the Central Limit Theorem. Now we will proceed to the next lesson of this section of lean six sigma Black Belt.

In this lesson we will learn the types of Capability Metrics and understand how to compute them. We will cover process capability for continuous and discrete data. Let us look into the agenda in the next slide.

We will start this lesson with process capability pre-considerations, followed by process capability indices for continuous data. We will then look into process capability indices for discrete data and non-normal capability analysis.

Calculating process capability either by calculating C-p or C-p-k or measuring it in terms of D-P-M-O (Defects Per Million Opportunities) or P-P-M (Parts per Million)is not correct without determining whether the process is in statistical control or not. This is a mistake made by most of the practitioners. They assume conditions as controlled, and calculate capability, which is a mistake. It is important that the process data follows a normal distribution. Without checking for normality, it is futile to do a Cp or Cpk calculation, as Minitab allows a non-normal and between normal capability analysis. Let us take an example to understand this. An improvement team uses Lean techniques to reduce the time to process an order. A random sample of twenty five orders before the change had an average time of three and a half hours to process. A random sample of twenty five orders after the change has an average time of two hours to process. The team asserts that they have decreased the order processing time by more than forty percent. Is that credible? Or would it be more credible if the confidence interval for the order processing time was calculated for the original twenty five orders; and the improvement asserted only if the new average of two hours fell outside the original interval? We will find the answer for this in the next slide.

The answer to the question discussed in the previous slide is stability. Every distribution has three properties that need to be determined, and those are location, spread, and shape. Location and spread are determined by stability, while shape is determined by doing a normality check (which will be explained in detail in the future slides.) Mean shows the location of the distribution and standard deviation is used to show the dispersion or spread. Some of the questions that need to be asked before calculating the process capability are: Is the process central tendency stable over time or is it shifting? Is the process dispersion stable? Is the process distribution consistent over a period of time? Let us move on to the next slide where we will understand how to answer these queries.

If the answer is ‘No’ to any of the questions asked in the previous slide, we cannot and should not calculate the capability of the process. Once the answer is ‘Yes’ to all the three questions, we can proceed to ask questions on the capability of the process. Here are some questions to be looked upon. Is the process meeting requirements? Is the process capable of meeting requirements and expectations? Can re-centering the process help improve its performance? How can we reduce variations in the process? We can ask all of these questions by calculating Cp, Cpk, Pp, and Ppk indices. But before calculating the indices, check if the data is normal. Let us continue discussing process capability pre-considerations in the next slide.

Here, the Ryan Joiner test is done with the same data for stability post stability check in Minitab. The normality test results given in this slide indicate that the data is normal and so capability calculations can be done. Here is the graph that shows the probability plot of delivery hours using the Ryan Joiner test method. All the data points are plotted and a best fit line is drawn. The statistics for these data are computed and are shown on the top right of the chart. From these statistics, one can make out that the value of p is greater than 0.10. One can decide the sample is NOT normal only if p is less than 0.05 which is not the case in this data. So, we can conclude that the data is normal. Let us move on to the next slide to understand another process.

If normality failed with the same set of data, we can follow a different process. The process is as follows. Open Minitab and then click on Stat, there, click on the quality tools. There will be a lot of tools listed. Select the individual distribution identification tool and finally, select the variable and click ok to get the graph. This tool will show the probability plots for all the possible data distribution models. The Anderson–Darling test is a statistical test to find whether a given sample of data is drawn from a given probability distribution or not. The Anderson-Darling test for the set of data returns a p -value of zero point zero five zero. The Black Belt is not confident if the normal distribution is the best fit, so decides to go for a recheck.

We can see the snapshot of the process mentioned in the previous slide.

Probability plots for 95% confidence intervals are drawn. Each of the individual charts is showing the goodness of fit test. The charts drawn are normal, box-cox, lognormal, and 3-parameter lognormal, for 95% confidence interval. Similarly, the second chart shows the plot for logistic, log-logistic, 3-parameter loglogistic, and normal (Johnson Transformation), for 95% confidence interval. From the plots, Box-Cox transformation and Johnson Transformation can be considered the best fit models (based on the p value). In the next slide, two more images are given that come under the process capability pre-considerations.

In this slide, we will see the results from Minitab and understand the type of distribution that we should consider using. Probability plots for deliver hours using multiple methods are plotted here. They are smallest extreme value, largest extreme value, Gamma, and 3-parameter Gamma. And on the second chart we have exponential, 2-parameter exponential, Weibull, and 3-parameter Weibull plots. All the above mentioned plots are drawn with 95% confidence interval. Each of the individual charts shows the goodness of fit test.

Minitab will also provide details of all the distributions in a tabular format for comparison and decision making. Let us look into the details of each distribution with their Anderson-Darling test values and p-value being listed. Reviewing the p-values for all the distributions, the noteworthy ones are mentioned below in a tabular format. Largest extreme value seems to be the best fit distribution for this data and thus should be considered for being the data distribution model. Some black belts may also go with 3- parameter Weibull for simplicity considerations.

Let us learn how to calculate the process capability for continuous data. Cp is the process capability that shows how good the process is in delivering what the customer wants with respect to the specifications. Cpk refers to the process capability index that shows how good the process is in delivering what the customer wants with respect to mean. Pp is the process potential, which shows how good the process can perform in the long run with respect to the specifications. Ppk is the process performance index, which shows how good the process is in delivering what the customer wants in terms of performance to mean in the long run. To understand further, let us move on to the next slide.

Cp is represented by a formula that is USL minus LSL divided by six multiplied by standard deviation (denoted by sigma), with USL minus LSL representing the process tolerance. USL is the upper specification limit and LSL is the lower specification limit. Note that Cp doesn’t work for a single specification limit. The customer should give both the specification limits for the value of Cp to be calculated. To calculate Cpk we should find the nearest specification limit and use either Xbar minus LSL divided by three multiplied by standard deviation or USL minus Xbar divided by three multiplied by standard deviation. If Cpk is less than Cp, mean is not centered. And if Cpk equals Cp, mean is centered and process is accurate but one need to know how much is the variation. In the next slide we will understand how to calculate Process Capability for continuous data with Minitab.

Let us calculate the process capability for continuous data with Minitab. Project Y is delivery hours. Open Minitab, Go to menu, click on Stat and then click on Quality Tools. Now click on Capability Analysis and in that, click on Normal. A graph will be plotted for the data set along with the key metrics, control limits, and statistics. The graphs show the distribution in histogram form for delivery hours with a normal curve on top of it. The statistics shown here includes potential capability and overall capability of the process.

Let us interpret the capability indices The LSL was defined at twenty nine and USL at thirty two point five. The Cpk value of zero point eight zero is less than Cp value of zero point nine zero, indicating mean is not centered yet. The Cp value indicates that the variations are high resulting in capability being low. The same data against control charts was checked and it was found to be stable. It is due to the high common cause of variation that we are finding the Cp value being so low. Centering the mean in this case wouldn’t do much of good. This is a clear case of reducing variation.

In this slide, let us learn how to calculate process capability for discrete data. Cp, Cpk, Pp, and Ppk cannot be used for discrete data. Discrete data is countable data and in a sense we either count defects or defectives. DPMO or PPM calculations are to be used in finding process capability when the project Y is discrete. Let us take an example to understand this better. On a measure of twenty days, the numbers of defective items reported in a company are as mentioned in the slide. It has been checked that the defective items are stable in the way they come out. From the data we can see that the defects are in the range of 19 to 25 per day. Calculate the process capability using the data provided.

Open Minitab, Go to menu, click on Stat and then click on Quality Tools and click on Capability Analysis. From the list of options given, click on Binomial. This will take all the data from the data set and plot the binomial distribution in Minitab.

This graph chart will help us understand the process capability analysis of the binomial data (discrete). In the next slide we will understand Non-Normal Capability Analysis.

We will now learn how to calculate capability for a non-normal data set. We may get data that can be normal or non-normal. The data needs to be analyzed fully to confirm if the process is stable and the data is normal. Normality along with stability is one of the important pre-requisites for anyone to do a capability analysis on the data. The ‘delivery hours’ data collected as part of project Y is first determined if it is in statistical control. Then, it should be checked for normality, and finally for capability. This flow of events should never be violated. A process in statistical control will produce a normal distribution most of the times. Therefore, it is important to determine the stability first. In the next slide, we will figure out how stability results are searched.

We first check stability using control charts. Look at the chart, it plots the sample mean and range. As one can see the data is fairly random. Neither is there a specific pattern in the control chart nor is there any outlier. Based on these criteria, one can perform the stability checks concluding the result as ? ‘process is stable’. To analyze normality check results, let us move on to the next slide.

Now check normality. In this chart, all the data is plotted and a best fit line is drawn. The statistics are listed on the right side. From the statistics, one can conclude that the data is non- normal with p value equal to zero point zero five zero. In order to understand this analysis, let us look into a histogram in the next slide.

So far we have done normality check and stability check graphs. The problem with these is that they reveal little on how the data fits a particular model. To get a better understanding of the data distribution, we use a histogram to study the fit. The histogram given in this slide with a normal fit shows that the normal distribution is not a good fit for the data. Let us proceed on to the next slide to analyze one more chart related to non-normal capability analysis.

Open Minitab and then click on Stat, after this click on the Quality Tools. There will be lot of tools listed. Select the individual distribution identification tool and click ok to get the graph. We have Exponential, 2-Parameter Exponential, Weibull, and 3-Parameter Weibull plots. All these are drawn with 95% confidence interval. Each of the individual charts shows the goodness of fit test with the key statistics given for each of them. For each of the distribution, the Anderson-Darling test values and p-value are listed. From all the distribution, look at the p-value and select the p-value that is higher than 0.05. We can conclude that this data belongs to the 3-Parameter Weibull function, as the p value is higher than 0.05, it is 0.197. The Weibull distribution is one of the most widely used lifetime distributions in reliability engineering. It is a versatile distribution that can take on the characteristics of other types of distributions based on the value of the shape parameter. When you use the 3-parameter Weibull distribution, Weibull calculates the value by utilizing an optimized Nelder-Mead algorithm and adjusts the points such that they fall on a straight line, and then plots both the adjusted and the original unadjusted points. This is widely accepted method for analyzing distribution. Let us discuss another graph in the next slide.

Finally, using the non-normal capability analysis function in Minitab and the graph given in the slide, we can summarize capability as shown in the model. Here Weibull distribution model is used. The data is plotted in histogram and Weibull distribution curve is plotted by Minitab. The descriptive statistics are computed and are shown on the right side of the graph. The left side statistics shows the values like Upper Control Limit, Lower Control Limit, Sample Mean, etc. The right side statistics show process capability metrics and overall performance. Here, the overall Ppk calculated equals zero point seven seven and PPM is eight thousand nine hundred six. These are the key metrics to demonstrate process capability. Other metrics are already shown on the graph.

Now we will take up a small assignment. In this assignment, you can go through the steps that we have already covered. The expectation for you is to understand the data and provide details on process capability. The detailed steps are follows. First, plot your Project Y data. Here, hypothetical values are okay. Assume an LSL and USL for continuous data and a target for discrete data. In the second step, find if data is stable. In the third step, find if data is normal. At the end, calculate capability indices approximately using Minitab and from the descriptive statistics interpret it. This will help in understanding what distribution model that it fits into and also, computing process capability metrics. We will discuss the summary in the next slide.

In this lesson, we have learned the following details on capabilities. In the beginning, we covered the pre considerations around process capability. After that we went into details for looking at the process capability for continuous data. Then, we analyzed process capability for discrete data and finally, reviewed non-normal capability analysis. Please note that the detailed properties of Continuous and Discrete Distributions would be tackled in one of the later lessons. Let us move further to the next lesson namely, Variations, Variability, Capability, and Process Conditions.

In this lesson we will understand the different types of variations like common cause and special cause. We will also understand common cause and variability within the process, capability, and process conditions.

Let us look into the agenda of this lesson. We will discuss variations and variability followed by capability, and process conditions. Slide 122: Variations, and Variability Let us look into variations and variability in this slide. Variations are caused due to a variety of reasons, and these reasons are classified into two, namely, special cause and common cause. Special cause is something that can occur anytime and there is no pattern. Special Causes of Variation if any need to be identified and eliminated before we could move on with detailed understanding of variability, capability and other process conditions. Any type of variation can cause the process to be unstable and reduces its capability. Special causes of variation, if any, need to be identified and eliminated before we could comment on variability, capability, and other process conditions. Special causes of variation result in the process to go unstable, and thus need to be arrested, if it is found undesirable for the process. Common cause of variation is something that is inherent in the process and may follow some pattern. Common causes of variation need to be controlled. Excessive effects of CCV (common causes of variation) could result in the process spread being high, resulting in overall capability being low. To know more about this topic, let us look into the next slide.

The two charts shown in this slide represent difference in variability. The left chart shows the Individual and Moving Range plot for Delivery Hours 2 and the right chart shows the same plot for Delivery Hours. The USL and LSL range for Deliver hours 2 is much higher than the other chart that plots Delivery Hours. This shows that the variations in Delivery hours 2 are much higher than the Delivery Hours data.

Comparing the two charts on different situations with project y of deliverable hours, absence of SCV is clearly seen. The common cause of variation is lesser in delivery hours as compared to delivery hours two. The control limits of delivery hours are thirty two point five three and twenty nine point three four with the moving range being in control. As one can see, the data plotted in the earlier slider are observations and are not on time-series. However, if this was a time-series generated data, we could have easily attributed reduction in variation from delivery hours 2 to delivery hours, resulting in an improved capability. Let us discuss capability and process conditions in the next slide.

Assuming special causes of variation are eliminated, let us look into two conditions. If variations are high, variability would be high, capability would be low, and process would be incapable, i.e., Cp or Cpk would be less than 1.If variations are low, variability would be low, capability would be high and process would be capable, i.e., Cp or Cpk would be more than 1. If process is found to be capable, further analysis is done to check whether further reduction in variation is needed or not, and a decision is taken accordingly. If that is the conclusion of the analysis, only mean re-centering needs to be done by making fundamental changes to the process, and classical DMAIC (Define, Measure, Analyze, Improve, Control) need not be pursued. We will look into the summary in the next slide.

In this lesson, we understood variations and variability along with capability and process condition.

Let us in this lesson learn about Data Distributions. We will cover various distribution models, when and how to use it and also, how to interpret it.

In this lesson, we are going to discuss the topics that are mentioned as follows. We will start this lesson with permutations and combinations. Then, we will move on to frequency distributions and binomial distribution. Poisson distribution will be discussed next and later, normal and exponential distributions will be looked into as the last two topics of the lesson. Let us understand permutations and combinations, in the next slide.

The ordered arrangement of elements is known as permutation. Let us take an example to understand this better. You are provided with five boxes, and five objects to be placed in each box. The objects are to be placed in such a way that every box contains only one object. How many total arrangements can be worked on such a setting? Now let us look into the solution. Object one can choose from any of the five boxes and can be placed. But, object two can only be placed in any of the four boxes, as the fifth one is occupied. Object three in three and so on. Thus, the total number of arrangements is five multiplied by four multiplied by three multiplied by two multiplied by one and that equals one hundred and twenty two. In general, for n positions to be filled with n objects the equation is n multiplied by n minus one multiplied by n minus two and so on until n minus (n minus one). This is also known as n factorial. In the next slide, we will understand combinations.

Combination talks of arranging n objects by taking groups of r objects at a time. Let us take an example to understand combination. If three items are provided, such as X, Y, and Z in groups of two items at a time, what could be the effective combinations of all three items? The formula for nCr given here is, nCr equals n factorial divided by r factorial multiplied by n minus r factorial. nCr is also referred to as n choose r In this case, three C two is three factorial divided by two factorial multiplied by one factorial, which equals to three. In other words, we can have three combinations here, and they are XY, YZ, and ZX. We will discuss frequency and cumulative distributions, in the next slide.

Frequency distribution is an empirical set of observations presented. Commonly used frequency distribution tools are histogram and stem and leaf plots. Frequency distributions show frequency of specific values or a group of values. Cumulative distributions show the cumulative value of frequency for that particular item or a group of items. If we can assume a state of SPC, we can answer the questions, such as; Probability that x will occur; Probability that is a value less than x will occur; Probability that a value greater than x will occur; and Probability that a value between x and y will occur. Let us now move on to the next slide to discuss Binomial Distribution.

Binomial distribution is extremely useful in studying the probability of having x defectives in a sample of n units. In simple words, this distribution is used to study the probability of the occurrence of a defective item in a lot. The probability of having x defectives or (P of x) in a sample equals nCx multiplied by p raised to x multiplied by one minus p raised to n minus x. By standard, this is known as the binomial probability distribution. For deeper understanding, we will work with Binomial Distributions in three modes namely, manual calculations, Microsoft Excel, and Minitab. The next slide will help us understand Binomial Distribution with relevant example.

Let us discuss an example that will help us understand Binomial Distribution. A process produces bottles on a continuous basis. Past process performance indicates two percent chance of the bottles having one or more flaws. If we draw a sample of twenty units from the lot, what is the probability of getting 0 defectives? Now, let us look into its solution. According to the information presented here, n equals twenty, x equals zero, and p equals zero point zero two. The standard binomial distribution equation is, n-C-x multiplied by p raised to x multiplied by one minus p raised to n minus x. Here, nCx equals twenty C zero equals twenty factorial divided by zero factorial multiplied by twenty factorial. P raised to x is one, and one minus p raised to n minus x is zero point six six. By multiplying all of them, we get our result as sixty six percent. This means there is a sixty six percent probability of accepting a process. This process on an average produces two percent defectives to produce zero defectives in a sample of twenty items.

This slide shows the function BINOMDIST (pronounce it as Binomdeest) in Microsoft Excel. Use the formula BINOMDIST. The formula template is BINOMDIST (numbers, trials, probability, cumulative). The result returned by the BINOMDIST function is zero point six six seven. This means, one can have a sixty-six point seven percent probability of accepting such a process. Let us now move on to the next slide to understand Binomial Distribution in Minitab.

In this slide, we will see how to do Binomial Distribution problems in Minitab. Two snapshots are shown in this slide that represents Binomial Distribution in Minitab. Open Minitab and then click on Calc, after this, click on Probability Distributions. Then click on Binomial Distribution. In Binomial Distribution, enter number of trials as 20 and probability as 0.02. Results from Minitab are given in the slide. The result shows that one can have a 66.76% probability of accepting such a process.

Let us understand Poisson distribution in this slide. Poisson distribution is used when the subject of study is non-conformities and not just non-conforming units. Non-conforming units are referred to as defectives, which are studied using Binomial Distribution. Non-conformities in a unit are defects, which are studied using Poisson distribution. For example, we check the quality of fifty computers and find minor non-conformances in each computer. Applying the non-conforming units logic, we could find hundred percent non-conforming in this case, which in reality might not be the case. In such a scenario, we should use the Poisson distribution with the probability distribution function. The function P of x equals myu raised to x multiplied by e raised to minus myu divided by x factorial. Here, Myu is the average number of non-conformances per unit; and x is the number of non-conformances in a sample. With P of x, we can find the exact number of non-conformances in the sample. In the next few slides, we will continue this discussion on Poisson distribution.

Let us consider an example. A company producing marker pens conducts external audits once production of a lot is completed. Several audits on fifty pens revealed three minor non conformances. What is the probability that the next audit will reveal zero non conformances? Let us look into the manual calculations. From the information, Myu equals three x is zero. P of zero non-conformances equals three raised to zero multiplied by e raised to minus 3 divided by zero factorial, which gives zero point zero four nine. The probability of the pens having zero non-conformances is approximately five percent. That means probability of pens having at least one non-conformance is ninety five percent.

Use the POISSON function (given in the slide) in Microsoft Excel to calculate the desired Poisson probability for Excel version 2007 and before. If you are using Excel 2010 and beyond you can use the POISSON.DIST function (given in the slide). The template for using the function is as follows. POISSON.DIST(x, mean, cumulative) Where x is the number of events; mean is the expected numeric value; and cumulative is a logical value that determines the form of the probability distribution returned. Let us understand how to do Poisson distribution in Minitab. Click on Calc -then on Probability Distributions. Then click onPoisson to do Poisson calculations. Minitab results are shown on the slide. Let us now move on to the next slide to discuss normal distribution.

Normal distribution also known as bell curve, is a distribution most processes desire to have. This is not a mandatory requirement or compulsion for processes. By now, we know how to calculate µ (mean) and ? (standard deviation) value, using Excel as well as Minitab. With these two statistics a Normal Distribution can be expressed. z-score is an important metric, which can be used with the normal distribution. z equals x minus Myu divided by Sigma, where "x" is a random variable from a normal distribution with mean, myu and standard deviation, sigma. In the next slide, we will continue our discussion on Normal Distribution.

Let us consider an example. Assume that we have checked the breaking strength of a gold wire bonding process used in microcircuit production and we have found that the process average strength is nine micron and the standard deviation is four micron. The process distribution is normal. If the engineering specification is three micron minimum, what percentage of the process will be below the low specification? The solution is as follows: Myu is nine micron and Sigma is four micron. Z score equals three minus nine divided by four micron equals negative one point five. A Z table has been provided with the toolkit. The corresponding probability value per Z table is six point six eight per cent. Let us discuss exponential distribution in the next slide.

Exponential distribution is extremely useful in analyzing reliability. Probability density function for exponential distribution is f of x equals one divided by Myu multiplied by e raised to negative x divided by Myu. A typical exponential distribution function is heavily skewed. In fact, more than sixty three percent of all the data falls below the mean. A representation of exponential distribution can be seen on the slide. We will continue this discussion on exponential distribution in the next slide as well.

Let us solve an exponential distribution problem with Excel. A city water company averages 500 system leaks per year. What is the probability that the weekend crew, which works from 6 p.m. Friday to 6 a.m. Monday, will get no calls? Let us look into the solution. A typical year, having three sixty five days, will have a total of eight thousand seven hundred sixty hours. Average number of leaks is five hundred. Thus, mean time between failures is seventeen point five two hours. Let us use the EXPONDIST (Pronounce as one word) function in Microsoft Excel (up to Excel 2007). For Excel 2010 and beyond, use EXPON.DIST function as given in the slide. Time between the ranges illustrated is sixty hours. The probability of the staff getting calls is ninety six point seven percent. The staff thus has a free time worth three point three percent during the weekends. We will summarize this lesson in the following slide.

In this lesson we have learned permutations and combinations, frequency distributions, binomial distribution, Poisson distribution, normal distribution and finally, exponential distribution. Now, we will move to the next lesson which deals with Sigma shift, mean shift, and reducing variations.

In this lesson, we will understand sigma shift, mean shift, and reducing variations.

We will start this lesson with sigma shift, followed by mean shift or reducing variations. Finally, we will understand baseline data.

Conceptually, the long term Sigma and short term Sigma differ by one point five as proven by Motorola after conducting the empirical studies. This shift is known as one point five Sigma Shift. The LT and ST sigma could differ by a value more or less than one point five Sigma. But over a period of time with data collected across multiple groups, this difference is observed to be one point five Sigma. The long term sigma is less than the short term sigma by one point five. For example, if the short term Sigma level is six, the long term Sigma level is four point five. This shift happens due to unexpected and unforeseen variations, because of which it is difficult to maintain statistical control at all times. In the next slide, we will discuss mean shift or reducing variations.

Mean shift or reducing variations is one of the key decisions a Black Belt may take depending on how his Cp and Cpk values shape up. Of course, all this is done assuming the process to be in statistical control. Check the Cp value first and then the Cpk value. Is Cp greater than one? If it is greater than one and Cpk is also greater than one, it indicates the process is capable of performing to customer expectations. If both Cp and Cpk are greater than one and are equal, for example Cp equals one point three and Cpk equals one point three three, the process is considered accurate and capable. The Black Belt can then decide to move on to the Control stage directly. If both Cp and Cpk are greater than one, and Cp is greater than Cpk, the Black Belt should first find out the difference. If Cp equals one point three and Cpk equals one point two seven, he may still want to move to the Control stage. We will continue this discussion on Mean Shift and Reducing Variations in the next slide.

If Cp equals one point three and Cpk equals one, the mean needs to be centered. Variations are still in control, which means post the centering of the mean, the Black Belt can move to the Control stage. If Cp equals one point three and Cpk equals zero point eight, mean has to be centered. If Cp equals zero point eight and Cpk equals zero point seven, the process is not capable because of variations. The project is a variation reduction project. It is important to note that these considerations should be thought of by the Black Belt only after he ascertains process stability and the normality status.

Baseline data is the initial collection of data, which serves as a basis for comparison with the subsequently acquired data. It is the current snapshot of data using current process and current measurement system or tools. This data can also be used for calculating current sigma level or defect per million opportunities. In the next slide, we will summarize this lesson.

In this lesson, we understood sigma shift, mean shift or reducing variations, and baseline data.

Let us summarize the Measure phase. In this section, we have discussed pre-measure considerations and tools, types of data and measurement scales, central tendency and dispersion, and measurement system analysis along with variables GAGE RR and attribute RR. We have also covered stability conditions; capability metrics (Cp, Cpk, Cpkm, Pp, Ppk); variations, variability, and capability; process conditions, data distribution (normal, binomial, Poisson, exponential). Apart from this, we discussed 1.5 shift and sigma shift mean shift or variations; and baseline data.

In this slide we will discuss the summary of activities in the Measure phase leading to the Baseline of data, which is a pre-consideration in the Analyze phase. It includes understanding types of data, measurement system analysis, sampling, stability check, descriptive statistics, normality check, capability check, understanding the process on distribution, and location, shape, and spread determination. The next slide will provide us with a summary of measure tools.

The tools to use in the Measure phase in the chronological order are mentioned here. These tools are provided to you with the toolkit. They include CE Matrix; GAGE RR (variables for continuous and discrete) and attribute RR for attribute data; control charts; graphical summary in Minitab; normality test with Anderson Darling; capability analysis in Minitab; and reviewing current process conditions. This brings us to the end of this section. Please go through the quiz section in the following slides for a better understanding of the concepts that we have covered in the section. In the next section we will deal with the analyze phase.

- Disclaimer
- PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

A Simplilearn representative will get back to you in one business day.