Hello and welcome to the fifth lesson of the Lean Six Sigma Application in Information Technology course offered by Simplilearn. In this lesson , we will discuss Analyze tools used in IT Let us explore the objectives of this lesson in the next screen.

Identify common tools used in the Analyze phase of a Lean Six Sigma project Describe the benefits and process of regression analysis and regression testing for software development Explain using hypothesis testing to analyze performance Let’s start our discussion on the next screen with a review of the major activities that occur in the analyze phase of a project

The analyze phase is focused on understanding the impact various factors on the process output have and ultimately identifying the critical root causes to correct in the project. There are 5 key activities completed within the analyze phase of a project. Exploratory Analysis: At the beginning of the Analyze phase, there are multiple inputs under study. Using graphical analysis techniques and tools such as FMEA, the inputs are screened down to 8-10 potential critical ones. Once the inputs have been screened down, statistical analysis begins. The key tools in this phase are various hypothesis testings to understand and validate differences in performance of various inputs Process Modelling: The movement of the process output based on the levels of the key inputs are identified and explored using regression analysis After regression testing, the 2-3 critical inputs to the process are identified and validated via tools including ANOVA and prioritization of root causes. Finally, planning occurs as to what will occur in the Improve phase of the project, including planning for a pilot of the solutions or setting up a designed experiment to optimize the process. There are many analysis and modelling tools common to lean six sigma that can be used in an IT organization. This lesson will cover two of the most powerful tools- regression and hypothesis testing. We will begin our discussion of the 1st topic on the next screen

The first topic of this lesson will focus on regression analysis for software development. In the next screen, we will begin our discussion on regression analysis.

Regression Analysis is leveraged to measure the impact of independent variables or x’s on a dependent variable y. The output of a regression study is a predictor equation that calculates the output at various levels of the independent variables In addition to the predictor equation, a regression study also produces an r squared value-this measures the amount of variation explained by the model. If the number is low (less than 30%), it means you have missed a critical independent variable and should re look at the model In the next screen, we will discuss the different types of regression studies used in lean six sigma projects.

Following are the types of regression analysis used in Lean Six Sigma projects: Simple Regression—Models the linear relationship between a single continuous x and single continuous y variable. Multiple Regression Analysis—Models the relationship between multiple x’s and a single continuous y variable. Logistic Regression—Models the relationship between one or more continuous and/or attribute x’s and attribute y. In this course, we will focus on simple linear regression In the next screen we will define the processes for developing and completing a simple linear regression analysis

Listed below are the high level steps to set up and run a regression analysis Collect data - Collect data for both independent and dependent variable Produce scatterplot- Graph the data via scatterplot to evaluate if the relationship is linear. If it is not, do not proceed with the linear regression analysis Run regression study - Use statistical software to run regression study Interpret results - Evaluate the equation and r squared value On the next screen we will review the output of a simple linear regression study

The graph given here is the output of the regression study. It is clear by looking at the fitted line plot, the relationship between cost and number of requests and revenue is linear in nature. Directly under the title is the regression equation that can be used to predict the revenue at any given level of requests. Using the equation is easy. If there are 10 requests in a day, the revenue we would expect would be $134.9 + $22.32x10. This calculates to $358.10 in revenue if 10 requests are received. The second data point of interest is the r square percentage. For this model, the r squared is 80.5% which indicates that the model accounts for 80.5% of variation in revenue, which is driven by requests. The high r squared % indicates you have uncovered a large critical x in the process Let us now move to a different spin on regression – regression testing

The graph given here is the output of the regression study. It is clear by looking at the fitted line plot, the relationship between cost and number of requests and revenue is linear in nature. Directly under the title is the regression equation that can be used to predict the revenue at any given level of requests. Using the equation is easy. If there are 10 requests in a day, the revenue we would expect would be $134.9 + $22.32x10. This calculates to $358.10 in revenue if 10 requests are received. The second data point of interest is the r square percentage. For this model, the r squared is 80.5% which indicates that the model accounts for 80.5% of variation in revenue, which is driven by requests. The high r squared % indicates you have uncovered a large critical x in the process Let us now move to a different spin on regression – regression testing

Regression testing for software development is conducted in a different way than classic statistical regression However while the method for running the regression is different, the concept and purpose are the same. As covered in the previous screen, the purpose of statistical regression is to determine the impact of an independent variable on a dependent variable. In software regression testing, the purpose is to test the impact of changes such as enhancements and defect fixes on software. Specifically, the purpose of regression is to ensure no negative impact on software as a result of the changes. In software regression testing, the enhancements can be thought of as the independent x’s and the software itself can be thought of as the dependent y The next screen will cover the benefits of regression testing

The main benefit of regression testing is it allows an organization to ensure the quality of software is been maintained when changes are made into the environment Regression testing can help developers evaluate the impact of possible changes prior to release- allowing more robust, data driven decision making in regards to what enhancements are ultimately released into the live environment Methods of regression testing will be covered in the next screen Bottom line is strong regression testing allows the IT organization to proactively manage and report on the quality of what is being delivered Next we will discuss different methods of regression testing

Regression testing may be completed at any level of testing, but is generally most relevant during system testing Ideally, full regression testing should be done for all changes. A full regression test is one in which all test cases are executed and compared to the results pre change to evaluate if the change caused unforeseen negative consequences. This is not always possible due to time or resource constraints. In these cases, a variation of the benefit and effect analysis tool discussed in the define lesson can be effective in identifying the parts of the software that have the highest likelihood of being impacted by the changes. The test cases specific to the high priority items are then regression tested. In an environment where changes are frequent, it is desirable to explore automating the regression testing to ensure efficiency Let us proceed to the next topic of this lesson in the next screen.

This topic covers the first case study of this lesson. The case study is on using simple linear regression to understand the impact of testing on cost. The next screen will provide the background for the case study.

One of the medium sized mortgage broker company with 450 employees operating in 3 states serves both residential and commercial mortgage customers. The company has experienced cost overruns in their IT projects as a result of having to fix multiple defects during the warranty period of new technology solutions. These solutions were implemented to improve the efficiency of the business. A Lean Six Sigma Green Belt has been working on the problem with a team , and they have identified a potential link between test plan effectiveness and the cost to fix defects in the warranty period. Test plan effectiveness is defined as the number of defects discovered per 100 hours of test case execution. The team collected data points from 30 projects and decided to run a simple linear regression to identify the impact of test case effectiveness on cost. Let us proceed to the next screen to see the data.

The graphic above shows 10 of the 30 data points used to run the regression study The output from the regression study on next screen

Please use the output to answer the questions in the following screens.

Is the regression analysis statistically valid? Yes, the p value for the study is 0, which means the model is statistically valid.

Does the model data exhibit linearity? Yes, the scatterplot shows the data generally is linear in nature.

According to the regression equation, what cost overrun would you expect to see if the test plan effectiveness count is 45 ? The expected cost overrun would be $14.85

Does the regression model explain a significant portion of the variation in the cost overrun? Yes, the r squared figure of 97.5% indicates almost all the variation in cost overrun is explained by test plan effectiveness. Let us proceed to the next topic of this lesson in the next screen.

This topic will focus on the concepts of hypothesis testing. The next screen will provide an overview of hypothesis testing.

Hypothesis testing is used across all scientific disciplines to test the validity of an idea using observed data In lean six sigma projects the population being observed is usually an input or the output of a process Every hypothesis test has both a null and an alternate hypothesis. The null hypothesis is presumed to be true unless statistically proved through hypothesis testing. Null hypothesis is the status quo or current status and Alternate Hypothesis (H_a) is the claim believed to be true that is being tested We will discuss the rules for proving a hypothesis in the next screen

Each hypothesis test has a significance level used to decide the outcome of the hypothesis test. 5% significance level is standard for non-safety related hypothesis tests. It implies there is a 5% chance you are making a wrong conclusion, also called the alpha level. Every hypothesis test calculates a probability value (p-value). When p-value is less than alpha, reject the null hypothesis and accept the alternate to be true. The next screen provides a few statistical terms important to hypothesis testing

Some of the statistical terms used in hypothesis testing are: Power The chance of finding a difference between the null and alternate hypothesis when one exists. The higher the value the better—1.0 is the maximum value. Power value less than 0.8 should be used with caution. Factors that influence the power of a hypothesis test: Delta (?)—Size of the difference to be tested. The larger the difference, the easier it is to prove. Sigma (?)—Measure of variation. The more variation in the data, the lower the power will be. Sample size—The larger the sample size, the higher the power will be. Actions to be taken prior to conducting a hypothesis test are on the next screen.

Below are a few items to consider prior to hypothesis testing Examine the data over time: Use a control chart to ensure process stability. If special cause variation is present, explore the reasons prior to hypothesis testing Determine the type of data to be used: The type of data being tested will determine the hypothesis test to run. This will be covered in more detail in future screens Test for equal variances: Hypothesis testing can be done on data without equal variances, however make sure to indicate this fact in the statistical software being used to conduct the testing. This is generally an option or check box. In the next screen, we will build a recipe for hypothesis testing.

All hypothesis test follow the same general recipe, which are listed below: State a practical problem.. An example of a practical problem would be: I want to know if my average call handle time is better after the process improvements implemented. Determine the null and alternate hypothesis- State the decision criteria via selecting a significance level Run the appropriate test Look at the p-value , if it is less than the significance level selected, you will reject the null hypothesis Form the practical conclusion- in the example from earlier, if the p value is less than the significance level, the practical conclusion is the call handle time is better. If the p value is greater than the significance level, the practical conclusion is there is no difference after the process improvement

There are many options to choose from when selecting a hypothesis test. One of the keys to successful hypothesis testing is selecting the correct test. The table lists the common hypothesis tests used with continuous data. If you have one sample, hypothesis tests include : 1 sample t, 1 sample z to test the mean. 1 sample sign , 1 sample Wilcoxon to test the median and 1 variance If you have 2 samples, hypothesis tests include : 2 sample t , 2 sample z to or paired t test to test mean. Mann-Whitney to test median, 2 variance and F ratio test. If you have more than 2 samples, hypothesis test include Anova to test the mean and Kruskal-Wallace and Moods Median to test the median The next screen covers hypothesis test for discrete data.

When working with discrete data, there are fewer choices to conduct hypothesis testing. They are: If you have one sample, 1 proportion test and Chi-square goodness of fit 2 samples, 2 proportion test and Chi-squared test for association More than 2 samples, Chi-squared % defective In the next screen, we will look at commonly used hypothesis tests in a help desk environment

Data coming out of a call center environment generally fall into three types- cycle time, defects or defects per unit, and % defective or error rate. Each of three data types have specific tests that can be used which are detailed in the table on the screen. For Cycle time data, the various hypothesis tests are : t tests Variance tests Mann-Whitney Kruskal-Wallace Moods Median ANOVA For Defects/Defects per Unit, hypothesis test include : Poisson rate Chi-squared tests For , % Defective or Error Rate, hypothesis test include : Proportion tests Chi-squared tests Let us proceed to the next topic of this lesson in the following screen.

This topic covers the second case study of this lesson. We will now walk through a case study to demonstrate how hypothesis testing can be used to improve the break fix process. The case background can be found on the next screen.

One of the medium sized mortgage broker company with 450 employees operating in 3 states serves both residential and commercial mortgage customers. The company has a dedicated internal help desk function that has responsibility for answering employee questions and servicing laptops and peripherals when they malfunction. A recent customer survey was conducted to gauge the satisfaction with the help desk services. The lowest rated item on the survey was time to fix a non-functioning laptop. The current time to fix a non-functioning laptop is 21 hours versus a customer expectation of 4 hours. Preliminary analysis has indicated two possible areas of difference to be explored. The team has decided to conduct hypothesis test to confirm the differences noted in the data. In the next screen we will review the first scenario the company wishes to test.

The first hypothesis the team decided to test was to see if there was a difference between the individual employees time to close the broken laptop tickets The team set up the hypothesis test in the following manner: Practical question to test - Is there a difference in the time it takes employees to close broken laptop fix tickets? State the null and alternate hypothesis – Null hypothesis, No difference between employees Alternative hypothesis, At least one employee has different performance in time to close laptop fix tickets Decision criteria - Significant at 0.05 Let us proceed to the next screen for first question.

Considering the data type, which hypothesis test should the team run? ANOVA, as they have continuous (time) data and are looking to compare the mean performance for 4 employees. Let us proceed to the next screen to see the ANOVA results.

The team ran the ANOVA test with the results as shown on the screen. Let us proceed to the next screen to answer a question about the ANOVA results.

What is the statistical conclusion of the analysis? Based on the p-value of 0.573, there is no statistical difference between employees. Therefore, we fail to reject the null hypothesis. The practical outcome is on the next screen

The ANOVA analysis conducted statistically proved there was not a difference between employees in time to close laptop fix tickets, therefore this idea will no longer be considered as a key input requiring solution. The second hypothesis that the team wanted to test begins in the next screen

The second hypothesis test was to see if customer availability impacts resolution time The team set up the hypothesis test in the following manner: Practical question to test - Is there a difference in the time it takes employees to close broken laptop fix tickets when the customer is available at first contact to fix? State the null and alternate hypothesis Null Hypothesis – No difference Alternative hypotheses - Time to close ticket is shorter when the customer is available at first contact to fix Decision criteria - Significant at 0.05 Let us move on to the next screen to view a sample of the data collection

Team collected data from 100 cases, a sample of which is given below: Please advance to the next screen to answer a case study question

Considering the data type, which hypothesis test should the team run? 2 sample t test is the correct answer because the team will be testing the mean close time between 2 factors. Advance to the next screen to continue the case study

See above for the analysis and proceed to the next screen to answer the next question

What is the statistical conclusion of the analysis? Based on the p-value of 0.01, there is a statistical difference when the customer is available at first call. Therefore, we reject the null hypothesis. The practical outcome is in the next screen.

Hypothesis Testing results indicate it takes less time to close the tickets when the customer is available at first call Using the information validated by the hypothesis test, the team developed a new process where the appointment to fix the broken laptop is made at the time of first call, therefore ensuring the customer is available when the technician arrives to fix the laptop

Following is the quiz section to test your understanding of the topics covered in this lesson

Regression analysis is used to identify the impact of variables on an output and provides an equation to predict process performance. Regression testing is a variation of regression analysis in which testing is done to measure the impact of enhancements and fixes on the current software. Hypothesis testing is used to statistically prove assumptions made about samples from a population. Hypothesis testing can be done on any type of data. Let us proceed to the next screen to conclude the lesson.

This concludes lesson five, Analyze In the next lesson, we will discuss Improve tools for IT

- Disclaimer
- PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

A Simplilearn representative will get back to you in one business day.