Regression Analysis
Six Sigma – iSixSigma › Forums › General Forums › Tools & Templates › Regression Analysis
 This topic has 1 reply, 2 voices, and was last updated 7 months, 3 weeks ago by Robert Butler.

AuthorPosts

February 5, 2022 at 5:38 pm #256218
JohnParticipant@Sparky19722791 Include @Sparky19722791 in your post and this person will
be notified via email.Hi All, I am currently measuring the QA score improvement trend by a number of different improvement initiatives taking place.. The population is a team of 80 agents, and i have measured the QA improvement trend by agent with a count of the number of different improvement activities by agent, such as 5 mentoring sessions, 10 x 1to1, 8x workshops, 2x self learning etc etc for each agent. I now want to determine the impact by activity.. I ran a multiple regression, can i use the coefficients to show the impact by activity, the p value for all the dependant variables are not statistically significant, but does this matter as I am looking at the full population.. 80 agents?? The coefficients will show me which activities have greater impact.. Any thoughts or feedback would be appreciated..
0February 6, 2022 at 11:02 am #256219
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.I’m not trying to be rude or offensive but the short answer to your question is – start over.
1. You will need to provide more information concerning the QA score – what is it and how is it quantified?
a. Is this some kind of personal assessment by yourself – i.e. a 110 Likert score based on perception or is it something that is measured?
b. If it is measured – how?2. Your description of improvement activities is not clear.
a. When you say “10 1×1…” etc. What do you mean? – is this a score of 10 points for (I assume) each onetoone encounter or is this a maximum weighting of 10 for each (again I’m assuming) onetoone encounter or is it a simple count of whatever 1×1 is or is it something else? The same holds for the rest of the items in that list.3. Before running any regression you need to plot the data in any way that makes sense and see what you see. At a minimum you should build a plot of QA values against the values of each of the items you listed (include all of the agents data in each of these plots) and see what you see. If there is a trend what does it look like? Simple straight line, some kind of curvature, shotgun blast, clustering, banding, etc. Just looking at univariate plots of each X vs all agent QA measures will tell you if you have a chance of detecting any kind of significant trend (it will also highlight problems with your data that you will need to check and resolve before doing anything more) and thus identifying factors that might be connected with QA score changes. I can’t emphasize enough the importance of thorough graphing before running an analysis – if you don’t do this you have an excellent chance of going wrong with great assurance.
4. When you run a multiple regression you need to take into account the question of independence of the various things you a treating as the X variables. If you don’t do this then your analysis will be of the garbage in gospel out variety and will be of no value. You will need to run VIF (variance inflation factors) and condition indices tests on the matrix of your X variables in order to identify those variables which exhibit sufficient independence and thus which could be included in a multivariable regression analysis.
5. Once you have identified those X’s which are reasonably independent of one another you would want to put them in a multivariable model and run both backward elimination and stepwise (forward selection with replacement) in order to generate a reduced model which will consist of just those terms which are statistically significant (choosing a cut point of P < .05 for term significance is one that is often used). Assuming the two methods converge to the same model you would take that reduced model – run it against the data and run a full residual analysis (note – this is MUCH more than just looking at summary statistics such as the AndersonDarling – check any good book on regression analysis to understand what in involved in real residual analysis) to see if the model provides adequate fit to the data – if it does then you will have identified terms that are significantly correlated with QA measures and you will be in a position to start thinking about activity impact.
5. The point of identifying terms that are statistically significant is to be able to find those variables whose changes are different from random noise. If, after all of the above, you have no statistically significant terms then all you have is a model of noise which means you really have no indication of term importance and no real assessment of variable changes and their possible impact on activity.
0 
AuthorPosts
You must be logged in to reply to this topic.