What kind of data am I working with?
Six Sigma – iSixSigma › Forums › Old Forums › General › What kind of data am I working with?
 This topic has 15 replies, 4 voices, and was last updated 13 years ago by Craig.

AuthorPosts

September 9, 2009 at 4:05 pm #52624
I am looking at what factors influence the use of a given piece of equipment on the shop floor. The intent is to encourage workers to maximize use of the equipment. Data exists in terms of number of times a given worker uses the tool during the day. Total number of events (ie number of opportunities to use the tool) is not known. Resolution is limited to six distinct categories (0 – no use 6 – tool used up to 6 times a day).
So my question is, ‘What kind of data am I working with – binary (used the tool/did not use the tool) or count (number of times a given worker uses the tool per day) or ordinal (6 is better than 5 is better than 4 etc)?
My intent is to determine which logistical regression model best fits the data type so that I can quantify the effects of these relationships.
THANKS!!!0September 9, 2009 at 5:02 pm #185298
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.Your call. All will work. Are your factors continuous or discrete? Consider that before deciding whether log reg is appropriate.
0September 9, 2009 at 7:11 pm #185303Hi Doc,
Thanks so much for the response! I will be working primarily with categorical factors, with a limited number of covariates.
Is there a preference in which log reg you use (ie greater precision, etc)?
THANKS!0September 9, 2009 at 7:18 pm #185304Or would Chi Sq be a simpler approach, whereby I could run twovariable comparisons to determine significance using count data as the Response with two categorical variables? I would rather run the log reg, as it will give me significance and magnitude…..I think…
Thanks for the help!0September 9, 2009 at 7:33 pm #185305
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.Well, you have a choice of binary..they either used the machine or not…categorical..doesn’t seem to apply…or ordinal. You have an ordinal scale and if you want to predict the number of times they will use it rather than whether they will use it or not, I would tend towards the ordinal. Send me the data and I’ll try look at it. [email protected]
0September 9, 2009 at 7:35 pm #185306
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.I thought of Chi Square as well if both the X and Y are discrete. That will tell you whether there is relationship but you don’t get the prediction equation if that is important.
0September 9, 2009 at 7:49 pm #185308
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Question: you said “Resolution is limited to six distinct categories (0 – no use 6 – tool used up to 6 times a day).” Does this mean that you will actually count the number of distinct times up to and including 5 but that for 6 or anything greater than 6 you will just record it as 6? Or is it a situation where 6 is the maximum and no one ever uses something 7 times?
0September 9, 2009 at 8:00 pm #185309The data is happenstance, and the number of counts within the data set range from 06, with “0” being an operator didnt use the tooling at all for the day, and 16 being the number of times the tooling was used by the operator. Although currently, no value exists beyond 6, it is expected that the counts should move steadily upward.
Is there a preferred method to deal with this upper category? I was thinking I would categorize it at >6 for now, and then once the numbers moved into the double digits, I could begin treating it as continuous. ??0September 9, 2009 at 8:10 pm #185311Doc,
ok, that’s what I thought…a predictive quality would be preferred.0September 9, 2009 at 8:13 pm #185312
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.Kind of a shame to take counts and drive them to categories. I will defer to my friend Robert on this now that he has stopped in for a visit. He will likely quote you some great passage in his trailer full of books and a couple of citations to go with it. He is my idol for regression problems.
0September 9, 2009 at 8:17 pm #185313So keeping the count visible is desirable, got it. I will await further pearls of wisdom from Robert. Thanks Darth!
0September 10, 2009 at 9:25 am #185334
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Given that you really only have counts of 06 and, at least at the moment, no one has ever gone to 7 or more so that the count of 6 is really six and not a catchall for 6 or greater then how you proceed will depend on the kind of software you have.
If your software allows you to do Poisson regression then this would be the method of choice. You will have to run the usual tests for overdispersion to make sure you aren’t underestimating the standard errors and overestimating the test statistics.
If you don’t have this capability then the fall back would be straight linear regression. As noted in Allison’s book Logistic Regression “For years, people analyzed count data by ordinary linear regression and, in most cases, that method was adequate to the task.” (per the prior posts of Darth – I didn’t want to disappoint and not provide at least one cite :) )
There are issues but I’m with Darth and I think these are far outweighed by the loss of information that would result if you chose to lump your data into categories and run things such as contingency tables.0September 10, 2009 at 10:49 am #185336Darth/Robert,
You guys are awesome. Thanks so much for the help. Robert one question (as if there is ever just one question): I would be using MTB 15 as my software…My log reg choice include binary, ordinal, and nominal….Which of these would fit into the Poisson Reg definition? Thanks!0September 10, 2009 at 12:35 pm #185338
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.I don’t believe any of them do. I also checked it out after Robert’s post. There appears to be some special case and approach for running the Poisson Reg. I did run across something about transforming Poisson data so possibly regular old regression will work. At this point, I would KISS and just use an ordinal log regression and see if it produces anything useful. You can worry about something more sophisticated later on. Good news is that nobody will have any idea what you are doing anyway so just BS them with a straight face and cite Robert and Darth, that will do it.
0September 12, 2009 at 5:34 pm #185390Gentlemen, Thanks again for your help. Have a great weekend.
0September 13, 2009 at 9:53 am #185398One approach is to capture the data in a binary format for each 30 minute or 1 hr interval during the day. (1 = equip used, 0 = equip not used).
There wouldn’t seem to be a large number of factors driving the use of the equipment. (Operator available?, materials available?, tool in working condition? Other tool selected?, etc.)
Logistic regression is OK if you want to confuse the heck out of everyone in your organization, except you! :)
This might be more of an OEE question.
Good luck0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.