iSixSigma

What kind of data am I working with?

Six Sigma – iSixSigma Forums Old Forums General What kind of data am I working with?

Viewing 16 posts - 1 through 16 (of 16 total)
  • Author
    Posts
  • #52624

    newbie
    Participant

    I am looking at what factors influence the use of a given piece of equipment on the shop floor.  The intent is to encourage workers to maximize use of the equipment. Data exists in terms of number of times a given worker uses the tool during the day.   Total number of events (ie number of opportunities to use the tool) is not known. Resolution is limited to six distinct categories (0 – no use  6 – tool used up to 6 times a day).
    So my question is, ‘What kind of data am I working with – binary (used the tool/did not use the tool) or count (number of times a given worker uses the tool per day) or ordinal (6 is better than 5 is better than 4 etc)? 
    My intent is to determine which logistical regression model best fits the data type so that I can quantify the effects of these relationships.
    THANKS!!!

    0
    #185298

    Darth
    Participant

    Your call. All will work. Are your factors continuous or discrete? Consider that before deciding whether log reg is appropriate.

    0
    #185303

    newbie
    Participant

    Hi Doc,
    Thanks so much for the response!  I will be working primarily with categorical factors, with a limited number of covariates. 
    Is there a preference in which log reg you use (ie greater precision, etc)?
    THANKS!

    0
    #185304

    newbie
    Participant

    Or would Chi Sq be a simpler approach, whereby I could run two-variable comparisons to determine significance using count data as the Response with two categorical variables?  I would rather run the log reg, as it will give me significance and magnitude…..I think…
    Thanks for the help!

    0
    #185305

    Darth
    Participant

    Well, you have a choice of binary..they either used the machine or not…categorical..doesn’t seem to apply…or ordinal. You have an ordinal scale and if you want to predict the number of times they will use it rather than whether they will use it or not, I would tend towards the ordinal. Send me the data and I’ll try look at it. [email protected]

    0
    #185306

    Darth
    Participant

    I thought of Chi Square as well if both the X and Y are discrete. That will tell you whether there is relationship but you don’t get the prediction equation if that is important.

    0
    #185308

    Robert Butler
    Participant

    Question:  you said “Resolution is limited to six distinct categories (0 – no use  6 – tool used up to 6 times a day).”  Does this mean that you will actually count the number of distinct times up to and including 5 but that for 6 or anything greater than 6 you will just record it as 6?  Or is it a situation where 6 is the maximum and no one ever uses something 7 times? 

    0
    #185309

    newbie
    Participant

    The data is happenstance, and the number of counts within the data set range from 0-6, with “0” being an operator didnt use the tooling at all for the day, and 1-6 being the number of times the tooling was used by the operator.  Although currently, no value exists beyond 6, it is expected that the counts should move steadily upward. 
    Is there a preferred method to deal with this upper category?  I was thinking I would categorize it at >6 for now, and then once the numbers moved into the double digits, I could begin treating it as continuous.  ??

    0
    #185311

    newbie
    Participant

    Doc,
    ok, that’s what I thought…a predictive quality would be preferred. 

    0
    #185312

    Darth
    Participant

    Kind of a shame to take counts and drive them to categories. I will defer to my friend Robert on this now that he has stopped in for a visit. He will likely quote you some great passage in his trailer full of books and a couple of citations to go with it. He is my idol for regression problems.

    0
    #185313

    newbie
    Participant

    So keeping the count visible is desirable, got it.  I will await further pearls of wisdom from Robert.  Thanks Darth!

    0
    #185334

    Robert Butler
    Participant

     Given that you really only have counts of 0-6 and, at least at the moment, no one has ever gone to 7 or more so that the count of 6 is really six and not a catchall for 6 or greater then how you proceed will depend on the kind of software you have.
      If your software allows you to do Poisson regression then this would be the method of choice.  You will have to run the usual tests for overdispersion to make sure you aren’t underestimating the standard errors and overestimating the test statistics.
     If you don’t have this capability then the fall back would be straight linear regression.  As noted in Allison’s book Logistic Regression “For years, people analyzed count data by ordinary linear regression and, in most cases, that method was adequate to the task.” (per the prior posts of Darth – I didn’t want to disappoint and not provide at least one cite :-)  )
      There are issues but I’m with Darth and I think these are far outweighed by the loss of information that would result if you chose to lump your data into categories and run things such as contingency tables.

    0
    #185336

    newbie
    Participant

    Darth/Robert,
    You guys are awesome.  Thanks so much for the help.  Robert one question (as if there is ever just one question):  I would be using MTB 15 as my software…My log reg choice include binary, ordinal, and nominal….Which of these would fit into the Poisson Reg definition?   Thanks!

    0
    #185338

    Darth
    Participant

    I don’t believe any of them do. I also checked it out after Robert’s post. There appears to be some special case and approach for running the Poisson Reg. I did run across something about transforming Poisson data so possibly regular old regression will work. At this point, I would KISS and just use an ordinal log regression and see if it produces anything useful. You can worry about something more sophisticated later on. Good news is that nobody will have any idea what you are doing anyway so just BS them with a straight face and cite Robert and Darth, that will do it.

    0
    #185390

    newbie
    Participant

    Gentlemen, Thanks again for your help. Have a great weekend.

    0
    #185398

    Craig
    Participant

    One approach is to capture the data in a binary format for each 30 minute or 1 hr interval during the day. (1 = equip used, 0 = equip not used).
    There wouldn’t seem to be a large number of factors driving the use of the equipment. (Operator available?, materials available?, tool in working condition?  Other tool selected?, etc.)
    Logistic regression is OK if you want to confuse the heck out of everyone in your organization, except you! :-)
    This might be more of an OEE question.
    Good luck

    0
Viewing 16 posts - 1 through 16 (of 16 total)

The forum ‘General’ is closed to new topics and replies.