Data Mining and Six Sigma

    How is “data mining” being used in six sigma projects ? In measure/analyze phase, data is collected and analyzed in any DMAIC project. So is the “hype” just about using queries and Data-mining tools to collect that data from systems and applications or is there something else to it that has real value for the project ?



    Statistics Lie and Liars use statistics!!!
    Some times you need to MINE the DATA, to tell a specific story.
    What is the point you are trying to prove? with the DATA…..
    Real Value, is understanding your data, and making it useful & presentable for the project, MORE to the HYPE than collecting the Data.
    Is this a Homework question?



    IMHO, The real value of DMAIC lies in the time and effort you spend in D.
    What are you trying to do and why? Tie your project to business needs ($) and get the attention of upper mgmt.
    Keep that D in mind as you work through the M and identify WHAT DATA you need to collect. Once you have the RIGHT DATA, the analysis should tell you the story.Statistics CAN lie, if used improperly. And that improper use can be intentional or unintentional. If you’re just trying to make a point, the temptation is there. But if you are working a project that you want to be successful, why would you lie to yourself?I saw a statistic that red cars are in more accidents than purple cars. So now I only buy purple cars.I saw a statistic that most accidents occur within 10 miles of home. So I moved.M



    Must understanding what data mining is before you can use it for
    anything. So is it just queries? What do the queries really mean? Why
    would you be performing a data mining project in the first place?Got to read the book first before you call it hype.



    It normally would depend on what kind of project you are working on. For most of the projects; normal statistics techniques are very helpful and you do not need to apply data mining algorithms.
    Some time when the number are X’s are very large and the interaction with them are there; it is better to apply data mining techniques. If you are very interested in Outliers and want to understand the patterns on how these outliers are being formed; data mining techniques are more powerful than normal statistics.
    Data mining is a huge field; it is definitely not hyped. You just need to get an appreciation for that.



    Here is a link to a presentation on data mining used in the improve phase to support a DOE –’m a Minitab guy, but this presentation uses JMP’s recursive partitioning feature.I’ve seen a lot of LSS presentation in my day and this is the only one I’ve seen that discusses data mining and six sigma. Hope this helps.

