Big Data Analytics Vs Six Sigma

Six Sigma – iSixSigma Forums General Forums General Big Data Analytics Vs Six Sigma

Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
  • #55519

    Can any one throw some light on How sig sigma approach is different from Big data analysis/analytics?


    Robert Butler

    I think it is more a matter of using six sigma methods when running analysis on big data.

    You still have to define the problem you want to address and you still have to define how you will go about addressing it.

    You still have to think about measuring. This may sound odd but just because you think you “have it all” with a big data set does not mean that you really do “have it all”. All of the problems associated with small data are still present with big data – missing entries, sampling bias, incorrect data entry, overlooked critical variables, etc…and you will need to address every one of these things and more before you move on to data analysis.

    You still have to think about how you are going to analyze the data and the appropriateness of the methods of analysis you are using.

    You still have to think about of the results of your analysis and what they will mean with respect to actual improvement – failure to do this will guarantee you will go wrong with great assurance.

    You still have to think about the kinds of controls you will put in place to guarantee that the improvements remain.

    A good book that provides insight into the pitfalls of big data analysis when you don’t take the time to do the things listed above is Weapons of Math Destruction. Check your local library to see if they have it or can get it through inter-library loan.

    In short it is not a case of Big Data vs Six Sigma rather it is a case of correctly analyzing Big Data using the methods of Six Sigma.



    I agree with @rbutler. Six Sigma is a systematic approach for process improvement, methodology which relies on set of tools (quality, statistical, project management). Big data, machine learning, etc. could complement Six Sigma in its work. In case you found Big Data tools can help in Analyze and Improve phase to confirm root causes and develop robust solutions, it will be only part of the work.



    I agree with robert, after going through recent literatures and the universities offering Big data analytic courses – it seems to be focusing more on softwares like haloop and so, i am afraid that will they really take care of what you are briefing about the measuring, data accuracy etc., I doubt with out basics of six sigma it won’t be of much help in big data analysis..Thanks @rbutler & @ssobolev for your valuable inputs..


    I agree with the concepts of this thread, Robert gives a good answer. The main difference between DMAIC and Data Analytics in general is: in DMAIC we are most typically solving a process-oriented problem, with a fundamental y=f(x) at play and we are using analytics and modeling to find those Xs. This CAN be true in data analytics as well, but the context and scope can be much wider, and some of the tools we apply in analytics would not be as well-suited for process problem-solving. Some tools such as exploratory data analysis (EDA); correlation, regression modeling, etc. are common to both. But fundamental analytics will go further into tools such as cluster analysis (special groupings of customers for instance), association rules (customers who buy this, might also like this…), naive bayes (classifying emails or social media content as spam or not, positive or negative), and so on. To the points above however, both endeavors should follow a methodology. The point in analytics is to use data to gain INSIGHTS. Some writers focus on certain software. Don’t be misled. The software choice is really secondary.


    Most Big Data analytics teams have currently embraced using an Agile approach to acquiring knowledge from their sprints. I am sure you have heard of analysis paralysis? Each agile sprint focuses on answering a single question from the data. That answer is used to provide new and additional knowledge about process or issue. Knowledge is acquired from sprints and the team tries to leverage it to benefit the business as new knowledge.
    Six sigma is a methodology focused on making decisions based on skillfully acquired data about a problem. Big Data teams try to gather data to answer some plaguing question that no one in the business appears to know the correct answer.
    The real difference is Six Sigma uses statistically based small sample sizes to unlock the knowledge. Big data uses relatively huge ( almost population sized ) data and looks for the statistically identified trends or patterns to unlock the knowledge. Sounds almost the same, except, Big Data analysis requires the horse power of computers and computer arrays to crunch all the data relatively quickly to search for the data patterns. Some systems can gather data and analyze in almost real-time quickness. I have worked with automated systems that gather dozens of data points every second and make decisions from the data just as quickly. Traditional Six Sigma doesn’t necessarily have to solve a problem that quickly and definitely not with that large of a data set.


    One of the Six Sigma Methods that has a good chance of helping here are Designed Experiments. The specific type I have in mind have to do with Binary Variable levels. These kind of Designed Experiments are very useful at evaluating the statistical relevance of variables rather than drilling in toward good influence and noise propagation effects. The other advantage of a Designed Experiment is to create a search criteria in a big data set for instances of the targeted test cases. This would narrow the field of excessive bias in data, expose gap areas where big data might see holes in its coverage and lead to a thoughtful subset of big data that would we useful for machine learning from.



    One thing to keep in mind is that “big data” is not about a massive number of data points for a particular variable. It’s about extrapolating inferences from essentially random data not related to any particular variable. Big data uses rules to exclude data that are probably irrelevant (noise) then uses algorithms to make statistical inferences. It can reveal insights but it can be grossly misunderstood and misused. I recommend that you read Cathy O’Neil’s book “Weapons of Math Destruction” or view her on C-SPAN Book TV before you give trust to big data.



    Preddy – it has been said here (best by @rbutler) that “Big Analytics” tends to assume that the data is good, whereas SS emphasizes that you need to validate the measurement system as one of the earliest tasks. Now, oftentimes the data is actually good, so you won’t necessarily go wrong just jumping in and analyzing, but in those instances where the data does have problems, or isn’t really measuring/recording what people think it is, you will end up with erroneous conclusions.

    SS also consists of a process framework, whereas “Big Analytics” doesn’t have anything that is recognized and documented as a commonly understood methodology. There are many tools in the SS toolkit, but basically the overarching process steps are defined and one who practices or even is merely observing the actions, can logically understand the activities. A practitioner of “Big Analytics” may do just about anything.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic.