iSixSigma

help required

Six Sigma – iSixSigma Forums Old Forums General help required

Viewing 12 posts - 1 through 12 (of 12 total)
  • Author
    Posts
  • #36007

    indresh
    Participant

    dear all,
    i am currentlt working on a project where there are three variables namely time, load, traffic and there output on rise in temperature
    load is the number of hardware components installed and traffic is number of people, time is the time duration and mapping is done for the rise in temperature
    what is the best way of analysis that can be applied with output contineous but variables are both contineous and discrete
    rgds,
     

    0
    #102604

    chhabra
    Participant

    Hi
    What Do you exactly want to do in your analysis?I mean what is the objective of your data collection?
     

    0
    #102605

    indresh
    Participant

    find a relation between the factors and how they effect the rise in temperature
    factors being the load, time and traffic
    rgds,

    Load
    Time
    Mts
    Rise in degree celcius
    temperature
    Traffic

    6
    14:30
    35
    5.00
    28
    198

    6
    14:55
    60
    4.00
    32
    212

    6
    15:30
    90
    1.00
    33
    235

    6
    16:00
    120
    2.60
    35.6
    195

    6
    16:30
    150
    1.00
    36.6
    206

    6
    17:00
    180
    1.00
    37.6
    219

    12
    15:00
    30
    11.70
    32
    206

    12
    15:30
    60
    6.00
    38
    185

    12
    16:00
    90
    4.00
    42.1
    166

    12
    16:30
    120
    4.00
    46.5
    214

    12
    17:00
    150
    2.00
    48.4
    229

    12
    17:30
    180
    2.00
    50.1
    237

    10
    13:00
    60
    11.00
    34
    540

    10
    14:00
    120
    4.00
    38
    645

    10
    14:30
    150
    2.00
    40
    597

    6
    15:00
    30
    4.50
    26
    480

    6
    15:30
    60
    4.00
    29.6
    456

    6
    16:00
    90
    3.00
    33
    465

    6
    16:30
    120
    5.00
    38
    427

    6
    17:00
    150
    2.00
    40
    565

    6
    17:30
    180
    2.00
    42
    611

    5
    17:30
    60
    4.00
    25
    222

    5
    18:00
    90
    2.20
    27.2
    278

    5
    18:30
    120
    2.80
    30
    252

    5
    19:00
    150
    2.60
    32.6
    275

    5
    19:30
    180
    1.80
    34.8
    274

    15
    13:20
    60
    5.00
    31
    374

    15
    14:20
    120
    7.00
    37.9
    319

    15
    15:20
    180
    7.00
    45
    370

    15
    15:50
    220
    3.40
    48.4
    380
     

    0
    #102608

    Ken Feldman
    Participant

    Looking for relationships between variables usually calls for some type of regression analysis.  If the Y is continuous and the Xs are both continuous and discrete, as you mentioned, then the discrete values need to be substituted with dummy or indicator variables.  Search the web or a stat book to learn the specifics of doing this.  It can easily be done in Minitab if you have access to that program.

    0
    #102614

    Gollapudi
    Participant

    Indresh, looks like your data is not summarized properly. Can you give me brief on your data collection procedure.

    0
    #102617

    Tim F
    Member

    As Darth pointed out, it only takes a few minutes with Minitab. The fact that some of the data are discrete shouldn’t be a problem. Since you have five variables, you may as well fit all of them and see what happens. The only challenge is time, since it is coded as “16:30”, so these values were all converted to a numeric value, where 12:00 = 0.5, 18:00 = 0.75, etc.The results from Minitab are then…———————————————-Regression Analysis: Rise versus Load, Time, Mts, Temp, Trafic The regression equation is
    Rise = 2.92 + 0.467 Load + 2.04 Time – 0.0257 Mts – 0.066 Temp + 0.00278 Trafic
    Predictor Coef SE Coef T P
    Constant 2.924 7.316 0.40 0.693
    Load 0.4670 0.1676 2.79 0.010
    Time 2.041 9.403 0.22 0.830
    Mts -0.02568 0.01334 -1.92 0.066
    Temp -0.0665 0.1039 -0.64 0.528
    Trafic 0.002785 0.002902 0.96 0.347
    S = 1.87584 R-Sq = 56.7% R-Sq(adj) = 47.7%
    Analysis of VarianceSource DF SS MS F P
    Regression 5 110.698 22.140 6.29 0.001
    Residual Error 24 84.450 3.519
    Total 29 195.148
    Source DF Seq SS
    Load 1 40.406
    Time 1 32.218
    Mts 1 33.109
    Temp 1 1.724
    Trafic 1 3.240
    Unusual ObservationsObs Load Rise Fit SE Fit Residual St Resid
    7 12.0 11.700 7.481 0.855 4.219 2.53R
    13 10.0 11.000 6.404 0.838 4.596 2.74R
    27 15.0 5.000 8.504 1.100 -3.504 -2.31R

    0
    #102619

    Ken Feldman
    Participant

    Nice effort Tim.  What strikes me first is the fact that the R square adj. is only about 48%.  I hope the original poster has more Xs since he is only explaining about 1/2 of the variation in Y with the ones he has.  Did you try any analysis of residuals or multi-collinearity?  Did the p-values for the individual Xs show significance?

    0
    #102620

    V. Laxmanan
    Member

    Dear Indresh:
    I just saw the data that you have posted.  Imagine the following two experiments.
    Experiment 1: Something is falling. 
    We can do experiments and find the rate at which it is falling. 
    Experiment 2: Something is alternately rising and falling.
    Again we can do experiments and find the rate at which it is rising and falling.
    Now, you want to understand what is going on. 
    Both these experiments that I have mentioned were done and changed our understanding of the universe.  The first experiment was done in the last part of the 16th century.  The second experiment was done in the first part of the 20th century.
    Also, interestingly, just as in your problem, the author of the second experiment suspected that there are three types of forces acting on that “something” that is alternately rising and falling.  He thought about what his experiments were telling him deeply. 
    Then he developed a simple mathematical model to explain what he was observing.  He then tested the model thoroughly and did hundreds of experiments. He kept such good records – his lab notebooks are writtten so neatly it is unbelievable – that even today we can go back and analyze the data and see if we agree with his conclusions.
    Interestingly, he did never uses any statistical arguments. After more than 8 years of experimentation he established something so important that he was awarded the Nobel Prize!
    We can all learn from this first American born and educated scientist did. And, what is more interesting is that he started in Nobel Prize winning experiments when he well into what we call middle age. 
    So, age is no barrier to the desire to learn and enquire deeply into our observations. We could all benefit from a deeper understanding of what this great scientist did. 
    Now, let me give you his name.  His name is Robert A. Millikan and he is one of the few scientists who has written his autobiography.  I have learned a lot from what he did, and how he used pure mathematical logic, to discover two fundamental constants of nature. This was done, let me repeat, without the help of any of the statistical tools that we used today. 
    Surprisingly, he does not even use least square regression, although he finds linear relations in his experiments!
    Your data reminded me of Millikan’s experiments.  I don’t know what we can learn, but I just wanted to share this perspective.  Good luck with your project. Regards.
    Laxman

    0
    #102624

    Robert Butler
    Participant

    Your data looks like repeat measure data.  You appear to have set a load condition and then you let the process run (?) for a period of time which seems to be 30 minutes but can be longer. At the end of that initial time you take a measure of the temperature and the traffic. You then repeat this process at 30 minute intervals under the same load condition. The number of 30 minute intervals per load condition varies. 
      If you are interested in building a predictive equation connecting load, traffic, and time to either the absolute temperature or the delta temperature and if the data is in fact repeat measure data, you will have to find a package capable of properly handling repeat measure data.  Regular linear regression packages will treat the repeat measures as independent measures and the resultant model will be of little value.
      Providing answers to the following questions may allow others on this forum to offer additional advice.
    1. Are the assumptions I listed above valid?
    2. Is there a reason for the varying amount of minutes before first measurement?
    3. Is there a reason for the varying number of 30 minute intervals?
    4. Do you have any control over the variable “Traffic”?
     

    0
    #102659

    indresh
    Participant

    dear all,
    thanks for replying
    will reply to all questions again asked
    TIM , LAXMAN, DARTH, ROBERT
    The regression equation is Rise = 2.92 + 0.467 Load + 2.04 Time – 0.0257 Mts – 0.066 Temp + 0.00278 Trafic
    even i tried that, though that shows no significance
    ok will tell you what was the data collection procedure for this
    – this is data of every cell site (GSM), load is the number of sectors that the site supports and traffic is the number of calls that were reported during that time period
    – aim is to schedule the battery before the diesel generator so that when the power goes (mains fail) the site should first go on battery, thereby reducing the run time for DG. Now when the site is on battery the AC (air conditioner) does not work and thereby the temperature rises. the temp rise is dependent on the current the equipment are drawing from the battery which is dependent on traffic (number of calls ). At every site the technician was asked to switch off the mains supply putting the site on battery and note the tem rise after interval of time (somewhere it was 30,60,90,120,150,180) and somewhere it was taken on an hourly basis (statistically don’t think that should be a big factor for creating a bias in analysis). The battery life and the equipment life is uneffected till 40degree C average for the entire year, beyond which it is reduced to half.
    – we are finding out a optimised path to schedule the time for which battery should run before the DG is switched on in order to again run the AC to bring the temp down, without effecting the effeciency of the site
    on analysing the data, i found that in the first hour the temp rise is high but after the first hour is over it follows a normal pattern of rise, this can be attributed to matching the inside (shelter, where equipment is placed) to the outside temp, since its insulated it takes almost an hour to get to the outside temp.
    – if we remove the rise of temp for the first one hour and then do the regression analysis, will it be of use, i still need to do that at minitab which i shall do it today and post it here
    I WOULD REQUEST TO PROVIDE A LAYMAN’S EXPLANATION OF ANALYSIS ALONGWITH FOR BETTER UNDERSTANDING,
    thanks again for all efforts
    rgds,

    0
    #102680

    Robert Butler
    Participant

      Based on your latest post it would appear that the issue is not temperature rise. Instead, your critical measure would seem to be when and how your system approaches the region of 40C.  If you take the measured temperatures and difference them from 40 and use this as your Y variable and plot your data you will find a very nice linear relation between this difference and minutes as well as what looks to be a log relation between load and the difference.  A plot of traffic against this difference doesn’t reveal any trend.  If you take the variables of traffic, load, and minutes and normalize each of them to a -1 to 1 range and then run a simple linear regression against the difference you will get a model with a snginficant minutes and load effect and no significant traffic effect. 
      I don’t like focusing on R2 as a lone assessment of a regression equation for the simple reason that R2 is only one measure of one aspect of a regression and it is easily manipulated, however, since you seem to want the value the R2 for this simple model is .76 and the adjusted is .73.  If you build a simple linear model with only load and minutes and run a proper regression analysis you will find the residual patterns to be acceptable.  The various statistics for residual normality are also within limits.  The residual pattern suggests you may get a better fit if you run the regression against normalized minutes and the log of the load. The mean square error for the model is 3.5. 
      If you build the equation in this fashion and confirm it with additional observations you will have an equation relating your systems temperature difference from 40C to minutes on battery and load. Your data would also suggest that for the ranges of traffic recorded, traffic isn’t the main issue.  If the equation is confirmed you will have an equation that will permit you to make decisions concerning when to turn on the DG.

    0
    #105964

    indresh
    Participant

    Site name
    Configuration
    Initial temperature inside shelter
    Rise in temperature
    Initial temperature outside shelter
    Time elapsed in minutes
    time
    Traffic

    1
    Hinjewadi
    18
    30

    24
    0
    12:15
    734

    18
    31.5
    1.5
    34
    30
    12:45
    799

    18
    35.3
    3.8
    31
    60
    13:15
    845

    18
    39.3
    3.9
    29.4
    90
    13:45
    836

    18
    42.9
    3.7
    30
    120
    14:15
    760

    18
    45.8
    2.9
    33.3
    150
    14:45
    671

    18
    48.8
    3.0
    35
    180
    15:15
    653

    18
    30.0

    25
    0
    16:15
    693

    18
    37.6
    7.6
    29
    30
    16:45
    705

    18
    41.7
    4.1
    28
    60
    17:15
    774

    18
    45.6
    3.9
    28
    90
    17:45
    941

    18
    48.9
    3.3
    27
    120
    18:15
    960

    18
    52.7
    3.7
    27
    150
    18:45
    892

    18
    54.9
    2.2
    26
    180
    19:15
    925

    18
    25.0

    25
    0
    20:15
    957

    18
    39.5
    14.5
    25
    30
    20:45
    957

    18
    44.0
    4.6
    25
    60
    21:15
    958

    18
    46.3
    2.3
    24
    90
    21:45
    898

    18
    50.4
    4.1
    24
    120
    22:15
    853

    18
    52.2
    1.8
    24
    150
    22:45
    824

    2
    Khadki
    6
    21.3

    32
    0
    11:00
    359

    6
    29.2
    7.9
    32.5
    30
    11:30
    382

    6
    34.1
    4.9
    34.5
    60
    12:00

    0
Viewing 12 posts - 1 through 12 (of 12 total)

The forum ‘General’ is closed to new topics and replies.