Python: Ventrial - Analysis of Variance (One-way)

Ventrial

Python File and Dataset
Python IDE PyCharm - pandas, numpy, matplotlib.pyplot, statsmodels.api.stats.anova_lm, patsy.dmatricies, statsmodels.formula.api

The Study
In a clinical trial to investigate the effects of different ventilation treatments on patients undergoing cardiac bypass surgery, twenty-two patients were randomized to receive one of three treatments as follows:

Treatment I - Patients received a 50% nitrous oxide and 50% oxygen mixture continuously for 24 hours
Treatment II - Patients received a 50% nitrous oxide and 50% oxygen mixture only during the operation
Treatment III - Patients received no nitrous oxide but received 35-50% oxygen for 24 hours

Among the information recorded about the patients was the level of folate in their red blood cells 24 hours after the start of ventilation. It was suspected that different types of ventilation, and in particular different amounts of nitrous oxide, might have an effect on this.

The data was stored in a csv datafile 'ventrial.csv', with a variate called 'folate' containing the red blood cell folate level (in ug/l), and the factor 'ventil' with three levels (I, II and III) indicating the treatment group. I investigate this dataset with a view to seeing if there are any differences in the treatment groups and which treatment performs the best.

Python File
In line 8 I load the datafile 'ventrial.csv' into a pandas dataframe object called 'df1' . Then to get a feel for the data in lines 11 to 13, I produce a boxplot of the data grouped by treatment I, II and III, and use the .describe function to calculate and print the summary statistics for the variable 'folate'.

In lines 16 to 18 I fit the ordinary least squares regression model the run the analysis of variance summary results and print the table. In lines 19 and 20 I save the residuals of the fitted model and the fitted values of the model for later reference.

In lines 24 to 39 I produce a composite residuals plot of the deviance residuals as test for an appropriate fit.

In lines 42 to 44 I run a further analysis based on the treatment contrast O1 = vantil I - ventil II and print the summary table. In lines 46 to 48 I run a further analysis based on the treatment contrast O1 = vantil I - ventil III and print the summary table.

Analysis
This study is a controlled experiment because each person can be viewed as s single experiment taking one randomly selected treatment, so in theory this study could be repeated many times with different people. The treatment groups have been defined in a way that the researcher has set, hence the experiment is controlled.

From the boxplots (fig1) I can see that the ventil treatment group I has the largest range with the highest median 319, the lower quartile contains the median of group III but not of group II, also the data for group I appear to be normally distributed as the boxplot is symmetrical.

fig 1

The ranges of both groups II and III are very similar, however the median of group II 254 is the lowest but contained within the lower quartile of group III. Also the data for group II appear to be normally distributed.

Lastly the mean of group III is 270 which is in the middle of group I and II medians, the data for this group show a slight negative skew however due to the small number of observations (only 5) for group III then this is well within sampling variation of a normal distribution.

The assumption of independence for each result is assumed from the design of the experiment, the assumption of equal variance across all groups is within the 'rule of 4', that is the variance is within 4 times the variance of the other groups, so this holds as well.

Finally the assumption that the data for each group is normally distributed is clearly shown by the boxplots as all boxplots appear to be roughly symmetrical about their group medians, although group III is slightly skewed. I can conclude it is reasonable to proceed with an analysis of variance.

From the model summary table the test statistic given in is 3.711, this is compared to the distribution F(2,19) giving a p value of p = 0.044, this is moderate evidence against the null hypothesis of equal means for all groups i.e. mean(I)=mean(II)=mean(III). I can conclude from this that at least one of the group means for folate levels for each treatment group is not equal to the others.

From composite residual plots (fig2) of the fitted model above, the histogram looks slightly odd but given the small number of observations (only 22) this is well within expected sampling variation, also the residuals appear to tail off towards the extreme values of the data, so it not unreasonable to assume the residuals are normally distributed.

fig 2

The fitted values plot shows that the residuals variance for group I appears to be higher than the other groups, but altogether there is no apparent pattern to the residuals.

The normal plot show that the two most extreme values which could be considered to be potential outliers five a slight curve to the extreme edges of the residuals, but again due to the small number of observations this is not a cause for concern, however it might be beneficial to examine the analysis of variance results without these two points.

I can conclude that the composite residuals plots do not give any substantial evidence that the residuals are not normally distributed so I can conclude that the one-way ANOVA model appears to be adequate.

I then proceed to test the null hypothesis that O1 = mean(I) - mean(II) = 0, that is the mean of the group I minus the mean of group II is equal to 0. By using the contrast group parameters [[1],[-1],[0]]. This gives the test statistic 7.62 compared against the F(1,19) distribution giving a p values of P=0.012. This is strong evidence against the null hypothesis that the means of groups I and II are the same.

I then proceed to test the null hypothesis that O2 = mean(I) - mean(III) = 0, that is the mean of the group I minus the mean of group III is equal to 0. By using the contrast group parameters [[1],[0],[-1]]. This gives the test statistic 2.82 compared against the F(1,19) distribution giving a p values of P=0.108. This is little to no evidence against the null hypothesis that the means of groups I and III are the same.

In light of the above tests of group contrasts, there is evidence to suggest that the treatment groups I and II do have different means, that is, providing the same treatment for 24 hours (group I) against only during the operation (group II), both with a 50% nitrous oxide and 50% oxygen. In fact the evidence shows that group II has a lower mean then group I.

However there is little evidence to support the study's main aim that nitrous oxide improves the folate levels of patients because the test O2 = mean(I) - mean(III) shows little evidence of different group means, because as group I received 50% nitrous oxide and 50% oxygen group III only received 35-50% oxygen for the same period of time.

I can conclude that there is little to no evidence to support that nitrous oxide does affect the folate levels 24 hours after ventilation starts.

By Edward Adcock

Python

Monday, 12 September 2016

Ventrial - Analysis of Variance (One-way)

No comments:

Post a Comment