Huey et al.  studied the development of a fly (Drosophila subobscura) that had accidentally been introduced from Europe (EU) into North America (N.A.) around 1980. In Europe, characteristics of the fliesâ€™ wings follow a â€œclineâ€ â€“ a steady change with latitude. One decade after introduction, the N.A. population had spread throughout the continent, but no such cline could be found. After two decades, Huey and his team collected flies from 11 locations in western N.A. and native flies from 10 locations in EU at latitudes ranging from 35 to 55 degrees N. They maintained all samples in uniform conditions through several generations to isolate genetic differences from environmental differences. Then they measured about 20 adults from each group. The data set flies.txt shows average wing size in millimeters on a logarithmic scale.
(a) In their paper, Huey et al. used four separate regression models to suggest that female flies from both EU and N.A. have the same wing length â€“ latitude relationship (identical slopes), while the same relationships for male flies from the two continent are close but they were unable to say whether the slopes are the same.
We know that we can create a categorical variable to identify a flyâ€™s origin and sex. This variable can be created by pasting the columns Continent and Sex:
we obtain a model with four intercepts and four slopes, and the intercept and slope for the first level of FlyID (sorted alphabetically) is estimated and presented as the baseline.
Fit the linear model and interpret the results. Compare your results to the results presented in Huey et al. . Comment on any differences and why you feel you should use the approach we used here.
(b) The model we fitted here has its limitation. Only the slope and intercept of the first level are presented in the results explicitly. In this case, we will only see the intercept and slope for Female.EU, the baseline. Intercepts and slopes for the other three levels are presented in terms of their differences from the baseline. This is set up for hypothesis testing. That is, we can compare whether the slopes for Female.N.A., Male.EU, Male.N.A. are different from the slope for Female.EU. For this particular model, we can directly test whether the difference in slope between Female.EU and the slope of Female.N.A. is different from 0, but we cannot directly compare the slopes and intercepts for Male.EU and Male.N.A. To make this comparison, we must set Male.EU as the baseline first:
which will change FlyID into a numeric variable with integers 1 to 4, and 1 is “Male.EU”, 2 is “Male.N.A.”, 3 is “Female.EU”, and 4 is “Female.N.A.”. Now refit the same model as in (a). Using results from both (a) and (b) to compare whether the slope for male flies from N.A. differs from the slope for male flies from EU, and whether the slope for female flies from N.A. differs from the slope for female flies from EU.
(c) In their paper, the linear regression models have very low R2 values, and the model we fit has a very high R2 value. Why? Is our model that much better?