SL Paper 2

The manager of a folder factory recorded the number of folders produced by the factory (in thousands) and the production costs (in thousand Euros), for six consecutive months.

M17/5/MATSD/SP2/ENG/TZ2/03

Every month the factory sells all the folders produced. Each folder is sold for 2.99 Euros.

Draw a scatter diagram for this data. Use a scale of 2 cm for 5000 folders on the horizontal axis and 2 cm for 10 000 Euros on the vertical axis.

[4]
a.

Write down, for this set of data the mean number of folders produced, \(\bar x\);

[1]
b.i.

Write down, for this set of data the mean production cost, \(\bar C\).

[1]
b.ii.

Label the point \({\text{M}}(\bar x,{\text{ }}\bar C)\) on the scatter diagram.

[1]
c.

Use your graphic display calculator to find the Pearson’s product–moment correlation coefficient, \(r\).

[2]
d.

State a reason why the regression line \(C\) on \(x\) is appropriate to model the relationship between these variables.

[1]
e.

Use your graphic display calculator to find the equation of the regression line \(C\) on \(x\).

[2]
f.

Draw the regression line \(C\) on \(x\) on the scatter diagram.

[2]
g.

Use the equation of the regression line to estimate the least number of folders that the factory needs to sell in a month to exceed its production cost for that month.

[4]
h.



A manufacturer produces 1500 boxes of breakfast cereal every day.

The weights of these boxes are normally distributed with a mean of 502 grams and a standard deviation of 2 grams.

All boxes of cereal with a weight between 497.5 grams and 505 grams are sold. The manufacturer’s income from the sale of each box of cereal is $2.00.

The manufacturer recycles any box of cereal with a weight not between 497.5 grams and 505 grams. The manufacturer’s recycling cost is $0.16 per box.

A different manufacturer produces boxes of cereal with weights that are normally distributed with a mean of 350 grams and a standard deviation of 1.8 grams.

This manufacturer sells all boxes of cereal that are above a minimum weight, \(w\).

They sell 97% of the cereal boxes produced.

Draw a diagram that shows this information.

[2]
a.

(i)     Find the probability that a box of cereal, chosen at random, is sold.

(ii)     Calculate the manufacturer’s expected daily income from these sales.

[4]
b.

Calculate the manufacturer’s expected daily recycling cost.

[2]
c.

Calculate the value of \(w\).

[3]
d.



As part of his IB Biology field work, Barry was asked to measure the circumference of trees, in centimetres, that were growing at different distances, in metres, from a river bank. His results are summarized in the following table.


State whether distance from the river bank is a continuous or discrete variable.

[1]
a.

On graph paper, draw a scatter diagram to show Barry’s results. Use a scale of 1 cm to represent 5 m on the x-axis and 1 cm to represent 10 cm on the y-axis.

[4]
b.

Write down

(i)     the mean distance, \(\bar x\), of the trees from the river bank;

(ii)     the mean circumference, \(\bar y\), of the trees.

[2]
c.

Plot and label the point \({\text{M}}(\bar x,{\text{ }}\bar y)\) on your graph.

[2]
d.

Write down

(i)     the Pearson’s product–moment correlation coefficient, \(r\), for Barry’s results;

(ii)     the equation of the regression line \(y\) on \(x\), for Barry’s results.

[4]
e.

Draw the regression line \(y\) on \(x\) on your graph.

[2]
f.

Use the equation of the regression line \(y\) on \(x\) to estimate the circumference of a tree that is 40 m from the river bank.

[2]
g.



The following table shows the number of bicycles, \(x\), produced daily by a factory and their total production cost, \(y\), in US dollars (USD). The table shows data recorded over seven days.

(i)     Write down the Pearson’s product–moment correlation coefficient, \(r\), for these data.

(ii)     Hence comment on the result.

[4]
a.

Write down the equation of the regression line \(y\) on \(x\) for these data, in the form \(y = ax + b\).

[2]
b.

Estimate the total cost, to the nearest USD, of producing \(13\) bicycles on a particular day.

[3]
c.

All the bicycles that are produced are sold. The bicycles are sold for 304 USD each.

Explain why the factory does not make a profit when producing \(13\) bicycles on a particular day.

[2]
d.

All the bicycles that are produced are sold. The bicycles are sold for 304 USD each.

(i)     Write down an expression for the total selling price of \(x\) bicycles.

(ii)     Write down an expression for the profit the factory makes when producing \(x\) bicycles on a particular day.

(iii)     Find the least number of bicycles that the factory should produce, on a particular day, in order to make a profit.

[5]
e.



The following table shows the average body weight, \(x\), and the average weight of the brain, \(y\), of seven species of mammal. Both measured in kilograms (kg).

M17/5/MATSD/SP2/ENG/TZ1/01

The average body weight of grey wolves is 36 kg.

In fact, the average weight of the brain of grey wolves is 0.120 kg.

The average body weight of mice is 0.023 kg.

Find the range of the average body weights for these seven species of mammal.

[2]
a.

For the data from these seven species calculate \(r\), the Pearson’s product–moment correlation coefficient;

[2]
b.i.

For the data from these seven species describe the correlation between the average body weight and the average weight of the brain.

[2]
b.ii.

Write down the equation of the regression line \(y\) on \(x\), in the form \(y = mx + c\).

[2]
c.

Use your regression line to estimate the average weight of the brain of grey wolves.

[2]
d.

Find the percentage error in your estimate in part (d).

[2]
e.

State whether it is valid to use the regression line to estimate the average weight of the brain of mice. Give a reason for your answer.

[2]
f.



A group of candidates sat a Chemistry examination and a Physics examination. The candidates’ marks in the Chemistry examination are normally distributed with a mean of \(60\) and a standard deviation of \(12\).

Draw a diagram that shows this information.

[2]
a.

Write down the probability that a randomly chosen candidate who sat the Chemistry examination scored at most 60 marks.

[1]
b.

Hee Jin scored 80 marks in the Chemistry examination.

Find the probability that a randomly chosen candidate who sat the Chemistry examination scored more than Hee Jin.

[2]
c.

The candidates’ marks in the Physics examination are normally distributed with a mean of \(63\) and a standard deviation of \(10\). Hee Jin also scored \(80\) marks in the Physics examination.

Find the probability that a randomly chosen candidate who sat the Physics examination scored less than Hee Jin.

[2]
d.

The candidates’ marks in the Physics examination are normally distributed with a mean of \(63\) and a standard deviation of \(10\). Hee Jin also scored \(80\) marks in the Physics examination.

Determine whether Hee Jin’s Physics mark, compared to the other candidates, is better than her mark in Chemistry. Give a reason for your answer.

[2]
e.

To obtain a “grade A” a candidate must be in the top \(10\% \) of the candidates who sat the Physics examination.

Find the minimum possible mark to obtain a “grade A”. Give your answer correct to the nearest integer.

[3]
f.



The table below shows the distribution of test grades for 50 IB students at Greendale School.

M17/5/MATSD/SP2/ENG/TZ1/05

A student is chosen at random from these 50 students.

A second student is chosen at random from these 50 students.

The number of minutes that the 50 students spent preparing for the test was normally distributed with a mean of 105 minutes and a standard deviation of 20 minutes.

Calculate the mean test grade of the students;

[2]
a.i.

Calculate the standard deviation.

[1]
a.ii.

Find the median test grade of the students.

[1]
b.

Find the interquartile range.

[2]
c.

Find the probability that this student scored a grade 5 or higher.

[2]
d.

Given that the first student chosen at random scored a grade 5 or higher, find the probability that both students scored a grade 6.

[3]
e.

Calculate the probability that a student chosen at random spent at least 90 minutes preparing for the test.

[2]
f.i.

Calculate the expected number of students that spent at least 90 minutes preparing for the test.

[2]
f.ii.



In a school, all Mathematical Studies SL students were given a test. The test contained four questions, each one on a different topic from the syllabus. The quality of each response was classified as satisfactory or not satisfactory. Each student answered only three of the four questions, each on a separate answer sheet.

The table below shows the number of satisfactory and not satisfactory responses for each question.

M17/5/MATSD/SP2/ENG/TZ2/01

A \({\chi ^2}\) test is carried out at the 5% significance level for the data in the table.

The critical value for this test is 7.815.

If the teacher chooses a response at random, find the probability that it is a response to the Calculus question;

[2]
a.i.

If the teacher chooses a response at random, find the probability that it is a satisfactory response to the Calculus question;

[2]
a.ii.

If the teacher chooses a response at random, find the probability that it is a satisfactory response, given that it is a response to the Calculus question.

[2]
a.iii.

The teacher groups the responses by topic, and chooses two responses to the Logic question. Find the probability that both are not satisfactory.

[3]
b.

State the null hypothesis for this test.

[1]
c.

Show that the expected frequency of satisfactory Calculus responses is 12.

[1]
d.

Write down the number of degrees of freedom for this test.

[1]
e.

Use your graphic display calculator to find the \({\chi ^2}\) statistic for this data.

[2]
f.

State the conclusion of this \({\chi ^2}\) test. Give a reason for your answer.

[2]
g.



In the month before their IB Diploma examinations, eight male students recorded the number of hours they spent on social media.

For each student, the number of hours spent on social media (\(x\)) and the number of IB Diploma points obtained (\(y\)) are shown in the following table.

N16/5/MATSD/SP2/ENG/TZ0/01

Use your graphic display calculator to find

Ten female students also recorded the number of hours they spent on social media in the month before their IB Diploma examinations. Each of these female students spent between 3 and 30 hours on social media.

The equation of the regression line y on x for these ten female students is

\[y =  - \frac{2}{3}x + \frac{{125}}{3}.\]

An eleventh girl spent 34 hours on social media in the month before her IB Diploma examinations.

On graph paper, draw a scatter diagram for these data. Use a scale of 2 cm to represent 5 hours on the \(x\)-axis and 2 cm to represent 10 points on the \(y\)-axis.

[4]
a.

(i)     \({\bar x}\), the mean number of hours spent on social media;

(ii)     \({\bar y}\), the mean number of IB Diploma points.

[2]
b.

Plot the point \((\bar x,{\text{ }}\bar y)\) on your scatter diagram and label this point M.

[2]
c.

Write down the value of \(r\), the Pearson’s product–moment correlation coefficient, for these data.

[2]
d.

Write down the equation of the regression line \(y\) on \(x\) for these eight male students.

[2]
e.

Draw the regression line, from part (e), on your scatter diagram.

[2]
f.

Use the given equation of the regression line to estimate the number of IB Diploma points that this girl obtained.

[2]
g.

Write down a reason why this estimate is not reliable.

[1]
h.



The table below shows the scores for 12 golfers for their first two rounds in a local golf tournament.

(i) Write down the mean score in Round 1.

(ii) Write down the standard deviation in Round 1.

(iii) Find the number of these golfers that had a score of more than one standard deviation above the mean in Round 1.

[5]
a.

Write down the correlation coefficient, r.

[2]
b.

Write down the equation of the regression line of y on x.

[2]
c.

Another golfer scored 70 in Round 1.

Calculate an estimate of his score in Round 2.

[2]
d.

Another golfer scored 89 in Round 1.

Determine whether you can use the equation of the regression line to estimate his score in Round 2. Give a reason for your answer.

[2]
e.



In a mountain region there appears to be a relationship between the number of trees growing in the region and the depth of snow in winter. A set of 10 areas was chosen, and in each area the number of trees was counted and the depth of snow measured. The results are given in the table below.

In a study on \(100\) students there seemed to be a difference between males and females in their choice of favourite car colour. The results are given in the table below. A \(\chi^2\) test was conducted.

Use your graphic display calculator to find the mean number of trees.

[1]
A, a, i.

Use your graphic display calculator to find the mean depth of snow.

[1]
A, a, iii.

Use your graphic display calculator to find the standard deviation of the depth of snow.

[1]
A, a, iv.

The covariance, Sxy = 188.5.

Write down the product-moment correlation coefficient, r.

[2]
A, b.

Write down the equation of the regression line of y on x.

[2]
A, c.

If the number of trees in an area is 55, estimate the depth of snow.

[2]
A, d.

Use the equation of the regression line to estimate the depth of snow in an area with 100 trees.

[1]
A, e, i.

Decide whether the answer in (e)(i) is a valid estimate of the depth of snow in the area. Give a reason for your answer.

[2]
A, e, ii.

Write down the total number of male students.

[1]
B, a.

Show that the expected frequency for males, whose favourite car colour is blue, is 12.6.

[2]
B, b.

The calculated value of \({\chi ^2}\) is \(1.367\) and the critical value of \({\chi ^2}\) is \(5.99\) at the \(5\%\) significance level.

Write down the null hypothesis for this test.

[1]
B, c, i.

The calculated value of \({\chi ^2}\) is \(1.367\) and the critical value of \({\chi ^2}\) is \(5.99\) at the \(5\%\) significance level.

Write down the number of degrees of freedom.

[1]
B, c, ii.

The calculated value of \({\chi ^2}\) is \(1.367\) and the critical value of \({\chi ^2}\) is \(5.99\) at the \(5\%\) significance level.

Determine whether the null hypothesis should be accepted at the \(5\%\) significance level. Give a reason for your answer.

[2]
B, c, iv.



A group of 800 students answered 40 questions on a category of their choice out of History, Science and Literature.

For each student the category and the number of correct answers, \(N\), was recorded. The results obtained are represented in the following table.

N17/5/MATSD/SP2/ENG/TZ0/01

A \({\chi ^2}\) test at the 5% significance level is carried out on the results. The critical value for this test is 12.592.

State whether \(N\) is a discrete or a continuous variable.

[1]
a.

Write down, for \(N\), the modal class;

[1]
b.i.

Write down, for \(N\), the mid-interval value of the modal class.

[1]
b.ii.

Use your graphic display calculator to estimate the mean of \(N\);

[2]
c.i.

Use your graphic display calculator to estimate the standard deviation of \(N\).

[1]
c.ii.

Find the expected frequency of students choosing the Science category and obtaining 31 to 40 correct answers.

[2]
d.

Write down the null hypothesis for this test;

[1]
e.i.

Write down the number of degrees of freedom.

[1]
e.ii.

Write down the \(p\)-value for the test;

[1]
f.i.

Write down the \({\chi ^2}\) statistic.

[2]
f.ii.

State the result of the test. Give a reason for your answer.

[2]
g.



Part A

100 students are asked what they had for breakfast on a particular morning. There were three choices: cereal (X) , bread (Y) and fruit (Z). It is found that

10 students had all three

17 students had bread and fruit only

15 students had cereal and fruit only

12 students had cereal and bread only

13 students had only bread

8 students had only cereal

9 students had only fruit

Part B

The same 100 students are also asked how many meals on average they have per day. The data collected is organized in the following table.

A \({\chi ^2}\) test is carried out at the 5 % level of significance.

Represent this information on a Venn diagram.

[4]
A.a.

Find the number of students who had none of the three choices for breakfast.

[2]
A.b.

Write down the percentage of students who had fruit for breakfast.

[2]
A.c.

Describe in words what the students in the set \(X \cap Y'\) had for breakfast.

[2]
A.d.

Find the probability that a student had at least two of the three choices for breakfast.

[2]
A.e.

Two students are chosen at random. Find the probability that both students had all three choices for breakfast.

[3]
A.f.

Write down the null hypothesis, H0, for this test.

[1]
B.a.

Write down the number of degrees of freedom for this test.

[1]
B.b.

Write down the critical value for this test.

[1]
B.c.

Show that the expected number of females that have more than 5 meals per day is 13, correct to the nearest integer.

[2]
B.d.

Use your graphic display calculator to find the \(\chi _{calc}^2\) for this data.

[2]
B.e.

Decide whether H0 must be accepted. Justify your answer.

[2]
B.f.



The table shows the distance, in km, of eight regional railway stations from a city centre terminus and the price, in \($\), of a return ticket from each regional station to the terminus.


Draw a scatter diagram for the above data. Use a scale of \(1\) cm to represent \(10\) km on the \(x\)-axis and \(1\) cm to represent \(\$10\) on the \(y\)-axis.

[4]
a.

Use your graphic display calculator to find

(i)     \(\bar x\), the mean of the distances;

(ii)     \(\bar y\), the mean of the prices.

[2]
b.

Plot and label the point \({\text{M }}(\bar x,{\text{ }}\bar y)\) on your scatter diagram.

[1]
c.

Use your graphic display calculator to find

(i)     the product–moment correlation coefficient, \(r\,;\)

(ii)     the equation of the regression line \(y\) on \(x\).

[3]
d.

Draw the regression line \(y\) on \(x\) on your scatter diagram.

[2]
e.

A ninth regional station is \(76\) km from the city centre terminus.

Use the equation of the regression line to estimate the price of a return ticket to the city centre terminus from this regional station. Give your answer correct to the nearest \({\mathbf{\$ }}\).

[3]
f.

Give a reason why it is valid to use your regression line to estimate the price of this return ticket.

[1]
g.

The actual price of the return ticket is \(\$80\).

Using your answer to part (f), calculate the percentage error in the estimated price of the ticket.

[2]
h.



In a debate on voting, a survey was conducted. The survey asked people’s opinion on whether or not the minimum voting age should be reduced to 16 years of age. The results are shown as follows.

A \({\chi ^2}\) test at the 1% significance level was conducted. The \({\chi ^2}\) critical value of the test is 9.21.

State

(i)     \({{\text{H}}_0}\), the null hypothesis for the test;

(ii)     \({{\text{H}}_1}\), the alternative hypothesis for the test.

[2]
a.

Write down the number of degrees of freedom.

[1]
b.

Show that the expected frequency of those between the ages of 26 and 40 who oppose the reduction in the voting age is 21.5, correct to three significant figures.

[2]
c.

Find

(i)     the \({\chi ^2}\) statistic;

(ii)     the associated \(p\)-value for the test.

[3]
d.

Determine, giving a reason, whether \({{\text{H}}_0}\) should be accepted.

[2]
e.



The seniors from Gulf High School are required to participate in exactly one after-school sport. Data were gathered from a sample of 120 students regarding their choice of sport. The following data were recorded.

A \({\chi ^2}\) test was carried out at the 5 % significance level to analyse the relationship between gender and choice of after-school sport.

Write down the null hypothesis, H0, for this test.

[1]
a.

Find the expected value of female footballers.

[2]
b.

Write down the number of degrees of freedom.

[1]
c.

Write down the critical value of \(\chi ^2\), at the 5 % level of significance.

[1]
d.

Use your graphic display calculator to determine the \(\chi _{calc}^2\) value.

[2]
e.

Determine whether H0 should be accepted. Justify your answer.

[2]
f.

One student is chosen at random from the 120 students.

Find the probability that this student

(i) is male;

(ii) plays tennis.

[2]
g.

Two students are chosen at random from the 120 students.

Find the probability that

(i) both play football;

(ii) neither play basketball.

[5]
h.



Daniel grows apples and chooses at random a sample of 100 apples from his harvest.

He measures the diameters of the apples to the nearest cm. The following table shows the distribution of the diameters.

Using your graphic display calculator, write down the value of

(i)     the mean of the diameters in this sample;

(ii)     the standard deviation of the diameters in this sample.

[3]
a.

Daniel assumes that the diameters of all of the apples from his harvest are normally distributed with a mean of 7 cm and a standard deviation of 1.2 cm. He classifies the apples according to their diameters as shown in the following table.

Calculate the percentage of small apples in Daniel’s harvest.

[3]
b.

Daniel assumes that the diameters of all of the apples from his harvest are normally distributed with a mean of 7 cm and a standard deviation of 1.2 cm. He classifies the apples according to their diameters as shown in the following table.

Of the apples harvested, 5% are large apples.

Find the value of \(a\).

[2]
c.

Daniel assumes that the diameters of all of the apples from his harvest are normally distributed with a mean of 7 cm and a standard deviation of 1.2 cm. He classifies the apples according to their diameters as shown in the following table.

Find the percentage of medium apples.

[2]
d.

Daniel assumes that the diameters of all of the apples from his harvest are normally distributed with a mean of 7 cm and a standard deviation of 1.2 cm. He classifies the apples according to their diameters as shown in the following table.

This year, Daniel estimates that he will grow \({\text{100}}\,{\text{000}}\) apples.

Estimate the number of large apples that Daniel will grow this year.

[2]
e.



An agricultural cooperative uses three brands of fertilizer, A, B and C, on 120 different crops. The crop yields are classified as High, Medium or Low.

The data collected are organized in the table below.

The agricultural cooperative decides to conduct a chi-squared test at the 1 % significance level using the data.

State the null hypothesis, H0, for the test.

[2]
a.

Write down the number of degrees of freedom.

[1]
b.

Write down the critical value for the test.

[1]
c.

Show that the expected number of Medium Yield crops using Fertilizer C is 17, correct to the nearest integer.

[2]
d.

Use your graphic display calculator to find for the data

(i) the \(\chi^2\) calculated value, \(\chi _{calc}^2\);

(ii) the p-value.

[3]
e.

State the conclusion of the test. Give a reason for your decision.

[2]
f.



The number of bottles of water sold at a railway station on each day is given in the following table.

Write down

(i)     the mean temperature;

(ii)    the standard deviation of the temperatures.

[2]
a.

Write down the correlation coefficient, \(r\), for the variables \(n\) and \(T\).

[1]
b.

Comment on your value for \(r\).

[2]
c.

The equation of the line of regression for \(n\) on \(T\) is \(n = dT - 100\).

(i)     Write down the value of \(d\).

(ii)    Estimate how many bottles of water will be sold when the temperature is \({19.6^ \circ }\).

[2]
d.

On a day when the temperature was \({36^ \circ }\) Peter calculates that \(314\) bottles would be sold. Give one reason why his answer might be unreliable.

[1]
e.



A speed camera on Peterson Road records the speed of each passing vehicle. The speeds are found to be normally distributed with a mean of \(67\,{\text{km}}\,{{\text{h}}^{ - 1}}\) and a standard deviation of \(3.4\,{\text{km}}\,{{\text{h}}^{ - 1}}\).

Sketch a diagram of this normal distribution and shade the region representing the probability that the speed of a vehicle is between \(60\) and \(70\,{\text{km}}\,{{\text{h}}^{ - 1}}\).

[2]
a.

A vehicle on Peterson Road is chosen at random.

Find the probability that the speed of this vehicle is

(i)      more than \(60\,{\text{km}}\,{{\text{h}}^{ - 1}}\);

(ii)     less than \(70\,{\text{km}}\,{{\text{h}}^{ - 1}}\);

(iii)    between \(60\) and \(70\,{\text{km}}\,{{\text{h}}^{ - 1}}\).

[3]
b.

It is found that \(19\,\% \) of the vehicles are exceeding the speed limit of \(s\,{\text{km}}\,{{\text{h}}^{ - 1}}\).

Find the value of \(s\) , correct to the nearest integer.

[2]
c.

There is a fine of \({\text{US}}\$ 65\) for exceeding the speed limit on Peterson Road. On a particular day the total value of fines issued was \({\text{US}}\$ 14\,820\).

(i)     Calculate the number of fines that were issued on this day.

(ii)    Estimate the total number of vehicles that passed the speed camera on Peterson Road on this day.

[4]
d.



One day the numbers of customers at three cafés, “Alan’s Diner” ( \(A\) ), “Sarah’s Snackbar” ( \(S\) ) and “Pete’s Eats” ( \(P\) ), were recorded and are given below.


     17 were customers of Pete’s Eats only
     27 were customers of Sarah’s Snackbar only
     15 were customers of Alan’s Diner only
     10 were customers of Pete’s Eats and Sarah’s Snackbar but not Alan’s Diner
     8 were customers of Pete’s Eats and Alan’s Diner but not Sarah’s Snackbar

Some of the customers in each café were given survey forms to complete to find out if they were satisfied with the standard of service they received.

Draw a Venn Diagram, using sets labelled \(A\) , \(S\) and \(P\) , that shows this information.

[3]
A.a.

There were 48 customers of Pete’s Eats that day. Calculate the number of people who were customers of all three cafés.

[2]
A.b.

There were 50 customers of Sarah’s Snackbar that day. Calculate the total number of people who were customers of Alan’s Diner.

[3]
A.c.

Write down the number of customers of Alan’s Diner that were also customers of Pete’s Eats.

[1]
A.d.

Find \(n[(S \cup P) \cap A']\).

[2]
A.e.

One of the survey forms was chosen at random, find the probability that the form showed “Dissatisfied”;

[2]
B.a.

One of the survey forms was chosen at random, find the probability that the form showed “Satisfied” and was completed at Sarah’s Snackbar;

[2]
B.b.

One of the survey forms was chosen at random, find the probability that the form showed “Dissatisfied”, given that it was completed at Alan’s Diner.

[2]
B.c.

A \({\chi ^2}\) test at the \(5\% \) significance level was carried out to determine whether there was any difference in the level of customer satisfaction in each of the cafés.

Write down the null hypothesis, \({{\text{H}}_0}\) , for the \({\chi ^2}\) test.

[1]
B.d.

A \({\chi ^2}\) test at the \(5\% \) significance level was carried out to determine whether there was any difference in the level of customer satisfaction in each of the cafés.

Write down the number of degrees of freedom for the test.

[1]
B.e.

A \({\chi ^2}\) test at the \(5\% \) significance level was carried out to determine whether there was any difference in the level of customer satisfaction in each of the cafés.

Using your graphic display calculator, find \({\chi ^2}_{calc}\) .

[2]
B.f.

A \({\chi ^2}\) test at the \(5\% \) significance level was carried out to determine whether there was any difference in the level of customer satisfaction in each of the cafés.

State, giving a reason, the conclusion to the test.

[2]
B.g.



The Brahma chicken produces eggs with weights in grams that are normally distributed about a mean of \(55{\text{ g}}\) with a standard deviation of \(7{\text{ g}}\). The eggs are classified as small, medium, large or extra large according to their weight, as shown in the table below.

Sketch a diagram of the distribution of the weight of Brahma chicken eggs. On your diagram, show clearly the boundaries for the classification of the eggs.

[3]
a.

An egg is chosen at random. Find the probability that the egg is
(i)     medium;
(ii)    extra large.

[4]
b.

There is a probability of \(0.3\) that a randomly chosen egg weighs more than \(w\) grams.

Find \(w\) .

[2]
c.

The probability that a Brahma chicken produces a large size egg is \(0.121\). Frank’s Brahma chickens produce \(2000\) eggs each month.

Calculate an estimate of the number of large size eggs produced by Frank’s chickens each month.

[2]
d.

The selling price, in US dollars (USD), of each size is shown in the table below.

The probability that a Brahma chicken produces a small size egg is \(0.388\).

Estimate the monthly income, in USD, earned by selling the \(2000\) eggs. Give your answer correct to two decimal places.

[3]
e.



Alex and Kris are riding their bicycles together along a bicycle trail and note the following distance markers at the given times.

Draw a scatter diagram of the data. Use 1 cm to represent 1 hour and 1 cm to represent 10 km.

[3]
a.

Write down for this set of data the mean time, \(\bar t\).

[1]
b.i.

Write down for this set of data the mean distance, \(\bar d\).

[1]
b.ii.

Mark and label the point \(M(\bar t,{\text{ }}\bar d)\) on your scatter diagram.

[2]
c.

Draw the line of best fit on your scatter diagram.

[2]
d.

Using your graph, estimate the time when Alex and Kris pass the 85 km distance marker. Give your answer correct to one decimal place.

[2]
e.

Write down the equation of the regression line for the data given.

[2]
f.

Using your equation calculate the distance marker passed by the cyclists at 10.3 hours.

[2]
g.i.

Is this estimate of the distance reliable? Give a reason for your answer.

[2]
g.ii.



A store recorded their sales of televisions during the 2010 football World Cup. They looked at the numbers of televisions bought by gender and the size of the television screens.

This information is shown in the table below; S represents the size of the television screen in inches.

The store wants to use this information to predict the probability of selling these sizes of televisions for the 2014 football World Cup.

Use the table to find the probability that

(i) a television will be bought by a female;

(ii) a television with a screen size of 32 < S ≤ 46 will be bought;

(iii) a television with a screen size of 32 < S ≤ 46 will be bought by a female;

(iv) a television with a screen size greater than 46 inches will be bought, given that it is bought by a male.

[6]
a.

The manager of the store wants to determine whether the screen size is independent of gender. A Chi-squared test is performed at the 1 % significance level.

Write down the null hypothesis.

[1]
b.

The manager of the store wants to determine whether the screen size is independent of gender. A Chi-squared test is performed at the 1 % significance level.

Show that the expected frequency for females who bought a screen size of 32 < S ≤ 46, is 79, correct to the nearest integer.

[2]
c.

The manager of the store wants to determine whether the screen size is independent of gender. A Chi-squared test is performed at the 1 % significance level.

Write down the number of degrees of freedom.

[1]
d.

The manager of the store wants to determine whether the screen size is independent of gender. A Chi-squared test is performed at the 1 % significance level.

Write down the \({\chi ^2}\) calculated value.

[2]
e.

The manager of the store wants to determine whether the screen size is independent of gender. A Chi-squared test is performed at the 1 % significance level.

Write down the critical value for this test.

[1]
f.

The manager of the store wants to determine whether the screen size is independent of gender. A Chi-squared test is performed at the 1 % significance level.

Determine if the null hypothesis should be accepted. Give a reason for your answer.

[2]
g.



Pam has collected data from a group of 400 IB Diploma students about the Mathematics course they studied and the language in which they were examined (English, Spanish or French). The summary of her data is given below.

A student is chosen at random from the group. Find the probability that the student

(i)     studied Mathematics HL;

(ii)    was examined in French;

(iii)   studied Mathematics HL and was examined in French;

(iv)   did not study Mathematics SL and was not examined in English;

(v)    studied Mathematical Studies SL given that the student was examined in Spanish.

[8]
a.

Pam believes that the Mathematics course a student chooses is independent of the language in which the student is examined.

Using your answers to parts (a) (i), (ii) and (iii) above, state whether there is any evidence for Pam’s belief. Give a reason for your answer.

[2]
b.

Pam decides to test her belief using a Chi-squared test at the \(5\% \) level of significance.

(i)     State the null hypothesis for this test.

(ii)    Show that the expected number of Mathematical Studies SL students who took the examination in Spanish is \(41.3\), correct to 3 significant figures.

[3]
c.

Write down

(i)     the Chi-squared calculated value;

(ii)    the number of degrees of freedom;

(iii)   the Chi-squared critical value.

[4]
d.

State, giving a reason, whether there is sufficient evidence at the \(5\% \) level of significance that Pam’s belief is correct.

[2]
e.



Jorge conducted a survey of \(200\) drivers. He asked two questions:

How long have you had your driving licence?
Do you wear a seat belt when driving?

The replies are summarized in the table below.

Jorge applies a \({\chi ^2}\) test at the \(5\% \) level to investigate whether wearing a seat belt is associated with the time a driver has had their licence.

(i)     Write down the null hypothesis, \({{\text{H}}_0}\).

(ii)    Write down the number of degrees of freedom.

(iii)   Show that the expected number of drivers that wear a seat belt and have had their driving licence for more than \(15\) years is \(22\), correct to the nearest whole number.

(iv)   Write down the \({\chi ^2}\) test statistic for this data.

(v)    Does Jorge accept \({{\text{H}}_0}\) ? Give a reason for your answer.

[8]
a.

Consider the \(200\) drivers surveyed. One driver is chosen at random. Calculate the probability that

(i)     this driver wears a seat belt;

(ii)    the driver does not wear a seat belt, given that the driver has held a licence for more than \(15\) years.

[4]
b.

Two drivers are chosen at random. Calculate the probability that

(i)     both wear a seat belt.

(ii)    at least one wears a seat belt.

[6]
c.



The figure below shows the lengths in centimetres of fish found in the net of a small trawler.

Find the total number of fish in the net.

[2]
a.

Find (i) the modal length interval,

(ii) the interval containing the median length,

(iii) an estimate of the mean length.

[5]
b.

(i) Write down an estimate for the standard deviation of the lengths.

(ii) How many fish (if any) have length greater than three standard deviations above the mean?

[3]
c.

The fishing company must pay a fine if more than 10% of the catch have lengths less than 40cm.

Do a calculation to decide whether the company is fined.

[2]
d.

A sample of 15 of the fish was weighed. The weight, W was plotted against length, L as shown below.

Exactly two of the following statements about the plot could be correct. Identify the two correct statements.

Note: You do not need to enter data in a GDC or to calculate r exactly.

(i) The value of r, the correlation coefficient, is approximately 0.871.

(ii) There is an exact linear relation between W and L.

(iii) The line of regression of W on L has equation W = 0.012L + 0.008 .

(iv) There is negative correlation between the length and weight.

(v) The value of r, the correlation coefficient, is approximately 0.998.

(vi) The line of regression of W on L has equation W = 63.5L + 16.5.

[2]
e.



On one day 180 flights arrived at a particular airport. The distance travelled and the arrival status for each incoming flight was recorded. The flight was then classified as on time, slightly delayed, or heavily delayed.

The results are shown in the following table.

A χ2 test is carried out at the 10 % significance level to determine whether the arrival status of incoming flights is independent of the distance travelled.

The critical value for this test is 7.779.

A flight is chosen at random from the 180 recorded flights.

State the alternative hypothesis.

[1]
a.

Calculate the expected frequency of flights travelling at most 500 km and arriving slightly delayed.

[2]
b.

Write down the number of degrees of freedom.

[1]
c.

Write down the χ2 statistic.

[2]
d.i.

Write down the associated p-value.

[1]
d.ii.

State, with a reason, whether you would reject the null hypothesis.

[2]
e.

Write down the probability that this flight arrived on time.

[2]
f.

Given that this flight was not heavily delayed, find the probability that it travelled between 500 km and 5000 km.

[2]
g.

Two flights are chosen at random from those which were slightly delayed.

Find the probability that each of these flights travelled at least 5000 km.

[3]
h.



A random sample of 167 people who own mobile phones was used to collect data on the amount of time they spent per day using their phones. The results are displayed in the table below.

Manuel conducts a survey on a random sample of 751 people to see which television programme type they watch most from the following: Drama, Comedy, Film, News. The results are as follows.

Manuel decides to ignore the ages and to test at the 5 % level of significance whether the most watched programme type is independent of gender.

State the modal group.

[1]
i.a.

Use your graphic display calculator to calculate approximate values of the mean and standard deviation of the time spent per day on these mobile phones.

[3]
i.b.

On graph paper, draw a fully labelled histogram to represent the data.

[4]
i.c.

Draw a table with 2 rows and 4 columns of data so that Manuel can perform a chi-squared test.

[3]
ii.a.

State Manuel’s null hypothesis and alternative hypothesis.

[1]
ii.b.

Find the expected frequency for the number of females who had ‘Comedy’ as their most-watched programme type. Give your answer to the nearest whole number.

[2]
ii.c.

Using your graphic display calculator, or otherwise, find the chi-squared statistic for Manuel’s data.

[3]
ii.d.

(i) State the number of degrees of freedom available for this calculation.

(ii) State his conclusion.

[3]
ii.e.



Francesca is a chef in a restaurant. She cooks eight chickens and records their masses and cooking times. The mass m of each chicken, in kg, and its cooking time t, in minutes, are shown in the following table.

Draw a scatter diagram to show the relationship between the mass of a chicken and its cooking time. Use 2 cm to represent 0.5 kg on the horizontal axis and 1 cm to represent 10 minutes on the vertical axis.

[4]
a.

Write down for this set of data

(i) the mean mass, \(\bar m\) ;

(ii) the mean cooking time, \(\bar t\) .

[2]
b.

Label the point \({\text{M}}(\bar m,\bar t)\) on the scatter diagram.

[1]
c.

Draw the line of best fit on the scatter diagram.

[2]
d.

Using your line of best fit, estimate the cooking time, in minutes, for a 1.7 kg chicken.

[2]
e.

Write down the Pearson’s product–moment correlation coefficient, r .

[2]
f.

Using your value for r , comment on the correlation.

[2]
g.

The cooking time of an additional 2.0 kg chicken is recorded. If the mass and cooking time of this chicken is included in the data, the correlation is weak.

(i) Explain how the cooking time of this additional chicken might differ from that of the other eight chickens.

(ii) Explain how a new line of best fit might differ from that drawn in part (d).

[2]
h.



A survey of \(400\) people is carried out by a market research organization in two different cities, Buenos Aires and Montevideo. The people are asked which brand of cereal they prefer out of Chocos, Zucos or Fruti. The table below summarizes their responses.

The following table shows the cost in \({\text{AUD}}\) of seven paperback books chosen at random, together with the number of pages in each book.

One person is chosen at random from those surveyed. Find the probability that this person

(i) does not prefer Zucos;

(ii) prefers Chocos, given that they live in Montevideo.

[4]
i.a.

Two people are chosen at random from those surveyed. Find the probability that they both prefer Fruti.

[3]
i.b.

The market research organization tests the survey data to determine whether the brand of cereal preferred is associated with a city. A chi-squared test at the \(5\% \) level of significance is performed.

State the null hypothesis.

[1]
i.c.

The market research organization tests the survey data to determine whether the brand of cereal preferred is associated with a city. A chi-squared test at the \(5\% \) level of significance is performed.

State the number of degrees of freedom.

[1]
i.d.

The market research organization tests the survey data to determine whether the brand of cereal preferred is associated with a city. A chi-squared test at the \(5\% \) level of significance is performed.

Show that the expected frequency for the number of people who live in Montevideo and prefer Zucos is \(63\).

[2]
i.e.

The market research organization tests the survey data to determine whether the brand of cereal preferred is associated with a city. A chi-squared test at the \(5\% \) level of significance is performed.

Write down the chi-squared statistic for this data.

[2]
i.f.

The market research organization tests the survey data to determine whether the brand of cereal preferred is associated with a city. A chi-squared test at the \(5\% \) level of significance is performed.

State whether the market research organization would accept the null hypothesis. Clearly justify your answer.

[2]
i.g.

Plot these pairs of values on a scatter diagram. Use a scale of \(1{\text{ cm}}\) to represent \(50\) pages on the horizontal axis and \(1{\text{ cm}}\) to represent \(1{\text{ AUD}}\) on the vertical axis.

[3]
ii.a.

Write down the linear correlation coefficient, \(r\), for the data.

[2]
ii.b.

Stephen wishes to buy a paperback book which has \(350\) pages in it. He plans to draw a line of best fit to determine the price. State whether or not this is an appropriate method in this case and justify your answer.

[2]
ii.c.



Part A

A university required all Science students to study one language for one year. A survey was carried out at the university amongst the 150 Science students. These students all studied one of either French, Spanish or Russian. The results of the survey are shown below.

Ludmila decides to use the \({\chi ^2}\) test at the \(5\% \) level of significance to determine whether the choice of language is independent of gender.

At the end of the year, only seven of the female Science students sat examinations in Science and French. The marks for these seven students are shown in the following table.

State Ludmila’s null hypothesis.

[1]
A.a.

Write down the number of degrees of freedom.

[1]
A.b.

Find the expected frequency for the females studying Spanish.

[2]
A.c.

Use your graphic display calculator to find the \({\chi ^2}\) test statistic for this data.

[2]
A.d.

State whether Ludmila accepts the null hypothesis. Give a reason for your answer.

[2]
A.e.

Draw a labelled scatter diagram for this data. Use a scale of \(2{\text{ cm}}\) to represent \(10{\text{ marks}}\) on the \(x\)-axis (\(S\)) and \(10{\text{ marks}}\) on the \(y\)-axis (\(F\)).

[4]
B.a.

Use your graphic calculator to find

(i)     \({\bar S}\), the mean of \(S\) ;

(ii)    \({\bar F}\), the mean of \(F\) .

 

[2]
B.b.

Plot the point \({\text{M}}(\bar S{\text{, }}\bar F)\) on your scatter diagram.

[1]
B.c.

Use your graphic display calculator to find the equation of the regression line of \(F\) on \(S\) .

[2]
B.d.

Draw the regression line on your scatter diagram.

[2]
B.e.

Carletta’s mark on the Science examination was \(44\). She did not sit the French examination.

Estimate Carletta’s mark for the French examination.

[2]
B.f.

Monique’s mark on the Science examination was 85. She did not sit the French examination. Her French teacher wants to use the regression line to estimate Monique’s mark.

State whether the mark obtained from the regression line for Monique’s French examination is reliable. Justify your answer.

 

[2]
B.g.



For an ecological study, Ernesto measured the average concentration \((y)\) of the fine dust, \({\text{PM}}10\), in the air at different distances \((x)\) from a power plant. His data are represented on the following scatter diagram. The concentration of \({\text{PM}}10\) is measured in micrograms per cubic metre and the distance is measured in kilometres.

His data are also listed in the following table.

Use the scatter diagram to find the value of \(a\) and of \(b\) in the table.

[2]
a.

Calculate

i)      \({\bar x}\) , the mean distance from the power plant;

ii)     \({\bar y}\) , the mean concentration of \({\text{PM}}10\) ;

iii)    \(r\) , the Pearson’s product–moment correlation coefficient.

[4]
b.

Write down the equation of the regression line \(y\) on \(x\) .

[2]
c.

Ernesto’s school is located \(14\,{\text{km}}\) from the power plant. He uses the equation of the regression line to estimate the concentration of \({\text{PM}}10\) in the air at his school.

i)     Calculate the value of Ernesto’s estimate.

ii)    State whether Ernesto’s estimate is reliable. Justify your answer.

[4]
d.



A manufacturer claims that fertilizer has an effect on the height of rice plants. He measures the height of fertilized and unfertilized plants. The results are given in the following table.

A chi-squared test is performed to decide if the manufacturer’s claim is justified at the 1 % level of significance.

The population of fleas on a dog after t days, is modelled by

\[N = 4 \times {(2)^{\frac{t}{4}}},{\text{ }}t \geqslant 0\]

Some values of N are shown in the table below.

Write down the null and alternative hypotheses for this test.

[2]
i, a.

For the number of fertilized plants with height greater than 75 cm, show that the expected value is 97.5.

[3]
i, b.

Write down the value of \(\chi_{calc}^2\).

[2]
i, c.

Write down the number of degrees of freedom.

[1]
i, d.

Is the manufacturer’s claim justified? Give a reason for your answer.

[2]
i, f.

Write down the value of p.

[1]
ii, a, i.

Write down the value of q.

[2]
ii, a, ii.

Using the values in the table above, draw the graph of N for 0 ≤ t ≤ 20. Use 1 cm to represent 2 days on the horizontal axis and 1 cm to represent 10 fleas on the vertical axis.

[6]
ii, b.

Use your graph to estimate the number of days for the population of fleas to reach 55.

[2]
ii, c.



The weight, W, of basketball players in a tournament is found to be normally distributed with a mean of 65 kg and a standard deviation of 5 kg.

The probability that a basketball player has a weight that is within 1.5 standard deviations of the mean is q.

A basketball coach observed 60 of her players to determine whether their performance and their weight were independent of each other. Her observations were recorded as shown in the table.

She decided to conduct a χ 2 test for independence at the 5% significance level.

Find the probability that a basketball player has a weight that is less than 61 kg.

[2]
a.i.

In a training session there are 40 basketball players.

Find the expected number of players with a weight less than 61 kg in this training session.

[2]
a.ii.

Sketch a normal curve to represent this probability.

[2]
b.i.

Find the value of q.

[1]
b.ii.

Given that P(W > k) = 0.225 , find the value of k.

[2]
c.

For this test state the null hypothesis.

[1]
d.i.

For this test find the p-value.

[2]
d.ii.

State a conclusion for this test. Justify your answer.

[2]
e.



The heat output in thermal units from burning \(1{\text{ kg}}\) of wood changes according to the wood’s percentage moisture content. The moisture content and heat output of \(10\) blocks of the same type of wood each weighing \(1{\text{ kg}}\) were measured. These are shown in the table.

Draw a scatter diagram to show the above data. Use a scale of \(2{\text{ cm}}\) to represent \(10\% \) on the x-axis and a scale of \(2{\text{ cm}}\) to represent \(10\) thermal units on the y-axis.

[4]
a.

Write down
(i)     the mean percentage moisture content, \(\bar x\) ;
(ii)    the mean heat output, \(\bar y\) .

[2]
b.

Plot the point \((\bar x{\text{, }}\bar y)\) on your scatter diagram and label this point M .

[2]
c.

Write down the product-moment correlation coefficient, \(r\) .

[2]
d.

The equation of the regression line \(y\) on \(x\) is \(y = - 0.470x + 83.7\) . Draw the regression line \(y\) on \(x\) on your scatter diagram.

[2]
e.

The equation of the regression line \(y\) on \(x\) is \(y = - 0.470x + 83.7\) . Estimate the heat output in thermal units of a \(1{\text{ kg}}\) block of wood that has \(25\% \) moisture content.

[2]
f.

The equation of the regression line \(y\) on \(x\) is \(y = - 0.470x + 83.7\) . State, with a reason, whether it is appropriate to use the regression line \(y\) on \(x\) to estimate the heat output in part (f).

[2]
g.



In an environmental study of plant diversity around a lake, a biologist collected data about the number of different plant species (y) that were growing at different distances (x) in metres from the lake shore.

Draw a scatter diagram to show the data. Use a scale of 2 cm to represent 10 metres on the x-axis and 2 cm to represent 10 plant species on the y-axis.

[4]
a.

Using your scatter diagram, describe the correlation between the number of different plant species and the distance from the lake shore.

[1]
b.

Use your graphic display calculator to write down \(\bar x\), the mean of the distances from the lake shore.

[1]
c.i.

Use your graphic display calculator to write down \(\bar y\), the mean number of plant species.

[1]
c.ii.

Plot the point (\(\bar x\), \(\bar y\)) on your scatter diagram. Label this point M.

[2]
d.

Write down the equation of the regression line y on x for the above data.

[2]
e.

Draw the regression line y on x on your scatter diagram.

[2]
f.

Estimate the number of plant species growing 30 metres from the lake shore.

[2]
g.



George leaves a cup of hot coffee to cool and measures its temperature every minute. His results are shown in the table below.

Write down the decrease in the temperature of the coffee

(i) during the first minute (between t = 0 and t =1) ;

(ii) during the second minute;

(iii) during the third minute.

[3]
a.

Assuming the pattern in the answers to part (a) continues, show that \(k = 19\).

[2]
b.

Use the seven results in the table to draw a graph that shows how the temperature of the coffee changes during the first six minutes.

Use a scale of 2 cm to represent 1 minute on the horizontal axis and 1 cm to represent 10 °C on the vertical axis.

[4]
c.

The function that models the change in temperature of the coffee is y = p (2t )+ q.

(i) Use the values t = 0 and y = 94 to form an equation in p and q.

(ii) Use the values t =1 and y = 54 to form a second equation in p and q.

[2]
d.

Solve the equations found in part (d) to find the value of p and the value of q.

[2]
e.

The graph of this function has a horizontal asymptote.

Write down the equation of this asymptote.

[2]
f.

George decides to model the change in temperature of the coffee with a linear function using correlation and linear regression.

Use the seven results in the table to write down

(i) the correlation coefficient;

(ii) the equation of the regression line y on t.

[4]
g.

Use the equation of the regression line to estimate the temperature of the coffee at t = 3.

[2]
h.

Find the percentage error in this estimate of the temperature of the coffee at t = 3.

[2]
i.



A biologist is studying the relationship between the number of chirps of the Snowy Tree cricket and the air temperature. He records the chirp rate, \(x\), of a cricket, and the corresponding air temperature, \(T\), in degrees Celsius.

The following table gives the recorded values.

Draw the scatter diagram for the above data. Use a scale of 2 cm for 20 chirps on the horizontal axis and 2 cm for 4°C on the vertical axis.

[4]
a.

Use your graphic display calculator to write down the Pearson’s product–moment correlation coefficient, \(r\), between \(x\) and \(T\).

[2]
b.

Interpret the relationship between \(x\) and \(T\) using your value of \(r\).

[2]
c.

Use your graphic display calculator to write down the equation of the regression line \(T\) on \(x\). Give the equation in the form \(T = ax + b\).

[2]
d.

Calculate the air temperature when the cricket’s chirp rate is \(70\).

[2]
e.

Given that \(\bar x = 70\), draw the regression line \(T\) on \(x\) on your scatter diagram.

[2]
f.

A forest ranger uses her own formula for estimating the air temperature. She counts the number of chirps in 15 seconds, \(z\), multiplies this number by \(0.45\) and then she adds \(10\).

Write down the formula that the forest ranger uses for estimating the temperature, \(T\).

Give the equation in the form \(T = mz + n\).

[1]
g.

A cricket makes 20 chirps in 15 seconds.

For this chirp rate

(i)     calculate an estimate for the temperature, \(T\), using the forest ranger’s formula;

(ii)     determine the actual temperature recorded by the biologist, using the table above;

(iii)     calculate the percentage error in the forest ranger’s estimate for the temperature, compared to the actual temperature recorded by the biologist.

[6]
h.