Mathematical Statistics

Home Admissions Course Guide Mathematical Statistics

This Course Guide has been taken from the most recent presentation of the course. It would be useful for reference purposes but please note that there may be updates for the following presentation.

STAT S347

Mathematical Statistics

Welcome to STAT S347 Mathematical Statistics. This course provides the mathematical theory underlying the methods and concepts used in practical statistical analyses. In this course, we adopt materials from The Open University (UK) course M347 Mathematical Statistics.

The course consists of 14 units, divided into four blocks.

  • Block 1: Review and distribution theory
  • Block 2: Classical inference
  • Block 3: Bayesian statistics
  • Block 4: Linear modelling

The first block (Units 1-3) starts with a review unit of basic statistical ideas and mathematical techniques required for the course, followed by two units introducing distribution theory. The second block (Units 4-7) is about the 'classical' approach to statistical inference, while the third block (Units 8-10) covers the 'Bayesian' approach to statistical inference. The subject of the final block (Units 11-14) is linear modelling, which is considered from both the classical and Bayesian viewpoints.

 

Professional accreditation

STAT S347 is one of the core modules to fulfill the requirements of the Graduate Statistician Membership professional award conferred by the Hong Kong Statistical Society (HKSS). For further information, see the HKSS website.

 

Prerequisite knowledge

The course assumes that you have a basic knowledge of the ideas and concepts of statistical science at the level of MATH S280 Statistical Methods for Decision Analysis. Relevant topics include:

  • histograms and scatterplots;
  • normal, Poisson and binomial distributions;
  • the Central Limit Theorem;
  • point estimation;
  • maximum likelihood estimation;
  • confidence intervals;
  • hypothesis testing;
  • simple linear regression;
  • correlation.

A considerable amount of mathematics is also sometimes required for the development of the theory presented in the course. As such, you are expected to have a reasonable degree of mathematical competence, up to the level of MATH S122 Fundamental Applied Mathematics or MATH S221 Mathematical Methods. The most relevant mathematical techniques are:

  • logarithmic and exponential functions;
  • calculus (including Taylor series);
  • algebra (including manipulation of inequalities);
  • matrices.

All of these statistical and mathematical topics are reviewed in the course, but if you are not familiar with these areas from other courses you have studied, you are strongly advised to begin your study of Unit 1 as early as possible, and to allow plenty of time for working through that unit.

 

Course aims

This course aims to:

  • provide an advanced understanding of the principles of probability, standard univariate and multivariate continuous distributions and their applications in a variety of real-world problems;
  • develop a solid understanding of decision theory, point estimation, confidence intervals, and hypothesis testing;
  • equip students with a theoretical foundation in the methods of statistical inference with practical applications in data analysis;
  • provide practical training in Bayesian statistics and its applications in science, business, economic and financial mathematics; and
  • develop knowledge and understanding of the use of the Bayesian approach to modelling the regression and the generalized linear models.

Course learning outcomes

Upon completion of STAT S347, you should be able to:

  • discuss and apply the concepts of various probability distributions including chi-squared, Student's t, F, gamma and beta distributions;
  • perform basic operations of bivariate normal distribution in both its general and 'standard' forms;
  • work with the expectation and variance of a linear combination of random variables;
  • understand the elements of hypothesis testing, including null and alternative hypotheses, significance level and power, the test statistic, null distribution, rejection region and p-value;
  • understand the formal statement of the Central Limit Theorem, use its result in practice, and appreciate its limitation to random variables with finite variance;
  • compare some popular link functions and their properties, and use general results for the exponential dispersion family to calculate the canonical link function for a member of the family; and
  • understand how Bayesian inference is set up as a statistical decision problem involving a set of possible decisions and a loss function measuring how good each decision is.

The following chart gives a general overview of the course structure.

Unit Title Weeks spent Assessment activity
(end of unit)
Block 1
1 Starting points 2  
2 Univariate continuous distribution theory 2  
3 Multivariate continuous distribution theory 3 Assignment 1
Block 2
4 Basic ideas of statistical inference 2  
5 Point estimation 2  
6 Hypothesis tests and confidence intervals 2  
7 Asymptotic theory 2 Assignment 2
Block 3
8 Prior to posterior 2  
9 Bayesian inference 3  
10 Markov chain Monte Carlo 2 Assignment 3
Block 4
11 Linear regression 2  
12 Multiple regression 2  
13 General and generalised linear models 4 Assignment 4
14 Bayesian modelling 3  
Revision 3  
Total 36  

The original OU (UK) course M347 was delivered online. Here in this HKMU course, however, the content will mainly be delivered through printed units, with the PDF files uploaded to the course Online Learning Environment (OLE). You will therefore need to be aware of some special features you may come across as you work through this adapted HKMU print-based course. Firstly, you should ignore references to the course code M347 in the units and assume that they refer to STAT S347. The word 'module' means the course. In the original OU online course, animations and screencasts were linked directly into the units. When you come across references to these in STAT S347, please be aware that these items can be found on the course OLE instead of through a Web link. In addition, you may sometimes find icons in the units referring to certain terms. As these relate only to the original materials, please ignore them.

In this section, you will find further details about working through the course.

 

Study units

A summary of the contents of the units is provided below.

 

Unit 1

Unit 1 is largely a review unit. To some of you, it might be wholly a review unit, to others it might introduce one or two additional things that you have not previously studied in detail. Unit 1 reviews some parts of the statistical background that you should already have, or can readily attain, in order to study the rest of the course; it also reviews the main mathematical techniques that will be employed during the course. (Review of other relevant elements of statistical background will be delayed until nearer to the time they are used, in Units 4 and 8, the first units of Blocks 2 and 3.)

The style of Unit 1 and the quantity and role of its extra exercises are somewhat different from later units, as described in the Unit 1 introduction.

The other two units in Block 1 comprise an introduction to the theory of continuous distributions, that is, models for quantities that vary randomly on a continuous scale.

 

Unit 2

Unit 2 specifically concerns models for 'univariate' continuous random variables. 'Univariate' is the statistician's favoured word for 'one-dimensional'. The unit concerns a number of basic properties of univariate continuous distributions, many of which you are probably already familiar with. (No real problem if you aren't; the unit takes things pretty much from scratch.)

In Unit 2, however, the above notions are developed in a more mathematical manner than you might have seen before. You will develop skills to be able to calculate and develop formulae for these quantities yourself. Calculus, especially integration, will be particularly important here. (This is one of the mathematical techniques reviewed in Unit 1.)

 

Unit 3

In Unit 3, the mathematical structure of 'multivariate' continuous distributions will be explored. 'Multivariate' is statistician-speak for 'multi-dimensional'. Once there is more than one variable involved, new issues arise about the way variables depend on one another. The joint behaviour of collections of variables is important. Multivariate variables therefore have joint distributions. These are defined in Unit 3, although some important aspects of joint distributions are still univariate, such as distributions of individual variables (so-called 'marginal distributions') and distributions of individual variables conditional on the values of other variables (so-called 'conditional distributions').

Dependence between variables can partly be understood through the concepts of 'covariance' and 'correlation', which are also investigated in this unit. Another statistical notion that you are probably already aware of, concerned with dependence structure, is regression (and allied methods). This will not be studied until Block 4.

Many of the general issues concerning dependence between several variables are present in the bivariate case, so much of Unit 3 takes place in this context, with extension to the full multivariate case only towards its end. (Yes, you have the idea by now: 'bivariate' is statistician-speak for 'bi-dimensional', or 'two-dimensional'.)

You should be warned that Unit 3 is the longest in the whole course although, of course, you are given correspondingly longer time in which to study it.

 

Unit 4

Block 2 starts in Unit 4 with a review of the key concepts of classical statistical inference. All of these concepts are introduced without mathematical detail, the purpose being to review the basic ideas of classical inference before Units 5–7 delve into the underlying mathematical statistics. Unit 4 is quite short; indeed, it is the shortest unit in the whole course.

 

Unit 5

It is often desirable to give a single estimate for an unknown parameter. The process of obtaining such a single estimate is known as 'point estimation' and is the subject of Unit 5. The main method of point estimation considered in Unit 5 is 'maximum likelihood estimation', in which a single estimate for a parameter is found by maximizing the 'likelihood function' using calculus. A number of general properties associated with point estimation are also considered in Unit 5.

 

Unit 6

Because of sampling variability, it is almost certain that a single estimate of a parameter is not equal to the true value of the parameter. It may therefore be desirable to obtain instead an interval of values which one is confident contains the true value of the parameter. Such intervals are known as 'confidence intervals' and are explored in Unit 6. The majority of Unit 6 is taken up with the related concept of 'hypothesis tests', which considers evidence for and against contrasting hypotheses about the true value, or range of values, of a parameter. General, principled, approaches are taken to derive hypothesis tests and confidence intervals in this unit, in contrast, perhaps, to more ad hoc or specific approaches that you might have encountered before (and will have been briefly reminded of in Unit 4).

 

Unit 7

Finally in Block 2, Unit 7 explores 'asymptotic theory' which describes the behaviour of quantities based on a sample of observations of a random variable as the sample size gets large. Asymptotic theory is very important to statistics as it provides the theoretical justification for many of the approximations — with some of which you are already familiar, such as the Central Limit Theorem — which are applied to practical problems. As well as a number of important general results, asymptotic properties of maximum likelihood estimation are investigated in some detail in this unit.

Unit 7 is probably the most 'abstractly mathematical' in STAT S347, and so might have a rather different feel than most other units in the course. (If you don't like it too much, be assured that the rest of the course is not written in the same vein.)

 

Units 8 and 9

As you might have suspected, Bayes' Theorem plays a major role in the Bayesian approach to statistics and is used to combine the information contained in the data with any information external to the data. Unit 8 focuses on the process of using Bayes' Theorem to combine the two sources of information about any unknown parameters. How this combined information can then be used for Bayesian inference is the subject of Unit 9.

 

Unit 10

As mentioned earlier, Bayesian statistics can be computationally difficult to implement in practice. Unit 10 explores a computational technique known as Markov chain Monte Carlo. This technique is the principal reason why Bayesian statistics became computationally feasible, and consequently popular, in the late 20th century.

 

Unit 11

The block starts, in Unit 11, with a detailed study of linear regression with a single explanatory variable. The random variation in the response variable over and above the contribution to its value made by the explanatory variable is here modelled by a normal distribution.

Linear regression with one explanatory variable is treated first in a classical manner (resulting in various formulae with which you might already be familiar), and then in a Bayesian manner. In this unit, the latter is (very) restricted to use of a specific, improper, prior which has the property of leading to the same results as the classical case, albeit with a different interpretation.

 

Unit 12

Unit 12 extends the ideas of Unit 11 to 'multiple regression', the name given to linear regression with two or more explanatory variables.

This unit is a little shorter than other 'full length' (whatever that might be!) units.

 

Unit 13

The normality assumption made in Units 11 and 12 is, in some ways, rather limiting, and Unit 13 considers two important extensions to such regression models.

  • The 'General Linear Model', which despite its grand title is a relatively minor extension of multiple regression, discards normality, but continues with normality-related 'least squares' parameter estimation. The focus here is more on the structure of the explanatory variables and how to cope with awkward, but practically important, cases (such as the effects of 'treatments' in medical, industrial and agricultural experiments).
  • The 'Generalised Linear Model', another grand title, is a more far-reaching extension of regression. It allows a much wider variety of response variable types in the same unified framework, no longer needing to be normally distributed or even continuous; responses can even be binary.

Unit 14

Unit 14 considers further Bayesian linear modelling, incorporating more general prior information than used in Unit 11. This unit will explore the Bayesian approach to modelling each of the following:

  • linear regression models with one explanatory variable;
  • multiple regression models;
  • the generalised linear model.

This closing unit is in length somewhere between a half and two-thirds of most units in the course.

 

Format of the units

The units contain the following elements:

  • Main course text
  • Exercises — see the subsection below for more details
  • Examples — to assist with your learning
  • Bold terms within text — to highlight important terms
  • Boxed material — to emphasize important material
  • Animations — see the subsection below for more details
  • Screencasts and audio — see the subsection below for more details.

Exercises

Throughout STAT S347 there are many exercises integrated into each unit. To help your learning you should try to do each exercise as you come to it, and so for this purpose you should always have a pen and paper handy next to you. The solutions to each part of an exercise can be found at the end of the unit.

A note on accuracy: It is worth noting that statisticians are pretty relaxed about the number of decimal places or significant figures to which numerical answers are given. In the numerical exercises in the text, a desired number of decimal places is sometimes specified. Often, however, you are not asked to display your answers to a given number of decimal places. In such cases, you should use a sensible number, displaying your final answer in a way that is consistent with your intermediate working. Remember, there is no sense in being ultra-exact in (most of) your numerical computations when the statistical modelling process concerns numerous assumptions that are rarely exactly true in practice. Modelling approximations therefore correspond to approximations in answers to real-life questions, the inaccuracies of which far outweigh mathematical worries about, say, the fifth or sixth decimal place.

Having said this, however, if you are asked to give your answer to the specified number of decimal places in an assignment or the exam, do follow the instructions you are given.

Animations

Many of the units contain animations which are designed to help your understanding of various aspects of STAT S347. The animations can be viewed in the Course Materials section of the course OLE. To enhance your learning, we recommend that you view the animations as you come to them.

If an animation has an associated audio description, you will need to have speakers or headphones connected to your computer in order to hear the audio.

Screencasts and audio

Several units have screencasts, which are short audio-visual presentations explaining a particular aspect of a unit. You will need to have speakers or headphones connected to your computer to hear the screencast. When a screencast is referred to in a unit, turn to the Course Materials section of the OLE to find and play it.

 

Extra exercises

In addition to the exercises which are integrated into the units and which you should attempt, each unit also has a set of extra exercises which is optional. The extra exercises for each unit can be found in PDF form on the course OLE. Please note that these are in soft copy format only and you will not be sent a printed version. If you feel that you would like to have some extra practice with a particular topic, then it would be a good idea to have a look at the extra exercises.

You might also find the extra exercises useful for your revision. But please do not feel that you need to do all (or indeed any!) of the extra exercises: they are there as additional help for those students who would like to use them. The extra exercises have been written so that you can 'dip into' them and do as many (or as few) as you wish to do. As such, even though many of the extra exercises do follow on from each other, they are written as 'stand-alone' exercises and each extra exercise is written on the assumption that students may not have done any of the previous extra exercises.

Optional material

Some units (specifically Units 5, 6, 9 and 13) have some optional material associated with them. This material has been uploaded to the OLE for completeness of the course for those of you who would like to see proofs of some of the more difficult results in these units. It should be emphasized, though, that you do not have to look at this material and it certainly won't be assessed: the optional material is most definitely optional!

 

The OLE

As mentioned previously, you will need to access the OLE in order to access some of the course materials. In addition, you can use the OLE to submit your assignments, view course announcements and communicate with your tutor and fellow students on the course discussion board.

 

Presentation Schedule

The Presentation Schedule for this course can be found on the course OLE. It shows you how long to spend on each unit and when to attend tutorials and submit assignments.

 

Equipment needed

Calculator

You will need a calculator with basic mathematical functions (exp, log, square root, etc.), but not necessarily with statistical functions. You will be allowed to bring a calculator into the examination, but only an HKMU-approved model. A list of approved calculator models can be found on the STAT S347 OLE.

Home computer

You will also need to have access to a computer with an Internet connection to access the course OLE.

STAT S347 assessment consists of continuous assessment and a final examination. The continuous assessment counts towards 30% and the final examination counts towards 70% of the overall grade.

To pass STAT S347 you must obtain at least 40% overall on both the continuous assessment and the final examination.

 

Assignment booklets

The assignment booklets contain more information about which units are covered by each assignment, and when you should submit your assignments. The assignment booklets will be sent to you during the presentation and posted on the OLE for you to download.

 

Continuous assessment

The continuous assessment for STAT S347 consists of four assignments. The best three out of the four assignments will count towards the final continuous assessment mark. Upon receiving your assignments, tutors will be required to mark them and return them to you with your scores, comments, and feedback.

TypeCoverageWeightingRequirements
Assignment 13–4 questions
Covering Units 2 to 4
33.3%3 out of 4 assignments are required
Total: 100%
Assignment 23–4 questions
Covering Units 5 to 7
33.3%
Assignment 33–4 questions
Covering Units 8 to 10
33.3%
Assignment 43–4 questions
Covering Units 11 to 13
33.3%

 

Examination

There is a three-hour examination at the end of the course which is based on the whole course. The passing threshold is 40% for the examination.

Unlike other HKMU statistics courses, there is no Formula Handbook associated with STAT S347. Any non-trivial formulae required in the examination will be included with the examination paper itself. Some will be printed in the examination questions.

The exam paper is divided into two parts:

  • Part I contains some short and MC questions that assess your general knowledge of the course material from all units.
  • Part II comprises several challenging long questions based on a problem-solving approach. The questions will assess your ability to use methods of Bayesian statistics and use statistical inference for solving problems.

Specimen examination paper

To help you prepare for the examination, you will be given a specimen examination paper on the STAT S347 OLE. You should work through it carefully together with the sample solutions.

Several kinds of HKMU support are available to you during the course. They include:

  • direct tutor support; and
  • electronic support.

Direct support

The course supports you through telephone tutoring, tutorials and surgeries.

Tutors

Each student is assigned a personal tutor. Your assignments will be marked and commented on by your tutor, who will keep an eye on your progress and assist you if you encounter problems during the course. Marked assignments will be returned to you as soon as possible.

It is good practice to keep a copy of each assignment submitted for marking, so that you can always refer any queries to the tutor during telephone conversations. Hence, please contact your tutor should the following arise:

  1. You do not understand any part of the study units or the assigned readings.
  2. You have any difficulty with self-tests.
  3. You have a question or problem with the assignments, or with your tutor's comments or grading on an assignment.

Telephone tutoring

When you have any difficulties in your studies, you may consult your tutor by telephone in the assigned time slots. The total number of hours that you can receive telephone-tutoring services is up to four hours per week. During the telephone tutoring, you can seek advice on the study topics, guidance in assignments, and help in preparing for the examination.

Tutorials

The course includes five tutorial meetings of two hours each — 10 contact hours in total. The tutorials are conducted to provide an opportunity for you to receive some course progress guidance from the tutors. In addition, you have an opportunity to share your study experiences and difficulties in your peer-to-peer group discussions. You may bring along to the tutorial any queries on the study units, assignments and specimen examination paper. Although the tutorials are not compulsory, you are encouraged to attend the tutorial meetings as far as possible.

Details of the dates, times and location of the tutorials as well as the name and phone number of your tutor will be sent to you in due course.

Surgeries

As a supplement to the telephone tutoring, the course also supplies ten surgery sessions. An assigned tutor will take care of each surgery. Each surgery aims to provide face-to-face consultation on individual students' study problem areas. You may bring along to the surgeries any queries on the study units, assignments and specimen examination paper.

 

Electronic support

Electronic mail

You may also submit your study problems to your assigned tutor through email.

Email provides flexibility to both tutors and students in overcoming the limitations of telephone tutoring for solving more technical issues.

OLE

As mentioned earlier, a course webpage will be established through the OLE for disseminating the latest information on the course, course announcements, course scheduling, and assignment submission.

STAT S347 Mathematical Statistics is a 14-unit course structured around four blocks:

  • Block 1: Review and distribution theory
  • Block 2: Classical inference
  • Block 3: Bayesian statistics
  • Block 4: Linear modelling

In addition to the printed study units, you will also be provided with extra exercises to work through on the course OLE in order to practise your skills. Support will be provided throughout the course by your tutor both online and in face-to-face sessions.

Good luck and enjoy the course!

The original OU course M347 Mathematical Statistics was produced by the following team.

Course team chairs: Chris Jones and Catriona Queen

Academic contributors: Paddy Farrington, Ian Martin, Jane Mitchell and Heather Whitaker

External assessor: Merrilee Hurn (University of Bath)

Curriculum manager: Gloria Baldi

Media project manager: Stephen Clift

Editorial media developer: Lucinda Simpson

Technical developers: Robert Hasson and Jonathan Fine

Interactive media developers: Callum Lester and Lynn Short

Graphics media developer: Jon Owen

With the assistance of: Andy Allum, Karim Anaya-Izquierdo, Robert Brignall, Alison Cadle, Jim Campbell, Martin Chiverton, Heather Clark, Mark Daniels, Anna Edgley-Smith, Rafael Hidalgo, Carol Houghton, Alba Madriz, Kevin McConway, Sandy Nicholson, Angela Noufaily, Steve Rycroft, Sue Stavert, Martin Stephenson and Jane Williams.