Linear Statistical Modelling

Home Admissions Course Guide Linear Statistical Modelling

This Course Guide has been taken from the most recent presentation of the course. It would be useful for reference purposes but please note that there may be updates for the following presentation.

MATH S346

Linear Statistical Modelling

Welcome to MATH S346 Linear Statistical Modelling.

This high level course is designed to be a second course in the statistical modelling of data. Your first course should have brought you a wide-ranging and realistic introduction to the modern practice of the science of statistics: MATH S248 Analyzing Data or MATH S280 Statistical methods for decision analysis fit that bill perfectly. Indeed, since much of the outlook and approach of MATH S346 mirrors that of MATH S248/MATH S280, a few introductory words from the MATH S248/MATH S280 Course Guide remain pertinent.

Few things in the world are clear-cut: it is rare that when information is sought to shed light on some question of interest, a definite and definitive answer emerges. Instead, a myriad of complications and provisos and apparently random elements combine to muddy the waters. Making sense of data, and hence obtaining useful and well-founded answers to important questions, is the major goal of statistics in general, and hence of this course. (Deciding what data to collect and how best to do so is another very important element of statistics.)

The statistical analysis of data is usually based on some kind of modelling of the situation of interest. However, large amounts of data and realistic models together lead to many difficult calculations. This is where the power of computers comes in. The computer can take the drudgery out of the computational side of statistics, and thus leave the analyst free to concentrate on understanding and interpreting the results.

This course is built around a powerful commercial statistical software package called GENSTAT and you will be taught how to use many of its features. A student version of GENSTAT is provided with the course and this runs under Windows and takes full advantage of the Windows interface (see Section 3.3 of this guide).

MATH S346 focuses on the statistical modelling of data using a wide variety of models that have, in a sense described in the course, a linear element to them. This may sound, at the beginning of the course, as though MATH S346 deals only with a narrow specialism, but not so: it should soon become clear that the term 'linear statistical modelling' covers a huge area of statistical science, providing a wide range of practically important and useful statistical tools. In a way, linear statistical models are just extensions of the simple linear regression model with which you should already be familiar. The latter model links a normally distributed response variable to a single quantitative explanatory variable; the more general models of this course also explore how a response variable depends on explanatory variables, but the response variable need not be normally distributed, nor even continuous (it could even just take values 0 or 1) and the explanatory variables can number more, perhaps many more, than one and take any of a variety of types.

In common with MATH S248/MATH S280, in MATH S346 the subject is developed mainly through the computer-aided analysis of data, with just a little supporting theory where necessary. Virtually all of the datasets, which are taken from a huge variety of sources and subject areas, have arisen from real-world examples addressing real-world problems.

 

1.1 The purpose of the Course Guide

This Course Guide will help you become familiar with the overall shape and structure of MATH S346. You'll also be introduced to all the components of the course. You'll find information about how the course is run, what you have to do to pass the course. The real purpose of the Course Guide is to help prepare you to get the most out of the course.

You should read through this Course Guide very carefully as the first step in your study of MATH S346, and refer back to it later, whenever you have questions about how the course is run. We expect you to spend some time in the first week of MATH S346 working through the Course Guide and becoming familiar with all the parts of the course. You can begin to learn more about MATH S346 by becoming familiar with the aims and objectives for the course.

 

1.2 Course Aims

This course is about the statistical modelling of situations in which a response variable depends on at least one explanatory variable. It offers a practical treatment of an important area of statistical methodology, applicable in a wide variety of situations. This course aims to enable students to deal with questions such as how well patients will respond to treatment given their age and the severity of their disease; or how different strains of wheat compare when grown in various conditions; or how loss due to abrasion may depend on the hardness and tensile strength of samples of rubber, etc.

 

1.3 Course Objectives

  • To understand the statistical methods including simple linear regression, one-way analysis of variance, multiple linear regression, two-way factorial analysis of variance, binary regression, Poisson regression and loglinear models for contingency taught in the course.
  • To decide which methods are appropriate to answer a given question.
  • To apply the methods and interpret the answers appropriately.

2.1 Preparatory work

The recommended prerequisite for this course is MATH S248 Analyzing Data or MATH S280 Statistical methods for decision analysis. There is no need to revise the prerequisite before MATH S346 starts because Chapters 2 and 4 of MATH S346 are largely concerned with revision of the most relevant topics from MATH S248/ MATH S280. These topics include: histograms, probability plots and transformations; the normal, t, c2, Bernoulli, binomial and Poisson distributions; confidence intervals and hypothesis testing, including t testing; maximum likelihood estimation; the central limit theorem and normal-based confidence intervals; correlation; and, of course, various aspects of simple linear regression, such as: fitting lines and making inferences; prediction intervals; and using residuals to check the assumptions.

Students who studied MATH S280 (or the earlier course MATH S248) will also have covered the necessary statistical topics, but with a slightly different emphasis: the MATH S248/MATH S280/MATH S346 approach can be picked up in Chapters 1 and 3. Other students are expected to have knowledge of the same topics, and can also 'get up to speed' from these chapters.

Mathematically, you need to be able to understand simple tables and graphs. Some familiarity with basic calculus and matrix algebra would be helpful, but is not essential. Complicated algebraic arguments are kept to a minimum. The course relies on computer-aided analysis of data rather than on pencil-and-paper arithmetic (because this is what statisticians do in practice). Students who have successfully studied MATH S248/ MATH S280 will be well prepared.

 

2.2 Calculator

You will need a calculator with the basic mathematical functions and a memory. Basic statistical functions (sample summaries and linear regression) are also a recommended attribute of your calculator but are not absolutely essential. However, your calculator must be one of the models on the list of approved calculations. The list will be posted on the OLE.

 

2.3 Computer

You will need access to a computer capable of running the GENSTAT software. If possible, you may install the software before the course begins. This will enable you to address any installation difficulties you may encounter before you start studying Chapter 3 of the course book. In this way, you can get any problems sorted out before you are required to use the software.

This course is developed by the Open University, UK, and therefore you will see the course code M346 on the cover and in the course material.

 

3.1 Couse units

The core material for this course is the book Statistical Modelling Using GENSTAT, which is written by the Open University, UK. Its contents are as follows.

Chapter 1 Introduction and Review of statistical concepts

Chapter 2 Introduction to GENSTAT

Chapter 3 Linear regression with one explanatory variable

Chapter 4 One-way analysis of variance

Chapter 5 Multiple linear regression

Chapter 6 Analysis of factorial experiments

Chapter 7 Experiments with blocking

Chapter 8 Binary regression

Chapter 9  What are generalized linear models?

Chapter 10 Diagnostic checking

Chapter 11 Loglinear models for contingency tables

Chapter 12 Further data analyses

Each chapter concludes with a brief summary of the main methodological content of the chapter. A final short postscript section, which is not assessed, mentions yet more statistical modelling tools, related to those in the course, but not covered in detail.

The course is based on data and motivated by problems. The book therefore contains many sets of data. A number of examples analyzing these datasets are worked through in the text and lots of exercises for you to do yourself are also provided. The majority of these will require use of your computer.

Each chapter, from Chapters 1 to 12, is timetabled for one to four weeks' study; see the Presentation Schedule.

The numbering of examples, exercises, tables and figures each restarts within each chapter: for example, Exercise 4.12 refers to the twelfth exercise in Chapter 4 while Figure 11.3 is the third figure in Chapter 11.

 

3.2 Solutions to exercises

Solutions are provided to all the exercises. Because you will have the opportunity, through the exercises, to work a number of ideas out for yourself, solutions to some of the exercises are quite long and, it is hope, informative!

 

3.3 GENSTAT

GENSTAT is introduced in Chapter 2 of the main course text, with the assistance of the Genstat Guide. You need not familiarize yourself with GENSTAT at all before then. GENSTAT will be introduced from scratch, and all students, regardless of statistical computing background (provided you know the basis of Windows), will be on a level playing field. The remainder of the course, from Chapter 2 onwards, will always assume use of GENSTAT and indeed will be explicit about how to use GENSTAT to achieve the necessary ends.

 

3.4 Handbook

A Handbook is provided to give you a convenient source of basic definitions and concepts for use throughout the year and during the examination. The Handbook entries are slightly expanded versions of the summaries of methodology given at the end of the course chapters. You will not be allowed to bring the Course Handbook to the exam. Another copy of the handbook will be given to you together with the exam paper.

 

3.5 Broadcasts

There are no TV or radio programmes associated with this course. Nor is there any recorded audio or video material.

 

3.6 Presentation Schedule

The Presentation Schedule sets out an overall schedule for the study weeks for each unit. Assignment cut-off dates and the dates for face-to-face sessions are also incorporated.

Although the suggested study week for each unit is just for your reference, it is important to keep to schedule. For most assignments, the cut-off date is very soon after the end of the study week for the last of the relevant units. We recommend that you try to finish the assignment questions for each unit as soon as you finish the unit, otherwise before the cut-off dates for the assignments you will have a lot of work to do in only a few days.

 

3.7 Stop presses

Stop presses act as a sort of course newsletter, containing useful information of various types. Note that all the stop presses will be posted on the OLE only and no hardcopy will be sent to you. You should always access the OLE for reading stop presses.

 

3.8 Errata

An erratum lists out the mistakes that are in any of the printed course materials. When you read the errata, you should correct the text immediately. Note that all the errata will be posted on the OLE only.

Your mastery of the materials in the course will be tested at regular intervals through tutor-marked assignments (TMAs). Your marks for this type of continuous assessment will be combined with your final examination mark to produce an overall result for the course.

 

4.1 Continuous assessment

The assignments of the course are arranged as below. (See the Presentation Schedule for their cut-off dates.)

There are four TMAs, each with equal weighting, from which the best three scores will be used to count towards your final continuous assessment grade. It is to your advantage to submit all the assignments. Note that all the assignments will be posted on the OLE and you can download the assignments if necessary.

You can choose to submit your TMAs to your tutor through the assignment system in the OLE (e-submission) or by post. If you choose to use e-submission for submitting your TMAs, you need to make sure that the graphs/diagrams will show clearly. Also, all your TMAs with comments will be returned to you. All your TMAs must reach your tutor by the cut-off dates.

Overall, the continuous assessment contributes 30% of your final score for the course. See the Assignment Booklet for a detailed breakdown of marks within each assignment.

 

4.2 The examination

There is a three-hour examination at the end of the course, based upon the whole course. You will be sent a specimen examination paper, which you should work through carefully some time before the examination. Sample solutions will be provided. The examination mark constitutes the remaining 70% of your overall score for the course.

 

4.3 How to pass the course

You will be awarded a grade for each TMA which you submit (and zero for each assignment which is not submitted) and the overall continuous assessment score is the average of your best three individual TMA grades. The preliminary overall course score is calculated as 30% of your overall continuous assessment score, plus 70% of your final examination mark. If your overall continuous assessment score and your final examination mark are higher than 40%, then you will be certain to pass MATH S346.

According to the characteristics of distance learning, face-to-face sessions are designed to provide supplementary support to you. Therefore, the face-to-face sessions are not compulsory. That means you can attend these sessions based on your own needs. There is no penalty for being absent from face-to-face sessions.

 

5.1 Tutorials

There are tutorial sessions scheduled throughout the course. These sessions will be conducted by your tutor. It is hoped that you will attend, although attendance at these sessions is not compulsory. You should take with you any current course material (including Solutions to the Exercises) and your calculator. In particular, you should take along any notes that you have made, to remind yourself of points from the course material you would like to discuss. The precise times, dates and locations of your tutorial sessions will be posted on the OLE and sent to you through your University email account (see the 'Using Email at HKMU' booklet). The tentative dates on which tutorials are expected to be held have already been indicated on the Presentation Schedule.

 

5.2 Surgeries

There is one surgery session scheduled before each TMA cut-off date. A tutor will be in charge of these surgery sessions. There is no lecture during the surgeries. You are expected to bring your own questions when you attend the surgeries. A student can sit next to the tutor and ask him/her questions. The precise times, dates and locations of your surgery sessions will be posted on the OLE and sent to you through your University email account (see the 'Using Email at HKMU' booklet). The tentative dates on which surgeries are expected to be held have already been indicated on the Presentation Schedule.

In the face-to-face sessions described in the previous section, you will certainly have chances to seek the help of HKMU staff, especially your tutor, if you have any problems studying the course. But students learning at a distance like you often need help at other times. We've designed a number of other ways for you to get the help you need to succeed in this course, no matter what point of the course you are at. The following are some possible ways.

 

6.1 From your tutor

Your tutor is there to help you understand the ideas in the course and the best way for him/her to do this is through the comments given on your TMAs. Go through the script and take note of the comments written by your tutor. They will help you to avoid similar errors in later assignments and in the examination. Try to attend tutorials and/or surgeries, because that is where you will have the opportunity to talk to your tutor directly and, just as important, to talk to other students. Your tutor can give you help with both statistical and computing problems in the course.

 

6.1.1 Telephone contact

Tutors are also available for you to phone for immediate help or advice. Contact details for your tutor will be sent to you by Registry before the course starts. Your tutor will also let you know what hours he or she is available for telephone tutoring. Keep your tutor's mailing and email addresses since you have to send your TMAs to him/her directly.

 

6.1.2 Email

Tutors are also available for you to send email in the OLE for asking questions. However, do not always expect immediate replies since tutors work on a part-time basis.

 

6.2 The Internet

The OLE will be used to facilitate communication and discussion among the students and tutors in MATH S346, and for posting supplementary course materials such as stop presses, errata, and assignments.

Using the OLE is compulsory for this course because all the supplementary course materials will ONLY be posted on the OLE. In your course material package, you will find an OLE User Guide which provides information on how to use the OLE.

 

6.3 From your fellow students

One of the best ways of learning is by talking about your work with fellow students. However, you may only meet them at the infrequent face-to-face sessions. It will be helpful if you could arrange to have the telephone numbers or email addresses of some MATH S346 students so that you can stay in touch. You might even like to form your own study group to meet regularly. This is often a good way of getting together to discuss common difficulties, especially in the assessment questions. Again, you may use the OLE to communicate with other MATH S346 students.

A word of warning about study groups — although students are encouraged to discuss their problems with one another, including those relating to assignments, it is essential that you submit you own attempt at a TMA and not one you have copied. You must remember that the assignments are part of the teaching and learning process and so it is to your own interest to ensure that you submit your own work.

 

6.4 From the Course Coordinator

If there are any queries which your tutor cannot solve for you, he/she will probably advise you to contact the Course Coordinator. The Course Coordinator is a full-time member of the School of Science and Technology at the HKMU responsible for organizing the face-to-face and distance tuition. Ways of contacting the Course Coordinator can be found in the 'Letter to Students' which is included in the package of course materials that you have received.

Coming soon