MEAFA Professional Development Workshop in Quantitative Analysis Using Stata, 21-25 June 2010


20 April 2010: following popular demand, we are very glad to announce a 2-day workshop on Multiple Imputation (MI) by Yulia Marchenko (the developer of the -mi- suite in Stata). The MI workshop has been integrated into MEAFA's annual 5-day professional development workshop in quant analysis. This is done in order to provide the opportunity to Stata novices to obtain an exposure to Stata and to quant analysis using Stata prior attending the MI workshop. Nonetheless, the first three days of the workshop are still open to those not interested in MI.

Brief description of Multiple Imputation (MI)

Many real world datasets are incomplete, in the sense that some observations may not have data for all the variables you wish to analyse. A popular way to handle these situations is to use Donald Rubin's multiple imputation (MI). Like listwise deletion, which is another common way for handling missing data, MI is applicable to a wide variety of analyses. Unlike listwise deletion, which results in a loss of data, MI preserves all available information in the dataset and can therefore lead to more efficient estimation. MI is a simulation-based method consisting of three steps: (1) imputation creates multiple imputed (completed) datasets according to a chosen imputation model, (2) complete-data analysis performs primary analysis of interest on each of the imputed datasets, (3) pooling consolidates results from step 2 into one MI inference using Rubin's combination rules. In Stata 11, you can use the -mi- suite command to perform multiple imputation; for an example see The Stata News, 2010, vol 25, No 1 (PDF).

Workshop description

You may attend any one or any combination of the following days:

Day 1 (Monday, June 21): Introduction to Stata 11 and data management by Demetris Christodoulou, MEAFA General Convener

This day assumes no previous knowledge of Stata. An overall introduction to Stata 11 will be provided and ways to customise/personalise the software will be discussed. Basic data structures, the analysis of different types of variables and various data management techniques with be discussed. Some examples of graphing, tables and the management of output will be presented. This day is of interest to those who are new or have limited experience with Stata.

Day 2 (Tuesday, June 22): Two parallel sessions - you can choose only one to attend

Proficient Programming with Stata 11 by Demetris Christodoulou

This day assumes working knowledge of Stata but no knowledge of programming with Stata or with any other software. By the end of this day you will be able to produce tractable, reproducible and automated routines for data management, statistical analysis, econometric estimation, creation of tables, graphing etc. This day is appropriate for those who wish to become more efficient in working with Stata and have an appreciation for programming.

Econometric Modelling and Statistical Testing by Andrey Vasnev

This day assumes familiarity with Stata 11 and a basic understanding of quantitative methods. It uses applications to demonstrate the use of statistical analysis, hypothesis testing and basic econometric modelling for validating assumptions and expectations. This day is of interest to those who wish to know how to apply various quantitative methods using Stata. Detail notes on theory will be provided as background reading.
N.B.: MEAFA reserves the right to cancel a parallel session in case of low demand.

Day 3 (Wednesday, June 23): Two parallel sessions - you can choose only one to attend

Panel Data Analysis by Vasilis Sarafidis

This day assumes working knowledge of Stata 11 and basic knowledge of econometrics. It explains the rationale of panel data methods and demonstrates the use of static, dynamic and nonlinear models of panel data. This day is of interest to those who wish to learn how to use panel data analysis using Stata. Detail notes on theory will be provided as background reading.

Time Series Analysis by Richard Gerlach

This day assumes working knowledge of Stata 10 and basic knowledge of econometrics. It details the theory for modelling univariate time series, forecasting and bivariate causality relationships, and offers extensive applications using Stata. This day is of interest to those who wish to learn how to use time series analysis using Stata. Detail notes on theory will be provided as background reading.
N.B.: MEAFA reserves the right to cancel a parallel session in case of low demand.

Days 4-5 (Thursday-Friday, June 24-25): Multiple Imputation Using Stata by Yulia Marchenko, Senior Statistician at StataCorp LP

These two days assume working knowledge of Stata and of standard statistical techniques such as linear/logistic regression. The course provide a brief introduction to multiple imputation (MI) analysis and a detail description of the three stages of MI (imputation, complete-data analysis, pooling) with applications in Stata 11. Various imputation techniques will be discussed with the main focus on multivariate normal imputation. A number of examples demonstrating how to safely and efficiently manage multiply-imputed data will be provided. Linear and logistic regression analysis of multiply-imputed data as well as several post-estimation features will be presented. Detailed notes will be provided outlining all theory and applications. The presenter, Yulia Marchenko, is the chief developer of Stata's routines for multiple imputation -mi- and co-author of An Introduction to Survival Analysis using Stata.

Enrollment and Fees

You may attend any one or any combination of days. See the description of each day to determine which days are of most interest to you. Fees vary on the days attended. The following combinations are possible (prices exclude GST):

  • Attend one of Days 1, 2 or 3: $500
  • Attend two of Days 1, 2 or 3: $900
  • Attend Days 1, 2 and 3: $1300
  • Attend Days 4 & 5 on Multiple Imputation: $1300
  • Attend Days 4 & 5 and one of Days 1, 2 or 3: $1700
  • Attend Days 4 & 5 and two of Days 1, 2 or 3: $2100
  • Anttend all five days: $2500

Fees include extensive course material, a detailed guide to Stata 11, data sets, lectures, use of computing facilities, temporary use of Stata 11 licenses and full catering. Numbers are limited and places are reserved on a first-come first-served basis following ?the completion of the online Reservation Form. Successful attendees will be notified shortly after reservation and invoices will be issued accordingly. Due to the limited places, MEAFA maintains a no refund policy following payment. For more information on enrollment and fees contact


You may qualify for one of the following discounts:

  • 35% discount for a restricted number of non-employed full-time PhD students.
  • 15% discount for additional attendees from the same organisation or academic unit.


The workshop will take place at Sydney University at the Faculty of Economics and Business computer labs, ground level of Building H69, cnr Codrington & Rose streets (see interactive map). You do not need to bring your own laptop. PCs and Stata 11 licenses for Microsoft Windows will be provided.


All days have the following schedule:

    08:40-09:00 - Welcome tea and coffee
    09:00-10:30 - Session 1
    10:30-10:45 - Morning break
    10:45-12:15 - Session 2
    12:15-13:15 - Lunch
    13:15-14:45 - Session 3
    14:45-15:00 - Afternoon break
    15:00-16:30 - Session 4
    16:30-17:00 - Buffer-time and user-specific questions

Detailed Programme

Day 1 (Monday, 21 June): Introduction to Stata 11 and Data Management

Session 1: Introduction to Stata 11 environment
The Stata environment; configuration; special features; updates; personalised system; obtain help and perform search; Stata syntax
Session 2: Data formats and data handling
Data formats; import, export, load and save datasets; simulated datasets; document the dataset; sorting and ordering; display formatting; append and merge
Session 3: Data structures and types of variables
Categorical vs. continuous data; numerical, string and date/time variables; missing data; generate variables; dummy variables; special purpose variables
Session 4: Data management and output management
Logs for output; prefixes; tables and graphs; export output; stored and saved results

Day 2 (Tuesday, 22 June): Parallel sessions

Proficient Programming with Stata 11

Econometric Modelling and Statistical Testing

Session 1: Basics of Stata programming
Executing commands using do-files; proper structure of do-files; using comments; writing long commands; do v.s run; combination of preserve and restore; the command display; accessing Stata parameters and Stata constants
Session 1: Statistical description and linear regression analysis
Means, variances and higher order moments; medians and modes; confidence intervals; ordinary least squares; predicted values and residuals; correlation and standardized regression coefficients; hypothesis testing; problems with regression
Session 2: It's all about Macros!
What is a Stata macro; local macros; global macros; numerical macros; string macros; compound punctuation; macro evaluation; formatting macro output; nested macros
Session 2: Multiple regression analysis
Multiple regression models; partial effects; variable selection; t-tests and confidence intervals for individual coefficients; F-tests for sets of coefficients; multicollinearity; interaction effects; intercept and slope dummy variables; logarithmic regression
Session 3: Special features of macros and loops
Incrementing/decrementing macros; combining incrementation with evaluation; macro expansion; function keys and global macros; foreach loop; forvalues loop; nested loops; using _rc (return codes)
Session 3: Statistical description and nonlinear regression functions
Graphing the data; a general strategy for modelling nonlinear regression functions; transformations; polynomials and logarithms; interactions (incl. continuous and dummy variables); internal and external validity

Session 4: Automating routines and other special features
capturing saved results as macros; macro evaluation and saved results; scalars and precision; stored results; creating complex tables and complex graphs using stored results; explicit subscripting; matrix programming; creating programs

Session 4: Regression with a binary dependent variable
Binary dependent variables and the linear probability model; Probit and Logit regression; estimation and inference in the Logit and Probit models; application to mortgage approval data

Day 3 (Wednesday, 23 June): Parallel sessions

Panel Data Analysis

Time Series Analysis

Session 1: Introduction to panel data analysis
Advantages of panel data analysis; panel data sets; balanced and unbalanced panels; panel data dimensions and frequencies; properties of estimators; unbiasedness; efficiency; consistency; describing panel data; graphing panel data
Session 1: Introduction to forecasting and time series
Qualitative and quantitative forecasts; sata structure; describing time series; graphing time series; smoothing; time series components; data transformations; basic trend modelling and forecasting; forecast accuracy
Session 2: Static linear models
Specification and estimation; one-way and two-way error components; fixed and random effects; the Least Squares Dummy Variable model; the Within, Between and GLS estimators; the Hausman test; variance decomposition
Session 2: Stationarity and time series models
Time series decomposition; stationarity; auto-correlation; time series regression; non-seasonal exponential forecasting; seasonal Holt-Winters methods
Session 3: Dynamic linear models
Nickell biases; Anderson-Hsiao IV estimation; the problem and tests of weak instruments; the Generalised Method of Moments; testing for overidentifying restrictions; cross sectional dependence
Session 3: Non-seasonal Box-Jenkins models
AR, MA and ARMA processes; ARIMA and further trend modelling; detecting trends and/or mean non-stationarity; Box-Jenkins model forecast behaviour
Session 4: Nonlinear panel data models
Poisson Regression Model; Probit and Logit: latent variable representations; marginal effects; model diagnostics
Session 4: Seasonal Box-Jenkins and intervention models
ARIMA; Seasonal ARIMA models; pure additive and factored models; models for outliers, level shifts and other interventions

Day 4 (Thursday, 24 June): Multiple Imputation Using Stata, Part A

Session 1: Statistical overview of multiple imputation
Introduction to MI; MI as a statistical procedure; Stages of MI: imputation, complete-data analysis, pooling
Session 2: Multiple imputation using Stata
MI in Stata; overview of the mi suite of commands
Session 3: Methods and applications of imputation
Imputation techniques; univariate imputation; multivariate imputation
Session 4: Advanced imputation
Imputing complex data: survival, panel; checking sensibility of imputations

Day 5 (Friday, 25 June): Multiple Imputation Using Stata, Part B

Session 1: Basic management of imputed data
Storing multiply-imputed data; importing existing multiply-imputed data; verification of multiply-imputed data
Session 2: Advanced management of imputed data
Variable management (passive variables); merging, appending, and reshaping multiply-imputed data; exporting multiply-imputed data to a non-Stata application
Session 3: Basic estimation of multiple imputed data
Analysis and pooling stages of MI in one easy step; overview and applications of mi estimate
Session 4: Advanced estimation of multiple imputed data
Estimating linear and nonlinear functions of coefficients; testing linear and nonlinear hypotheses

N.B. The precise content may be subject to minor changes.

Reservation Form

Numbers are limited and places are reserved on a first-come first-served basis following the completion of the Reservation Form.