MEAFA Professional Development Workshop on Panel Data Analysis using Stata, 8-12 February 2016

Panel data analysis

This workshop showcases two days on Panel Data Analysis (Thursday-Friday 11-12 Feb), delivered by Vasilis Sarafidis, Department of Econometrics at Monash University. Vasilis is a panel data expert, with extensive consulting experience on panel data problems, including work with the UK Office for Water Services, Telestyrelsen Denmark, Jersey Competition Regulatory Authority, UK Electricity Association, Sydney Water Corporation, and others.

Working efficiently with Stata, Programming and Data management

The first three days of the workshop provide an thorough introduction to Stata, including Working Efficiently with Stata (Monday 8 Feb), Stata Programming (Tuesday 9 Feb), and Data Management (Wednesday 10 Feb). These days are delivered by Demetris Christodoulou, MEAFA General Convenor and architect of the MEAFA Professional Development Workshops. Demetris has extensive consulting experience on Stata and data analysis, including work with the NSW Bureau of Crime Statistics and Research, The George Institute of International Health, the Australian Institute of Family Studies, AusAid, NERA Economics, the NSW Department of Education and Communities, and others.

Workshop description

You may attend any one day or any combination of the following days.

Day 1 (Monday 8 Feb 2016): Working Efficiently with Stata by Demetris Christodoulou, MEAFA General Convenor

This day assumes no previous knowledge of Stata 14. It describes the environment of Stata, its limitations and strengths, and core syntactic features. It demonstrates ways of working efficiently with Stata, including the use of logs and do-files. It discusses key principles and presents tools for developing work that is reproducible and verifiable. The day is of interest to those who are new to Stata 14 or have limited experience with earlier versions of Stata. It is also useful to more experienced users who wish to attain a more structural understanding of Stata from first principles.

Day 2 (Tuesday, 9 Feb 2015): Introduction to Programming by Demetris Christodoulou, MEAFA General Convenor

This day assumes working knowledge of Stata 14 but no knowledge of programming with Stata or any other software. By the end of this day you will be able to produce efficient, tractable and automated routines for data management, statistical analysis and estimation, creation of tables and graphs. The day covers key programming tools (saved results, stored restults, macros, scalars, loops), and the fundamentals of building your own commands in Stata (programs or ado-files). This day is appropriate to those who wish to attain a deeper knowledge of Stata and achieve the aforementioned attributes in their work. This day assumes knowledge of the material presented in Day 1.

Day 3 (Wednesday, 10 Feb 2016): Management of Raw Data by Demetris Christodoulou, MEAFA General Convenor

This day assumes working knowledge of Stata and basic programming skills but not of data management. The day demonstrates ways to import and export different data formats. It demonstrates the management of numerical variables, string variables and date/time variables, and the implications of missing values. It explores key data structures including cross-sectional, time-series and panel data in long and wide formats. It covers the management of data attributes, the organisation of data and the importance of metadata. It also demonstrates strategies for working efficiently with very large datasets. Dataset organisation, archiving, combinations, and transformations will also be discussed. If you have no or limited experience with Stata 13 then you are strongly advised to attend Day 1 first. Some programming tools will also be applied from Day 2 (stored results, macros, scalars and loops).

Days 4-5 (Thursday-Friday, 11-12 Feb 2016): Panel Data Analysis by Vasilis Sarafidis, Department of Econometrics at Monash University

These two days assume working knowledge of Stata and basic knowledge of econometrics. The first part begins with a discussion on panel data fundamentals and presents key Stata operations for specifying panel data structure and exploring variation in a longitudinal setting. It then lays the foundations for panel data analysis with a discussion on fixed effects, between effects and random effects, within a static linear framework. It then proceeds to present the problem of endogenous regressors in panel data, plus other misspecification like cross-sectional dependence and serial correlation. Then it discusses dynamic panel data specifications with lagged dependent variables, and the method of Generalised Method of Moments for developing workable models. The second part begins with non-linear panel models using fixed effects or random effects, for binary responses and count responses. It then goes into the realm of mixed/multilevel models with hierarchical panel data, and presents models with random intercepts, and random coefficients with a reference to crossed effects. It concludes with a discussion on generalised models with mixed/multilevel effects applied to different families and link functions that also allow for complex treatment of correlation structures. If you have no experience with Stata then you are required to atttend at least Day 1.


All days have the same time schedule:

  • 08:40-09:00 - Welcome tea and coffee
    09:00-10:30 - Session 1
  • 10:30-10:45 - Morning break
    10:45-12:15 - Session 2
  • 12:15-13:15 - Lunch
    13:15-14:45 - Session 3
  • 14:45-15:00 - Afternoon break
    15:00-16:30 - Session 4
  • 16:30-17:00 - Buffer-time and user-specific questions

The computer labs will be accessible from 8am to 8pm every day. Catering is provided at each break.

Detailed Programme

Day 1 (Monday, 8 Feb 2016): Working efficiently with Stata

Session 1: The Stata environment
Stata interface; configuration; limits; system constants and parameters; updates; profile and system directories; help files; manual entries; open-source programs; other help resources.
Session 2: Syntactic features
General syntax; parsing; strings and double quotes; wildcards; operators; functions; qualifiers; missing values; Boolean evaluation; prefixes.
Session 3: Do-files and Logs
The Do-file Editor; key do-file commands; master do-files; comments; errors and prevention; troubleshooting; suppression of errors; recording selective output in logs; command logs; common headings; log archive; printing and translation.
Session 4: Data fundamentals
How Stata understands data; memory requirements and dataset size; data types; physical limitations; numerical precision; string precision; long strings.

Day 2 (Tuesday, 9 Feb 2016): Introduction to Programming

Session 1: Macros and scalars
Local and global macros; macro evaluation; macro expansion; prevent macro expansion; scalars; scalar evaluation; applications.
Session 2: Saved results
r-class values; e-class values; evaluating saved results; estimation and postestimation; storing estimates; results in tables; applications.
Session 3: Loops
Types of loops; initialising values; macro incrementation; nested macros; nested loops; rereferencing macros within loops; debugging loops; applications.
Session 4: Programs
Programming new commands; structure of an ado-file; storing and accessing programs; programs as 'wrappers'; softcoding.

Day 3 (Wednesday, 10 Feb 2016): Management of Raw Data

Session 1: Raw data fundamentals
The importance of raw data; use, save and describe data; decimal separator; data formats; inspecting data attributes; data elements; explicit subscripting; record data subsets in logs; preserve, destroy and restore.
Session 2: Data types and computational memory
Numerical variables; string variables; date/time variables; missing values; memory requirements and physical limitations; settings and Stata limitations; numerical and string precision; working efficiently with large datasets.
Session 3: Data organisation and metadata
Naming rules for variables; order of variables; sorting of observations; data and variable labels; notes; value labels; labels in graphs; display format; data signature; archiving data and metadata.
Session 4: Data combinations and transformations
Dataset comparisons; appending datasets; one-to-one merging; match-merging; reshape long and wide; collapse to summaries; contract to frequencies.

Day 4 (Thursday, 11 Feb 2016): Panel Data Analysis (Part A)

Session 1: Panel data fundamentals
Panel data dimensions and structure; balanced vs. unbalanced; description of continuous variables; description of categorical variables; error components; standard error correction.
Session 2: Static linear panel data
Between effects; fixed effects; random effects; Hausman specification test; estimation of time-invariant variables.
Session 3: Mis-specification
Endogenous regressors; instrumental variables; serial correlation; cross-sectional dependence; seemingly unrelated regressions for T>N; model comparison.
Session 4: Dynamic linear panel data
Biases from OLS and Within effects; first differences and IV estimation; deterministic trend; system Generalised Method of Moments.

Day 5 (Friday, 12 Feb 2016): Panel Data Analysis (Part B)

Session 1: Mixed/multilevel models
Linear mixed/multilevel models; syntax; fixed part; level 1 random effects; inconsistent random effects; computational considerations.
Session 2: Random coefficients
Random coefficients; higher level random effects and random coefficients; covariance structures for random effects.
Session 3: Non-linear panel data
Poisson count outcome models and negative binomial; binary outcome models; latent variable representation; sensitivity of quadrature approximation; perfect prediction.
Session 4: Generalised framework for mixed/multilevel models
Distribution families and link functions for a variety of mixed/multilevel models; summary and review.

N.B. The precise content per session is subject to reshuffling and fine-tuning.