MEAFA Professional Development Workshop on Survival Analysis Using Stata, 18-22 July 2011

Brief description of Survival Analysis

Survival analysis is just another name for time-to-event analysis or reliability analysis.

The term survival analysis was coined by biomedical sciences because of the focus on measuring time to death of subjects (hence their 'survival'). Time-to-event analysis is a term mostly used in social sciences where researchers measure the time to an event such as entries or exits to job market, household changes, bankruptcies, equity restructuring etc. Reliability analysis is a term used by the engineering sciences, also sometimes referred to as failure time analysis, because of the measurement of time to failure of a machine element.

Survival analysis spans over many disciplines and many of its methods have been developed more than once, each time being labelled with a different name. The resulting field of survival analysis is replete with jargon: censoring, hazards, delayed entry, competing risks etc. The key to mastering survival analysis is in decoding the jargon and in realizing that survival analysis is not really a distinct field of statistics requiring its own theory. Instead, survival analysis consists of a set of adjustments that need to be made to standard, well-known analyses. These adjustments, while necessary when your data measures time to failure or some other event of interest, are more easily understood when compared directly to their non-survival analogs. This 2-day course will be taught in that spirit.

Presenter of Survival Analysis

Rory Wolfe is expert in Stata, Survival Analysis and Panel Data Analysis. He is an established Biostatistician and the PhD Program Coordinator in Epidemiology and Preventive Medicine and the co-director of the Biostatistics Consulting Service at Monash University. He is also an expert at the NHMRC Centre for Clinical Research Excellence in Gait Analysis. He has a long history with Stata and has published in the Stata Technical Bulletin and the Stata Journal, and has contributed open-source Stata commands in the Statistical Software Components library. Rory also runs short courses on Survival Analysis with Stata for the Australian Psychology Society.

Workshop description

You may attend any one or any combination of the following days:

Day 1 (Monday, July 18): Working efficiently with Stata 11 and intro to data management by Demetris Christodoulou, MEAFA General Convener

This day assumes no previous knowledge of Stata. An overall introduction to Stata will be provided and ways to customise/personalise the software will be discussed, as well as the handling of key data structures, the analysis of different types of variables and various introductory data management techniques. Some examples of graphing, tables and the management of output will be presented. The focus of day is working efficiently with reproducible and tractable routines. This day is of interest to those who are new or have limited experience with Stata or want become more efficient in their work.

Day 2 (Tuesday, July 19): Two parallel sessions - you can choose only one to attend

Introduction to Stata programming by Demetris Christodoulou

This day assumes working knowledge of Stata but no knowledge of programming with Stata or with any other software. By the end of this day you will be able to produce fast automated routines for data management, statistical analysis, econometric estimation, creation of tables, graphing etc. This day is appropriate for those who wish to become step up their knowledge of statistical computing and start producing more complex routines with Stata.

Econometric modelling and statistical testing using Stata by Andrey Vasnev

This day assumes familiarity with Stata and a basic understanding of quantitative methods. It uses applications to demonstrate the use of statistical analysis, hypothesis testing and basic econometric modelling for validating assumptions and expectations. This day is of interest to those who wish to know how to apply various quantitative methods using Stata. Detail notes on theory will be provided as background reading.
N.B.: MEAFA reserves the right to cancel a parallel session in case of low demand.

Day 3 (Wednesday, July 20): Two parallel sessions - you can choose only one to attend

Graphing with Stata 11 by Demetris Christodoulou

This day assumes working knowledge of Stata but no knowledge of graphing with Stata or any other software. The day provides an in depth analysis of Stata's graphing logic, syntax and capabilities. Graphing examples will be demonstrated for a variety of data structures. By the end of this day you should be able to produce informative, robust, complex and beautiful graphs using reproducible routines. If you have no or limited experience with Stata then you are strongly advised to attend Day 1 first. Programming elements from Day 2 will also be used for producing more complex graphs.

Time series analysis by Richard Gerlach

This day assumes working knowledge of Stata and basic knowledge of econometric principles. It details the theory for modelling univariate time series and forecasting, and offers extensive applications using Stata. This day is of interest to those who wish to learn how to model and estimate univariate time series using Stata. Detailed notes on theory will be provided as background reading.
N.B.: MEAFA reserves the right to cancel a parallel session in case of low demand.

Days 4-5 (Thursday-Friday, July 21-22): Survival analysis using Stata by Rory Wolfe, Associate Professor, Epidemiology and Preventive Medicine, Monash University.

These two days assume basic knowledge of Stata and working with Stata do-files. A basic knowledge of standard statistical techniques is also assumed (such as linear/logistic regression). The course will be taught from first principles (see also description on top of this page). Following the introduction to survival analysis, the 2-day workshop will break-down the topic by method: non-parametric analysis, semi-parametric analysis and parametric analysis. More advanced topics will be addressed at the end of the second day. Detailed notes, log-files, do-files and datasets will be provided outlining all theory and applications. The course will be interactive, use real data, and offer ample opportunity for working exercises to reinforce what is learned.

Enrollment and Fees

You may attend any one day or any combination of days. See the description of each day to determine which days are of most interest to you. Fees are fixed at $500 per day but the 2-days on Survival Analysis go together as a package (prices exclude GST):

  • Each one of Day 1, Day 2 and Day 3 at $500 per day
  • Days 4 & 5 on Survival Analysis at $1000

Fees include extensive course material, do-files and data sets, use of computing facilities, temporary use of Stata 11 licenses and full catering.

Numbers are limited and places are reserved on a first-come first-served basis following the completion of the online Reservation Form. Successful attendees will be notified shortly after reservation and invoices will be issued accordingly. Due to the limited places, MEAFA maintains a no refund policy following payment. For more information on enrollment and fees contact meafa@econ.usyd.edu.au.

N.B. Proceedings from the workshop go to funding MEAFA PhD scholarships.

Discounts

You may qualify for one of the following discounts:

  • 35% discount for a restricted number of non-employed full-time PhD students.
  • 10% discount for additional attendees from the same organisation or academic unit.

Venue and computing facilities

The workshop will take place at the computer labs of The University of Sydney Business School, at the ground level of Building H69, cnr Codrington & Rose streets (see interactive map).

PCs and Stata 11 licenses for Microsoft Windows will be provided onsite. It is also possible to to work on your own laptop but you will not be able to access the web. You can also install a temporary one-month license for Stata 11.

Timetable

All days have the following schedule:

  • 08:40-09:00 - Welcome tea and coffee
    09:00-10:30 - Session 1
  • 10:30-10:45 - Morning break
    10:45-12:15 - Session 2
  • 12:15-13:15 - Lunch
    13:15-14:45 - Session 3
  • 14:45-15:00 - Afternoon break
    15:00-16:30 - Session 4
  • 16:30-17:00 - Buffer-time and user-specific questions

Detailed Programme

Day 1 (Monday, 18 July): Working efficiently with Stata 11 and intro to data management

Session 1: Introduction to Stata 11 environment
The Stata environment; configuration; special features; updates; personalised system; obtain help and perform search; Stata syntax; working with do-files.
Session 2: Data handling and adding metadata
Data formats; Import, export, load and save datasets; simulated datasets; sorting and ordering; review and document the dataset; display formatting; append and merge.
Session 3: Data structures and types of variables
Categorical vs. continuous data; numerical, string and date/time variables; missing data; dummy variables; special purpose variables.
Session 4: Output management and special features
Logs for output; tables; export output; some statistical and estimation commands; prefixes; stored and saved results.

Day 2 (Tuesday, 19 July): Parallel sessions

Introduction to Stata programming

Econometric modelling and statistical testing using Stata

Session 1: Basics of Stata programming
Properly structured do-files; comments; writing long commands; do vs. run; combination of preserve and restore; the command display; accessing Stata parameters and Stata constants.
Session 1: Statistical description and linear regression analysis
Means, variances and higher order moments; medians and modes; confidence intervals; ordinary least squares; predicted values and residuals; correlation and standardized regression coefficients; hypothesis testing; problems with regression.
Session 2: It's all about Macros!
What is a Stata macro; local macros; global macros; numerical macros; string macros; compound punctuation; macro evaluation; formatting macro output; nested macros.
Session 2: Multiple regression analysis
Multiple regression models; partial effects; variable selection; t-tests for individual coefficients; F-tests for sets of coefficients; multicollinearity; interaction effects; intercept and slope dummy variables.
Session 3: Special features of macros, and loops
Incrementing/decrementing macros; combining incrementation with evaluation; macro expansion; foreach loop; forvalues loop; nested loops; return codes.
Session 3: Statistical description and nonlinear regression functions
Graphing the data; modelling nonlinear regression functions; transformations; polynomials and logarithms; interactions (incl. continuous and dummy variables); internal and external validity.
Session 4: Automating routines and other special features
Capturing saved results; macro evaluation with saved results; scalars and precision; creating tables using stored results; the command file; explicit subscripting.
Session 4: Regression with a binary dependent variable
Binary dependent variables and the linear probability model; Probit and Logit regression; estimation and inference in the binary models; applications.

Day 3 (Wednesday, 20 July): Parallel sessions

Graphing with Stata 11

Time series analysis using Stata

Session 1: Basics of graphing with Stata 11
Dialog boxes vs. do-file routines; inspecting the data prior graphing; reducing the data dimension to speed up graphing; setting range of variation; graph example - histogram; titles; axes; labels; bars; adding notes; the concepts of box, position, line, text, colour and font.
Session 1: Introduction to forecasting and time series
Why forecast?; Stata time series structure; describing, graphing time series; smoothing and time series components; data transformations; exponential smoothing and forecasting; forecast accuracy; the concept and application of stationarity; auto-correlation and ACF plots; modelling and forecasting Australian beer production.
Session 2: Subgroups and overlays
Graphing by categorical groups; subgroup options; formatting graph text; using special characters; graph aspect and size; superimposing densities and other graphs; legends for multiple graphs; multiple axes; graph help files; saving and modifying graphs; graph export formats.
Session 2: Time series modelling and forecasting
The autoregressive process (AR); the moving average process (MA); ARMA processes; Basic time series regression; Holt-Winters for trends; seasonal Holt-Winters; modelling and forecasting electricity production data.
Session 3: So many graphs!
The twoway command; an example - the scattergraph; nonparametric density estimators; parametric density estimators; patterns in variance and nonlinearities; time-series graphs; panel data graphs.
Session 3: Integrated and seasonal Box-Jenkins models
Trends and integration; ARIMA processes; detecting trends and/or mean non-stationarity; ARIMA model forecast behaviour; Seasonal ARIMA models; pure additive and factored models; models for outliers, level shifts and other interventions; modelling and forecasting sales data.
Session 4: Advanced graphing
Using loops for multiple overlays; combining multiple graphs side-by-side; recasting twoway plots; reproducing formatting; do-file options; graph editor recording; existing graph schemes; creating your own graph scheme; the graph editor as a scheme maker; special graphs.
Session 4: Time series regression and volatility modelling
Advanced time series regression; distributed lag models; modelling and forecasting inflation and unemployment data; the concept of conditional heteroskedasticity (CH); ARCH and Generalised ARCH processes (GARCH); modelling and forecasting asset return volatility and Value-at-Risk.

Day 4 (Thursday, 21 July): Survival analysis using Stata, Part A

Session 1: Introduction to survival analysis
The problem of survival analysis; hazards; cumulative hazards; survival functions; censoring; truncation; delayed entry.
Session 2: Survival analysis with Stata
Span data versus snapshot data; the -st- (survival-time data) suite of commands; stset; summary statistics: stdes, stsum etc.
Session 3: Nonparametric analysis
Kaplan?Meier curves; Nelson?Aalen curves; estimating the hazard function via smoothing; mean and median survival time; tests of hypotheses.
Session 4: Semiparametric analysis
The Cox regression; basics of stcox; hazard ratios and proportional hazards; estimating baseline functions.

Day 5 (Friday, 22 July): Survival analysis using Stata, Part B

Session 1: Semiparametric analysis
Stratified models; using stcurve for predicted functions; time-varying covariates/coefficients; model diagnostics.
Session 2: Parametric analysis
The basics of streg; parametric proportional-hazards models; accelerated failure time models; using stcurve for predicted functions; predictions and diagnostics.
Session 3: Complex data for survival analysis
Complex survey data and survival analysis; missing data and multiple imputation with survival analysis.
Session 4: Advanced survival analysis
Frailty models; power and sample size calculations; competing risks.

N.B. The precise content is subject to minor adjustments.

Reservation Form

Numbers are limited and places are reserved on a first-come first-served basis following the completion of the Reservation Form.