Doctoral Dissertation

1.10.2016 M.Sc. Satu Helske (Faculty of Mathematics and Science,


1.10.2016 12:00 — 15:00

Location: Seminaarinmaki , S212, Vanha juhlasali
M.Sc. Satu Helske defends her doctoral dissertation in Statistics ”Statistical analysis of life sequence data”. Professor Emeritus (Professeur honoraire) Gilbert Ritschard, NCCR LIVES and Institute of Demography and Socioeconomics (IDESO) (University of Geneve, Switzerland) and custos Professor Juha Karvanen (University of Jyväskylä). The doctoral dissertation is held in English.



Life courses are studied across disciplines for understanding the implications of life transitions on different aspects of life. Life course trajectories include, e.g., family trajectories, residential histories, and occupational careers. Trajectories embed events and transitions that may be singular or repetitive. Links between events and choices in different life domains form an interdependent system, often requiring joint analysis of different dimensions.

This thesis considers and compares different statistical approaches -- event history analysis (EHA), hidden Markov models (HMMs), and sequence analysis (SA) -- in the analysis of complex life sequence data. EHA is the traditional method for analysing the effects of time-constant and time-varying covariates on the timing and duration of events and transitions. In hidden Markov modelling we assume a latent or hidden level, i.e., one or more unobservable statuses that may be constant or time-varying. Observed states are regarded as being generated by a hidden or latent Markov chain. SA is a more recent model-free data-mining type of approach where the focus is on the comparison of whole trajectories. It is a descriptive tool, typically used for finding and visualizing groups of individuals with similar trajectories.

These methods are described and tested with empirical analyses, e.g., to study which types of joint family and career trajectories are typical and which atypical, to find associations between individuals' childhood characteristics and their future partnership trajectories, and to compress information across various life domains into more general life stages. This thesis also presents new software for the analysis and visualization of complex sequence data.

The three approaches provide versatile information on the phenomena of interest, as the methods capture time in different ways. The choice of the method(s) depends on the type of the data and the aims of the study. Applying model-free and modelling approaches or even combining them is often beneficial as they are not substitutes but complete each other in the analysis of life course data.