# Analyzing longitudinal orthodontic data. Part 4: Latent growth curve models

## Analyzing longitudinal orthodontic data. Part 4: Latent growth curve models

American Journal of Orthodontics and Dentofacial Orthopedics, 2013-12-01, Volume 144, Issue 6, Pages 927-930, Copyright © 2013 American Association of Orthodontists

In previous statistical articles, we demonstrated how to use multilevel linear or nonlinear models to analyze longitudinal orthodontic growth data. Here, we will explain how to use latent growth curve models for data with repeated measurements. From a statistical perspective, latent growth curve models and multilevel models for longitudinal data analysis are actually equivalent. However, because these 2 methods have been implemented in software packages in a different manner, they have their own advantages and limitations in practical data analyses.

Multilevel models are most useful when the number of repeated measurements is large and these measurements have been undertaken at different times for different subjects. Current software packages for latent growth curve models do not provide the same level of flexibility in coping with different time intervals between measurements among subjects. Latent growth curve models and multilevel modeling are equally useful, when (1) the repeated measurements were undertaken at the same intervals for all subjects, and (2) the number of repeated measurements is limited. When the number of repeated measurements is limited but the outcome shows a nonlinear change pattern, latent growth curve models can be a better choice. This scenario usually occurs when interventions are given to the subjects at a baseline or over a period of time, such as in clinical trials.

Latent growth curve models have been popular in social and psychological sciences for analyzing experimental and observational data. For this article, we used a simple example to show how latent growth curve models can be used for analyzing orthodontic pain data. Since latent growth curve models can be viewed as a special application of structural equation models, all software packages for structural equation models can be used for latent growth curve models analysis. All of our analyses were undertaken with the software Mplus (version 7.11; Muthen & Muthen, Los Angeles, Calif).

The data consist of repeated pain measurements on a visual analog scale (VAS) for 53 patients after placement of 2 types of orthodontic appliances, A (28 subjects) and B (25 subjects), at 4 hours, 24 hours, 3 days, and 7 days. The aim of our analysis was to evaluate whether there were differences in the pain levels at baseline and the changes in pain between the 2 groups.

## Statistical analysis

We first plotted the VAS scores for the 53 patients. Figure 1 shows individual VAS scores over 7 days for each patient in the 2 groups. For both groups, the pain levels seemed to increase with time until day 3 but slightly decreased at day 7. Table I summarizes the VAS scores on the 4 occasions for each group. Although group B showed greater levels of pain on the 4 occasions than did group A, the 2-sample t test shows that their differences were not statistically significant at the baseline (8.74; 95% confidence interval [CI], −8.34-25.82) and at 24 hours (8.67; 95% CI, −8.38-25.72), but they were significant on day 3 (20.28; 95% CI, 4.11-36.45) and day 7 (20.28; 95% CI, 4.19-34.26). When the 4 measurements of pain were summed as a total score, the differences in the total scores between the 2 groups were also statistically significant (56.91; 95% CI, 6.69-107.14). These results are statistically significant when the significance level is set at 5%. As we carried out multiple testing procedures for the same data, a more stringent significance level (such as the adjustment made by the Bonferroni correction) might be required to control the false-positive rates. Observed trends in VAS scores over 7 days for the 53 patients in the 2 groups. The thick blue lines are the fitted average growth curves for each group.
Table I
Summary of VAS scores for the 2 treatment groups
Variable Mean SD Minimum Maximum
Group A (n = 28)
4 hours 40.54 33.03 0 98.2
24 hours 56.45 34.13 0 100
Day 3 40.17 31.29 0 99.1
Day 7 22.07 23.57 0 84.68
Group B (n = 25)
4 hours 49.28 28.35 5.86 94.59
24 hours 65.12 26.72 4.5 99.55
Day 3 60.45 26.82 1.35 99.55
Day 7 41.30 30.82 0.45 94.59

For latent growth curve models, it is helpful to use a path diagram ( Fig 2 ), commonly used to portray a structural equation model, to explain the statistical concepts behind latent growth curve models. In our analysis, 4 VAS scores were made at 4 hours, 24 hours, day 3, and day 7. These 4 variables are represented by X1, X2, X3, and X4, respectively, in Figure 2 . Latent growth curve models use the 4 variables to estimate the baseline VAS and the change in VAS over the observation period by estimating first the individual baseline VAS and the change in VAS. This is exactly the same approach as multilevel models that we demonstrated previously. In multilevel models, the variations in baseline VAS are denoted as random intercepts (each patient’s baseline VAS score), whereas the variations in slopes for the changes in VAS are denoted as random slopes (each patient’s VAS score changed pattern with time); they were treated as 2 random variables in the models. In latent growth curve models, these 2 random variables are explicitly depicted as 2 latent (unobserved and to be estimated) variables, “Intercept” and “Slope.” Following the convention in structural equation models, latent variables are in circles, whereas observed variables such as VAS scores are in squares. Path diagram for the linear latent growth curve model with 4 repeated measurements of VAS scores. Two parameters were estimated for Intercept: M1 is the average value of Intercept (estimated average baseline VAS) for group A, and D1 is the residual variance for Intercept. Two parameters were estimated for Slope: M2 is the average value of Slope (estimated average velocity of change in VAS scores) for group A, and D2 is the residual variance for Slope.

For the 2 latent variables Intercept and Slope in Figure 2 to represent the estimated intercepts and slopes, we needed to carefully set up their relationships to the 4 VAS scores; this was done by the arrows from the 2 latent variables to each X. The direction of the arrows in Figure 2 from Intercept and Slope to X and their associated factor loadings (the number next to an arrow) define their functions in the models. For instance, the relationship between VAS scores and intercept/slopes can be written as the following regression equations:

X1=1*Intercept+0*Slope+E1 $X1\phantom{\rule{0ex}{0ex}}=\phantom{\rule{0ex}{0ex}}1\text{*Intercept}\phantom{\rule{0ex}{0ex}}+\phantom{\rule{0ex}{0ex}}0\text{*Slope}\phantom{\rule{0ex}{0ex}}+\phantom{\rule{0ex}{0ex}}E1$
X2=1*Intercept+1*Slope+E2 $X2\phantom{\rule{0ex}{0ex}}=\phantom{\rule{0ex}{0ex}}1\text{*Intercept}\phantom{\rule{0ex}{0ex}}+\phantom{\rule{0ex}{0ex}}1\text{*Slope}\phantom{\rule{0ex}{0ex}}+\phantom{\rule{0ex}{0ex}}E2$
X3=1*Intercept+3*Slope+E3 $X3\phantom{\rule{0ex}{0ex}}=\phantom{\rule{0ex}{0ex}}1\text{*Intercept}\phantom{\rule{0ex}{0ex}}+\phantom{\rule{0ex}{0ex}}3\text{*Slope}\phantom{\rule{0ex}{0ex}}+\phantom{\rule{0ex}{0ex}}E3$
X4=1*Intercept+7*Slope+E4 $X4\phantom{\rule{0ex}{0ex}}=\phantom{\rule{0ex}{0ex}}1\text{*Intercept}\phantom{\rule{0ex}{0ex}}+\phantom{\rule{0ex}{0ex}}7\text{*Slope}\phantom{\rule{0ex}{0ex}}+\phantom{\rule{0ex}{0ex}}E4$
where E1 , E2 , E3 , and E4 are residual error terms, and their variances were assumed to be equal. Because X1 and X2 are VAS scores at 4 and 24 hours, respectively, a simple algebraic manipulation shows that Slope is estimated by X2−X1 (estimated difference between baseline VAS and VAS at 24 hours), and Intercept is the estimated baseline VAS scores. If we ignored the 4-hour lag between placement of the orthodontic appliances and the first VAS measurement, X3 and X4 could be viewed as the sum of the baseline VAS and the changes in pain level after 3 and 7 days, respectively. Consequently, these 4 equations and the latent growth curve models in Figure 2 were actually a linear growth curve model: ie, the change in pain was assumed to follow a linear trend. The variable “group” in Figure 2 was a dummy variable (group A was coded 0; group B was coded 1), and the 2 arrows from Group to Intercept and Slope aimed to estimate the difference in baseline VAS and changes in VAS scores between the 2 groups.

For the latent growth curve model, however, Figure 1 shows that the change in pain level does not appear to be linear, and this was confirmed by the results from the model fit statistics: the model’s chi-square value was 52.34 with 10 degrees of freedom (df); this was highly significant. In latent growth curve models and structural equation models in general, the null hypothesis is that our model represents the same causal relationship among variables in the model, and the chi-square test evaluates the difference between estimated and observed relationships among variables in the model. Therefore, if our model is correct, the chi-square test should be nonsignificant, and a significant chi-square test suggests that something is wrong with our proposed model.

A quadratic curve model including another latent variable to capture the nonlinear change in pain did not resolve the problem because the chi-square test was still highly significant (χ 2 = 39.3; df = 8). Since there were only 4 measurements of VAS scores, it made little sense to estimate a cubic or even higher-order growth curve model. An alternative approach, which is not feasible in multilevel models, is to allow the 2-factor loadings associated with the arrows from Slope to X2 and X3 to be freely estimated. If the change in VAS scores was indeed linear, the estimated factor loadings would be close to the time intervals when these measurements were made; otherwise, these estimated factor loadings would try to reproduce the real changes in the VAS scores.

The results from the final model, where a few other changes to the original model were made, including a correlated error covariance between E3 and E4, are shown in Table II . The factor loadings for the arrows from Slope to X2 and X3 were −8.30 and −2.54, respectively, and because the estimated Slope is −2.04 per day, indicating an overall decrease in VAS scores on day 7 compared with the baseline, the negative factor loadings mean that the pain level actually increased on days 1 and 3. The difference in baseline VAS scores between the 2 appliance groups was 13.81 (SE = 6.10; P = 0.024), and the difference in change in pain level was 0.2 per day (SE = 0.48; P = 0.63). This suggests that although patients in group B experienced significantly higher levels of pain than did those in group A, the overall change in pain remained similar. The chi-square test was 6.13 with 6 df, and the associated P value was 0.41, which was not significant.

Table II
Results from the final nonlinear latent growth curve model
Estimate SE P value
Intercept ≤ Group 13.81 6.10 0.024
Intercept for group A 38.12 5.23 <0.001
Slope ≤ Group 0.23 0.48 0.633
Slope for group A −2.04 0.73 0.005
X1 ≤ Slope 0
X2 ≤ Slope −8.30 4.53 0.067
X3 ≤ Slope −2.54 3.06 0.405
X4 ≤ Slope 7
Variance SE
E1 642.09 147.83
E2 435.24 84.90
E3 435.24 84.90
E4 435.24 84.90
Intercept 327.78 103.28
Slope 0.45 0.87
Covariance SE
X3 with X4 273.10 77.21 <0.001
Intercept with Slope −9.25 6.08 0.128

## Discussion

We demonstrated how to use latent growth curve models to analyze data with repeated measurements. Simple comparisons show that the pain levels of the 2 groups are different at 4 and 24 hours, and those differences become larger on days 3 and 7. The latent growth curve model analysis tests a slightly different hypothesis: the differences in the estimated baseline pain levels and the changes in pain. When there are only a few repeated measurements at the same time intervals across all subjects, latent growth curve models can be flexible in modeling nonlinear curves, as shown in our example. Another potential advantage is that it is quite straightforward for a latent growth curve model to test the relationship between growth parameters and distant outcomes. For instance, suppose we also measure patients’ overall experiences with the orthodontic appliances at the end of 7 days, and it would be straightforward to test their overall experiences, their baseline pain levels, and the changes in pain latent growth curve models; such an analysis could be quite cumbersome with multilevel models. Although the application of latent growth curve models might have a smaller scope than multilevel models, latent growth curve models can be a useful tool for analysis of data from clinical trials when a limited number of repeated observations of the same outcomes are undertaken after the intervention.

## Acknowledgments

We thank Padhraig Fleming for kindly providing the example data.