Now Reading
Time series analysis



American Journal of Orthodontics and Dentofacial Orthopedics, 2022-04-01, Volume 161, Issue 4, Pages 605-608, Copyright © 2022


Introduction

This article describes a simple method of applying a time series analysis to sample data sets using a free and open statistical software program, Language R.

Methods

Records of new patients who visited 2 different university-affiliated orthodontic departments in 2 different countries were collected. Time series analysis was performed by applying Language R software. The data sets and codes were provided for tutorial and illustrative purposes.

Results

Using time series decomposition, the trend component and the seasonal variation were separated and visualized graphically.

Conclusions

Time series analysis may be helpful to clinicians by providing a simple tool to evaluate patient characteristics and manage the practice.

By definition, a time series is a sequence of observations arranged chronologically. This method is frequently used to evaluate changes over time, especially in the social sciences. Charts of stock market prices may be the best-known example of a time series. In contrast, time series analysis has been rarely applied to studies in orthodontics. ,

Decomposing a time series separates the data into a trend and a seasonal component. , For example, because most patients seeking orthodontic treatments are growing children and young adults, the numbers of new patients might show seasonal variations on the basis of the academic calendar or schooling schedules. , The trend in the number of new patients may also be related to the socioeconomic characteristics of society.

Clinicians might also want knowledge of changes or potential trends in practice management, especially regarding the number of new patients and variations in office income. Although several commercially available statistical software packages can perform time series analyses, these software packages may have drawbacks. For example, the user manual may not be user-friendly, especially for practicing clinicians. Moreover, although many academic clinicians can access commercial software programs through the licenses obtained from their universities, private practitioners may not afford these commercial software packages. For example, the cost of statistical software can range from $1100 per user per year for SPSS (IBM Corp, Armonk, NY) to $9000 per year for the professional edition (SAS, Cary, NC).

The present article described a simple method by which practicing clinicians can perform time series analysis using a free and open software package, Language R (R Foundation for Statistical Computing, Vienna, Austria). For illustrative and comparative purposes, real data were collected, analyzed, and compared. The Language R codes and the data sets used in this article are provided in the Supplementary Data or on request to the authors.

Material and methods

The sources of the time series data were based on patient records collected at 2 different university-affiliated orthodontic departments in 2 different countries, 1 in Gainesville, Florida, and the other in Seoul, Korea. The records of new patients from January 1, 2012, through to December 31, 2019, were selected, and the numbers per month were arranged in a spreadsheet using Microsoft Excel software (Microsoft Corp, Redmond, Wash). However, these data could be input using any text editing software, especially applicable to clinicians who do not wish to purchase commercial software packages ( Table I ).

Table
Example time series data: monthly records of new patients collected at 2 different university-affiliated orthodontic departments in 2 different countries
Year Month new.patients.gville new.patients.seoul
2012 January 26 169
2012 February 34 129
2012 March 22 69
2012 April 16 53
2012 May 30 65
2012 June 18 59
2019 July 9 115
2019 August 28 121
2019 September 21 64
2019 October 25 59
2019 November 17 98
2019 December 23 114
Note. Getting the data can be possible by reading the spreadsheet formatted in Microsoft Excel csv. The file is provided as Supplementary Data or on request to the authors.
> Patients <- read.csv(“Example.Data.csv”)
> patients.gville <- Patients$new.patients.gville
> patients.seoul <- Patients$new.patients.seoul

The time series analysis method utilized the basic commands in Language R software. The R programming language results from a worldwide collaborative project involving many contributors, including the authors’ research team. The program can be downloaded for free from the R project homepage https://cran.r-project.org/ and runs on a wide variety of platforms for Windows, Mac OS, and Linux.

In acquiring the data, the format of the spreadsheet is unimportant. The simplest format, as shown below, may be preferred to working with spreadsheet software. Codes written in the Courier New font indicate the code in the R console.

> patients.gville <- scan()

26 34 22 16 30 18 20 18 16 30 18 16

22 26 17 19 26 8 31 33 22 28 16 21

16 22 8 24 43 28 47 50 50 28 14 29

41 40 34 20 12 24 20 29 20 36 29 17

21 30 24 21 26 17 22 43 21 20 16 22

21 12 18 18 21 12 16 24 21 30 21 18

23 25 21 20 28 26 18 20 21 22 27 16

24 19 13 19 9 10 9 28 21 25 17 23

In the second step, the data are transformed to yield a time series structure that includes both the starting and ending data, which will produce a time series table. In this table, the rows are the years from 2012 to 2019, and the columns are the months from January to December.

> patients.gville.ts <- ts(patients.gville,

  • start = c(2012, 1),

  • end = c(2019, 12), frequency=12)

> print(patients.gville.ts)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2012 26 34 22 16 30 18 20 18 16 30 18 16

2013 22 26 17 19 26 8 31 33 22 28 16 21

2014 16 22 8 24 43 28 47 50 50 28 14 29

2015 41 40 34 20 12 24 20 29 20 36 29 17

2016 21 30 24 21 26 17 22 43 21 20 16 22

  • 2017 21 12 18 18 21 12 16 24 21 30 21 18

2018 23 25 21 20 28 26 18 20 21 22 27 16

2019 24 19 13 19 9 10 9 28 21 25 17 23

The third step is decomposition of the time series into seasonal variations and trends.

>decompose(patients.gville.ts)-> patients.gville.ts.decomposed

The next line will result in a default plot similar to the multiple frame graphics shown in Figure 1 .

Time series graphs. A, raw data; B, trend component; C, seasonal variation.
Fig 1
Time series graphs. A, raw data; B, trend component; C, seasonal variation.

> plot(patients.gville.ts.decomposed)

The entire code file provided as Supplementary data , Time_Series_Example_Code.R, can be dragged into the program window or R console, or the lines of code can be loaded line-by-line. Both methods yield the same results.

Results

The trends and seasonal variations may be difficult to determine from the time-related table alone ( Table ). Graphic visualization of these data is more informative and easier to understand than a table consisting only of raw numbers. Through decomposition of the raw data ( Fig 1 , A ), the time series analysis can separate meaningful signals from noise. The trend component showed that the number of patients increased in late 2014 and decreased afterward, followed by increasing slightly during 2018 ( Fig 1 , B ).

Further evaluation showed clear seasonal variations, with peaks and valleys observed at certain periods in each year ( Fig 1 , C ). The heights of these peaks seem to coincide with the summer and winter breaks at schools.

Detailed evaluation of seasonal variations showed the highest peaks every summer in Gainesville, Florida ( Fig 2 , A ) and every winter in Seoul, Korea ( Fig 2 , B ).

Seasonal variations in new orthodontic patients in (A) Gainesville, Florida; and (B) Seoul, South Korea.
Fig 2
Seasonal variations in new orthodontic patients in (A) Gainesville, Florida; and (B) Seoul, South Korea.

You're Reading a Preview

Become a DentistryKey membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here