Full text link:http://tecdat.cn/?p=5438
survival analysis It is used in various fields, such as:
Cancer research is an analysis of patients’ survival time,
Sociology of “historical analysis of events”
In the “failure time analysis” of engineering.
In cancer research, typical research problems are as follows:
What is the impact of some clinical features on the survival of patients?
What’s the probability of an individual surviving in three years?
Is there any difference in survival rate among groups?
Here, we start by defining the basic terms of survival analysis
Time to live and events
Survival function and risk function
Survival time and event types in cancer research
There are different types of events, including:
From the beginning of observation to the end of observation \_ time \_ Commonly referred to as \_ survival time \_ (or the time of the event).
The two most important evaluation methods in cancer research include: I)Time of death; And II) none \_ Recurrence survival time \_ It corresponds to the time between treatment response and disease recurrence. It’s also known as none \_ Disease survival time \_ He Wu \_ Event lifetime \_。
As mentioned above, survival analysis focuses on the expected duration until the occurrence of an event of interest (recurrence or death).
Kaplan Meier survival assessment
Kaplan – Meier (km) method is a nonparametric method used to estimate the survival probability of observed survival time (Kaplan and Meier, 1958).
The survival curve is the relationship curve between management survival probability and time. It provides a useful summary of data and can be used to estimate measures such as median survival time.
R survival analysis
Survival analysis summary and visualization of survival analysis results
Sample data set
We will use the lung cancer data provided in the survival package.
head(lung) inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss 1 3 306 2 74 1 1 90 100 1175 NA 2 3 455 2 68 1 0 90 90 1225 15 3 3 1010 1 56 1 0 90 90 NA 15 4 5 210 2 57 1 1 90 60 1150 11 5 1 883 2 60 1 0 100 90 NA 0 6 12 1022 1 74 1 1 50 80 513 0
Inst: institution code
Time: survival time in days
Status: status 1 = review, 2 = death
Gender: male = 1, female = 2
ph.ecog : ECOG performance score (0 = normal, 5 = death)
ph . Karno: Karnofsky performance score (poor) = 0 normal = 100) assessed by physician
pat.karno Karnofsky performance score was assessed by the patient
Meals: calories consumed during meals
wt . Loss: weight loss in the past six months
We need to calculate the probability of survival by sex.
function \_ survfit \_ () can be used to calculate Kaplan – Meier survival estimate.
Using functions\_ Surv\_ () created by
To calculate the survival curve, enter the following:
print(fit) n events median 0.95LCL 0.95UCL sex=1 138 112 270 212 310 sex=2 90 53 426 348 550
By default, the function print() displays a summary of the survival curve. It shows the number of observations, number of events, median survival and median confidence interval.
To display a more complete summary of the survival curve, enter the following:
#Survival curve summary summary(fit)# summary(fit)$table
Visual survival curve
We generated survival curves for two groups of subjects.
ggplot(fit, pval = TRUE, conf.int = TRUE, risk.table =True, # add risk table risk.table.col ="Strata", # change risk table color by group
\_ legend . labs \_ Change the legend label.
ggplot( Fit, # survfit object with calculated statistics. PVAL = true, # shows the p value of log rank test. conf.int =True, # shows the confidence interval of survival curve point estimation. conf . int . style = " step ", # Custom confidence interval style xlab = " Time in days ", # Customize the x-axis label. break.time.by =200, # breaks the x-axis at 200 intervals. ggtheme = theme_ Use theme to customize drawing and risk table. risk . table = " abs_ pct ", # Absolute value
The median survival time of each group represents the time when the survival probability s (T) is 0.5.
Use parameters\_ xlim\_ The range of survival curve can be shortened as follows:
Note that parameters can be used\_ fun\_ Specify three frequently used transformations:
Cumulative risk is often used to estimate the probability of risk.
Kaplan Meier life table: summary of survival curve
As mentioned above, you can use functions \_ summary \_ () to obtain a complete summary of the survival curve
Log – Rank test: survdiff ()
Yes\_ Rank test\_ It is the most widely used method to compare two or more survival curves. The null hypothesis is that there is no difference in survival between the two groups.
Survdiff() can be used as follows:
surv_diff N Observed Expected (O-E)^2/E (O-E)^2/V sex=1 138 112 91.6 4.55 10.3 sex=2 90 53 73.4 5.68 10.3 Chisq= 10.3 on 1 degrees of freedom, p= 0.00131
The log rank test of survival rate difference gave a p value of P = 0.0013, which indicated that there was significant difference in survival rate between male and female groups.
Complex survival curve
In this section, we will calculate the survival curve using a combination of multiple factors. Next, we will use ggsurvplot() to output the result
ggplot(fit, conf.int = TRUE, risk.table.col ="Strata", # change risk table color by group ggtheme = theme_ BW (), # change ggplot2 theme
Visual output. The following figure shows the survival curve of the sex variable according to the value of Rx & here.
Survival analysis is a statistical method of data analysis, in which the result variable of interest is the time before the event.
In this article, we demonstrate how to use two R packages to perform and visualize survival analysis.
Most popular insights