# Visual analysis of R language survival analysis

Time：2021-4-21

survival analysis It refers to a series of statistical methods used to explore the occurrence time of events of interest.

survival analysis It is used in various fields, such as:

Cancer research is an analysis of patients’ survival time,

Sociology of “historical analysis of events”

In the “failure time analysis” of engineering.

In cancer research, typical research problems are as follows:

What is the impact of some clinical features on the survival of patients?

What’s the probability of an individual surviving in three years?

Is there any difference in survival rate among groups?

=

# Basic concepts

Here, we start by defining the basic terms of survival analysis

Time to live and events

Survival function and risk function

Survival time and event types in cancer research

There are different types of events, including:

recrudescence

death

From the beginning of observation to the end of observation \_ time \_ Commonly referred to as \_ survival time \_ (or the time of the event).

The two most important evaluation methods in cancer research include: I)Time of death; And II) none \_ Recurrence survival time \_ It corresponds to the time between treatment response and disease recurrence. It’s also known as none \_ Disease survival time \_ He Wu \_ Event lifetime \_。

As mentioned above, survival analysis focuses on the expected duration until the occurrence of an event of interest (recurrence or death).

# Kaplan Meier survival assessment

Kaplan – Meier (km) method is a nonparametric method used to estimate the survival probability of observed survival time (Kaplan and Meier, 1958).

The survival curve is the relationship curve between management survival probability and time. It provides a useful summary of data and can be used to estimate measures such as median survival time.

# R survival analysis

Survival analysis summary and visualization of survival analysis results

## Sample data set

We will use the lung cancer data provided in the survival package.

``````head(lung)

inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
1    3  306      2  74   1       1       90       100     1175      NA
2    3  455      2  68   1       0       90        90     1225      15
3    3 1010      1  56   1       0       90        90       NA      15
4    5  210      2  57   1       1       90        60     1150      11
5    1  883      2  60   1       0      100        90       NA       0
6   12 1022      1  74   1       1       50        80      513       0``````

Inst: institution code

Time: survival time in days

Status: status 1 = review, 2 = death

Age: age

Gender: male = 1, female = 2

ph.ecog : ECOG performance score (0 = normal, 5 = death)

ph . Karno: Karnofsky performance score (poor) = 0 normal = 100) assessed by physician

pat.karno Karnofsky performance score was assessed by the patient

Meals: calories consumed during meals

wt . Loss: weight loss in the past six months

Survfit ()

We need to calculate the probability of survival by sex.

function \_ survfit \_ () can be used to calculate Kaplan – Meier survival estimate.

# Using functions\_ Surv\_ () created by

To calculate the survival curve, enter the following:

``````print(fit)

n events median 0.95LCL 0.95UCL
sex=1 138    112    270    212    310
sex=2  90    53    426    348    550``````

By default, the function print() displays a summary of the survival curve. It shows the number of observations, number of events, median survival and median confidence interval.

To display a more complete summary of the survival curve, enter the following:

``````#Survival curve summary
summary(fit)#
summary(fit)\$table``````

# Visual survival curve

We generated survival curves for two groups of subjects.

``````ggplot(fit,
pval = TRUE, conf.int = TRUE,
risk.table  =True, # add risk table
risk.table.col  ="Strata", # change risk table color by group``````

\_ legend . labs \_ Change the legend label.

``````ggplot(
Fit, # survfit object with calculated statistics.
PVAL = true, # shows the p value of log rank test.
conf.int  =True, # shows the confidence interval of survival curve point estimation.
conf . int . style  = " step ",  #  Custom confidence interval style
xlab  = " Time in days ",   #  Customize the x-axis label.
break.time.by  =200, # breaks the x-axis at 200 intervals.
ggtheme = theme_ Use theme to customize drawing and risk table.
risk . table  = " abs_ pct ",  #  Absolute value``````

The median survival time of each group represents the time when the survival probability s (T) is 0.5.

Use parameters\_ xlim\_ The range of survival curve can be shortened as follows:

Note that parameters can be used\_ fun\_ Specify three frequently used transformations:

Cumulative risk is often used to estimate the probability of risk.

Kaplan Meier life table: summary of survival curve

As mentioned above, you can use functions \_ summary \_ () to obtain a complete summary of the survival curve

``summary(fit)``

# Log – Rank test: survdiff ()

Yes\_ Rank test\_ It is the most widely used method to compare two or more survival curves. The null hypothesis is that there is no difference in survival between the two groups.

Survdiff() can be used as follows:

`````` surv_diff

N Observed Expected (O-E)^2/E (O-E)^2/V
sex=1 138      112    91.6      4.55      10.3
sex=2  90      53    73.4      5.68      10.3
Chisq= 10.3  on 1 degrees of freedom, p= 0.00131``````

The log rank test of survival rate difference gave a p value of P = 0.0013, which indicated that there was significant difference in survival rate between male and female groups.

# Complex survival curve

In this section, we will calculate the survival curve using a combination of multiple factors. Next, we will use ggsurvplot() to output the result

``````ggplot(fit,
conf.int = TRUE,
risk.table.col  ="Strata", # change risk table color by group
ggtheme = theme_ BW (), # change ggplot2 theme``````

Visual output. The following figure shows the survival curve of the sex variable according to the value of Rx & here.

# outline

Survival analysis is a statistical method of data analysis, in which the result variable of interest is the time before the event.

In this article, we demonstrate how to use two R packages to perform and visualize survival analysis.

Most popular insights

## Review of SQL Sever basic command

catalogue preface Installation of virtual machine Commands and operations Basic command syntax Case sensitive SQL keyword and function name Column and Index Names alias Too long to see? Space Database connection Connection of SSMS Connection of command line Database operation establish delete constraint integrity constraint Common constraints NOT NULL UNIQUE PRIMARY KEY FOREIGN KEY DEFAULT […]