## Survival Design

1. Create timediff variable from date of diagnosis to date of death in days.
2. Create flag variable from status at event time. 1 = dead, 0 = alive.
3. Clean the data by changing the raw data from continuous data to binary data where 0 remains 0 and everything else is changed to 1.
4. Perform a lifetest on each row. For each row: timediff by flag(0 = alive, 1 = dead), stratify by current row (binary data).
• Grab ChiSq and ProbChiSq for Log-Rank, Wilcoxen, -2Log(LR)
• Name as LogRank, wilcoxen, LRatio and prob_LogRank, prob_wilcoxen, prob_LRatio
5. Perform phreg on each row (log10 data).
• Grab Estimate, ChiSq and ProbChiSq for Cox.
• Name as estimate, cox, prob_cox
6. Perform Exact Log Rank (binaryData).
• Grab "Exact conditional on follow-up" and its mid-p (2 sides)
• Name as exactLR
7. For each row, get the count of columns that are 1.
8. Build summary table of summed pvalues for (Log-Rank, Wilcoxen, Liklihood Ratio).
9. Build summary table of pvalue for Cox.
10. Build summary table of Exact Log Rank values.
11. For each row, set the chisq values sign to the same sign as the estimate.
12. Select a subset of genes (sumPvalue < j, cox_pvalue < k, ExactLR > l, count > m, ...).
13. For each column in the subset, standerdize the values for that column.
14. For each row, sum the chisq values.
15. For each row, multiply the logged data values for that row by the sum value.
16. For each column, sum the data values for that column (score for that patient).
17. Run phreg using score, timediff and flag.
18. Sort score values and split in middle(or somewhere else).
19. Run Lifetest using timediff and flag, stratified by split score values.
20. Use the candidate set (pick list of row ids) from the training set to generate a score for each testing patient.
21. Apply to testing.
• 1) Sort by the testing score and cut in the middle to generate two groups and run lifetest, stratafying by the group.
• 2) Compare the testing score to the means from training set grouping and determine to which group they are closer.

• Alternative to filtering, using the top N method to generate training candidate sets, and applying them to the testing. Not available currently.

Need to run survival
1. Data set to build scores
2. Data set to modify with scores.
3. timediff variable (per patient)
4. Censor variable (per patient) Tells whether or not a patient died at the date event. (0 means they left the study alive)
5. path variable (per patient)

Topic revision: r8 - 19 Oct 2004, JeremyRoberts

• Biostatistics Webs Copyright © 2013-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback