A few assignment details: Assume that patients remain on antiretroviral therapy (ART) once they start it and that the hazard of death at time t depends on a subject's ART history only through its current value. Use CD4 count as the time-varying covariate ("cd4_v" in the "lab_cd4_sim" table). HAART (highly active ART) use is designated in the "art_class" variable of the "art_sim" table. For simplicity, exclude anyone who started a non-HAART regimen at any time. (This could of course cause some selection bias, but assume it is negligible.) Date of ART initiation is given in the "art_sd" variable of the "art_sim" table. You are only interested in the first "art_sd" date per patient, the rest of the dates (when they changed regimens) can be ignored. Do not worry about the type of regimen patients started (only that it's HAART). Therefore, the only information needed in the "art_sim" table is the patient identifier to merge datasets ("patient"), the indicator of HAART use ("art_class"), and the date of ART initiation ("art_sd"). The date of first visit is recorded in the "basic_sim" table as "baseline_d". Other demographic variables are included in this table that should be used as potential confounders including study site ("site"), sex ("male"), age at baseline ("age"), prior AIDS diagnosis at baseline ("aids_y" -- assume 9 [unknown] equals 0 [no prior AIDS]). Exclude all patients with "recart_y"=1 (prior ART use). Mode of transmission ("mode") can be considered for inclusion as a covariate, but there are a huge number where this is "Unknown" including for all patients in Haiti, so I'm OK if you do not adjust for this variable. (Of course, not adjusting for potential confounders makes the sequential ignorability assumption less plausible.) Death is recorded in the "follow_sim" table in the "death_y" variable. The date of death is given as "l_alive_d" in the "follow_sim" table for those who died and for those living, "l_alive_d" gives the date of the last visit. In your analysis, you will need to discretize time into monthly intervals from the "baseline_d" to the "l_alive_d". You will need to determine whether or not a patient started ART during each month and what their CD4 count was during that month. As CD4 is not measured monthly, you will likely need to carry the last observation forward. Include only patients whose first visit was on or after January 1, 2000. Incorporate weights for loss to follow-up / censoring.