VaPOEM < Main < Vanderbilt Biostatistics Wiki

You are here: Vanderbilt Biostatistics Wiki>Main Web>Projects>VaProjects>VaPOEM (20 Jul 2011, KristenKotter)EditAttach

-- KristenKotter - 20 Jul 2009

log of some relevant Kristen Kotter POEM and other related VA Work with completion dates

Actionable Items and date assigned

Completed Items (and date completed)

generate Test Set of the PSI for the Boston group and email Shibei the location using exact same algorithm as final decisions for the development set (7/10)
Create folder for power calculating on res4 POEM Server. \\Vhatvhres4\data\POEM_Project\Test_Data\Creating PSI File for Boston group\Checking Power (7/10)
Case Level Power Calculations:
- Pilot taking a sub-sample of NSQIP cases capturing all cases with complications and randomly assign exactly 10% of cases without complications to the cohort (7/10). Note: I used the RANUNI function in SAS with a seed at 0 such that uses the internal time clock to initiate the sequence
- Simulate a fake response at case level for the POEM classifier, PoemComplication = 0 or 1. Simulate with probability of sensitivity = 70% and specificity = 90% (7/15) Note: RANUNI(0)
- code a macro to calculate 1000 of the previous simulations and a sas dataset with each of their sensitivities, specificities, and PPVs (7/20) Note: I did this one for fun. Not sure yet if they'll want to use this...
Patient Level Power Calculations:
- Email Fern, Ted, Rob to verify that the list I have of pertinent complications for our study is correct. They spoke of adding/dropping some over past few weeks on POEM call so Rob says lets make sure (7/23)
  - Note: Fern emailed most up to date list of relevant complications on 7/24:
    - cdarrest, cdmi, oprenafl, othdvt, othsysep, oupneumo, pulembol, supinfec, urninfec wndinfd, orgspcssi
- Transpose Case Level Nsqip Dataset on Patient Level with Proc Transpose (7/24)
- Create Variable PatientHasAnyComp = 1 if the patient has any complications in any of their cases (7/24)
- Sample exactly 100% of patients where PatientHasAnyComp = 1. All of these cases will have sampling weight = 1 (Sw=1) Note: this will include some cases that have no complications. (7/24)
- Randomly sample exactly 10% of the patients with all their corresponding cases where PatientHasAnyComp = 0. sampling weight = 10. (7/24)
- transpose Patient Level back to Case Level and get sensitvity, specificity, PPV incorporating sampling weights for one simulation (7/27)
  - Refer to Rob's extensive handouts from 7/23/2009 for Sensitivity, specificity, and PPV calculations as needed:
    - Weighted_SubSampling_Page1.pdf: Rob's Weighted SubSampling Part1
    - Weighted_SubSampling_Page2.pdf: Weighted_SubSampling_Page2.pdf
    - Weighted_SubSampling_Page3.pdf: Weighted_SubSampling_Page3.pdf
- code a macro to calculate 1002 of the previous simulations and a sas dataset with each of their sensitivities, specificites and PPVs (7/28)
  - SAS Code:
    - Checking_power_patientlevel_simulation.sas:
  - Simulation Results (n=1002):
    - Summary_simulation_results_1002_iterations_patientlevel_07282009_histograms.rtf: Exactly 1002 iterations.rtf
Simple Descriptives of Characteristics to decide if we will incorporate some Stratified Sampling or Oversampling into our current simulations: (7/30)
- Get n's on multi-site patients and create a contingency table for site by PatientHasAnyComp at case level if patients are site-specific
  - Findings: 202 patients had operations at more than 1 site in the nsqip dataset
  - Site by PatienthasAnyComp at case level
    - site_by_patienthasanycomp.rtf: site_by_patienthasanycomp.rtf
- Gender by PatientHasAnyComp at case level
  - sex_by_patienthasanycomp.rtf: sex_by_patienthasanycomp.rtf
- Race by PatientHasAnyComp at case level
  - race_by_patienthasanycomp.rtf: race_by_patienthasanycomp.rtf
  - race_by_patienthasanycomp_withmissing.rtf: race_by_patienthasanycomp_withmissing.rtf
- Operation Year by PatientHasAnyComp at case level
  - OperationYear_by_patienthasanycomp.rtf: !OperationYear _by_patienthasanycomp.rtf
- Age by PatientHasAnyComp (case level calculation)
  - Age_by_patienthasanycomp.rtf:
- Report to Rob, Michael, Ted, Fern and find out if they think any other characteristics might also be possibilities
Update the Master Nsqip dataset to include an identifier for the following 3 waves to be parsed and indexed and email locations and details to Fern and group (7/31)
- CaseHasAnyComp (to be included in Wave 1 parsing) = 1 if case has any of our 11 surgical complications of interest
- CaseNoCompPatientComp (Wave 2) = 1 if PatientHasAnyComp = 1 and CaseHasAnyComp = 0; ie A complication did not occur during operation indicated at this case, however a complication did occur for this patient at another operation (case)....
- PatientNoCompSubSample01 (Wave 3) = 1 if this "no complication" case was sampled to be included in parsing and indexing using our best "matching" algorithm as described in aforementioned tasks
Generate stats on overlap of cases to get an idea of best way to parse free text jobs without redundancy (8/7):
- Create CasesOverlap = 1 if days from this case to previous case or days from this case to next case are <=61
  - Calculate Num patients and Num cases where CasesOverlap = 1
  - Calculate Num patients and Num cases where CasesOverlap = 1 and CaseHasAnyComp = 1
- Calculate Unique "clusters" of overlap
  - Calculate Num unique case clusters
  - Calculate Num unique case clusters with a case where CaseHasAnyComp = 1 and one where CaseHasAnyComp = 0
- Findings (where case to case difference is capped at 61 days):
  - Of the 45,159 unique cases(operations) in the NSQIP dataset:
    - 6476 unique cases overlap with either the previous case or the next case (where <= 61 days either way defines an overlap)
    - 2730 unique patients have overlapping NSQIP cases(operations)
  - Of those 6476 cases that overlap:
    - 1156 also have at least one of the 11 complications of interest = 1
    - the remaining 5320 have none of the 11 complications of interest = 1.
- Findings (where case to case difference is capped at 30 days; mistakenly ran this one first so just kept for comparison to 61 day cap or if curious)
  - Findings_where_casetocaseoperation_lessthan_orequalto_30days_forcomparison.rtf:
- Review Ted's sample STATA code for logic on NSQIP operation overlap problem and generate applicable descriptives (8/13-8/17)
  - POEMsample.do: Ted's sample STATA code
  - OVERLAP_simple_stats_08182009.rtf: OVERLAP_simple_stats_08182009.rtf
- Complete Steps on NSQIP dataset for Dealing with Overlap Problem Before Sampling (These steps created by Ted on 8/23 and sent to Kristen as a dataset) :
  - Step 1 Related to Finding: 86 cases in the NSQIP dataset are duplicate on opdt (date and time) for a unique patient. Reconcile these into 43 cases. Pool all complications. If a complication occurs in either of the duplicate cases, give it a 1. For variable values that differ for othcpt1 prncpt1, and speccode, create var1 and var2 with var1 value equal to 1st case value for that variable and var2 value equal to 2nd case value for that variable so that we have no lost information. For all other variables, go with 1st case's value and delete the 2nd case's value. (8/20)
  - Step 2 Pool surgical cases that occur within 30 days post op into a single episode/case.
  - Step 3 Flag those cases with 30 < lagpre < 46 . The intention is to not pull the documents related to these cases on current 30 day pre and post algorithm because that would result in "double pulling" of documents
- Import Ted's .dta dataset with corrected overlap into SAS and run some counts to verify he had same complications for Step1 that I did independently. Note: Everything checked out and placed on the res4 Server with other POEM work. Now proceed with subsampling (8/24)
- Discuss subsampling plan and overlap algorithm with Rob (8/25)
- Using Ted's subset of NSQIP cases (which was created from the original NSQIP to avoid double pulling of documents) follow next steps: (8/27)
  - Step 1. Look at the wave 1 and wave 3 Subsample algorithm to create the subsample groups for case and control cases.
    - Create CaseHasAnyComp = 1 (Wave1) if any c_complication variables = 1
    - Create PatientNoCompSubSample01 = 1 (Wave3a) by Randomly assigning exactly 10% of non complication development and 10% of non complication test.
    - Create PatientNoCompSubSample02 = 1 (Wave3b) = 1 by Randomly assigning next 10% of non complication development and next 10% of non complication test and so on.
  - Step 2. Create summary variables that identify the Case-Development, Control-Development, Case-Test, and Control-Test groups.
  - Step 3. merge my POEM_Sample data and the Case-Control group variables into the NSQIP dataset for Fern to use. Caution, a unique record identifier should be used when merging datasets, i.e., each line has a unique ID. We know that patientID is not unique. Oprymd data is not unique. The combination patientID-Oprymd variable is not unique. Note: use patientid-opdt combo but check to make sure it is unique first
  - Note: Dataset Location: file://vhatvhres4/data/POEM_Project/Test_Data/Creating%20PSI%20File%20for%20Boston%20group/Create_Subsample_Groups
    - Called "teds_nsqip_and_subsample_waves.sas7bdat"
  - meet with Fern to verify everything in: teds_nsqip_and_subsample_wavescollapsed and to import in SQL Server and make sure she can begin pulling documents. There were some problems with dates to possibly look at and reformat for her. (8/29)
  - Set Control Waves: Wave1_a Wave2_a Wave3_a = to the number of cases for development and test because the aforementioned method of of selecting a random group of 10% of the non complications produced fewer controls than cases (8/31)
  - Use the excel table from Fern's pull to manually check 3 patients and their pulled documents and make sure they match up to NSQIP table (8/31)
table 1 baseline statistics (9/9/2009)
- Patientlevel Include Vars: race, age (per Ted: take 1st occurring age at case level for a patient), sex, hispanic , comordities(AHRQ)
  - Of the 33,565 patients in Ted's NSQIP level dataset where POEM_Sample = 1,
    - 163/33565 patients have differing values for race. (solution?: mode as done for PSI)
    - 4/33565 patients have differing values for sex. (solution?: b/c there are only 2 in the ones with differing values, 1st occuring)
    - 3/33565 patients have differing values for hispanic. (hispanic created from race variable. If race = 1 (Hispanic, white) or race = 2 (Hispanic, black) then hispanic = 1; else hispanic = 0; solution?: if hispanic ever = 1 then patientlevel_hispanic = 1; else patientlevel_hispanic = 0;
  - latex_patientlevel_1.pdf (need to add in the AHRQ comordities at the Patient level)
- Caselevel Include Vars: 11 C_Complications, Surgical Procedures, Medical Center (STN1 in NSQIP dataset)
  - latex_caselevel_1.pdf (need to add surgical procedures)
read POEM grant (9/14)
discuss analysis plan with Rob (9/16)
Add both test and development PSI-level indicators to PoemSample dataset (9/24)
- Location: file://vhatvhres4/data/POEM_Project/Test_Data/Creating%20PSI%20File%20for%20Boston%20group/Add_PSI_to_PoemSample
- SAS Program: "Add_PSI_to_PoemSample"
- Dataset: "poem_sample_with_accepted_psis"
- Note: the same PSI outcomes typically repeat across many NSQIP observations
get case level counts of PRNCPTX on PoemSample (the previously generated CPT counts incorporated OTHCPT1-OTHCPT10) (9/24)
- location: \\Vhatvhres4\data\POEM_Project\Test_Data\Creating PSI File for Boston group\POEM-Table1
- Code: Table1_CaseLevel.sas
- table: case_level_prncptx_procs_descending_frequence.rtf:
Table 1-case level fix- merge site 622 into site 626 (Murfreesboro merged into Nashville in 2005?)
- condense all complications to individual binary variables (9/24)
- latex_CaseLevel_Baseline_2.pdf
run analysis code for PSI test set using best definitions for accepted and experimental PSIs previously decided as the maximum sensitivities for respective groups of the development set (10/7)
per ted, talk to Fern and Michael about whether inpatient CPT procedure codes are equivalent to ICD9 procedure codes so that maybe we could just input them in the CCS procedure code classifier. Per Fern and Michael: No they are not the same but Michael knows of a mapping system that might be able to track ICD9 codes from CPT codes at which point we could then use the CCS ICD9 procedure code classifier to get generalizations from the CPT codes we have. (10/8)
meet with coder expert, Lee Carr, to figure out best way to produce the baseline case level procedure code classifications when all we have are CPT procedure codes (10/15)
find single level CCS classifier link for CPT procedure codes on AHRQ website and email to the group. Write SAS code that uses this crosswalk to get case level CCS classifications for Poem_Sample dataset using PRNCPTX (Principal Operating CPT Procedure) results to group. (10/15)
- Location: file://vhatvhres4/data/POEM_Project/Test_Data/Creating%20PSI%20File%20for%20Boston%20group/CCS_CPT_Procs
- Code: crosswalk_cpt_ccs_classification.sas
update publications workbook for POEM related projects (11/14)
generate report of case level CCS classfications for PRNCPTX on POEM_Sample dataset and get Michael and Harvey's feedback on generalizing further per Ted (11/19)
generate report of PSI test and development set results and give to Ted for feedback (11/19)
- - PSI_Test_Results.pdf: PSI_Test_Results.pdf
- - PSI_Dev_Results.pdf: PSI_Dev_Results.pdf
Import Fern's extraction table for Cardiac Arrest, transpose it to PoemSample level and then work with Ted on creating some rules in SAS for flagging Cardiac Arrest on the Beta set. Basically we will settle on the combination that maximizes sensitivity of: (11/24)
- TIU Notes: #85 + #86
- Discharge Summaries: #88 + #89
- SNOMED: #85 + #88
- Regular Expressions: #86 + #89 (also, the decomposing the various clusters)
- ALL: #85 + #86 + #88 + #89
  - Results: (11/24)
  - CDArrest_Output.pdf: CDArrest_Output.pdf
email Fern 2 pieces of information (11/24)
- 1) The CCS categories from CPT Procedure codes and also the more general mapping of those results created by Harvey and Michael
- 2) indicator for all the Beta controls and their CCS mapping to see characteristics for possible matching
generate printouts of relevant characteristics of the Beta dataset created at \\Vhatvhres4\data\POEM_Project\Test_Data\Creating PSI File for Boston group\Data_for_Fern_Matching for discussion on possible matching (11/24)
- - Beta_Complication_Counts.rtf: Beta_Complication_Counts.rtf

- - Beta_Controls.rtf: Beta_Controls.rtf

- - Beta_with_Pulembol_or_DVT_categories.rtf: Beta_with_Pulembol_or_DVT_categories.rtf
create dataset called "Beta_CCS_Michael_Harvey_DA_category.csv" located: \\Vhatvhres4\data\POEM_Project\Test_Data\Creating PSI File for Boston group\Data_for_Fern_Matching: that contains all Beta cases with the categories from Harvey, Dominic, and Michael as well as a binary variabled called discrepancy that is fired when any parties disagreed and email Michael for a consensus (11/25)
get counts on how many times different rules fired from the evidence table (11/30)
- Number_of_times_rules_fired_for_96_Beta_Cases.rtf: Number_of_times_rules_fired_for_96_Beta_Cases.rtf
Program matched cases for Pulembol_DVT on: (12/2)
- age +- 10 years
- Michael's specialty category
- operation date +- 1 year
  - Code (Data_for_Fern_Matching3), Output (Matched_Pulembol_Dvt_cases_to_Control.csv)
  - Location \\Vhatvhres4\data\POEM_Project\Test_Data\Creating PSI File for Boston group\Data_for_Fern_Matching
Get Descriptives of tables indicated by Ted in the PTSD/TBI Project Data Model and email to group. First verify with Elliot that the data model has been updated and there is a templated variable as well as a file size variable for exclusion purposes (12/4)
Create sampling groups of PTSD documents per Ted's specifications: (12/9)
- Gender. Oversample females to equal 10% of Development Set and 10% of Test Set.
- Era: Sample WWII at 5% Development Set N= 2 Test Set 7 Vietnam 40% Persian Gulf War at 55%
- Marital Status Married 50% Divorced or Widowed at 20% Single (S) at 10% Other at 20%
run query for MI on database and then generate analysis of results (12/11)
- MI_Output.pdf
run query for updated cdarrest on database and then generate analysis of results (12/15)
generate results of CDArrest for nsqip gold standard, reviewed gold standard, and nsqip or reviewed gold standard (12/16)
- CDArrest_all.pdf: CDArrest_all.pdf
email Harvey deidentified table of Beta Categories for final review by him (12/16)
generate results of number of times MI rules fired (12/16)
- number_times_MI_rule_fired.rtf: number_times_MI_rule_fired.rtf
write sas code for generating results of MI "Brown" Standard. MI "Brown" Standard occurs when any of following ICD9 codes are found in variable CDICD: (411.89, 410.91, 410.9, 410.81, 410.8, 410.72, 410.71, 410.7, 410.21, 410.2, 410.11, 410.1, 401.9) (12/17)
- Findings: No Hits
rerun analysis of MI with Q-Wave separated. Email results to group (12/17)
run Fern's query for UTI evidence table (12/18)
generate results of the following query combinations for UTI evidence table: (12/18)
- 1
- 2
- 2+7+9
- 9+10
- 7+91
- 5+6+10
  - uti_evidence_results.pdf: uti_evidence_results.pdf

generate results of number of times different UTI rules fired by NSQIP complication for UTI (12/18)
- UTI_number_times_rule_fired.rtf: UTI_number_times_rule_fired.rtf
manually check some TBI docs in SQL Server for consistency. Discovered that some short exams accidentally leaked through Elliot and Zhao's import of 3241 TBI documents. Per Ted, wrote SQL query that excluded those TBI exams with fewer than 19 sentences, bringing us to 1,929 TBI documents to sample from (12/29)
write TBI query from which we will sample and imported results into SAS to run sampling code on (12/29)
generate descriptives for making informed decisions about sampling proportions (emailed to group for input on 12/29)
write sas sampling code for TBI subsample and then import results in SQL Server (1/6)
document distributions of beta, development, and test for our target variables and get N's of strata (1/6)
- distribution_of_beta_development_test_for_tbi.rtf
meet with George about doctypeid for TBI, the location of TBI, etc. George also caught me up on his organization of the databases in SQL Server (1/8)
Zhao has to restuff PTSD and TBI documents because there was an error in his tagging.. so need to verify that he will not be changing documentid or anything else on his restuff before sampling. Verified by Elliot on 1/10
rerun random sampling code and reimport onto res6 sql server for PTSD s.t. they're stratified by (1/12):
- Gender: oversample females at 10%
- Era: Active Duty Navy or Army at 5%
  - Vietnam at 40%
  - Persian Gulf War at 55%
- Marital: Married at 50%
  - D or W at 20%
  - Single at 10%
  - Other at 20%
Generate Sampling weights for stratified PTSD and TBI sample in case we want to extrapolate our results to the entire sample (1/12)
- Because Ted didn't want to use sampling weights, I also generated code that would set each strata's sample size proportional to it's joint probability of occurring in the original sample. This is another way of making extrapolation acceptable? But need to double check my logic on this contruction just to make sure
write queries for Beta, Development, and Test for Enlai to pull TBI subsample (1/14)
re-write query for Enlai to pull new PTSD subsample (1/14)
meet with Ted and Ruth to go over Annotation Study and talk about how to analyze the data for this study (1/15)
meet with Fern to talk about next steps for POEM rule building meetings and prepare handouts for that meeting. She also walked me through some of her queries and data checking of George's load of PTSD and TBI in the data model in case I need to do more checking in the future (1/19)
meet with Fern, Rob, Ted, and Kim for POEM progress reports and next steps. Also received some training from Fern on the MCVS data model so I can start querying the evidence tables (1/21)
sample 20 ptsd docs for annotation study. Use all of beta and stratified random group of 13 development. Email sample and distribution to group (1/22)
generate descriptives of study samples for annotation, ptsd, and tbi and email to group (1/25)
generate results for combo of rules 1& 2 for UTI (1/26)
generate 5 random extra ptsd documents not in beta, development, or test for testing in the annotation study (1/27)
query evidence table for DVT (1/28)
run analysis for evidence table pull of DVT rule combinations (1/28)
- DVT_EVIDENCE.pdf
query evidence table for pulmonary embolism (2/1)
run analysis for evidence pull for pulmonary embolism combinations (2/1)
- pulembol.pdf: pulembol.pdf
get number of times rule fired for Beta--pulmonary embolism and dvt (2/1)
meet with Fern to cross-train for querying evidence table (2/3)
run query in POEM Rules for queryid 8 (2/5)
rerun analysis for MI with QWave excluded (2/5)
- MI_updated.pdf: MI_updated.pdf
get report of counts on dvt hits within hospitalization, out of hospitalization, etc (2/8)
generate flags and counts for MI bronze standard (2/8)
generate table for all waves parsed as well as beta controls and flag for MI bronze standard and give to michael so that he can set up reviewer in Access (2/8)
- code and dataset located " \\Vhatvhres4\data\POEM_Project\Test_Data\Creating PSI File for Boston group\Create_flag_bronze_standard_MI"
Meet with Ruth and Ferdo to look at outputted data from Knowtator and find out more about Annotation study (2/9)
annotate my own fake document and export resulting xml marked up data to see how it could be analyzed (2/9)
fix date and import waves parsed and mi bronze standard into sql server with properly formatted variables so Michael could get the dataset into Access for their reviewing (2/9)
Compile information about the SAS XML Mapper tool for Ruth (2/9)
Query evidence table for renal failure (2/9)
generate "analysis" for renal failure (sensitivities, specificities, ci's)(2/9)
- results located " \\Vhatvhres4\data\POEM_Project\Test_Data\Creating PSI File for Boston group\RenalFailure_Beta"
generate contingency tables for number of times each rule fired by gold standard complication for renal failure (2/9)
- results at " \\Vhatvhres4\data\POEM_Project\Test_Data\Creating PSI File for Boston group\RenalFailure_Beta"
import xml marked up data from our sample annotation in knowtator into sas with sas xml mapper tool and export as structured csv dataset (2/12)
- SAS code located: " \\Vhatvhres4\data\POEM_Project\Test_Data\Creating PSI File for Boston group\annotation_trial"
generate dataset for final adjudication on ccs categories for poem sample (previously this had only been completed for beta) (2/17)
- dataset emailed. Code and dataset located: " \\Vhatvhres4\data\POEM_Project\Test_Data\Creating PSI File for Boston group\Adding_Physiological_Specialty_categories_to_NSQIP"
send Harvey simplified table for his adjudication (only relevant variables and no duplicates) (2/22)
send Michael sample SAS code for fixing format issue for SQL Server import (2/22)
meet Fern and Vinnie to assign tasks for validation of some documents of Enlai's indexing runs (2/23)
generate results for additional rules for MI: (2/23)
- 17 or 94
- 17 and 94 and 105
- Include stats for queries with no hits
- Results for Qwave and NonQwave
- MI_queries_rules_results_beta.pdf: MI_queries_rules_results_beta.pdf
generate results for Pneumonia queries (2/23)
- pneumonia_queries_rules_results_beta.pdf: pneumonia_queries_rules_results_beta.pdf
generate tables for number of times queries/rules fired--Pneumonia (2/23)
- number_times_rule_fired_pneumonia.rtf: number_times_rule_fired_pneumonia.rtf
incorporate harvey's adjudications into poemsample so that we have a final_specialty and final_physiological_category for all of poemsample (2/24)
randomly select 10 dvt cases from poem development set that meet following criteria for Fern's poster: (2/24)
- inpatient (inout = 1)
- date of dvt occurs within hospitalization admit and discharge date
- positive for dvt
program a random set of 3 waves of control matches for 10 dvt cases st.: (2/24)
- inout = 1
- age +- 10 years
- final_physiological_category equivalent
- operation date +- 365 days
randomly select 10 dvt cases from poem development set st (2/26):
- inpatient (inout = 1)
- date of dvt occurs outside of hospitalization admit and discharge date
- positive for dvt
program a random set of 2 waves of control matches for these 10 dvt cases st. (2/26):
- inout = 1
- age +- 10 years
- final_physiological_category equivalent
- operation date +- 365 days
randomly select 7 dvt cases from poem development set st (3/1):
- outpatient (inout = 0)
- positive for dvt
program a random set of 2 waves of control matches for these 7 dvt cases st. (3/1):
- inout = 0
- age +- 10 years
- final_physiological_category equivalent
- operation date +- 365 days
write code to structure annotated free text data and then generate results for Ferns NLP poster (3/7)
review Ferns NLP poster for feedback (3/7)
pull results from SQL Server database and generate some basic stats (kappa, sensitivies, specificity, etc) to test agreement of 2 reviewers for Ruth's note title study poster (3/15)
program results for anion gap into results (3/22)
- Anion gap acidosis: this is defined by either:
  - [Na + K] - [Cl + HCO3 (or serum CO2)] > 16
  - Na - [Cl + HCO3 (or serum CO2)] > 12
pull all queryid results for sepsis from sql server (3/23)
create table of sensitivies, specifities, confidence intervals on results for sepsis (3/23)
query results for queryid 108 and incorporate it into MI evidence table. Validate data: Found error in evidence table and sent to Fern for repair. Also change MI table formats (3/23)
add Harvey's following proposed rule/query combinations to results tables:(3/24)
- UTI: QueryIds: 2 or 7 or 9 or 91
- Renal Failure: QueryIds: 101 and 103
per Ted, add stats for exclusion queries for Renal Failure to current Renal Failure table (3/24)
obtain indexing run tables from poem_nextgeneration and conduct freq distribution on app_ids for each completion status on all 4 servers (3/24)
take unstructured xml marked up data from annotated documents sampled by Ted and Ruth and then write code to make the free text markups structured: (3/25)
- I've done this using SAS XML mapper toold. Further, tampa sent python code to do this as well and Michael has a script. might want to see if cole can look into python code and help if available
generate table of some POEM related work (general descriptions with one example of how i filled role) from last annual performance review until this annual performance review (1 year period) as well as goals for Kim and Ted to look over (3/29)
generate basic stats on overlap between NSQIP and PSI and bring to Poem rule building group (3/31)
calculate prevalence rates of each type of wound infection: orgspcci, wndinfd, supinfec (3/31)
obtain new indexing run tables from poem_next generation and additional app_ids for tbi and ptsd for each completion status on all 4 servers (4/1)
generate results for PSI Development and Test Sets on POEMSample (previous results were for entire NSQIP dataset) for complications occurring at any time relative to hospitalization discharge (4/6)
generate results for PSI Development and Test Sets on POEMSample for complications occurring before hospitalization discharge (4/6)
generate tables of overlap between POEMSample and PSI (4/6)
query results for wound infections evidence table and then generate results on Beta (4/6)
generate documentation for matching from nsqip to psi and give to ted and group (4/8)
generate documentation for how poemsample was created and give to ted and group (4/8)
organize and compile information from emails between amy rosen and shibei to ted about their exclusion process (4/8)
generate table of number and types of complications occurring for those PSI hospitalizations that hit on multiple nsqip operations (4/9)
meet with cole and look at python script for creating csv dataset from xml-marked up data, make a few changes to the code and generate csv data from the data from fern's dvt study (4/12)
generate raw dataset of hospitalizations with multiple nsqip operation hits and relevant variables (4/12)
generate PSI results on development (minus beta) and test sets of poemsample for only those expected to be indexed when complication occurs before hospitalization or complication occurs at any time (4/14)
From 4/14 to 5/25 I didn't update. But the following is some of the relevant updates for that time period.
review AHRQ published PSI-generating SAS code and determine when/where they're excluding and facilitate feedback regarding from the group
edit various exclusions components of PSI generating SAS script to get a better idea of how exclusions are impacting our results
- Findings: The entire PSI algorithm is virtually dependent on formatting standards in ICD9 codes. However, we found that there is much minor formatting variability in the input ICD9 codes that is not accounted for in their PSI-generating SAS code which is causing some loss in ability to correctly flag for complications. Not only are they unable to accurately flag the proper icd9 codes because of formatting, they are also being excluded from the denominator in the assessment for how well the PSI is performing. Upon discussion with Shibei and other members of the group and extensively reviewing the SAS code line by line and its impact on our data, discovered that there are not standards to account for all potential variability in coding for which program accepts formatting of the ICD codes. There are some published standards such as the string code must be left-justified and padded with spaces to the right which was originally accounted for and addressed in our input dataset. But there are many instances of random padding of 0's or spaces and other permutations of the code input that are just not being accounted for in their SAS code consistently causing additional exclusions.
- Current solution: Since we are comparing POEM results to that established PSI algorithm, we will have to publish the PSI as it was written, knowing the issue but they thought could be a good paper. Still aim to compare this to POEM results of those hospitalizations that were not excluded by PSI algorithm (many due to incorrectly identifying formatting variability and others because they were legitimately on the published list of exclusions) : the "one to one" matches for Harvey's paper... In addition some information will be provided about how POEM performed on those hospitalizations that were excluded from PSI analysis
revisions on informatics paper "Are Posttraumatic Stress Disorder Mental Health Terms Found in SNOMED-CT Medical Terminology"
update Table 1's (patient level and case level) by Development and Test for POEM including only the indexed samples and also add some other relevant variables brought up by Harvey
add combinations to wound table plus Fern's updated queryid 25. Note: there were several query updates and revisions to wound table.
Prepare a more detailed project flow plan of next steps for the group in words and project flow chart (Visio diagram) now that Development set is 95% indexed and almost ready to go (5/24)
Generate Table 1 for Harvey's POEM-PSI comparison paper (5/24)
Make template changes to MI and Pneumonia of Harvey's flow tables based on what is in the code and produce counts for Renal Failure, MI, Pneumonia, PEDVT, Sepsis on our subsample for POEM-PSI paper (6/9)
Table 1--Subsample description for POEM-PSI paper (6/9)
Generate Appropriate Descriptives for Ruth's poster (6/12)
Create tables for Michael (6/14)
Produce Updated Table 1's for POEM paper (6/16)
finish the 4 page flow diagram of the impact of the different PSI rule components and produce baseline statistics at various levels of the diagram (6/23)
trainings (6/28)
merge in PTF race information to PSI dataset and generate a contingency table for comparing Race variable from NSQIP data to the Race variable from the PTF files (6/28)
update Table 1's with additional variables including new race variables, age at hospitalization admit date, etc (6/29)
tweak psi-generating sas code to account for exclusions occurring in unintended order and update findings in pe/dvt portion of contributions table (6/29)
update pe/dvt flow table with findings (6/29)
run the rules Fern has completed from structured data for uti (6/30)
generate a consensus race variable from multiple sources based on following criteria (7/7):
- if either PTF race or NSQIP race is unknown then consensus_race = known race category
- else consensus_race = the rarer category between differing race categories of the 2 sources
recode Atype variable and rerun in PSI algorithm (Previously if any of the NSQIP operations within a PSI hospitalization had NSQIP variable emergncy = 1 (indicated for emergent case), then Atype = 3 (emergent). Now give Atype value according to emergncy for the 1st consecutive NSQIP operation only. The group thought that this recode would be more accurate, appropriate, and slightly lower the number of exclusions due to this category (7/7)
Produce Contingency Tables and diagnostic tests results for PSI outcomes according to updated Atype (7/7)
Update Ted's flow table for Renal Failure, Pe/DVT, Sepsis according to updated Atype (7/7)
run results of Rule 106 for pneumonia (7/7)
produce raw frequencies at the sub-concept level for Annotations completed so far for Ruth's Annotation Paper. Show Ruth several instances where subconcepts were not marked at all but rather the "super-concepts" were marked. My advice was for her to show these few instances to the annotators and have them indicate the appropriate subconcept for the phrases they flagged so all the data at this subconcept level can be most appropriately compared. Ruth fielded to Ted for his preference (7/9)
generate CCS comorbidities for POEMSample. Also read literature on how these have been applied (7/14)
generate more results for Ruth's Annotation paper so that the group understands the issues (7/14)
generate DVT development minus beta set rules 81,84 results for POEM paper (remind group that Development -beta results for POEM-PSI comparison paper will be different sample at hospitalization level) (7/19)
send comparison of results for DVT from original indexed beta to those beta indexed in development set (7/19)
add 7 comorbidity variables produced by the CCS comorbidity algorithm by Elixhauser et al to Harvey's Table 1 per Ted's email (7/21)
per ted's request of sending out more informal intermediary tables as I produce them, I sent out all the contingency tables for the 15 POEM rules queryid's Fern sent me on 7/20....at operation/case level for development minus beta set (7/22)
program an algoorithm for and then send Ruth and Ted a dataset of the structured data for the 5 documents annotated between Ted and Elliot for Ruth's annotation project---where an observation was a phrase that was annotated, and there was information indicating a match between annotators if the start and end span of the phrase was exactly equivalent (7/26)
the remaining POEM-paper procedure level contingencies for primary and alternative rule components determining cdarrest and urinary tract infection (7/28)
PSI hospitalization-level "results" for PE/DVT including (7/29)
- contingency of PSI algorithm minus exclusions where NSQIP gold standard must occur prior to or on hospitalization discharge date
- contingency of POEM primary algorithm minus PSI PE/DVT exclusions where NSQIP gold standard must occur prior to or on hospitalization discharge date
- contingency of POEM primary algorithm where the DataDate for the specific queryid's that make up the rule components firing must occur prior to or on hospitalization discharge date and where NSQIP gold standard must occur prior to or on hospitalization discharge date
- contingency of POEM alt algorithm ''
- Some crude tables of a first glance view of how the number of days in a hospitalization that POEM is not indexing results on might be related to outcomes
Update Pneumonia and MI portion of Teds flow table accounting for updated Atype (8/3)
add Hispanic, Marital, Period of Service, and raw count of comorbidities included in Elixhauser et al's list of relevant comorbidities according to their ICD9 code comorbidity generating algorithm I ran on POEMSample to Table 1 (last meeting Ted wants Hispanic to be based on ethnicity var in NSQIP) (8/2)
update Harvey's separate flow tables for the 5 PSI complications for his paper with PSI results altered according to updated atype variable (8/9)
send out a bunch of results on corrupted res8 data they knew they're going to have to correct (8/24)
send out contingencies for POEM level reverse-engineered pulembol, dvt, pneumonia (8/31)
send out contingencies of PSI-level reverse engineered pneumonia for Harvey's rules (8/31)
send out contingencies of PSI results updated with corrected Atype variable definition (8/31)
edit Python script from tampa so that Ruth can pull out structured data from XML files produced in Knowtator for analaysis. Then meet with Ruth and show her how to run it (9/1)
run reverse engineered MI rules and send out contingencies of POEM level MI rules for development set (9/4)
during my regular database checks of evidence table, I discovered incorrect operation dates (they were in the early 1900's!) and reported to Fern for correction (9/7)
send out contingencies of PSI level reverse engineered pedvt rules for development set (9/10)
send out contingencies of PSI level reverse engineered MI rules for development set (9/14)
program rule for anion gap (rule for sepsis) from raw data for each poem operation of development set and send out contingencies (9/15):
- [Na + K] - [Cl + HCO3 (or serum CO2)] > 16
- Na - [Cl + HCO3 (or serum CO2)] > 12
run all rules for reverse engineered sepsis at POEM (operation) level for development and send out contingencies for all rules (9/20)
send out all confidence intervals for Harvey's POEM-PSI paper abstract for top performing sensitivity-specificity combination (9/28)
program rule for cardiac arrest for search of free text data for Text: "CRASH" or "CODE": findings at for each POEM level operation (9/28)
generate contingencies for all rules at POEM operation level for cardiac arrest of development set (9/29)
generate latex table 1 for only those elements Ted requested for POEM abstract to send to group (9/30)
generate contingencies for renal failure at PSI level of development set and forward to group (by 10/1)
generate results and confidence intervals for all levels of POEM-PSI abstract, discuss interpretation with Harvey, and review Harvey's abstract for submission (by 10/1)
generate results and confidence intervals for all levels of POEM abstract and review for submission (by 10/1)
generate confidence intervals and discuss interpretation with Ruth for her Time abstract and review for submission (by 10/1)
clean Fern's PAIN database, discuss output to be produced, and generate pain to LOS descriptives for Fern's abstract and review for submission (by 10/1)
indicate findings to Ted where Shibei's alternative definition PSI: Pneumonia incorrectly flagged 2 cases (by 10/1)
program an algoorithm for and send Ruth and Ted a dataset of the structured data for the 5 documents annotated between Ted and Elliot for Ruth's annotation project---where an observation is a unique phrase that was annotated by either annotators, and there was information indicating a match between annotators if the start and end spans of the phrase overlaps by one character (10/10)
run all rules for UTI rev engineered dev, export from SQL Server, generate contingencies, and send out to group (assigned Thurday 10/14; sent out 10/19)
run all rules for Wound rev enginneered dev, export from SQL Server, and generate contingencies, and send out to group. Note, nsqip wound infection is defined by hits on orgspcssi, wndinfd, or supinfec. (assigned Monday 10/18; sent out 10/19)
double check sensitivities and specificities that concerned Ted in POEM rules meeting (assigned 10/19; completed 10/20) Findings: they were correct.
read Harvey's POEM-PSI paper draft and per Ted's request meet with him to go over ambiguous areas, edits to the initial draft, and code still need to be written for production of descriptives/tables (assigned 10/19 and completed 10/21)
send patient level findings for DVT that was completed last March for Fern's poster (Fern requested 10/23; sent out on 10/24 and 10/25)
meet with Fern and help design dvt poster and run code to generate descriptives (10/25)
complete Tables in Harvey's manuscript per Harvey's request (10/27...Note: finding as of 10/28 that these may be incorrect because they found more errors in the xml tagged documents that were reverse engineered)
read initial draft of Annotation analysis plan (10/29)
write and run code to get a better idea of how c_complication variables were originally created by Ted because I was unexpectedly finding instances where a complication was associated with a particular operation when it occurred up to 2 months after the operation date despite the understanding that the rule for NSQIP nurses was to associate complications to operations occuring within 30 days. Produce update and indicate my findings regarding these variables at POEM Rules meeting 11/1....Solution: They decided they want to continue to use the c_complication variables Ted created for operation level gold standard rather than the original NSQIP variables....and the resolution was to explain this method in the methods section of paper.
Per Ted: Meet Ruth to go over signature study (basic study to assess how accurately medical profession titles can be extracted from free text doctor's notes) , provide feedback on design, and discuss code I will need to write (11/2)
Fern sent: "updated evidence table with Urine nitrite and esterase--diagnostic tests for query 126 in the StructuredPlusOrigDevEvidenceUnion" ~ export raw data from SQL Server, transpose on POEM level, and email contingencies (with CIs) to group (Fern sent 10/27, then it was cancelled on 10/28 when they find xml problemg by Harvey and then reassigned on 11/1 by Fern. I completed and sent on 11/2)
generate an operation-level table of all false negatives for DVT according to preferred POEM rule combination: 70,81,82,84 for Harvey to start manual reviews (11/3)
meet with Ruth, Kim, and Ted -- discuss edits for PTSD annotation analysis plan (11/4)
per Ted's request, discuss output I initially helped design with Ruth 1 on 1 for her signature study poster (study to assess how accurately medical profession can be extracted from free text doctor's notes) -- (update: she receieved distinguished poster award at AMIA and is presenting--yay!), make edits to design, and write code to run results for development and test calculating sensitivities, specificities, accuracy & confidence intervals around as well as kappa/agreement (11/9)
preliminary checking and cleaning of raw data for PTSD annotation project...send findings to Ruth....discuss handling of documents marked up for a single concept that are separated in the free text. Meet with Ruth to go over and discussed how to handle.. (11/11)
create and send Fern spreadsheet so she can do error analysis for iteration on development set --calculates sensitivities & specificities given false positives and true positives from her SQL Server output (assigned 12/2 sent out 12/2)
export raw data from evidence table in SQL Server--post deletion of incorrectly mapped documents , transpose on PSI level for Harvey's paper, and email contingencies for original hospital complications for PSI paper (assigned by Ted 12/2 sent out 12/3)
fix error in pedvt psi level calculation code and send out updated results (assigned 12/6 corrected and sent out 12/9)
export raw data from evidence table in SQL Server--post deletion of incorrectly mapped documents , transpose on PSI level for Harvey's paper, and email contingencies for alternative PSI complication: Pneumonia (requested by harvey 12/8, completed 12/13)
export raw data from evidence table in SQL Server--post deletion of incorrectly mapped docs , transpose on PSI level for Harvey's paper, and email contingencies for alternative PSI complication: MI (requested by harvey 12/8, completed 12/14)
generate confidence intervals for all final rules (requested by harvey 12/8, completed 12/14)
export evidence tables from SQL Server, transpose and generate contingencies, diagnostic tests, with CI's for all development set final rules associated with each complication at POEM paper/operation level for the following complications (requested (Ted) 12/14, completed 12/15)
- Renal Failure
- PE
- DVT
- Sepsis
- MI
- Cardiac Arrest
- Pneumonia
- UTI
- Wound Infection
Update Latex Table 1's for POEM paper ~ operation level (Development and Test) (assigned Ted 12/14, completed 12/15)
Complete Tables Ted sent to me and forward back to him for his presentation (assigned Ted 12/14, completed 12/15)
export evidence tables from SQL Server, transpose and generate contingencies, diagnostic tests, with CI's for all TEST set final rules from development associated with each complication at POEM paper/operation level for the following complications (fern sent data 12/18, completed 12/20)
- Renal Failure
- PE
- DVT
- Sepsis
- MI
- Cardiac Arrest
- Pneumonia
- UTI
- Wound Infection
as yesterday was requested to run the final rules without the relevant indexing components from imaging notes and discharge summaries, today Fern and Vinnie were able to update that data from the Server so need to rerun: export evidence tables from SQL Server, transpose and re-generate contingencies, diagnostic tests, with CI's for the TEST set final rules for the following complications that were previously missing relevant components from imaging and discharge summaries(fern/vinnie updated data 12/21, completed 12/21)
- Renal Failure
- PE
- DVT
- Sepsis
- Pneumonia
Send Ted raw counts of "Cases" and "Controls" for Sampled Development and Test. Where "case" is defined by having any of the 9 relevant complications (Ted requested 12/21, completed 12/21)
Send output of Prevalence Rates of all 11 complications as well as all 11.........across all VISN9 data (that we orginally sampled from) for Ted's report (Ted requested 12/21, completed and sent out 12/22)
Harvey sent a list of dev set alternative rules for me to run for his appendix including all rules ran on beta. however since there were several rules from beta that were not updated as well as a couple additions to the development evidence table (as reflected in Fern's POEM Rules documentation), I sent him back his list with this information incorporated. Export evidence tables from SQL Server, transpose at PSI (hospitalization) level , import PSI outcomes data, generate contingencies for available alternative rules for Harvey's appendix (Harvey sent his old version of the beta list 12/15, I sent contingencies for following available rules of MI, Pneumonia, and Sepsis on 12/22)
- PNEUMONIA:
  - 59, 60, 61, 63, 43 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 44 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 106, 107
- MYOCARDIAL INFARCTION:
  - 17, 18 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 19, 20, 21 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 22 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 92 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 93, 94, 92, 105, 19 or 92, 20 or 93, 17 or 94, + 108 (LISTED IN POEM RULES DEV DOC BUT WAS NOT IN BETA FOR THIS COMP)
- SEPSIS:
  - 69, 113 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 29 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 49, 51, 66 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 54 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 55, 56 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 57, 109, 43, 44, 25, 48, 50, 111 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 62, 67 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 52(LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 53 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 112 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP), 64 (LISTED IN POEM RULES DEV DOC AS NOT AVAILABLE FOR THIS COMP)
Below I have listed the latest query IDs for renal. These are from the most up to date Rules document
- 8, 14, 98, 101, 103, 104, 120 (Harvey sent updated list 1/6; I exported from SQL server, and generated contingencies, CI's, and diagnostic tests.. sent on 1/7)
l need the new confidence intervals for the sepsis final rule sensitivity (Table 3) ~ (Harvey sent on 1/6; returned 1/7)
CI's for the rules included in the appendix (Appendix) ~ (Harvey sent 1/6; returned 1/7)
rules run without time exclusions (i.e. event or alert occurring anytime) (Table 4)
- combined Renal Postop 8,14,98,101,103,104,120 minus Renal PSI exclusions by NSQIP gold standard Renal (anytime): (Harvey sent 1/6; returned 1/7)
Export evidence tables from SQL Server, transpose at PSI (hospitalization) level , import PSI outcomes data, generate contingencies for available alternative rules for Harvey's appendix for following remaining ~ (Harvey sent 1/6; returned 1/10)
- DVT
  - 70, 79, 81, 82, 84
- Pulmonary Embolism
  - 68, 75, 76
rules run without time exclusions (i.e. event or alert occurring anytime) (Table 4)
- Contingency of POEM Rules for combined 68, 75, 76 and DVT 70, 81, 82, 84 minus PE/DVT PSI exclusions by NSQIP gold standard PE/DVT (Harvey sent 1/6; returned 1/10)
Ruth sent 1/31: "Is there any way that today you could re-run the SAS code such that it produces a file like “test signature” only it would be for development instead of test? The SAS code spit out exactly what I needed in the file called “test signature” (the performance of national and local titles in the test set) I attached it here. But it does not produce the same kind of file for the development set (even though it accesses it). The SAS code accesses the file called Prov DevRoundII Results, which I attached. I think at the time I asked you for it, I thought I wouldn’t need to report the performance of the development set." I emailed her returned contingencies and conf intervals of each "type" of signature performance (ie nurse, md, etc), the overall development set kappa agreement, and sent her links to papers that I think would reference the type of statistics used for her signature study paper on 2/1
generate code and datasets for annotation project (raw annotated documents in endnote for 4 reviewers, 2 docs each, 54 possible concepts -> cleaned dataset of variables (reviewer, concept, document, FN, TP, precision where an observation is defined by unique document*concept*reviewer.... using substitute gold standard data because actual data for this is still not available (2/10-2/24)
meet with Ted&Kim&Ruth. go over the output for complete factorial design anova at concept level for model y(recall) = u + r(reviewer) + c(concept) + r*c + d(document) + r*d + e Note: the actual data for this study is still not available although Ted said might be in next week (on 2/24)
add following variables to concept level dataset: (2/26)
- FP, precision
- run "example" analysis w/ precision as dependent variable at concept level
Harvey's request 2/23, he updated that he needed more info on 2/28: generate datasets for all false negatives for each of the complications in the poem-level development sets. Email 3/3: I generated 8 csv files in the folder: \\Vhatvhres4\data\POEM_Project\Test_Data\Creating PSI File for Boston group\DevelopmentPOEMRules_Res8issue\POEM_Results_Post_Deletion called:
fn_cdarrest
fn_renal
fn_mi
fn_pe
fn_dvt
fn_sepsis
fn_pneumo
fn_wound
Individual Observations represent a false negative in POEM-level full development set. Each contain the following variables in this order:
1) patientid 2) nsqip operation date 3) nsqip operation date time 4) c_complication (ted's rolled up nsqip complication indicator) 5)ssn 6) poem complication indicator for best performing @ dev
7) fn for poem complication indicator (will always be = 1 for all observations in dataset)
8) date of nsqip complication
Specific details to address for example data management and programming of analysis plan for PTSD annotation on SAMPLE data (3/8)
- import class via crosswalk link from concept and generate dataset at reviewer*document*class level. Write code for descriptives, graphics, and complete factorial anova code at this level for dependent variables precision and recall
- finish formatting dataset and writing code for second sub-study: incomplete block design analysis of variance
email Ted & Kim to see if they have any specific changes to analysis plan with sample code/output I produced from the last meeting and to let them know that I've finished first run of analysis plan on sample data so that can make changes to analysis plan and output before running analysis on official data (update: emailed 3/3 and 3/8 ~ we are meeting 3/11)
Produce scatter plots of Precision by Concept, Precision by Class, Recall by Concept, Recall by Class (assigned 3/9 produced 3/11. Note: Ruth updated raw data for this 3/10 so it looks like we should now have the official, unchanging raw dataset to use from here on out)
Assigned Friday 3/11:
- Produce Jitter boxplots of recall & precision outcomes:
  - - for various classes at class level for sub-study 1 (complete factorial)
    - for various concepts at concept level for sub-study 1 (complete factorial)
- Background information and Steps Required:
- Background information about raw data I'm given for this study and data management required to get data into format for analysis and plots (completed steps are listed):
  - - 4 reviewers and an adjudicator tag raw text of 10 clinical documents each (with a total of 20 unique documents to choose from with only 2 documents shared by all 5) for any of a list of 55 potential clinical concepts. Any combination or amount of text span can be tagged/associated with a particular concept (of those 55 listed) in the annotation tool: Knowtator. Knowtator's export capabilities are such that it can produce XML output of all this information (note: this is the most structured form of data that Knowtator is able to export). ~Ruth sent data on 3/1
- Step 1) write a script that parses out the relevant structured elements of this XML output (all XML output follows the same general schema) into the following variables for each reviewer (where a single observation is any span of text): noteid (ie docid), person (ie reviewer), classtype (ie concept tagged for associated text span), start (ie start of span location), end (ie end of span location), slotvalue (ie associated positive, negative, or neutral assertion), text (ie the raw text tagged) where an observation is a single span of text. (this code written months ago~but actual data arrived 3/1)
- Step 2) Use these 5 | delimited datasets (separate dataset for each reviewer where an observation is a single span of text) and produce 2 datasets for sub-study 1:
  - 2a) concept level sub-study 1: complete factorial including data only from the 2 documents shared by all
    - variables in dataset: documentid, reviewer id, concept id, tp, fn, fp, precision, recall
    - observation unique key identifier = documentid * reviewer * concept
    - TP for obs @ unique documentB *reviewer B * concept B = number of times we see any span of text overlap between reviewer B and adjudicator (by 1 or more characters) that was subsequently tagged with concept B by both reviewer B and adjudicator on the same document B
    - FN for unique obs @ documentB * reviewer B * concept B is number of times that for each span of text tagged by gold standard for document B and concept B, reviewer B either did not tag any corresponding overlapping text (by 1 or more characters) OR reviewer did not tag with concept B
    - FP for unique obs @ documentB * reviewerB * concept B is number of times that for each span of text tagged by reviewer B for concept B on document B, the gold standard either did not tag any corresponding overlapping text (by 1 or more characters) or did not mark concept B for overlapping text
  - 2 b) class level sub-study 1: complete factorial including data only from 2 docs shared by all
    - variables in dataset: documentid, reviewerid, class id, tp, fn, fp, precision, recall
    - observation unique key identifier = documentid * reviewer * class
    - class is a "superset" of concept so hopefully we can follow how this dataset would be created.
- Step 3) Produce Graphs:
  - Scatter Plot overlayed with Box plot for Outcome Precision by Concept using dataset from Step 2a.
  - Box Plot of Outcome Precision by concept w/inset statistics listed for each concept using data from Step 2a.
  - Scatter Plot overlayed with Box plot for Outcome Recall by Class using dataset from Step 2b
  - Box Plot of Outcome Recall by class w/inset statistics listed for each concept using data from step 2b.
  - Previous Steps Completed 3/16
From Friday meeting (3/11, changed/updated 3/16) assigned the following tasks by Kim & Ted
- Scatter plot overlayed boxplots as well as boxplots w/inset statistics of recall & precision outcomes:
  - for various classes at class level for data from sub-study 2 (incomplete block design)
  - for various concepts at concept level for data from sub-study 2 (incomplete block design)
- also produced table of TP, FN, FP, precision, recall for concepts by reviewer @ concept level and class by reviewer @ class level
- update from 3/16 meeting: still produce these as planned but instead of producing these for Incomplete Block of 18 documents, IBD for all combined 20 docs. Also, document is no longer independent variable in model. Therefore, dataset observations should be defined by unique reviewer*concept key for concept level analysis and reviewer*class key for class level analysis. Further, we should consider doing analysis as criterion level as well. Kim will update analysis plan. I need to produce the new plots for this and email to Kim and Ted when done.
- previous steps emailed to Ted & Kim @ 6:00 am on 3/21
complete expired VA training: citi good clinical practices (ted requested 3/23 sent 3/23)
update descriptives/graphics output for complete factorial model data (2 docs) to include table of precision, recall, tp, fp, fn for each concept/class per reviewer per document and email all updated output to group (completed & sent 3/25)
findings of eliminating uncertain assertions of text tagging (previously counted both positive and uncertain) as as indicators for "POEM"-identified complication for each of the complications on development set. Re-exported raw data of all "hits" for each type of queryid from SQL Server database, tranposed data at POEM operation level dataset (unique patientid*operationdate combination), calculate dichotomous hits for each proposed POEM rule at this level (combination of queryids required ie OR AND statements), produce descriptive tables and contingency tables relative to VASQIP nurse defined complication at this level for all complications: any vasqip wound infection, cardiac arrest, renal failure, sepsis, urinary tract infection, pulmonary embolism, dvt, ... (completed 3/28)
"We have observed on the false positive review that the many of our wound infection hits are related to operative findings. We also wanted to run the wound infection queries but excluding any hits occurring within 48 hours of the operative date. This strategy may help reduce the number of false positive hits (or may not) for development set" (Note: as of 3/29: have written code, however, Res10 Server on my computer has been down and cannot access R or SAS ) ~ completed & distributed at meeting 3/30
"We have observed on the false positive review that the many of our wound infection hits are related to operative findings. We also wanted to run the wound infection queries but excluding any hits occurring within 48 hours of the operative date. This strategy may help reduce the number of false positive hits (or may not) for development set" addendum from 3/30 meeting: they thought it might be beneficial to exclude ALL POEM hits (even if some occurred after 48 hours of operation) if any hits occurred within 48 hours for that particular person's operation (as they were already designated w/complication "preoperatively")... for the following complications:
- wound
- uti
- pneumo
- renal
- sepsis ~completed and emailed 3/30
generate p-values for poems-psi comparison paper of sensitivities & specificities (I produced confidence intervals previously...Harvey thinks p-values will be necessary.....) (Note: as of 3/29: have written code, however, Res10 Server on my computer has been down and cannot access R or SAS) ~ completed 4/7
generate csv of false negatives for UTI for POEM unstructured data ~ completed 4/12
Export evidence tables from SQL Server, transpose at POEM (operation) level , and generate contingencies for UTI development set rule: 2,7,9,91 unless present in 150 ~ completed 4/20
email Kim and Ted analysis output from sub-study 1 for Thurs 1:00 pm mtg to review output: factorial anova of 2 documents shared by all reviewers and met with Ted to go over output and next steps ~ sent and met to discuss 4/21
- output included: 2 pdf's:
  - Class Level Analysis Included
    - Descriptive of Raw Data Collapsed by Reviewer, Document, Class
    - Model w Dependent Variable Precision:
      - Standard output for Model of Precision = u + r(reviewer) + c(class) + d(document) + r*d + e
        
        where r, c are fixed effects and d, r*d are random
      - Residual Plot, a histogram with normal density overlaid, a Q-Q plot, and fit stats (AIC, etc)
      - Studentized REsidual Plot and associated histogram, Q-Q plot, and fit stats
      - Pearson Residual Plot ........
      - calculated ICC from output of model
    - Model w Dependent Variable Recall
      - repeat above...
  - Concept Level Analysis:
    - repeat class level analysis at concept level
Next Steps from output discussed with Ted on 4/21 (completed and sent out on 4/28 for 1:00 meeting)
- at concept level he only wants descriptives for 20 documents which he would like me to send as soon as completed (prior to weekly mtg)
- at class level and "pooled class level" he still wants to run full analyses which I will send by 1:1 meeting next week
- other than that he said output looked like more than they'll need for publication but should see if Kim has any feedback/requests....(emailed kim 4/21)
- Kim emailed addendum to tasks 4/26:
  - found a typo in my code:
    - your formula for ICC is not right, it should be (var(doc) - var (reviewer*doc)/3) / (var(doc) + var(reviewer*doc) + var(residual)) and you have: data class; set class; precision_ICC = ((0.001601 - 0.001575)/3)/(0.001601 + 0.001575 + 0.03585); run;
    - You are dividing the whole numerator by 3 and it should only be the variance of the interaction (k-1 df, k=4, # reviewers).
  - fix ICC and rerun models with REML in addition to Type3 options at class and pooled class levels.
Emailed explanation of changes to programming of data and data checks as well as results for models of most updated analysis plan by Kim for Annotation project (5/24)
- She responded 7/11 to remove interaction terms from model
programmed Anion Gap and SENT FINAL programmed contingencies of data transposed into PSI level (ie minus PSI exclusions for corresponding complications data) of POEM searches on development set for following complications and queryids: (6/21)
- RENAL
  - Single-Rules: 8 (Dialysis postop snomed lvg TIU), 13 ( Post Dialy% in title of note TIU), 14 ( Dialysis Postop keyword TIU), 16 (Keyword dialsys pre op Tiu ), 96 ( SM dialysis pre oP Tiu), 97 (CPT OP procedure code post op OPCPT), 99 (CPT OP procedure code pre op OPCPT),
    98 (Post DCSum SM dialysis DCSum), 100 ( Dialysis title note pre opTIU), 101 (Post SM Acute renal failure TIU), 103 (Post TIU KW Acute renal failure TIU), 104 (Post DCSum KW acute renal failure DCSum), 120 (Dialysis post op XML keyword leftovers TIU), 121 (Dialysis pre op xml keyword leftovers TIU), Combo-Rules: 101 AND 103, 8 OR 14 OR 98 OR 120, 8 OR 14 OR 98 OR 101 OR 103 OR 104 OR 120, 101 AND 103 OR 98 AND 104
- MI
  - Single Rules: 17 (Cardiac biomarkers Chlab), 18 (Cardiology in note title TIU), 19 (Q wave concept TIU), 20 (ST Segment ischemia
    TIU), 92 (Q wave DCSum), 93 ( ST Segment ischemia DC Sum), 94 (MI Exclude Q-wave TIU), 95 (MI Exclude Q-wave DCSum),
    105 (Troponin ≥ 0.5 Chlab), 108 (CK or Troponin Chlab), Combo-Rules: 17 OR 18 OR 19 OR 20 OR 93 OR 94 OR 95 OR 105, 19 OR 92, 20 OR 93, 17 OR 94, 17 AND 94 AND 105,
- Sepsis
  - Single-Rules: 25 (Inflammatory response, purulence TIU), 29 (Inflammatory response, purulence DCSum), 43 ( Blood culture
    Micro reg exp), 44 (Blood culture, mycology Micro reg exp), 48 (Septic Shock TIU), 49 (Septic Shock DCSum), 50 (Post op systemic infection (sepsis)) TIU), 51 (Systemic infection DCSum), 52 (Heart rate, > 90 Vital signs), 53 (Respiration, > 20 breaths/minute
    Vital signs), 54 (PaCo2 Chlab), 55 ( WBC > 12,000 cells/mm3 or < 4000 cells/mm3 Chlab), 56 (Bands Chlab), 57 (Anion Gap
    Chlab), 62 ( Shock with children TIU), 64 (Wound culture Micro reg exp), 66 (Pressor drug IVRx), 67 (Blood pressure systolic less than), 90 (Vital signs), 69 (Patient on respirator TIU), 109 (Anion Gap (as final score) Chlab), 312 (Temperature >38 degrees or < 36 degrees Vital signs), 313 (Patient on a respirator DCSum), Combo-Rules: 109 OR 57, 43 OR 44 OR 51 OR 50 AND 48 OR 49
PSI level POEM development set results for PE, DVT, and pneumonia (7/6)
PSI level POEM sequential tests for combinations for PE, DVT, Pneumonia, MI, Sepsis, and Renal (7/11)
Sent Harvey code for calculation of Wilson confidence intervals in R (7/10)
met with Ruth to go over additional data for annotation project per Ted's request (Arm 3) and ran Kim's updated model for Arm 1 (take out interaction effects) (7/18)
sent PSI level POEM sequential tests contingencies and calculations (with code) for all complications: PE/DVT, Pneumonia, MI, Sepsis, Renal with first pass POEM queryids and 2nd pass the corresponding PSI per feedback (7/20)

The following users may edit this page:

Set ALLOWTOPICCHANGE = VaHsrGroup, KimCrimin

Topic attachments
I	Attachment	Action	Size	Date	Who	Comment
rtf	Age_by_patienthasanycomp.rtf	manage	2.9 K	03 Aug 2009 - 16:26	KristenKotter
rtf	Beta_Complication_Counts.rtf	manage	4.6 K	25 Nov 2009 - 08:31	KristenKotter
rtf	Beta_Controls.rtf	manage	8.2 K	25 Nov 2009 - 08:31	KristenKotter
rtf	Beta_with_Pulembol_or_DVT_categories.rtf	manage	7.5 K	25 Nov 2009 - 08:32	KristenKotter
pdf	CDArrest_Output.pdf	manage	16.5 K	24 Nov 2009 - 15:29	KristenKotter
pdf	CDArrest_all.pdf	manage	24.8 K	16 Dec 2009 - 17:24	KristenKotter
sas	Checking_power_patientlevel_simulation.sas	manage	8.5 K	28 Jul 2009 - 14:14	KristenKotter
rtf	Comparing_PSI_to_NSQIP.rtf	manage	113.1 K	14 Sep 2009 - 16:27	KristenKotter
pdf	DVT_EVIDENCE.pdf	manage	16.3 K	28 Jan 2010 - 15:45	KristenKotter
rtf	Findings_where_casetocaseoperation_lessthan_orequalto_30days_forcomparison.rtf	manage	31.6 K	10 Aug 2009 - 13:02	KristenKotter
pdf	MI_Output.pdf	manage	14.9 K	11 Dec 2009 - 14:16	KristenKotter
pdf	MI_queries_rules_results_beta.pdf	manage	25.7 K	23 Feb 2010 - 22:23	KristenKotter
pdf	MI_updated.pdf	manage	16.4 K	05 Feb 2010 - 15:15	KristenKotter
rtf	Number_of_times_rules_fired_for_96_Beta_Cases.rtf	manage	36.1 K	30 Nov 2009 - 15:29	KristenKotter
rtf	OVERLAP_simple_stats_08182009.rtf	manage	68.3 K	17 Aug 2009 - 10:38	KristenKotter
rtf	OperationYear_by_patienthasanycomp.rtf	manage	4.7 K	03 Aug 2009 - 15:17	KristenKotter
do	POEMsample.do	manage	3.9 K	13 Aug 2009 - 17:55	KristenKotter	Ted's STATA code
pdf	PSI_Dev_Results.pdf	manage	15.8 K	19 Nov 2009 - 12:10	KristenKotter
pdf	PSI_Test_Results.pdf	manage	15.8 K	19 Nov 2009 - 12:09	KristenKotter
rtf	Summary_Simulation_Results_1000_iterations_patientlevel_7282009.rtf	manage	4.8 K	28 Jul 2009 - 14:24	KristenKotter
rtf	Summary_simulation_results_1002_iterations_patientlevel_07282009_histograms.rtf	manage	215.2 K	28 Jul 2009 - 16:39	KristenKotter
rtf	UTI_number_times_rule_fired.rtf	manage	46.8 K	19 Jan 2010 - 17:14	KristenKotter
rtf	Using_Ted_STATA_code_and_Robs_input_simple_stats_OVERLAP_08172009.rtf	manage	61.3 K	16 Aug 2009 - 21:00	KristenKotter
pdf	Weighted_SubSampling_Page1.pdf	manage	817.5 K	23 Jul 2009 - 16:34	KristenKotter	Rob's Weighted SubSampling Part1
pdf	Weighted_SubSampling_Page2.pdf	manage	603.6 K	23 Jul 2009 - 16:34	KristenKotter
pdf	Weighted_SubSampling_Page3.pdf	manage	509.2 K	23 Jul 2009 - 16:35	KristenKotter
rtf	case_level_prncptx_procs_descending_frequence.rtf	manage	212.9 K	25 Sep 2009 - 15:00	KristenKotter
rtf	distribution_of_beta_development_test_for_tbi.rtf	manage	63.3 K	06 Jan 2010 - 19:56	KristenKotter
pdf	latex_CaseLevel_Baseline_2.pdf	manage	22.6 K	24 Sep 2009 - 18:10	KristenKotter
pdf	latex_caselevel_1.pdf	manage	22.9 K	09 Sep 2009 - 21:16	KristenKotter
pdf	latex_patientlevel_1.pdf	manage	24.5 K	09 Sep 2009 - 17:12	KristenKotter
rtf	number_times_MI_rule_fired.rtf	manage	33.6 K	17 Dec 2009 - 12:00	KristenKotter
rtf	number_times_rule_fired_pneumonia.rtf	manage	62.2 K	23 Feb 2010 - 22:24	KristenKotter
pdf	pneumonia_queries_rules_results_beta.pdf	manage	22.2 K	23 Feb 2010 - 22:24	KristenKotter
pdf	pulembol.pdf	manage	15.7 K	01 Feb 2010 - 17:38	KristenKotter
rtf	race_by_patienthasanycomp.rtf	manage	3.6 K	30 Jul 2009 - 13:32	KristenKotter
rtf	race_by_patienthasanycomp_withmissing.rtf	manage	40.0 K	31 Jul 2009 - 15:47	KristenKotter
rtf	sex_by_patienthasanycomp.rtf	manage	4.9 K	30 Jul 2009 - 13:32	KristenKotter
rtf	site_by_patienthasanycomp.rtf	manage	3.5 K	03 Aug 2009 - 15:16	KristenKotter
pdf	uti_evidence_results.pdf	manage	16.5 K	18 Dec 2009 - 16:50	KristenKotter

Topic revision: r199 - 20 Jul 2011, KristenKotter

Main

Department Home Page

Biostatistics Graduate Program

Vanderbilt University Medical Center

Biostatistics Webs
- Archive
- Main
- Sandbox
- System

Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback