QDQ Project Meetings

28Oct10

Notes by: Afshartous

Question development: The proposed 10-point scale should be fine. Better to start with a finer scale instead of coarser scale, as if necessary one can transform from finer to coarser but not vice versa. Better for classification algorithm also since allows more flexibility in the partitioning schemes. Question wording should be framed as objectively as possible to avoid potential bias in response and best represent underlying characteristic of item.

Number of items per characteristic and disease: Minimum would be 8 if there was one unique characteristic per disease. Suppose there are K characteristics per disease. Could possibly have more than one question per characteristic for some diseases. E.g., suppose one of the diseases is 80% of the population. By having several questions for each of the characteristics of this disease, we reduce the probability of falsely classifying a subject that should be in this dominant category. On the other hand, rare diseases might have certain characteristics that differentiate them from the more dominant diseases (is this true?) and thus not require as many questions to identify and classify. This issue is also affected by the loss function employed, e.g., how costly are the various mis-classifications? Which are the diseases that we do not want to miss? Perhaps these should have more questions directed towards their defining characteristics. Need to discuss this further once we have a better idea of the population distribution and the loss functions.

Data reduction: Current proposal discusses the development of the alpha QDQ and use of factor analysis on the pilot data. This step reduced the number of questions from 39 to 34. In terms of using the QDQ for classification, however, this step is not really necessary. The classification algorithms will do no better (and potentially worse) with the reduced number of questions as it will be based on less data. If the overall length of the questionnaire is not changed that much, probably not worth the effort.

Machine learning techniques: various methods are available for the multilcass classification problem. These include linear discriminant analysis (LDA), support vector machines (SVM), random forests (RF), and perhaps some Bayesian and EM based methods. Rifkin (J of Machine Learning Res, 2004, 5:101-141) argues that performance in the multiclass problem using one-versus-all (OVA) methods is comparable to more complex approaches. Possible R software package that can be leveraged is pamr (prediction analysis for microarrays). We can propose several of these methods and assess performance on the data.

26Oct10

Attending: Jacobson, McCaslin, Afshartous

Discussed overall project and next steps:

Afshartous to investigate options for item scales that minimize nonresponse and are better for classification algorithms.

Next, Jacobson & McCaslin to define characteristics (questions) that differentiate the 8 disease classes. Some of these will be shared by more than one disease.

Jacobson & McCaslin to provide information on population distribution of the 8 disease classes.

Need to determine number of items directed towards each characteristic and disease.

Need to determine loss function for missclassifications.

Once questions are formed, they will be given to dizziness subjects. Should they also be given to nondizzy control subjects? Perhaps useful to form baseline for each measure and identify questions that can be discarded. Note: that even if distribution of question is similar across dizzy and control subjects, question may still be useful in terms of differentiation among dizzy subjects.

Other topics discussed: ordering of questions; consistency in manner in which questions answered, e.g., with clinician present or not, possible touch screen.

-- DavidAfshartous - 26 Oct 2010
Topic revision: r2 - 28 Oct 2010, DavidAfshartous
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback