SAM Algorithm

The data is $x_{ij}$, i = 1, 2, ... p genes, j = 1, 2, ... n samples, and response data $y_j$, j = 1,2, ... n ($y_j$ may be a vector).

Generic SAM procedure :

  • Compute a statistic : $d_i = \frac{r_i}{s_i+s_0}$; i = 1, 2, ... p
    • $r_i$ is a score.
    • $s_i$ is a standard deviation.
    • $s_0$ is a fudge factor.

Details of $r_i$ and $s_i$ for Two Class, Unpaired Data response type :

  • $r_i = \bar x_i{}_2 - \bar x_i{}_1$
    • $\bar x_i{}_1 = \frac{\sum\limits_{j \in C_1} x_i{}_j}{n_1}$.
    • $\bar x_i{}_2 = \frac{\sum\limits_{j \in C_2} x_i{}_j}{n_2}$.
  • $s_i = \sqrt{(\frac{1}{n_1}+\frac{1}{n_2})\frac{\sum\limits_{j \in C_1}^{} (x_{ij} - \bar x_{i1})^2 + \sum\limits_{j \in C_2}^{} (x_{ij} - \bar x_{i2})^2}{n_1+n_2-2}}$
    • $C_k = \{j : y_j = k\}$; k = 1,2. $y_i$ = 1 or 2.
    • $n_k$ is the number of observations in $C_k$.

Computation of $s_0$ :

  1. Compute the 100 quantiles of the $s_i$ values, denoted by $q_0 < q_1 < q_2 < ... < q_{100}$.
  2. For $\alpha \in$ (0, .01, .02, ... 1.0)
    1. Let $d_i^{\alpha} = \frac{r_i}{s_i+s^{\alpha}}$, where $s^{\alpha}$ be the $\alpha$ percentile of the $s_i$ values.
    2. Compute $v_j = mad(d_i^{\alpha} | s_i \in [q_j, q_{j+1}))$, j = 1, 2, ... 100, where _mad_ is the median absolute deviation from the median, multiplied with 1.4826.
    3. Compute cv($\alpha$) = $\frac{stdev(v_j)}{mean(v_j)}$.
  3. Choose $\hat \alpha$ = argmin[cv($\alpha$)].
  4. Finally compute $\hat s_0$ = $s^{\hat \alpha}$. $s_0$ is henceforth fixed at the value $\hat s_0$.
Edit | Attach | Print version | History: r14 < r13 < r12 < r11 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r12 - 06 Feb 2012, JoanZhang
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback