SAM Algorithm

The data is $x_{ij}$, i = 1, 2, ... p genes, j = 1, 2, ... n samples, and response data $y_j$, j = 1,2, ... n ($y_j$ may be a vector).

Generic SAM procedure :

  • Compute a statistic : $d_i = \frac{r_i}{s_i+s_0}$; i = 1, 2, ... p
    • $r_i$ is a score.
    • $s_i$ is a standard deviation.
    • $s_0$ is a fudge factor.

Details of $r_i$ and $s_i$ for Two Class, Unpaired Data response type :

  • $r_i = \bar x_i{}_2 - \bar x_i{}_1$
    • $\bar x_i{}_1 = \sum\limits_{j \in C_1} x_i{}_j/n_1$.
    • $\bar x_i{}_2 = \sum\limits_{j \in C_2} x_i{}_j/n_2$.
  • $s_i = [(1/n_1+1/n_2)\{\sum\limits_{j \in C_1}^{} (x_{ij} - \bar x_{i1})^2 + \sum\limits_{j \in C_2}^{} (x_{ij} - \bar x_{i2})^2\}/(n_1+n_2-2)]^{.5}$
    • $C_k = \{j : y_j = k\}$; k = 1,2. $y_i$ = 1 or 2.
    • $n_k$ is the number of observations in $C_k$.

Computation of $s_0$ :

  1. Compute the 100 quantiles of the $s_i$ values, denoted by $q_0 < q_1 < q_2 < ... < q_{100}$.
  2. For $\alpha \in$ (0, .01, .02, ... 1.0)
    1. Let $d_i^{\alpha} = r_i/(s_i+s^{\alpha})$, where $s^{\alpha}$ be the $\alpha$ percentile of the $s_i$ values.
    2. Compute $v_j = mad(d_i^{\alpha} | s_i \in [q_j, q_{j+1}))$, j = 1, 2, ... 100, where _mad_ is the median absolute deviation from the median, multiplied with 1.4826.
    3. Compute cv($\alpha$) = $\frac{stdev(v_j)}{mean(v_j)}$.
  3. Choose $\hat \alpha$ = argmin[cv($\alpha$)].
  4. Finally compute $\hat s_0$ = $s^{\hat \alpha}$. $s_0$ is henceforth fixed at the value $\hat s_0$.
Edit | Attach | Print version | History: r14 | r12 < r11 < r10 < r9 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r11 - 13 Sep 2005, ColeBeck
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback