Info Algorithm

Performed on an ordered (ascending) set of numbers and labels of size N. There must be only 2 unique labels (Label 1 and label 2).

Seperate the list into N - 1 grouping sets such that:
Set Left Right
1 $v_1$ $v_2$, $v_3$, ...,$v_N$
2 $v_1$, $v_2$ $v_3$, ...,$v_N$
$\vdots$ $\vdots$ $\vdots$
N-1 $v_1$, $v_2$, $v_3$,...,$v_{N-1}$ $v_N$

For each grouping set generate the folling values:
  • $n_1$: the number of values in group 1
  • $n_2$: the number of values in group 2

  • a: Percent of values on the left side.
  • 1 - a: Percent of values on the right side.

  • b: Percent of left side that is label 1.
  • 1 - b: Percent of left side that is label 2.

  • c: Percent of right side that is label 1.
  • 1 - c: Percent of right side that is label 2.

  • Entropy: $E(x)=-x\frac{\log(x)}{\log(2)}-(1-x)\frac{\log(1-x)}{\log(2)}$, $f(1)=0$, $f(0)=0$

  • Info: $I(x)=(a)E(b) + (1-a)E(c)$

Choose the smallest Info value as the final info value.

Sorting: Sort first by value (ascending), then by labels (ascending).

Duplicate values: Calculate the info_n for the section that has duplicates. Then reorder the section by sorting labels(desc) and calculate the interior of the duplicate section (of size duplicateN 1) again. Do the same for all sections that have duplicates.

Examples:

Example 1

grouping 1 2 1 1 2   ordered 1 1 2 1 2
values   1 8 3 7 5           1 3 5 7 8

5 values so there will be 4 cut points
1          1 2 1 2
a=1/5      b=1/1     c=2/4
entropy(b)= 0        entropy(c)= 1
info = 0.8000

1 1        2 1 2
a=2/5      b=2/2      c=1/3
entropy(b)= 0         entropy(c)= 0.9183
info = 0.5510

1 1 2      1 2
a=3/5      b=2/3      c=1/2
entropy(b)= 0.9183    entropy(c)= 1
info = 0.9510

1 1 2 1    2
a=4/5      b=3/4      c=0/1
entropy(b)= 0.8113    entropy(c)= 0
info = .6490

The smallest value is 0.5510

Example 2

grouping 1 2 2 1 1   ordered 1 2 1 1 2
values   1 8 5 8 8           1 5 8 8 8

5 values so there will be 4 cut points

1          2 1 1 2
1          5 8 8 8
a=1/5      b=1/1      c=2/4
entropy(b)= 0         entropy(c)= 1
info =0.8

-- Info won't change if we reorder the the duplicate by group (desc)
1 2        1 1 2
1 5        8 8 8
a=2/5      b=1/2      c=2/3
entropy(b)= 1         entropy(c)= 0.9183
info = 0.9510

-- Duplicate section first run
1 2 1      1 2
1 5 8      8 8
a=3/5      b=2/3      c=1/2
entropy(b)= 0.9183    entropy(c)= 1
info = 0.9510

1 2 1 1    2
1 5 8 8    8
a=4/5      b=3/4      c=0/1
entropy(b)= 0.8113    entropy(c)= 0
info = .6490

-- Duplicate section second run. Reordered by group(desc)
1 2 2      1 1
1 5 8      8 8
a=3/5      b=1/3      c=2/2
entropy(b)= 0.9183    entropy(c)= 0
info = 0.5510

1 2 2 1    1
1 5 8 8    8
a=4/5      b=2/4      c=1/1
entropy(b)= 1         entropy(c)= 0
info = 0.8000

The smallest value is 0.5510

Topic revision: r4 - 21 Jan 2005, WillGray
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback