Info Algorithm

Performed on an ordered (ascending) set of numbers and labels of size N. There must be only 2 unique labels (Label 1 and label 2).

Seperate the list into N - 1 grouping sets such that:
Set Left Right
1 $v_1$ $v_2$, $v_3$, ...,$v_N$
2 $v_1$, $v_2$ $v_3$, ...,$v_N$
$\vdots$ $\vdots$ $\vdots$
N-1 $v_1$, $v_2$, $v_3$,...,$v_{N-1}$ $v_N$

For each grouping set generate the folling values:
  • $n_1$: the number of values in group 1
  • $n_2$: the number of values in group 2

  • a: Percent of values on the left side.
  • 1 - a: Percent of values on the right side.

  • b: Percent of left side that is label 1.
  • 1 - b: Percent of left side that is label 2.

  • c: Percent of right side that is label 1.
  • 1 - c: Percent of right side that is label 2.

  • Entropy: $E(x)=-x\frac{\log(x)}{\log(2)}-(1-x)\frac{\log(1-x)}{\log(2)}$, $f(1)=0$, $f(0)=0$

  • Info: $I(x)=(a)E(b) + (1-a)E(c)$

Choose the smallest Info value as the final info value.

Sorting: Sort first by value (ascending), then by labels (ascending).

Duplicate values: Calculate the info_n for the section that has duplicates. Then reorder the section by sorting labels(desc) and calculate the interior of the duplicate section (of size duplicateN – 1) again. Do the same for all sections that have duplicates.

Examples:

Example 1

grouping 1 2 1 1 2   ordered 1 1 2 1 2
values   1 8 3 7 5           1 3 5 7 8

5 values so there will be 4 cut points
1          1 2 1 2
a=1/5      b=1/1     c=2/4
entropy(b)= 0        entropy(c)= 1
info = 0.8000

1 1        2 1 2
a=2/5      b=2/2      c=1/3
entropy(b)= 0         entropy(c)= 0.9183
info = 0.5510

1 1 2      1 2
a=3/5      b=2/3      c=1/2
entropy(b)= 0.9183    entropy(c)= 1
info = 0.9510

1 1 2 1    2
a=4/5      b=3/4      c=0/1
entropy(b)= 0.8113    entropy(c)= 0
info = .6490

The smallest value is 0.5510

Example 2

grouping 1 2 2 1 1   ordered 1 2 1 1 2
values   1 8 5 8 8           1 5 8 8 8

5 values so there will be 4 cut points

1          2 1 1 2
1          5 8 8 8
a=1/5      b=1/1      c=2/4
entropy(b)= 0         entropy(c)= 1
info =0.8

-- Info won't change if we reorder the the duplicate by group (desc)
1 2        1 1 2
1 5        8 8 8
a=2/5      b=1/2      c=2/3
entropy(b)= 1         entropy(c)= 0.9183
info = 0.9510

-- Duplicate section first run
1 2 1      1 2
1 5 8      8 8
a=3/5      b=2/3      c=1/2
entropy(b)= 0.9183    entropy(c)= 1
info = 0.9510

1 2 1 1    2
1 5 8 8    8
a=4/5      b=3/4      c=0/1
entropy(b)= 0.8113    entropy(c)= 0
info = .6490

-- Duplicate section second run. Reordered by group(desc)
1 2 2      1 1
1 5 8      8 8
a=3/5      b=1/3      c=2/2
entropy(b)= 0.9183    entropy(c)= 0
info = 0.5510

1 2 2 1    1
1 5 8 8    8
a=4/5      b=2/4      c=1/1
entropy(b)= 1         entropy(c)= 0
info = 0.8000

The smallest value is 0.5510

Topic revision: r4 - 21 Jan 2005, WillGray
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback