Approximate String Matching

Here is a collection of algorithms that can be used to compare strings.

Levenshtein distance

Using the basic operations of insertion, deletion, and substitution, the distance between two strings can be calculated.

For example: calculate the distance between lunch and lentil

  1. lunch -> lench (substitution)
  2. lench -> lenth (substitution)
  3. lenth -> lenti (substitution)
  4. lenti -> lentil (insertion)

The distance is 4.

Reference: Levenshtein, VI. “Binary Codes with Correction of Deletions, Insertions and Substitution of Symbols.” In Dokl. Akad. Nank. SSSR, 163:845–848, 1965.

-- JeremyStephens - 28 Feb 2013
Edit | Attach | Print version | History: r4 < r3 < r2 < r1 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r1 - 28 Feb 2013, JeremyStephens
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback