Monthly Archives: September 2014

Data Preparation – Normalization Subsystem – Clustering using Tokens

Continuing on the subject related to clustering text to facilitate normalizing a data set in this blog post I will examine clustering using tokens. Token based clustering uses  tokens to evaluate similarity between two string and determine membership into a cluster. The … Continue reading

Data preparation – Normalization subsystem – Clustering Text using Distance Methods

Continuing from my previous blog (Data preparation – Normalization subsystem – Clustering Text using Fingerprinting) in this blog I will examine the distance approach a.k.a nearest neighbor  to clustering text strings. The distance approach to clustering provides better flexibility in finding … Continue reading

