Features of the DataCleanser

Declarative Rule Programming Module
Rules of Equivalence that determine when two different records should be linked.

("SAL STOLFO" and "SAL STOLPHO" --> the same person!)

Flexible "Key Extraction" from your data
Quick Scanning by ordering data to localize similar records that are likely to match.

Key("SAL STOLFO") = "SLSTL"

Key("SAL STOLPHO") = "SLSTL"

Multiple Passes to boost accuracy
Reordering data and Rescanning identifies more matching records.

Incremental Processing
Efficiently incorporates new data into a previously cleaned dataset.

Parallel and Distributed Processing
Scalable Implementation for Data Warehouse-sized problems.


The DataCleanser is

Scalable
Extensible
Accurate
Efficicent