Electronic Digital Documents, Inc.
80 Kenilworth Rd, Ridgewood, NJ 07450 - 212-444-3730 - edd@npsa.com

Electronic Digital Documents, Inc.(EDD)


80 KENILWORTH ROAD, RIDGEWOOD NJ 07450
201-444-3730 - EDD@NPSA.COM

PRESS RELEASE

ELECTRONIC DIGITAL DOCUMENTS, INC. SHIPPING DATACLEANSER(R) DATABLADE(R) FOR INFORMIX(R)-UNIVERSAL SERVER


Market Leading Data Integrity Solution is an Informix DataBlade.

New York, NY (December 3, 1996) - Electronic Digital Documents, Inc. (EDD) today announced it is shipping a new Datablade, called the DataCleanser DataBlade. The new DataBlade module will plug reliable data analysis capabilities directly into INFORMIX-Universal Server, allowing users to immediately "clean" their data sets of unwanted duplication and errors. Clean data improves information accuracy, allowing commercial organizations to make more precise, informed decisions and reduce maintenance and support costs. As a member of the DataBlade Developers Program, EDD is demonstrating their technology in the Informix Partner Pavilion at the DB/EXPO Conference in New York City this week.

"We're very excited about DataBlade technology, because it truly redefines the rules of information management,", said Salvatore Stolfo, chief scientific advisor of EDD. "Never before have third parties been able to extend the core functionality of the database. With DataBlade technology, we can easily encapsulate our technology into a module that transfers all of the intelligence and knowledge of the DataCleanser technology to the database, immediately providing Informix customers with access to our sophisticated data integrity solution."

"Customers are continually searching for new and better ways to preserve the integrity of their data, and EDD's DataCleanser DataBlade module will provide them with the data integrity solution they have been searching for in the form of a DataBlade module that is fully integrated into their overall information management system." said David Cope, general manager, DataBlade Business Development Unit, Informix Software, Inc.

About DataCleanser Technology

DataCleanser technology provides a convenient way to identify redundant items within a database through an easy-to-use intelligent matching process. This process of identifying common information and cleaning a data set is called DataCleansing. Different industries know this problem by a variety of names including merge/purge, de-dupe and record linking. The DataCleanser DataBlade module accomplishes the task of cleaning data sets efficiently over a single large database as well as a collection of heterogeneous databases gathered from different sources. For example, several lists of names of potential customers in a direct-marketing application gathered from credit bureaus, magazine subscription lists, and other sources can be easily and efficiently merged into one set of names that uniquely identifies an individual customer. Some example applications include:

  • Mailing Lists for Retail Customers: Remove unwanted duplicate entries that needlessly increase postage costs.
  • Credit card solicitations: Remove multiple offerings to the same individual.
  • Fraud Reduction for Insurance Companies: Detect repetitive and unwanted transactions.
  • Logistical support and Parts Procurement for Government Agencies: Identify common parts from multiple vendors at the best price.
  • Data Mining and Statistical Analysis by Corporate Planners: Identify and remove duplicate frequently occurring information that leads to faulty or misleading market analyses.

    The naive way of cleaning data is to compute a Duplicate Elimination Sort or run a full comparison of each record to every other. These strategies, however, assume a total ordering of the data exists and the data is not corrupted. In many real world data sets we cannot assume there exists a total ordering, nor a simple duplicate-matching process, meaning even slight errors in the data imply all possible ``matches'' of common data about the same entity may not be found.

    In many situations, the size of the data sets involved may be so large that it is not feasible to compare each record to every other in the database. This is where the DataCleanser's patented architecture does especially well. Data is first cleansed using one particular scan of the data set. The flexible Key Extraction function of the DataCleanser provides quicker scanning of your data by localizing similar records that are likely to match. Another pass over the data set using a different scanning procedure produces yet another set of results. Reordering data and re-scanning identifies more matching records. When the results of the two passes are merged together, the DataCleanser produces very accurate final results, much better than previously possible. Determining that two records are equivalent may not be specified easily as a simple arithmetic predicate, but rather by a set of rules. The user-specified rules determine when two distinct records provide information about the same entity. The DataCleanser provides a high-level declarative rule-based knowledge base that includes some handy primitive match functions that can be used directly in rules. This is a very important benefit to organizations that work under strict time constraints and have precious little time to experiment with alternative matching criteria specified in low level languages.

    The DataCleanser Rule-Programming module is easy to program and quite good at finding duplicates. A client-side GUI guides the user through the process of defining their rules and tailoring their cleansing process. The GUI controls the following components.

  • Rule editor. Rules can be expressed in simple IF-THEN format to compare two records at a time. External C/C++ functions can be used to implement complex comparison operations between record fields.
  • Rule Compiler. A rule compiler translates rules into a language understood by the server-side DataCleanser Engine.
  • A library of predicate functions. A library of pre-compiled and commonly used functions to compare different data types is provided.
  • Tuning Parameters. Forms, menus, and buttons guide the user entering several of the parameters needed by the DataCleanser.
  • Debugging mode support. The entire DataCleanser application can be compiled in debugging mode to facilitate testing the effect of different rules. Availability

    EDD is shipping its DataCleanser technology with the release of the INFORMIX-Universal Server .

    About Electronic Digital Documents, Inc

    EDD is a leader in the Database Analysis marketplace. Dr. Stolfo, EDD's chief scientific advisor, has developed several widely deployed database analysis applications in the areas of Telecommunications and Financial Analysis. EDD's patented DataCleanser technology has been rigorously evaluated by the Child Welfare Department of the State of Washington and has been demonstrated to be highly accurate and efficient over stressful database cleaning application. More information about EDD's DataCleanser technology is available at the World Wide Web site http://www.npsa.com/edd.

    About Informix

    Informix Software, based in Menlo Park, Calif., provides innovative database technology that enables the world's leading corporations to manage and grow their business. Informix is widely recognized as the technology leader for corporate computing environments ranging from workgroups to very large OLTP and data warehouse applications. Informix's database servers, application development tools, superior customer service, and strong partnerships enable the company to be at the forefront of many leading-edge information technology solution areas. With the acquisition of Illustra Information Technology's completely extensible database technology, Informix is now positioned as the first information management company capable of meeting the market's exploding need for a sophisticated database engine that combines enterprise scalability, robustness and parallel processing with the ability to store, retrieve, manage and manipulate virtually any kind of rich content data. More information about Informix is available via the World Wide Web at http://www.informix.com and http://www.illustra.com.

    About INFORMIX-Universal Server

    INFORMIX-Universal Server will change the way companies store, retrieve, manage, and use their information assets. It will allow customers to develop solutions for the World Wide Web, financial markets, data warehousing and other evolving complex data markets, by integrating core business systems and traditional data with new, emerging data types such as times series, geospatial, multimedia, HTML and more.

    About DataBlade Technology

    A DataBlade module is a software component that extends the relational database to manage new kinds of data. DataBlade modules add domain-specific expertise and key functionality required for native support of specific data types. DataBlade modules plug-in directly to the database, making the newly defined types and functions first-class citizens in the database. Informix customers may use a single DataBlade module or a number of DataBlade modules simultaneously to create a unique, integrated information management solution customized for their business needs. Customers can choose from a growing portfolio of packaged DataBlade modules from Informix or third parties, or define and create their own with the DataBlade Developers Kit.


    INFORMIX, INFORMIX-UNIVERSAL SERVER and DATABLADE are registered trademarks of Informix Software, Inc. DATACLEANSER is a registered trademark of Electronic Digital Documents, Inc.