Data Quality

The popularity and growth of the World Wide Web have dramatically increased the number of information sources available for use and thus the opportunity for important new information-intensive applications such as massive data warehouses, integrated supply chain management, global risk management, and in-transit visibility. Unfortunately, in order for this opportunity to be realized, we must overcome significant challenges with regard to data extraction and data interpretation.
The data extraction problem refers to the difficulty in extracting, easily and automatically, very specific data elements from Web sites for use by operational systems. New technologies, such as XML and Web Querying/Wrapping, offer possible solutions to this problem. The data interpretation problem refers to the existence of heterogeneous contexts, whereby each source of information and potential receiver of that information may operate with a different context, leading to large-scale semantic heterogeneity.
As can be readily seen, the data extraction and data interpretation problems are akin to the accessibility and interpretability data quality dimensions, respectively. In this chapter, examples of important context challenges will be presented and the critical role of metadata, in the form of context knowledge, will be discussed.
[7] 1999 IEEE. Reprinted from the Proceedings of 1999 IEEE Meta-Data Conference, Stuart Madnick, "Metadata Jones and the Tower of Babel: The Challenge of Large-Scale Semantic Heterogeneity."
A context is the collection of implicit assumptions about the context definition (i.e., meaning) and context characteristics (i.e., quality) of the information.