Utilize este identificador para referenciar este registo: http://biblioteca.unisced.edu.mz/handle/123456789/2671
Título: Quantitative Data Cleaning for Large Databases
Autores: Hellerstein, Joseph M.
Palavras-chave: Database
Database System
Analysis
data structures
Data: 15-out-2013
Editora: UC Berkeley
Citação: 42pg
Resumo: Data collection has become a ubiquitous function of large organizations { not only for record keeping, but to support a variety of data analysis tasks that are critical to the organizational mission. Data analysis typically drives decision-making processes and eficiency optimizations, and in an increasing number of settings is the raison d'etre of entire agencies or firms. Despite the importance of data collection and analysis, data quality remains a pervasive and thorny problem in almost every large organization. The presence of incorrect or inconsistent data can significantly distort the results of analyses, often negating the potential benefits of information-driven approaches. As a result, there has been a variety of research over the last decades on various aspects of data cleaning: computational procedures to automatically or semi-automatically identify { and, when possible, correct { errors in large data sets. In this report, we survey data cleaning methods that focus on errors in quantitative attributes of large databases, though we also provide references to data cleaning methods for other types of attributes. The discussion is targeted at computer practitioners who manage large databases of quantitative information, and designers developing data entry and auditing tools for end users. Because of our focus on quantitative data, we take a statistical view of data quality, with an emphasis on intuitive outlier detection and exploratory data analysis methods based in robust statistics. In addition, we stress algorithms and implementations that can be easily and eficiently implemented in very large databases, and which are easy to understand and visualize graphically. The discussion mixes statistical intuitions and methods, algorithmic building blocks, eficient relational database implementation strategies, and user interface considerations. Throughout the discussion, references are provided for deeper reading on all of these issues.
URI: http://biblioteca.unisced.edu.mz/handle/123456789/2671
Aparece nas colecções:Bancos de dados

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
Quantitative-Data-Cleaning-for-Large-Databases.pdf861.3 kBAdobe PDFVer/Abrir


Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.