Utilize este identificador para referenciar este registo: http://biblioteca.unisced.edu.mz/handle/123456789/2672
Título: Effective and Efficient Similarity Search in Databases
Autores: Lange, Dustin
Palavras-chave: Database
Database System
indexing
algorithmics
Data: 8-nov-2013
Editora: Universitat Postdam
Citação: 117pg
Resumo: With ever-growing amounts of data and the ability and desire to integrate and query more and more databases, there is a need for eficient processing of this data. Traditional relational database systems are built for fast retrieval of data from a large corpus. With SQL and eficient index structures, such as the B+-tree, retrieval of records with exact matches in their attribute values from even very large databases can be implemented with little effort. However, a query may also be inaccurate, as it may contain typing errors or missing values, and also a database record may contain incorrect or incomplete information. In this case, an index that only finds exact matches cannot be used. A traditional database system neither offers the possibility to define what is a similar record, nor does it perform a fast retrieval of those records. The field of research that solves this problem is called similarity search: Given a set of records in a database and a query record, similarity search aims to find all records in the database that are suficiently similar to the query record. This thesis is structured as follows. We begin with an overview of our similarity search system in Chapter 2 before describing the components of the system in detail in the following chapters. Chapter 3 introduces the similarity model used throughout the thesis. We also propose the novel similarity measure for comparing database records that exploits frequencies of values. Chapter 4 contains an introduction to similarity indexes for fast retrieval of similar values given specific similarity measures. We present an index structure for string similarity search, the State Set Index (SSI), and compare the method with previous index structures. For subsequent chapters, we assume that we have created one similarity index for each attribute, and that we have an overall similarity measure composed of attribute-specific measures. In Chapter 5, we then introduce query plans as a means of describing how to access the similarity indexes and how to combine the results. We describe static and query-specific algorithms for selecting query plans based on the criteria result completeness and execution cost. Chapter 6 adds the BSA method for answering top-k queries with similarity indexes by retrieving bulks of IDs of relevant records and combining results into a priority queue. For Chapters 3 to 6, related work is described at the end of each chapter. We conclude the thesis and give an overview on open research questions for future work in Chapter 7.
URI: http://biblioteca.unisced.edu.mz/handle/123456789/2672
Aparece nas colecções:Bancos de dados

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
Effective-and-Efficient-Similarity-Search-in-Databases.pdf8.76 MBAdobe PDFVer/Abrir


Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.