Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading

De Andrade, Tiago Luís; De Souza, Rogéria Cristiane Gratão; Babini, Maurizio; Valêncio, Carlos Roberto

Please use this identifier to cite or link to this item: http://acervodigital.unesp.br/handle/11449/72860

Full metadata record

DC Field	Value	Language
dc.contributor.author	De Andrade, Tiago Luís	-
dc.contributor.author	De Souza, Rogéria Cristiane Gratão	-
dc.contributor.author	Babini, Maurizio	-
dc.contributor.author	Valêncio, Carlos Roberto	-
dc.date.accessioned	2014-05-27T11:26:14Z	-
dc.date.accessioned	2016-10-25T18:35:52Z	-
dc.date.available	2014-05-27T11:26:14Z	-
dc.date.available	2016-10-25T18:35:52Z	-
dc.date.issued	2011-12-01	-
dc.identifier	http://dx.doi.org/10.1109/PDCAT.2011.58	-
dc.identifier.citation	Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings, p. 299-304.	-
dc.identifier.uri	http://hdl.handle.net/11449/72860	-
dc.identifier.uri	http://acervodigital.unesp.br/handle/11449/72860	-
dc.description.abstract	Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.	en
dc.format.extent	299-304	-
dc.language.iso	eng	-
dc.source	Scopus	-
dc.subject	Algorithm	-
dc.subject	Data cleansing	-
dc.subject	Duplicated tuples	-
dc.subject	Data cleaning	-
dc.subject	Knowledge discovery in database	-
dc.subject	Missing values	-
dc.subject	Multi-threading	-
dc.subject	Null value	-
dc.subject	Database systems	-
dc.subject	Linguistics	-
dc.subject	Optimization	-
dc.subject	Algorithms	-
dc.title	Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading	en
dc.type	outro	-
dc.contributor.institution	Universidade Estadual Paulista (UNESP)	-
dc.description.affiliation	Depto. de Ciências de Computação e Estatística Universidade Estadual Paulista - Unesp, São José do Rio Preto	-
dc.description.affiliation	Departamento de Letras Modernas Universidade Estadual Paulista - Unesp, São José do Rio Preto	-
dc.description.affiliationUnesp	Depto. de Ciências de Computação e Estatística Universidade Estadual Paulista - Unesp, São José do Rio Preto	-
dc.description.affiliationUnesp	Departamento de Letras Modernas Universidade Estadual Paulista - Unesp, São José do Rio Preto	-
dc.identifier.doi	10.1109/PDCAT.2011.58	-
dc.rights.accessRights	Acesso restrito	-
dc.relation.ispartof	Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings	-
dc.identifier.scopus	2-s2.0-84856660893	-
Appears in Collections:	Artigos, TCCs, Teses e Dissertações da Unesp

There are no files associated with this item.

Show simple item record Show full item record View Statistics