Article published in the TKDE
Tuesday, 10 Jul 2012The researcher Moises Carvalho had his article “A Genetic Programming Approach to Record Deduplication” published in the Transactions on Knowledge and Data Engineering, one of the most respected and prestigious journals of the IEEE Computer Society. It is one of the leading exponents jobs in the area of management of data and knowledge in the computing field. The article presents a proposal to solve an old database problem – the identification of duplicate records.
Very often, databases receive duplicate information. This is happens due to problems on identifying information on the data collector, similar inputs, incorrect or unreliable passwords and, especially, when there is the integration of separate information associated with the same user.
This kind of inconsistency in the management of duplicate information not only increases the maintenance costs, but also take longer to process queries.
The solution proposed by Moises defines a way to clean the database through a function that can identify these duplicate entries. Once the repeated data is identified , you can make the databases more streamlined and concise - without losing information. Moises proposes the use of a technique called Genetic Programming to generate and properly configure this function that identifies the replicas.
The article can be accessed on the link below:
http://www.computer.org/csdl/trans/tk/2012/03/ttk2012030399-abs.html
News, Press