Main Reference PaperRecord Matching over Query Results from Multiple Web Databases, IEEE Transactions on Knowledge and Data Engineering, April 2010
  • Duplicate detection is an important step in data integration and most state-of-the-art methods are based on offline learning techniques, which require training data. In the Web database scenario, where records to match are greatly query-dependent, a pretrained approach is not applicable as the set of records in each query’s results is a biased subset of the full data set. To overcome this problem, This project presented an unsupervised, online approach, UDD, for detecting duplicates over the query results of multiple Web databases.

+ Description
  • Duplicate detection is an important step in data integration and most state-of-the-art methods are based on offline learning techniques, which require training data. In the Web database scenario, where records to match are greatly query-dependent, a pretrained approach is not applicable as the set of records in each query’s results is a biased subset of the full data set. To overcome this problem, This project presented an unsupervised, online approach, UDD, for detecting duplicates over the query results of multiple Web databases.

  • To match the record in database.

  • To improve the efficiency of record matching.

  • To identify the duplicates.

+ Aim & Objectives
  • To match the record in database.

  • To improve the efficiency of record matching.

  • To identify the duplicates.

  • Two classifiers, WCSS and SVM, are used cooperatively in the convergence step of record matching to identify the duplicate pairs from all potential duplicate pairs iteratively. This contributed approach is comparable to previous work that requires training examples foridentifying duplicates from the query results of multiple Web databases.

+ Contribution
  • Two classifiers, WCSS and SVM, are used cooperatively in the convergence step of record matching to identify the duplicate pairs from all potential duplicate pairs iteratively. This contributed approach is comparable to previous work that requires training examples foridentifying duplicates from the query results of multiple Web databases.

  • Java JDK 1.8, MySQL 5.5.40

  • Netbeans 8.0.1, J2SE

+ Software Tools & Technologies
  • Java JDK 1.8, MySQL 5.5.40

  • Netbeans 8.0.1, J2SE

  • B.E / B.Tech / M.E / M.Tech

+ Project Recommended For
  • B.E / B.Tech / M.E / M.Tech

Professional Ethics: We S-Logix would appreciate the students those who willingly contribute with atleast a line of thinking of their own while preparing the project with us. It is advised that the project given by us be considered only as a model project and be applied with confidence to contribute your own ideas through our expert guidance and enrich your knowledge.