CONSchema: Schema matching with semantics and constraints Open Access

Wu, Kevin (Spring 2023)

Permanent URL: https://etd.library.emory.edu/concern/etds/12579t714?locale=en
Published

Abstract

Schema matching aims to establish the correspondence between the attributes of database schemas. It has been regarded as the most difficult and crucial stage in the development of many contemporary database and web semantic systems. Manual mapping is a lengthy and laborious process, yet a low-quality algorithmic matcher may cause more trouble. Moreover, the issue of data privacy in certain domains, such as healthcare, poses further challenges, as the use of instance-level data should be avoided to prevent the leakage of sensitive information. To address this issue, we propose CONSchema, a model that combines both the textual attribute description and constraints of the schemas to learn a better matcher. We also propose a new experimental setting to assess the practical performance of schema matching models. Our results on 6 benchmark datasets across various domains including healthcare and movies demonstrate the robustness of CONSchema. 

Table of Contents

1 Introduction 1

1.1 Motivation.................................

1 1.2 RelatedWork............................... 4

2 ConSchema 7

2.1 ProblemStatement............................ 7

2.2 Model ................................... 8 2. 2.1 Textualsimilarityembedding .................. 8 2.2.2 Constraintencoding ....................... 8

3 Experiment Setting 13

3.1 Datasets.................................. 14 3.2 BaselineMethods............................. 16

4 Results 17

4.1 RandomPartitionEvaluation ...................... 17

4.2 UnseenPartitionEvaluation....................... 18 4.2.1 CONSchema-RF ......................... 18 4.2.2 CONSchema-MLP ........................ 20

4.3 ExplainingCONSchemaMatchingDecisions . . . . . . . . . . . . . . 21

4.4 CMSCaseStudy ............................. 25

4.5 PrecisionRecallAnalysis......................... 26

5 Conclusion 

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files