Entity identification in database integration

Ee Peng Lim; Jaideep Srivastava; Satya Prabhakar; James Richardson

Entity identification in database integration

Ee Peng Lim, Jaideep Srivastava, Satya Prabhakar, James Richardson

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

93 Scopus citations

Abstract

The objective of entity identification is to determine the correspondence between object instances from more than one database. This paper examines the problem at the instance level assuming that schema level heterogeneity has been resolved a priori. Soundness and completeness are defined as the desired properties of any entity identification technique. To achieve soundness, a set of identity and distinctness rules are established for entities in the integrated world. We propose the use of extended key, which is the union of keys (and possibly other attributes) from the relations to be matched, and its corresponding identity rule, to determine the equivalence between tuples from relations which may not share any common key. Instance level functional dependencies (ILFD), a form of semantic constraint information about the real-world entities, are used to derive the missing extended key attribute values of a tuple.

Original language	English (US)
Title of host publication	1993 IEEE 9th International Conference on Data Engineering
Publisher	Publ by IEEE
Pages	294-301
Number of pages	8
ISBN (Print)	0818635703
State	Published - 1993
Event	1993 IEEE 9th International Conference on Data Engineering - Vienna, Austria Duration: Apr 19 1993 → Apr 23 1993

Publication series

Name	Proceedings - International Conference on Data Engineering

Other

Other	1993 IEEE 9th International Conference on Data Engineering
City	Vienna, Austria
Period	4/19/93 → 4/23/93

Bibliographical note

Funding Information:
in part by contract F30602-91-C-0128 from Rome Laboratory of the U.S.

Copyright:
Copyright 2004 Elsevier B.V., All rights reserved.

OpenUrl availability

Full text

Cite this

@inproceedings{7accdabcb8834a65b4b31c4dc7f91dbb,

title = "Entity identification in database integration",

abstract = "The objective of entity identification is to determine the correspondence between object instances from more than one database. This paper examines the problem at the instance level assuming that schema level heterogeneity has been resolved a priori. Soundness and completeness are defined as the desired properties of any entity identification technique. To achieve soundness, a set of identity and distinctness rules are established for entities in the integrated world. We propose the use of extended key, which is the union of keys (and possibly other attributes) from the relations to be matched, and its corresponding identity rule, to determine the equivalence between tuples from relations which may not share any common key. Instance level functional dependencies (ILFD), a form of semantic constraint information about the real-world entities, are used to derive the missing extended key attribute values of a tuple.",

author = "Lim, {Ee Peng} and Jaideep Srivastava and Satya Prabhakar and James Richardson",

note = "Funding Information: in part by contract F30602-91-C-0128 from Rome Laboratory of the U.S. Copyright: Copyright 2004 Elsevier B.V., All rights reserved.; 1993 IEEE 9th International Conference on Data Engineering ; Conference date: 19-04-1993 Through 23-04-1993",

year = "1993",

language = "English (US)",

isbn = "0818635703",

series = "Proceedings - International Conference on Data Engineering",

publisher = "Publ by IEEE",

pages = "294--301",

booktitle = "1993 IEEE 9th International Conference on Data Engineering",

}

TY - GEN

T1 - Entity identification in database integration

AU - Lim, Ee Peng

AU - Srivastava, Jaideep

AU - Prabhakar, Satya

AU - Richardson, James

PY - 1993

Y1 - 1993

N2 - The objective of entity identification is to determine the correspondence between object instances from more than one database. This paper examines the problem at the instance level assuming that schema level heterogeneity has been resolved a priori. Soundness and completeness are defined as the desired properties of any entity identification technique. To achieve soundness, a set of identity and distinctness rules are established for entities in the integrated world. We propose the use of extended key, which is the union of keys (and possibly other attributes) from the relations to be matched, and its corresponding identity rule, to determine the equivalence between tuples from relations which may not share any common key. Instance level functional dependencies (ILFD), a form of semantic constraint information about the real-world entities, are used to derive the missing extended key attribute values of a tuple.

AB - The objective of entity identification is to determine the correspondence between object instances from more than one database. This paper examines the problem at the instance level assuming that schema level heterogeneity has been resolved a priori. Soundness and completeness are defined as the desired properties of any entity identification technique. To achieve soundness, a set of identity and distinctness rules are established for entities in the integrated world. We propose the use of extended key, which is the union of keys (and possibly other attributes) from the relations to be matched, and its corresponding identity rule, to determine the equivalence between tuples from relations which may not share any common key. Instance level functional dependencies (ILFD), a form of semantic constraint information about the real-world entities, are used to derive the missing extended key attribute values of a tuple.

UR - http://www.scopus.com/inward/record.url?scp=0027189241&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027189241&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0027189241

SN - 0818635703

T3 - Proceedings - International Conference on Data Engineering

SP - 294

EP - 301

BT - 1993 IEEE 9th International Conference on Data Engineering

PB - Publ by IEEE

T2 - 1993 IEEE 9th International Conference on Data Engineering

Y2 - 19 April 1993 through 23 April 1993

ER -

Entity identification in database integration

Abstract

Publication series

Other

Bibliographical note

OpenUrl availability

Other files and links

Fingerprint

Cite this