Advancing the discovery of unique column combinations (Technische Berichte des Hasso-Plattner-Instituts für Softwaresystemtechnik ; 51)

نویسندگان

  • Ziawasch Abedjan
  • Felix Naumann
چکیده

Unique column combinations of a relational database table are sets of columns that contain only unique values. Discovering such combinations is a fundamental research problem and has many different data management and knowledge discovery applications. Existing discovery algorithms are either brute force or have a high memory load and can thus be applied only to small datasets or samples. In this paper, the wellknown Gordian algorithm [19] and “Apriori-based” algorithms [5] are compared and analyzed for further optimization. We greatly improve the Apriori algorithms through efficient candidate generation and statisticsbased pruning methods. A hybrid solution HCA-Gordian combines the advantages of Gordian and our new algorithm HCA, and it significantly outperforms all previous work in many situations. 1 Unique Column Combinations Unique column combinations are sets of columns of a relational database table that fulfill the uniqueness constraint. Uniqueness of a column combination K within a table can be defined as follows: Definition 1 Given a relational database schema R = {C1, C2, . . . , Cm} with columns Ci and an instance r ⊆ C1 × . . . × Cm, a column combination K ⊆ R is a unique, iff ∀t1, t2 ∈ r : (t1 = t2) ⇒ (t1[K] = t2[K]) Discovered uniques are good candidates for primary keys of a table. Therefore some literature refers to them as as “candidate keys” [17]. The term “composite key” is also used to highlight the fact that they comprise multiple columns [19]. We want to stress that the detection of uniques is a problem that can be solved exactly, while the detection of keys can only be solved heuristically. Uniqueness is a necessary precondition for a key, but only a human expert can “promote” a unique to a key, because uniques can appear by coincidence for a certain state of the data. In contrast, keys are consciously specified and denote a schema constraint. An important property of uniques and keys is their minimality. Minimal uniques are uniques of which no strict subsets hold the uniqueness property:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web-based development in the lively kernel (Technische Berichte des Hasso-Plattner-Instituts für Softwaresystemtechnik an der Universität Potsdam ; 53)

The World Wide Web as an application platform becomes increasingly important. However, the development of Web applications is often more complex than for the desktop. Web-based development environments like Lively Webwerkstatt can mitigate this problem by making the development process more interactive and direct. By moving the development environment into the Web, applications can be developed...

متن کامل

The effect of tangible media on individuals in business process modeling : a controlled experiment (Technische Berichte des Hasso-Plattner-Instituts fürSoftwaresystemtechnik an der Universität Potsdam ; 41)

In current practice, business processes modeling is done by trained method experts. Domain experts are interviewed to elicit their process information but not involved in modeling. We created a haptic toolkit for process modeling that can be used in process elicitation sessions with domain experts. We hypothesize that this leads to more effective process elicitation. This paper brakes down ”eff...

متن کامل

Design Thinking - Nur Hype oder auch Chance für die UX Community?

Design Thinking erlebt in den letzten Jahren einen Hype. Dabei ist Design Thinking nicht einfach zu definieren, so das Ergebnis einer neueren Studie des Hasso Plattner Instituts mit dem Titel „Parts without a whole – The Current State of Design Thinking Practice in Organizations“ (Schmiedgen et al. 2015, S. 49). Nach einer Definition gefragt, nennen die Studienteilnehmer Begriffe wie Toolbox, P...

متن کامل

Late-Materialization using Sort-merge Join Algorithm

1. Daniel J. Abadi ,Daniel S. Myers, David J. DeWitt, Samuel R. Madden. Materialization Strategies in a Column-Oriented DBMS. Proceedings of ICDE 2007, Istanbul, Turkey. 2. Daniel J. Abadi, Samuel R. Madden, Nabil hachem. Column-Store vs Row-store How different are they really? SIGMOD’08, June 9–12, 2008, Vancouver, BC, Canada 3. Daniel Abadi,PeterBoncz,Stavros Harizopoulos, Stratos Idreos, Sam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011