Meaningful keyword search in relational databases with large and complex schema


Keyword search over relational databases offers an alternative way to SQL to query and explore databases that is effective for lay users who may not be well versed in SQL or the database schema. This becomes more pertinent for databases with large and complex schemas. An answer in this context is a join tree spanning tuples containing the query’s keywords. As there are potentially many answers to the query, and the user is often only interested in seeing the top-k answers, how to rank the answers based on their relevance is of paramount importance. We focus on the relevance of join as the fundamental means to rank answers.

We devise means to measure relevance of relations and foreign keys in the schema over the information content of the database. This can be done offline with no need for external models. We compare the proposed measures against a gold standard we derive from a real workload over TPC-E and evaluate the effectiveness of our methods. Finally, we test the performance of our measures against existing techniques to demonstrate a marked improvement, and perform a user study to establish naturalness of the ranking of the answers.