In this paper we tackle the problem of answering SPARQL queries over virtually integrated databases. We assume that the entity resolution problem has already been solved and explicit information is available about which records in the different databases refer to the same real world entity. Surprisingly, to the best of our knowledge, there has been no attempt to extend the standard Ontology-Based Data Access (OBDA) setting to take into account these DB links for SPARQL query-answering and consistency checking. This is partly because the OWL built-in owl:sameAs property, the most natural representation of links between data sets, is not included in OWL 2 QL, the de facto ontology language for OBDA. We formally treat several fundamental questions in this context: how links over database identifiers can be represented in terms of owl:sameAs statements, how to recover rewritability of SPARQL into SQL (lost because of owl:sameAs statements), and how to check consistency. Moreover, we investigate how our solution can be made to scale up to large enterprise datasets. We have implemented the approach, and carried out an extensive set of experiments showing its scalability.
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-25007-6_12