Abstract
Linked open data provides a powerful way to publish data on the Web. However, most publishers still choose to publish their data in tabular formats. Whereas tools for transforming tabular data to linked data are useful, they are still immature and do not provide enough support to address this issue. Firstly, the process of tabular data cleaning is complex and involves using a wide variety of operations on tuples. Nevertheless, there are a number of common operations that users tend to need in certain situations, but that information is not taken advantage of. Furthermore, the process of transforming data to RDF is not trivial in itself – it involves RDF mapping, which requires a deep knowledge and prior research of relevant ontologies. However, few tools currently provide functions to facilitate RDF mapping by providing relevant information for ontologies, making it an intricate process. Even when data are successfully mapped, due to the process being largely manual, and, thus, error-prone, there are very few facilities for validating the produced mappings. This thesis aims to address the aforementioned issues by identifying relevant algorithms, tools and methodologies, and applying them in the context of linked data transformation. Firstly, we propose a methodology for providing suggestions for data cleaning operations based on measurements of their use in given contexts. Secondly, we apply an existing algorithm for analysing the content of a table for suggesting appropriate RDF annotations. Furthermore, we describe facilities to help effectively manage RDF ontologies and, based on the constraints described within the ontologies, also validate RDF mappings. As a proof-of-concept, this thesis provides a working prototype of the aforementioned functionalities, organised as a web service. The prototype has been partially integrated in the live version of the DataGraft platform.