In one study it was reported that there is 500 times as much data in the hidden or deep web, then there is in crawlable, indexable web pages. Most of this data is stored in relational databases. Hence, the Semantic Web’s promise of Web-scale data integration will only live up to its true promise with the inclusion of legacy relational database management systems (RDBMS). Two new, interrelated and complementary W3C standards, “Direct Mapping of Relational Data to RDF” and “R2RML: RDB to RDF Mapping Language”, are poised to enable this data into the web of Linked Data. This tutorial will introduce, contrast and explain the preferred solution organizations embodies by these two standards. In particular, the suitability of each approach will be characterized by both architectural concerns of the larger system, and the skill sets that exist in the supporting organizations. Architectural aspects include comparing ‘Extract, Transform, Load’ (ETL) of relational data to RDF into a large cached, or warehoused triplestore versus loosely coupled, distributed linked-data by answering SPARQL queries against a live relational database. Organizational issues concern the scope of in-house ontology development and accompanying Semantic Web components. The material will be organized around case studies spanning reference architectures suggested by a number of vendors and open-source contributors. The tutorial will also include a hands-on session.
This tutorial will introduce participants to RDB2RDF. We will start out with the historical context and then present an overview of the new W3C RDB2RDF standards: Direct Mapping and R2RML and how they complement each other. We will continue to present recent scientific results and then present an overview of different RDB2RDF tools, such as D2R, Karma, SparqlMap and Ultrawrap. In the afternoon, we will share several case studies of integrating relational databases in semantic systems. We will present the MusicBrainz’s case study, where RDB2RDF tools have been evaluated in order to ETL relational data to RDF and load it into OWLIM. Additionally, we will present the case study of rCAD, an RNA database that has been mapped to the Gene Ontology and RNA Ontology in order to support semantic search.
Throughout the tutorial, participants will be able to use Ultrawrap to run examples for the hands-on session, and OWLIM to gain practical experience in the use of R2RML for ETL into a semantic database. We will provide a virtual machine which will include an RDBMS, Ultrawrap, OWLIM and tutorial examples. Participants are also free to use any other tool they want.
The objectives of the tutorial are the following:
- present an overview of the W3C Direct Mapping and R2RML standards;
- summarize scientific results on RDB2RDF;
- communicate case studies of RDBMS in Linked Data systems;
- provide pratical experience in two scenarios (dynamic querying and ETL) for providing SPARQL capabilities over real-world data.
The intended audience of this tutorial include web developers who want to integrate their SQL-backed website into the Semantic Web, researchers who want to learn about recent scientific results in RDB2RDF, and practitioners and representatives of governments and funding agencies who want to learn about new web technologies and web standards.
We expect that participants have at least a passing knowledgable of relational databases and SQL, and to be interested in learning how to integrate their relational databases with the Semantic Web. We also expect participants to have a working understanding of RDF, OWL and SPARQL.