RDF aims at being the universal abstract data model for structured data on the Web. While there is effort to convert data in RDF, the vast majority of data available on the Web does not conform to RDF. Indeed, exposing data in RDF, either natively or through wrappers, can be very costly. Furthermore, in the emerging Web of Things, resource constraints of devices prevent from processing RDF graphs. Hence one cannot expect that all the data on the Web be available as RDF anytime soon.
SPARQL-Generate is an extension of SPARQL for querying not only RDF datasets but also documents in arbitrary formats.
SPARQL-Generate has a first reference implementation on top of Apache Jena, which currently enables to query and transform web documents in XML, JSON, CSV, HTML, CBOR, and plain text with regular expressions.
To cite our work:
Maxime Lefrançois, Antoine Zimmermann, Noorani Bakerally A SPARQL extension for generating RDF from heterogeneous formats, In Proc. Extended Semantic Web Conference, ESWC, May 2017, Portoroz, Slovenia (long paper - PDF - BibTeX)
Maxime Lefrançois, Antoine Zimmermann, Noorani Bakerally Flexible RDF generation from RDF and heterogeneous data sources with SPARQL-Generate, In Proc. the 20th International Conference on Knowledge Engineering and Knowledge Management, EKAW, Nov 2016, Bologna, Italy (demo track - PDF - BibTeX)
Maxime Lefrançois, Antoine Zimmermann, Noorani Bakerally Génération de RDF à partir de sources de données aux formats hétérogènes, Actes de la 17ème conférence Extraction et Gestion des Connaissances, EGC, Jan 2017, Grenoble, France - (PDF - BibTeX)
SPARQL-Generate is an extension of SPARQL 1.1 for querying not only RDF datasets but also documents in arbitrary formats. It offers a simple template-based option to generate RDF Graphs from documents, and presents the following advantages:
Since we leverage the expressiveness of SPARQL and its function extension mechanism, its implementation on top of a SPARQL engine is straightforward. This website describes a first implementation over Apache Jena, which currently enables to query and transform web documents in XML, JSON, CSV, HTML, CBOR, and plain text with regular expressions.
All these formats are supported thanks to our predefined SPARQL binding functions and SPARQL-Generate iterator functions, but of course you can leverage the SPARQL 1.1 extension mechanism and implement your own functions to support any other format.
You can start using SPARQL-Generate as:
SPARQL-Generate is already in use in the following projects: