NitrosBase

SP²Bench (SPARQL Performance Benchmark)

A competitive benchmark of NitrosBase Universal DBMS versus a well-known Graph DBMS shows that NitrosBase is significantly faster.

Generally, it is tens or hundreds of times faster. In the worst case (query 3b) it is at least 8 times faster. In the best case (query 11) it is 300 000 times faster.

Abstract

This document reports NitrosBase Universal DBMS ’s SP²Bench benchmark results.

SP²Bench comprises a data-generator for arbitrarily large documents, which builds upon a library model close to a real-world application scenario. The benchmark queries implement meaningful requests on top of this data, thereby testing typical SPARQL operator constellations and RDF access patterns.

For more information on SP²Bench see the project’s site:

http://dbis.informatik.uni-freiburg.de/index.php?project=SP2B

Benchmark results

Logarithmic scale is used to make the result more demonstrative.
Shorter bars indicate better performance.

Query1

Return the year of publication of 'Journal 1 (1940)'.

Query2

Extract all inproceedings with properties 'dc:creator', 'bench:booktitle', 'dc:title', 'swrc:pages', 'dcterms:partOf', 'rdfs:seeAlso', 'foaf:homepage', 'dcterms:issued', and optionally 'bench:abstract', including these properties.

Query3a

Select all articles with property swrc:pages.

Query3b

Select all articles with property swrc:month.

Query3c

Select all articles with property swrc:isbn.

Query4

Select all distinct pairs of article author names for authors that have published in the same journal.

Query5a

Return the names of all persons that occur as author of at least one inproceeding and at least one article.

Query5b

Return the names of all persons that occur as author of at least one inproceeding and at least one article (slightly differ from Q5a).

Query6

Return, for each year, the set of all publications authored by persons that have not published in years before.

Query7

Return the titles of all papers that have been cited at least once, but not by any paper that has not been cited itself. This query implements double negation.

Query8

Compute authors that have published with Paul Erdoes or with an author that has published with Paul Erdoes.

Query9

Return incoming and outgoing properties of persons.

Query10

Return all subjects that stand in any relation to person “Paul Erdoes”. In our scenario the query can be reformulated as Return publications and venues in which “Paul Erdoes” is involved either as author or as editor.

Query11

Return (up to) 10 electronic edition URLs starting from the 51th publication, in lexicographical order.

Query12a

Return yes if a person is an author of at least one inproceeding and article.

Query12b

Return yes if an author has published with Paul Erdoes or with an author that has published with “Paul Erdoes”.

Query12c

Return yes if person “John Q. Public” exists.

Experiments

All experiments were conducted on computer with an Intel® Core™ i5-3570 CPU @ 3.40GHz and 16GB DDR3 1600 MHz physical memory.

BENCHMARKS DATASETS

We used SP²Bench generator to generate test RDF documents comprising 50k, 250k, 1M, 5M and 25M triples. Then we performed the whole test on each generated dataset.

BENCHMARKS QUERIES

The benchmark queries also varied in general characteristics like selectivity, query and output size, and different types of JOINs.

Query 1

Return the year of publication of 'Journal 1 (1940)'.

This simple query returns exactly one result (for arbitrarily large documents).

   SELECT ?yr
   WHERE
   {
   ?journal rdf:type bench:Journal .
   ?journal dc:title "Journal 1 (1940)"^^xsd:string .
   ?journal dcterms:issued ?yr 
   }

Query 2

Extract all inproceedings with properties 'dc:creator', 'bench:booktitle', 'dc:title', 'swrc:pages', 'dcterms:partOf', 'rdfs:seeAlso', 'foaf:homepage', 'dcterms:issued', and optionally 'bench:abstract', including these properties.

This query implements a bushy graph pattern. It contains a single, simple OPTIONAL expression, and accesses large strings (i.e. the abstracts). Result size grows with database size, and a final result ordering is necessary due to operator ORDER BY.

   SELECT 
   ?inproc ?author ?booktitle ?title ?proc ?ee ?page 
   ?url ?yr ?abstract
   WHERE {
   ?inproc rdf:type bench:Inproceedings .
   ?inproc dc:creator ?author .
   ?inproc bench:booktitle ?booktitle .
   ?inproc dc:title ?title .
   ?inproc dcterms:partOf ?proc .
   ?inproc rdfs:seeAlso ?ee .
   ?inproc swrc:pages ?page .
   ?inproc foaf:homepage ?url .
   ?inproc dcterms:issued ?yr 
   OPTIONAL {
       ?inproc bench:abstract ?abstract
   }
   }
   ORDER BY ?yr

Query 3

Select all articles with property (a) swrc:pages, (b) swrc:month, or (c) swrc:isbn.

This query tests FILTER expressions with varying selectivity. According to Table I, the FILTER expression in Q3a is not very selective (i.e. retains about 92.61% of all articles). Data access through a secondary index for Q3a is probably not very efficient, but might work well for Q3b, which selects only 0.65% of all articles. The filter condition in Q3c is never satisfied, as no articles have swrc:isbn predicates.

Q3a

   SELECT ?article 
   WHERE { 
   ?article rdf:type bench:Article.
   ?article ?property ?value
       FILTER (?property=swrc:pages) 
   }

Q3b Like Q3a, but "swrc:month" instead of "swrc:pages"

Q3c Like Q3a, but "swrc:isbn" instead of "swrc:pages"


Query 4

Select all distinct pairs of article author names for authors that have published in the same journal.

Q4 contains a comparably long graph chain, i.e. variables ?name1 and ?name2 are linked through articles that (different) authors have published in the same journal. The query computes very large result sets.

   SELECT DISTINCT ?name1 ?name2 
   WHERE { 
   ?article1 rdf:type bench:Article.
   ?article2 rdf:type bench:Article.
   ?article1 dc:creator ?author1.
   ?author1 foaf:name ?name1.
   ?article2 dc:creator ?author2.
   ?author2 foaf:name ?name2.
   ?article1 swrc:journal ?journal.
   ?article2 swrc:journal ?journal
       FILTER (?name1 < ?name2)
   }

Query 5a

Return the names of all persons that occur as author of at least one inproceeding and at least one article.

Queries Q5a and Q5b test different variants of joins. Q5a implements an implicit join on author names, which is encoded in the FILTER condition, while Q5b explicitly joins the authors on variable ?person.

   SELECT DISTINCT ?person ?name 
   WHERE { 
   ?article rdf:type bench:Article.
   ?article dc:creator ?person.
   ?inproc rdf:type bench:Inproceedings.
   ?inproc dc:creator ?person2.
   ?person foaf:name ?name.
   ?person2 foaf:name ?name2
       FILTER(?name=?name2) 
   }

Query 5b

Return the names of all persons that occur as author of at least one inproceeding and at least one article (same as (Q5a)).

Queries Q5a and Q5b test different variants of joins. Q5a implements an implicit join on author names, which is encoded in the FILTER condition, while Q5b explicitly joins the authors on variable ?person.

   SELECT DISTINCT ?person ?name
   WHERE {
   ?article rdf:type bench:Article .
   ?article dc:creator ?person .
   ?inproc rdf:type bench:Inproceedings .
   ?inproc dc:creator ?person .
   ?person foaf:name ?name
   }

Query 6

Return, for each year, the set of all publications authored by persons that have not published in years before.

Q6 implements negation, expressed through a combination of operators OPTIONAL, FILTER, and BOUND. The idea of the construction is that the block outside the OPTIONAL expression computes all publications, while the inner one constitutes earlier publications from authors that appear outside. The outer FILTER expression then retains publications for which ?author2 is unbound, i.e. exactly the publications of those authors that have not published in earlier years.

   SELECT ?yr ?name ?doc 
   WHERE {
   ?class rdfs:subClassOf foaf:Document.
   ?doc rdf:type ?class.
   ?doc dcterms:issued ?yr.
   ?doc dc:creator ?author.
   ?author foaf:name ?name
   OPTIONAL {
   ?class2 rdfs:subClassOf foaf:Document.
   ?doc2 rdf:type ?class2.
   ?doc2 dcterms:issued ?yr2.
   ?doc2 dc:creator ?author2
   FILTER (?author=?author2 && ?yr2 < ?yr) 
   }
       FILTER (!bound(?author2)) 
   }

Query 7

Return the titles of all papers that have been cited at least once, but not by any paper that has not been cited itself. This query implements double negation.

   SELECT DISTINCT ?title 
   WHERE {
       ?class rdfs:subClassOf foaf:Document.
       ?doc rdf:type ?class.
   ?doc dc:title ?title.
   ?bag2 ?member2 ?doc.
   ?doc2 dcterms:references ?bag2
   OPTIONAL {
       ?class3 rdfs:subClassOf foaf:Document.
       ?doc3 rdf:type ?class3.
       ?doc3 dcterms:references ?bag3.
       ?bag3 ?member3 ?doc
       OPTIONAL {
           ?class4 rdfs:subClassOf foaf:Document.
           ?doc4 rdf:type ?class4.
           ?doc4 dcterms:references ?bag4.
           ?bag4 ?member4 ?doc3 
       }
       FILTER (!bound(?doc4)) 
   }
       FILTER (!bound(?doc3)) 
   }

Query 8

Compute authors that have published with Paul Erdoes or with an author that has published with Paul Erdoes.

   SELECT DISTINCT ?name 
   WHERE 
   {
       ?erdoes rdf:type foaf:Person.
       ?erdoes foaf:name "Paul Erdoes"ˆˆxsd:string.
       {
           ?doc dc:creator ?erdoes.
           ?doc dc:creator ?author.
           ?doc2 dc:creator ?author.
           ?doc2 dc:creator ?author2.
           ?author2 foaf:name ?name
           FILTER (?author!=?erdoes 
               && ?doc2!=?doc 
               && ?author2!=?erdoes 
               && ?author2!=?author)
       } UNION {
           ?doc dc:creator ?erdoes.
           ?doc dc:creator ?author.
           ?author foaf:name ?name
           FILTER (?author!=?erdoes) 
       } 
   }

Query 9

Return incoming and outgoing properties of persons.

   SELECT DISTINCT ?predicate 
   WHERE 
   {
       { 
           ?person rdf:type foaf:Person.
           ?subject ?predicate ?person 
       } UNION { 
           ?person rdf:type foaf:Person.
           ?person ?predicate ?object 
       } 
   }

Query 10

Return all subjects that stand in any relation to person “Paul Erdoes”. In our scenario the query can be reformulated as Return publications and venues in which “Paul Erdoes” is involved either as author or as editor.

   SELECT ?subj ?pred 
   WHERE { ?subj ?pred person:Paul_Erdoes }

Query 11

Return (up to) 10 electronic edition URLs starting from the 51th publication, in lexicographical order.

   SELECT ?ee Q11
   WHERE { ?publication rdfs:seeAlso ?ee }
   ORDER BY ?ee LIMIT 10 OFFSET 50

Query 12

(a) Return yes if a person is an author of at least one inproceeding and article;

(b) Return yes if an author has published with Paul Erdoes or with an author that has published with “Paul Erdoes”;

(c) Return yes if person “John Q. Public” exists.

Q12a and Q12b share the properties of their SELECT counterparts Q5a and Q8, respectively. Both return yes for sufficiently large documents.

Q12c asks for a single triple that is not present in the database.

(a) Q5a as ASK query Q12

(b) Q8 as ASK query

(c) ASK {person:John_Q_Public rfd:type foaf:Person}

Conclusion

The SP2Bench benchmark clearly demonstrates that NitrosBase Universal DBMS is tens or hundreds of times faster than a well-known Graph DBMS on most queries. In the worst case (query 3b), it is at least 8 times faster. In the best case (query 11) it is 300 000 times faster.