key: |?article_uri|
value: {http://news.yahoo.com/...}
key: |?city_name|
value: {Burlington}
;; etc.
The SPARQL ASK command checks to see if a given query produces any results.
The following example request a Lisp true/false return value to the question “Does
any article contain the city Chicago?”:
(sparql:run-sparql "
PREFIX kb: <http:://knowledgebooks.com/ontology#>
ASK
{
?any_article kb:containsCity ’Chicago’
}"
:results-format :boolean)
There are many possible options for the :results-format keyword argument, includ-
ing:
• :sparql-xml – serializes the results as XML to output-stream
• :sparql-json – serializes the results as JSON data to output-stream
• :sparql-ttl – serializes the results as Turtle encoding to output-stream (Turtle is
a simplified version of N3, like N-Triples with namespaces)
57
7. SPARQL Queries Using AllegroGraph APIs
• :hashes – returns a list of hash tables (as seen in a previous example)
• :arrays – returns a list of arrays for each results
• :lists – returns a list of lists for each results
• :count – returns an integer for the number of results
At some loss of efficiency it is sometimes useful to match string values against regular
expressions; for example:
(sparql:run-sparql "
PREFIX kb: <http:://knowledgebooks.com/ontology#>
SELECT ?article_uri
WHERE {
?article_uri kb:containsPerson ?person_name .
FILTER regex(?person_name, ’ˆ*Putin*’)
}"
:results-format :lists)
;; output:
(({http://news.yahoo.com/s/nm/20080616/ts_nm/worldleaders /}))
7.4. Wrap Up
You learned how to query RDF triples in a repository using the Lisp AllegroGraph
APIs in this chapter. We only considered triples that we explicitly added to the triple
store and in later chapters we will automate the collection of data from the Internet
and convert it to RDF and add it to a local triple store for reuse. Much of the power
of Semantic Web technologies in general and AllegroGraph in particular is the ability
to use triples that are inferred from RDFS without being explicitly created. This
capability is covered in the next chapter in addition to techniques for using different
data sources implemented using different schemas.
58
8. AllegroGraph Reasoning
System
In the last chapter we saw how SPARQL queries can be used to find specific data
in an RDF graph. So far we have only seen examples of finding data that has been
explicitly added to an RDF data repository.
However RDFS, RDFS++, and OWL reasoners can return results that are known
implicitly by using inference. We have already seen that AllegroGraph supports rea-
soning using the following predicates that can be used to infer new relationships that
are not explicitly stated in the RDF data stored in AllegroGraph:
• rdf:type – discussed in Chapter 3
• rdf:property – discussed in Chapter 3
• rdfs:subClassOf – discussed in Chapter 4
• rdfs:range – discussed in Chapter 4
• rdfs:domain – discussed in Chapter 4
• rdfs:subPropertyOf – discussed in Chapter 4
• owl:sameAs – discussed in Chapter 6
• owl:inverseOf– discussed in Chapter 6
• owl:TransitiveProperty – discussed in Chapter 6
8.1. Enabling RDFS++ Reasoning on a Triple
Store
We will look at the AllegroGraphs APIs and programming techniques for reasoning
in detail in this chapter. By default AllegroGraph triple stores do not support RDFS++
reasoning. You must enable RDFS++ reasoning functionality by:
59
8. AllegroGraph Reasoning System
(apply-rdfs++-reasoner :db *db*)
This function works via side effect: the specified data store is converted to support
inferencing. Since the default database *db* can be assumed, this can be shortened
to:
(apply-rdfs++-reasoner)
If you use multiple data stores at the same time you can use different inference support
for each. The remainder of this chapter uses reasoning to infer1 new information.
8.2. Inferring New Triples: rdf:type vs.
rdfs:subClassOf Example
In the following example, we define two triples and then perform a SPARQL query
that answers a question based on a new inferred triple that has not been explicitly
added to the triple store:
(add-triple !kb:man !rdfs:type !kb:person)
(add-triple !kb:sam !rdf:type !kb:man)
(sparql:run-sparql "
PREFIX kb: <http:://knowledgebooks.com/ontology#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
ASK
{
kb:sam rdf:type kb:man
}"
:results-format :boolean)
This query returns a Lisp true value T. You might think that since kb:man is declared
of rdf:type kb:person that the following query would return a true value:
(add-triple !kb:man !rdf:type !kb:person)
(add-triple !kb:sam !rdf:type !kb:man)
1Implementations of RDF triple stores that support RDFS, RDFS++, or OWL reasoning can implement
inferred triples in different ways. One approach is to “pre-calculate” inferred triples using forward chaining inference; this approach is used by the Sesame library. A different approach used in Allegrograph is to infer triples at query time. The results should (hopefully) be the same.
60
8.3. Using Inverse Properties
(sparql:run-sparql "
PREFIX kb: <http:://knowledgebooks.com/ontology#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
ASK
{
kb:sam rdf:type kb:person
}"
:results-format :boolean)
This, however, returns a Lisp false value NIL. To get what you probably thought was
the expected subclass behavior we can use rdfs:subClassOf:
(add-triple !kb:man !rdfs:subClassOf !kb:person)
Now the last query returns a true (Lisp T) value.
8.3. Using Inverse Properties
Properties define a one-way relationship between resources. Sometimes a property
like ”husband of” has an inverse property like ”wife of” so when we say that Mark is
the husband of Carol we would like an automatic logical inference that Carol is the
wife of Mark is true also.
(require :agraph)
(in-package :db.agraph.user)agraph.user)
(create-triple-store "/tmp/rdfstore_2")
(register-namespace "kb" "http:://knowledgebooks.com/ontology#")
(apply-rdfs++-reasoner)
(enable-!-reader)
(add-triple !kb:Mark !kb:husband-of !kb:Carol)
(add-triple !kb:wife-of !owl:inverseOf !kb:husband-of)
We can now infer who wife-of relationships:
(sparql:run-sparql "
PREFIX kb: <http:://knowledgebooks.com/ontology#>
SELECT ?y ?x WHERE { ?y kb:wife-of ?x }")
Since I did not specify the output data format, the default is RDF encoded as XML:
61
8. AllegroGraph Reasoning System
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head>
<variable name="y"/>
<variable name="x"/>
</head>
<results>
<result>
<binding name="y">
<uri>http:://knowledgebooks.com/ontology#Carol</uri>
</binding>
<binding name="x">
<uri>http:://knowledgebooks.com/ontology#Mark</uri>
</binding>
</result>
</results>
</sparql>
I find other output formats generally easier to use; for example specifying :results-
format :lists yields:
(({Carol} {Mark}))
and specifying :results-format :hashes like:
(sparql:run-sparql "
PREFIX kb: <http:://knowledgebooks.com/ontology#>
SELECT ?y ?x WHERE { ?y kb:wife-of ?x }" :results-format :hashes)
will yield:
Specifying an output format of :results-format :sparql-json yields:
{
"head" : {
"vars" : ["y", "x"]
},
"results" : {
"bindings" : [
62
8.4. Using the Same As Property
{
"y":{"type":"uri",
"value":
"http:://knowledgebooks.com/ontology#Carol"},
"x":{"type":"uri",
"value":
"http:://knowledgebooks.com/ontology#Mark"}
}
]
}
}
Yet another output option is :results-format :alists
8.4. Using the Same As Property
It is often useful to make an RDF statement that two different resources are equivalent
as in this example:
(add-triple !kb:Mark !kb:name !kb:"Mark Watson")
(register-namespace "test_news" "http://news.yahoo.com/s/nm/20080616/ts_nm")
(add-triple !test_news:Mark !kb:height !"6 feet 4 inches")
(add-triple !kb:mark !owl:sameAs !kb:test_news:Mark)
(apply-rdfs++-reasoner)
You will be surprised to see only the following results:
(({name} {Mark Watson})
({sameAs} {test_news:Mark}))
With just RDFS++ reasoning, the height of Mark will not be inferred. It would have
been using a full OWL reasoner.
8.5. Using the Transitive Property
If we start by making a few statements about family relationships:
(add-triple !kb:relativeOf !rdf:type !owl:TransitiveProperty)
63
8. AllegroGraph Reasoning System
(add-triple !kb:Mark !kb:relativeOf !kb:Ron)
(add-triple !kb:Ron !kb:relativeOf !kb:Julia)
(add-triple !kb:Julia !kb:relativeOf !kb:Ken)
And run a query like:
(sparql:run-sparql "
PREFIX kb: <http:://knowledgebooks.com/ontology#>
SELECT ?relative WHERE { kb:Mark kb:relativeOf ?relative }"
:results-format :sparql-json))
If you forget to enable reasoning then you will get results just using RDF + RDFS
that you do not expect:
{
"head" : {
"vars" : ["relative"]
},
"results" : {
"bindings" : [
{
"relative":{"type":"uri",
"value":
"http:://knowledgebooks.com/ontology#Ron"}
}
]
}
}
We need to enable reasoning with:
(apply-rdfs++-reasoner)
and we then get all expected relatives listed:
{
"head" : {
"vars" : ["relative"]
},
"results" : {
"bindings" : [
64
8.6. Wrap Up
{
"relative":{"type":"uri",
"value":
"http:://knowledgebooks.com/ontology#Ron"}
},
{
"relative":{"type":"uri",
"value":
"http:://knowledgebooks.com/ontology#Ken"}
},
{
"relative":{"type":"uri",
"value":
"http:://knowledgebooks.com/ontology#Julia"}
}
]
}
}
8.6. Wrap Up
We saw in the ”same as” example that RDFS++ does not always make inferences that
we might expect, so it is best to test reasoning that you depend on for an application
with a small example. At this point, you know how to use AllegroGraph and Lisp
to write your ow applications. I am going to take a short tangent in the next chapter
to show you a short-hand Prolog notation for using AllegroGraph embedded in Lisp
applications.
65
9. AllegroGraph Prolog Interface
This chapter contains optional material about Franz’s proprietary Prolog query in-
terface to AllegroGraph RDF stores. I will not use the Prolog interface in examples
later in this book because I wanted to stress standard technologies like SPARQL for
accessing RDF data. However, the Prolog interface is very convenient to use and if
you will be using AllegroGraph in your work projects I suggest that you take the time
to learn it.
The documentation for the Prolog interface1 is Franz’s tutorial and reference web
pages. I am going to give you a brief introduction in this chapter and you can later re-
view Franz’s documentation if you choose to use the Prolog interface in your projects.
I suggest that you open a Lisp repl and follow along with the examples that are in the
file quick start allegrograph lisp embedded/prolog.lisp. We start by opening a new
triple store and loading some example RDF N-Triple data to experiment with:
(require :agraph)
(in-package :db.agraph.user)
(enable-!-reader) ; enable the ! reader macro
(create-triple-store "/tmp/rdfstore_prolog_1"
:if-exists :supersede)
(register-namespace
"kb"
"http://knowledgebooks.com/ontology/#")
(load-ntriples
#p"quick_start_allegrograph_lisp_embedded/sample_news.nt")
I am assuming that you started the Lisp repl in the main examples directory for this
book so adjust the path in the load-ntriples statement if you started the repl in a
different location. I am going to show you a query example and then explain the
function of the Prolog operators in the example query:
1The Prolog interface is based on Peter Norvig’s Prolog implementation written in Common Lisp.
67
9. AllegroGraph Prolog Interface
(select (?s ?p ?o)
(q- ?s ?p ?o))
This query contains no conditions so every triple is displayed. In Prolog, terms start-
ing with a ? are variables that will later get value bindings. The select Lisp macro
is used to perform a query and return results in a convenient Lisp list notation. Each
q- term in a query is used to define variables and optionally conditions. You will see
the close correspondence with SPARQL queries as we look at more examples. The
output showing all N-Triples in the example data file looks like:
(("http://kbsportal.com/oak_creek_flooding /"
"http://knowledgebooks.com/ontology/#storyType"
"http://knowledgebooks.com/ontology/#disaster")
("http://kbsportal.com/oak_creek_flooding /"
"http://knowledgebooks.com/ontology/#summary"
"Oak Creek flooded last week affecting 5 businesses")
("http://kbsportal.com/bear_mountain_fire /"
"http://knowledgebooks.com/ontology/#storyType"
"http://knowledgebooks.com/ontology/#disaster")
("http://kbsportal.com/bear_mountain_fire /"
"http://knowledgebooks.com/ontology/#summary"
"The fire on Bear Mountain was caused by lightening")
("http://kbsportal.com/trout_season /"
"http://knowledgebooks.com/ontology/#storyType"
"http://knowledgebooks.com/ontology/#sports")
("http://kbsportal.com/trout_season /"
"http://knowledgebooks.com/ontology/#storyType"
"http://knowledgebooks.com/ontology/#recreation")
("http://kbsportal.com/trout_season /"
"http://knowledgebooks.com/ontology/#summary"
"Trout fishing season started last weekend")
("http://kbsportal.com/jc_basketball /"
"http://knowledgebooks.com/ontology/#storyType"
"http://knowledgebooks.com/ontology/#sports"))
We can refine this example query by only requesting news stories that have a sum-
mary:
(select (?news_uri ?summary)
(q- ?news_uri !kb:summary ?summary))
Now the results are:
68
(("http://kbsportal.com/oak_creek_flooding /"
"Oak Creek flooded last week affecting 5 businesses")
("http://kbsportal.com/bear_mountain_fire /"
"The fire on Bear Mountain was caused by lightening")
("http://kbsportal.com/trout_season /"
"Trout fishing season started last weekend"))
If we are only interested in news stories of type disaster, then we can add another
condition filtering against the story type:
(select (?news_uri ?summary)
(q- ?news_uri !kb:summary ?summary)
(q- ?news_uri !kb:storyType !kb:disaster))
Now we only get two results:
(("http://kbsportal.com/oak_creek_flooding /"
"Oak Creek flooded last week affecting 5 businesses")
("http://kbsportal.com/bear_mountain_fire /"
"The fire on Bear Mountain was caused by lightening"))
The Franz Prolog interface tutorial and reference web pages also show examples of
performing RDFS++ type inference and further Prolog techniques. Since we will not
use the Prolog interface in application examples in this book I refer you to the Franz
documentation if you are interested in using the Prolog interface.
69
Part III.
Portable Common Lisp
Utilities for Information
Processing
71
10. Linked Data and the World
Wide Web
It has been a decade since Tim Berners-Lee started writing about “version 2” of the
World Wide Web: the Semantic Web. His new idea was to augment HTML anchor
links with typed links using RDF data. As we have seen in detail in the last several
chapters, RDF is encoded as data triples with the parts of each triple identified as the
subject, predicate, and object. The predicate identifies the type of link between the
subject and the object in a RDF triple.
You can think of a single RDF graph as being hosted in one web service, SPARQL
endpoint service, or a downloadable set of RDF files. Just as the value of the web
is greatly increased with relevant links between web pages, the value of RDF graphs
is increased when they contain references to triples in other RDF graphs. In theory,
you could think of all linked RDF data that is reachable on the web as being a single
graph but in practice graphs with billions of nodes are difficult to work with. That
said, handling very large graphs is an active area of research both in university labs
and in industry.
URIs refer to things, acting as a unique identifier. An important idea is that URIs
in linked data sources can also be ”dereferenceable:” a URI can serve as a unique
identifier for the Semantic Web and if you follow the link you can find HTML, RDF or
any document type that might better inform both human readers and software agents.
Typically, a dereferenceable URI is ”followed” by using the HTTP protocol’s GET
method.
The idea of linking data resources using RDF extends the web so that both human
readers and software agents can use data resources. In Tim Berners-Lee’s 2009 TED
talk on Linked Data he discusses the importance of getting governments, companies
and individuals to share Linked Data and to not keep it private. He makes the great
point that the world has many challenges (medicine, stabilizing the economy, energy
efficiency, etc.) that can benefit from unlocked Linked Data sources.
73
10. Linked Data and the World Wide Web
10.1. Linked Data Resources on the Web
There are already many useful public Linked Data sources, with more being devel-
oped. Some examples are:
1. DBpedia contains the ”info box” data automatically collected from Wikipedia
(see Chapter 14).
2. FOAF (Friend of a Friend) Ontology for specifying information about people
and their social and business relationships.
3. GeoNames (http://www.geonames.org/) links place names to DBpedia (see Chap-
ter 15).
4. Freebase (http://freebase.com) is a community driven web portal that allows
people to enter facts as structured data. It is possible to query Freebase and get
results as RDF. (See Chapter 13).
We have already used the FOAF RDFS definitions in examples in this book1 and we
will DBpedia, GeoNames, and Freebase in later chapters.
10.2. Publishing Linked Data
Leigh Dodds and Ian Davis have written an online book ”Linked Data Patterns”2 that
provides useful patterns for defining and using Linked Data. I recommend their book
as a more complete reference than this short chapter.
I have used a few reasonable patterns in this book for defining RDF properties, some
examples being:
<http://knowledgebooks.com/ontology/containsPlace>
<http://knowledgebooks.com/ontology/containsCity>
<http://knowledgebooks.com/rdf/discusses/person>
<http://knowledgebooks.com/rdf/discusses/place>
It is also good practice to name resources automatically using a root URI followed by
a unique ID based on the data source; for example: a database row ID or a Freebase
ID.
<http://knowledgebooks.com/rdf/datasource/freebase/20121>
1As an example, for people’s names, addresses, etc.
2Available under a Creative Commons License at http://patterns.dataincubator.org/book/
74
10.3. Will Linked Data Become the Semantic Web?
<http://knowledgebooks.com/rdf/datasource/psql/ \\
testdb/testtable/21198>
For all of these examples (properties and resources) it would be good practice to make
these URIs dereferenceable.
10.3. Will Linked Data Become the Semantic
Web?
There has not been much activity building large systems using Semantic Web tech-
nologies. That said, I believe that RDF is a natural data format to use for making
statements about data found on the web and I expect the use of RDF data stores
to increase. The idea of linked data seems like a natural extension: making URIs
dereferenceable lets people follow URIs and get additional information on commonly
used RDFS properties and resources. I am interested in Natural Language Process-
ing (NLP) and it seems reasonable to expect that intelligent agents can use natural
(human) language dereferenced descriptions of properties and resources.
10.4. Linked Data Wrapup
I have defined the relevant terms for using Linked Data in this short chapter and
provided references for further reading and research. Much of the rest of this book is
comprised of Linked Data application examples using some utilities for information
extraction and processing with existing data sources.
75
11. Common Lisp Client Library
for Open Calais
The Open Calais web services are available for free use with some minor limi-
tations. This service is also available for a fee with additional functionality and
guaranteed service levels. We will use the free service in this chapter. Although I
made this chapter self-contained, you may also want to read the documentation at
www.opencalais.com.
You will need to apply for a free developers key. On my development systems I define
an environment variable for the value of my key using (the key shown is not a valid
key, by the way):
export OPEN_CALAIS_KEY=po2eq112hkf985f3k
The example source files are found in lisp practical semantic web/opencalais:
• load.lisp – loads and runs the demo
• opencalais-lib.lisp – performs web service calls to find named entities in text
• opencalais-data-store.lisp – maintains an RDF data store for named entities
• test-opencalais.lisp – demo test program
11.1. Open Calais Web Services Client
The Open Calais web services return RDF payloads serialized as XML data that you
can print out1 if you want to see what it looks like.
For our purposes, we will not use the returned XML data and instead parse the com-
ment block to extract named entities that Open Calais indentifies. There is a possibil-
ity in the future that the library in this section may need modi