Practical Semantic Web and Linked Data Applications by Mark Watson - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

key: |?article_uri|

value: {http://news.yahoo.com/...}

key: |?city_name|

value: {Burlington}

;; etc.

The SPARQL ASK command checks to see if a given query produces any results.

The following example request a Lisp true/false return value to the question “Does

any article contain the city Chicago?”:

(sparql:run-sparql "

PREFIX kb: <http:://knowledgebooks.com/ontology#>

ASK

{

?any_article kb:containsCity ’Chicago’

}"

:results-format :boolean)

There are many possible options for the :results-format keyword argument, includ-

ing:

• :sparql-xml – serializes the results as XML to output-stream

• :sparql-json – serializes the results as JSON data to output-stream

• :sparql-ttl – serializes the results as Turtle encoding to output-stream (Turtle is

a simplified version of N3, like N-Triples with namespaces)

57

7. SPARQL Queries Using AllegroGraph APIs

• :hashes – returns a list of hash tables (as seen in a previous example)

• :arrays – returns a list of arrays for each results

• :lists – returns a list of lists for each results

• :count – returns an integer for the number of results

At some loss of efficiency it is sometimes useful to match string values against regular

expressions; for example:

(sparql:run-sparql "

PREFIX kb: <http:://knowledgebooks.com/ontology#>

SELECT ?article_uri

WHERE {

?article_uri kb:containsPerson ?person_name .

FILTER regex(?person_name, ’ˆ*Putin*’)

}"

:results-format :lists)

;; output:

(({http://news.yahoo.com/s/nm/20080616/ts_nm/worldleaders /}))

7.4. Wrap Up

You learned how to query RDF triples in a repository using the Lisp AllegroGraph

APIs in this chapter. We only considered triples that we explicitly added to the triple

store and in later chapters we will automate the collection of data from the Internet

and convert it to RDF and add it to a local triple store for reuse. Much of the power

of Semantic Web technologies in general and AllegroGraph in particular is the ability

to use triples that are inferred from RDFS without being explicitly created. This

capability is covered in the next chapter in addition to techniques for using different

data sources implemented using different schemas.

58

8. AllegroGraph Reasoning

System

In the last chapter we saw how SPARQL queries can be used to find specific data

in an RDF graph. So far we have only seen examples of finding data that has been

explicitly added to an RDF data repository.

However RDFS, RDFS++, and OWL reasoners can return results that are known

implicitly by using inference. We have already seen that AllegroGraph supports rea-

soning using the following predicates that can be used to infer new relationships that

are not explicitly stated in the RDF data stored in AllegroGraph:

• rdf:type – discussed in Chapter 3

• rdf:property – discussed in Chapter 3

• rdfs:subClassOf – discussed in Chapter 4

• rdfs:range – discussed in Chapter 4

• rdfs:domain – discussed in Chapter 4

• rdfs:subPropertyOf – discussed in Chapter 4

• owl:sameAs – discussed in Chapter 6

• owl:inverseOf– discussed in Chapter 6

• owl:TransitiveProperty – discussed in Chapter 6

8.1. Enabling RDFS++ Reasoning on a Triple

Store

We will look at the AllegroGraphs APIs and programming techniques for reasoning

in detail in this chapter. By default AllegroGraph triple stores do not support RDFS++

reasoning. You must enable RDFS++ reasoning functionality by:

59

8. AllegroGraph Reasoning System

(apply-rdfs++-reasoner :db *db*)

This function works via side effect: the specified data store is converted to support

inferencing. Since the default database *db* can be assumed, this can be shortened

to:

(apply-rdfs++-reasoner)

If you use multiple data stores at the same time you can use different inference support

for each. The remainder of this chapter uses reasoning to infer1 new information.

8.2. Inferring New Triples: rdf:type vs.

rdfs:subClassOf Example

In the following example, we define two triples and then perform a SPARQL query

that answers a question based on a new inferred triple that has not been explicitly

added to the triple store:

(add-triple !kb:man !rdfs:type !kb:person)

(add-triple !kb:sam !rdf:type !kb:man)

(sparql:run-sparql "

PREFIX kb: <http:://knowledgebooks.com/ontology#>

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

ASK

{

kb:sam rdf:type kb:man

}"

:results-format :boolean)

This query returns a Lisp true value T. You might think that since kb:man is declared

of rdf:type kb:person that the following query would return a true value:

(add-triple !kb:man !rdf:type !kb:person)

(add-triple !kb:sam !rdf:type !kb:man)

1Implementations of RDF triple stores that support RDFS, RDFS++, or OWL reasoning can implement

inferred triples in different ways. One approach is to “pre-calculate” inferred triples using forward chaining inference; this approach is used by the Sesame library. A different approach used in Allegrograph is to infer triples at query time. The results should (hopefully) be the same.

60

8.3. Using Inverse Properties

(sparql:run-sparql "

PREFIX kb: <http:://knowledgebooks.com/ontology#>

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

ASK

{

kb:sam rdf:type kb:person

}"

:results-format :boolean)

This, however, returns a Lisp false value NIL. To get what you probably thought was

the expected subclass behavior we can use rdfs:subClassOf:

(add-triple !kb:man !rdfs:subClassOf !kb:person)

Now the last query returns a true (Lisp T) value.

8.3. Using Inverse Properties

Properties define a one-way relationship between resources. Sometimes a property

like ”husband of” has an inverse property like ”wife of” so when we say that Mark is

the husband of Carol we would like an automatic logical inference that Carol is the

wife of Mark is true also.

(require :agraph)

(in-package :db.agraph.user)agraph.user)

(create-triple-store "/tmp/rdfstore_2")

(register-namespace "kb" "http:://knowledgebooks.com/ontology#")

(apply-rdfs++-reasoner)

(enable-!-reader)

(add-triple !kb:Mark !kb:husband-of !kb:Carol)

(add-triple !kb:wife-of !owl:inverseOf !kb:husband-of)

We can now infer who wife-of relationships:

(sparql:run-sparql "

PREFIX kb: <http:://knowledgebooks.com/ontology#>

SELECT ?y ?x WHERE { ?y kb:wife-of ?x }")

Since I did not specify the output data format, the default is RDF encoded as XML:

61

8. AllegroGraph Reasoning System

<?xml version="1.0"?>

<sparql xmlns="http://www.w3.org/2005/sparql-results#">

<head>

<variable name="y"/>

<variable name="x"/>

</head>

<results>

<result>

<binding name="y">

<uri>http:://knowledgebooks.com/ontology#Carol</uri>

</binding>

<binding name="x">

<uri>http:://knowledgebooks.com/ontology#Mark</uri>

</binding>

</result>

</results>

</sparql>

I find other output formats generally easier to use; for example specifying :results-

format :lists yields:

(({Carol} {Mark}))

and specifying :results-format :hashes like:

(sparql:run-sparql "

PREFIX kb: <http:://knowledgebooks.com/ontology#>

SELECT ?y ?x WHERE { ?y kb:wife-of ?x }" :results-format :hashes)

will yield:

Specifying an output format of :results-format :sparql-json yields:

{

"head" : {

"vars" : ["y", "x"]

},

"results" : {

"bindings" : [

62

8.4. Using the Same As Property

{

"y":{"type":"uri",

"value":

"http:://knowledgebooks.com/ontology#Carol"},

"x":{"type":"uri",

"value":

"http:://knowledgebooks.com/ontology#Mark"}

}

]

}

}

Yet another output option is :results-format :alists

8.4. Using the Same As Property

It is often useful to make an RDF statement that two different resources are equivalent

as in this example:

(add-triple !kb:Mark !kb:name !kb:"Mark Watson")

(register-namespace "test_news" "http://news.yahoo.com/s/nm/20080616/ts_nm")

(add-triple !test_news:Mark !kb:height !"6 feet 4 inches")

(add-triple !kb:mark !owl:sameAs !kb:test_news:Mark)

(apply-rdfs++-reasoner)

You will be surprised to see only the following results:

(({name} {Mark Watson})

({sameAs} {test_news:Mark}))

With just RDFS++ reasoning, the height of Mark will not be inferred. It would have

been using a full OWL reasoner.

8.5. Using the Transitive Property

If we start by making a few statements about family relationships:

(add-triple !kb:relativeOf !rdf:type !owl:TransitiveProperty)

63

8. AllegroGraph Reasoning System

(add-triple !kb:Mark !kb:relativeOf !kb:Ron)

(add-triple !kb:Ron !kb:relativeOf !kb:Julia)

(add-triple !kb:Julia !kb:relativeOf !kb:Ken)

And run a query like:

(sparql:run-sparql "

PREFIX kb: <http:://knowledgebooks.com/ontology#>

SELECT ?relative WHERE { kb:Mark kb:relativeOf ?relative }"

:results-format :sparql-json))

If you forget to enable reasoning then you will get results just using RDF + RDFS

that you do not expect:

{

"head" : {

"vars" : ["relative"]

},

"results" : {

"bindings" : [

{

"relative":{"type":"uri",

"value":

"http:://knowledgebooks.com/ontology#Ron"}

}

]

}

}

We need to enable reasoning with:

(apply-rdfs++-reasoner)

and we then get all expected relatives listed:

{

"head" : {

"vars" : ["relative"]

},

"results" : {

"bindings" : [

64

8.6. Wrap Up

{

"relative":{"type":"uri",

"value":

"http:://knowledgebooks.com/ontology#Ron"}

},

{

"relative":{"type":"uri",

"value":

"http:://knowledgebooks.com/ontology#Ken"}

},

{

"relative":{"type":"uri",

"value":

"http:://knowledgebooks.com/ontology#Julia"}

}

]

}

}

8.6. Wrap Up

We saw in the ”same as” example that RDFS++ does not always make inferences that

we might expect, so it is best to test reasoning that you depend on for an application

with a small example. At this point, you know how to use AllegroGraph and Lisp

to write your ow applications. I am going to take a short tangent in the next chapter

to show you a short-hand Prolog notation for using AllegroGraph embedded in Lisp

applications.

65

9. AllegroGraph Prolog Interface

This chapter contains optional material about Franz’s proprietary Prolog query in-

terface to AllegroGraph RDF stores. I will not use the Prolog interface in examples

later in this book because I wanted to stress standard technologies like SPARQL for

accessing RDF data. However, the Prolog interface is very convenient to use and if

you will be using AllegroGraph in your work projects I suggest that you take the time

to learn it.

The documentation for the Prolog interface1 is Franz’s tutorial and reference web

pages. I am going to give you a brief introduction in this chapter and you can later re-

view Franz’s documentation if you choose to use the Prolog interface in your projects.

I suggest that you open a Lisp repl and follow along with the examples that are in the

file quick start allegrograph lisp embedded/prolog.lisp. We start by opening a new

triple store and loading some example RDF N-Triple data to experiment with:

(require :agraph)

(in-package :db.agraph.user)

(enable-!-reader) ; enable the ! reader macro

(create-triple-store "/tmp/rdfstore_prolog_1"

:if-exists :supersede)

(register-namespace

"kb"

"http://knowledgebooks.com/ontology/#")

(load-ntriples

#p"quick_start_allegrograph_lisp_embedded/sample_news.nt")

I am assuming that you started the Lisp repl in the main examples directory for this

book so adjust the path in the load-ntriples statement if you started the repl in a

different location. I am going to show you a query example and then explain the

function of the Prolog operators in the example query:

1The Prolog interface is based on Peter Norvig’s Prolog implementation written in Common Lisp.

67

9. AllegroGraph Prolog Interface

(select (?s ?p ?o)

(q- ?s ?p ?o))

This query contains no conditions so every triple is displayed. In Prolog, terms start-

ing with a ? are variables that will later get value bindings. The select Lisp macro

is used to perform a query and return results in a convenient Lisp list notation. Each

q- term in a query is used to define variables and optionally conditions. You will see

the close correspondence with SPARQL queries as we look at more examples. The

output showing all N-Triples in the example data file looks like:

(("http://kbsportal.com/oak_creek_flooding /"

"http://knowledgebooks.com/ontology/#storyType"

"http://knowledgebooks.com/ontology/#disaster")

("http://kbsportal.com/oak_creek_flooding /"

"http://knowledgebooks.com/ontology/#summary"

"Oak Creek flooded last week affecting 5 businesses")

("http://kbsportal.com/bear_mountain_fire /"

"http://knowledgebooks.com/ontology/#storyType"

"http://knowledgebooks.com/ontology/#disaster")

("http://kbsportal.com/bear_mountain_fire /"

"http://knowledgebooks.com/ontology/#summary"

"The fire on Bear Mountain was caused by lightening")

("http://kbsportal.com/trout_season /"

"http://knowledgebooks.com/ontology/#storyType"

"http://knowledgebooks.com/ontology/#sports")

("http://kbsportal.com/trout_season /"

"http://knowledgebooks.com/ontology/#storyType"

"http://knowledgebooks.com/ontology/#recreation")

("http://kbsportal.com/trout_season /"

"http://knowledgebooks.com/ontology/#summary"

"Trout fishing season started last weekend")

("http://kbsportal.com/jc_basketball /"

"http://knowledgebooks.com/ontology/#storyType"

"http://knowledgebooks.com/ontology/#sports"))

We can refine this example query by only requesting news stories that have a sum-

mary:

(select (?news_uri ?summary)

(q- ?news_uri !kb:summary ?summary))

Now the results are:

68

(("http://kbsportal.com/oak_creek_flooding /"

"Oak Creek flooded last week affecting 5 businesses")

("http://kbsportal.com/bear_mountain_fire /"

"The fire on Bear Mountain was caused by lightening")

("http://kbsportal.com/trout_season /"

"Trout fishing season started last weekend"))

If we are only interested in news stories of type disaster, then we can add another

condition filtering against the story type:

(select (?news_uri ?summary)

(q- ?news_uri !kb:summary ?summary)

(q- ?news_uri !kb:storyType !kb:disaster))

Now we only get two results:

(("http://kbsportal.com/oak_creek_flooding /"

"Oak Creek flooded last week affecting 5 businesses")

("http://kbsportal.com/bear_mountain_fire /"

"The fire on Bear Mountain was caused by lightening"))

The Franz Prolog interface tutorial and reference web pages also show examples of

performing RDFS++ type inference and further Prolog techniques. Since we will not

use the Prolog interface in application examples in this book I refer you to the Franz

documentation if you are interested in using the Prolog interface.

69

Part III.

Portable Common Lisp

Utilities for Information

Processing

71

10. Linked Data and the World

Wide Web

It has been a decade since Tim Berners-Lee started writing about “version 2” of the

World Wide Web: the Semantic Web. His new idea was to augment HTML anchor

links with typed links using RDF data. As we have seen in detail in the last several

chapters, RDF is encoded as data triples with the parts of each triple identified as the

subject, predicate, and object. The predicate identifies the type of link between the

subject and the object in a RDF triple.

You can think of a single RDF graph as being hosted in one web service, SPARQL

endpoint service, or a downloadable set of RDF files. Just as the value of the web

is greatly increased with relevant links between web pages, the value of RDF graphs

is increased when they contain references to triples in other RDF graphs. In theory,

you could think of all linked RDF data that is reachable on the web as being a single

graph but in practice graphs with billions of nodes are difficult to work with. That

said, handling very large graphs is an active area of research both in university labs

and in industry.

URIs refer to things, acting as a unique identifier. An important idea is that URIs

in linked data sources can also be ”dereferenceable:” a URI can serve as a unique

identifier for the Semantic Web and if you follow the link you can find HTML, RDF or

any document type that might better inform both human readers and software agents.

Typically, a dereferenceable URI is ”followed” by using the HTTP protocol’s GET

method.

The idea of linking data resources using RDF extends the web so that both human

readers and software agents can use data resources. In Tim Berners-Lee’s 2009 TED

talk on Linked Data he discusses the importance of getting governments, companies

and individuals to share Linked Data and to not keep it private. He makes the great

point that the world has many challenges (medicine, stabilizing the economy, energy

efficiency, etc.) that can benefit from unlocked Linked Data sources.

73

10. Linked Data and the World Wide Web

10.1. Linked Data Resources on the Web

There are already many useful public Linked Data sources, with more being devel-

oped. Some examples are:

1. DBpedia contains the ”info box” data automatically collected from Wikipedia

(see Chapter 14).

2. FOAF (Friend of a Friend) Ontology for specifying information about people

and their social and business relationships.

3. GeoNames (http://www.geonames.org/) links place names to DBpedia (see Chap-

ter 15).

4. Freebase (http://freebase.com) is a community driven web portal that allows

people to enter facts as structured data. It is possible to query Freebase and get

results as RDF. (See Chapter 13).

We have already used the FOAF RDFS definitions in examples in this book1 and we

will DBpedia, GeoNames, and Freebase in later chapters.

10.2. Publishing Linked Data

Leigh Dodds and Ian Davis have written an online book ”Linked Data Patterns”2 that

provides useful patterns for defining and using Linked Data. I recommend their book

as a more complete reference than this short chapter.

I have used a few reasonable patterns in this book for defining RDF properties, some

examples being:

<http://knowledgebooks.com/ontology/containsPlace>

<http://knowledgebooks.com/ontology/containsCity>

<http://knowledgebooks.com/rdf/discusses/person>

<http://knowledgebooks.com/rdf/discusses/place>

It is also good practice to name resources automatically using a root URI followed by

a unique ID based on the data source; for example: a database row ID or a Freebase

ID.

<http://knowledgebooks.com/rdf/datasource/freebase/20121>

1As an example, for people’s names, addresses, etc.

2Available under a Creative Commons License at http://patterns.dataincubator.org/book/

74

10.3. Will Linked Data Become the Semantic Web?

<http://knowledgebooks.com/rdf/datasource/psql/ \\

testdb/testtable/21198>

For all of these examples (properties and resources) it would be good practice to make

these URIs dereferenceable.

10.3. Will Linked Data Become the Semantic

Web?

There has not been much activity building large systems using Semantic Web tech-

nologies. That said, I believe that RDF is a natural data format to use for making

statements about data found on the web and I expect the use of RDF data stores

to increase. The idea of linked data seems like a natural extension: making URIs

dereferenceable lets people follow URIs and get additional information on commonly

used RDFS properties and resources. I am interested in Natural Language Process-

ing (NLP) and it seems reasonable to expect that intelligent agents can use natural

(human) language dereferenced descriptions of properties and resources.

10.4. Linked Data Wrapup

I have defined the relevant terms for using Linked Data in this short chapter and

provided references for further reading and research. Much of the rest of this book is

comprised of Linked Data application examples using some utilities for information

extraction and processing with existing data sources.

75

11. Common Lisp Client Library

for Open Calais

The Open Calais web services are available for free use with some minor limi-

tations. This service is also available for a fee with additional functionality and

guaranteed service levels. We will use the free service in this chapter. Although I

made this chapter self-contained, you may also want to read the documentation at

www.opencalais.com.

You will need to apply for a free developers key. On my development systems I define

an environment variable for the value of my key using (the key shown is not a valid

key, by the way):

export OPEN_CALAIS_KEY=po2eq112hkf985f3k

The example source files are found in lisp practical semantic web/opencalais:

• load.lisp – loads and runs the demo

• opencalais-lib.lisp – performs web service calls to find named entities in text

• opencalais-data-store.lisp – maintains an RDF data store for named entities

• test-opencalais.lisp – demo test program

11.1. Open Calais Web Services Client

The Open Calais web services return RDF payloads serialized as XML data that you

can print out1 if you want to see what it looks like.

For our purposes, we will not use the returned XML data and instead parse the com-

ment block to extract named entities that Open Calais indentifies. There is a possibil-

ity in the future that the library in this section may need modi