Practical Semantic Web and Linked Data Applications by Mark Watson - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Acknowledgements

I would like to thank my wife Carol Watson for copyediting this book.

7For your convenience, I include in the code ZIP file third party libraries, most of which are released under MIT, BSD, Lisp LGPL, or Apache licenses.

8downloading the free PDF from http://markwatson.com/opencontent does not give you the rights to this waiver.

xiv

1. Introduction

Franz has good online documentation1 for all of their AllegroGraph products. One

purpose of this book is to provide a brief introduction to AllegroGraph but I assume

that you also reference the documentation on the Franz web site. The broader purpose

of this book is to provide application programming examples using AllegroGraph and

Linked Data sources on the web. This book also covers some of my own open source

Common Lisp projects that you may find useful for Semantic Web applications. The

combination of interactive Lisp development with embedded AllegroGraph and my

utilities covered later should provide you with an agile development environment for

writing knowledge based and semantic web applications.

AllegroGraph is an RDF data repository that can use RDFS and RDFS+ inferencing.

AllegroGraph also provides three non-standard extensions:

1. Text indexing and search

2. Geo Location support

3. Network traversal and search for social network applications

1.1. Who is this Book Written For?

I assume that you both already know how to program in Common Lisp and that

you write applications that require handling large amounts of unstructured informa-

tion. AllegroGraph is a powerful tool for handling large amounts of data and Lisp

programming environments are excellent for rapidly prototyping new applications.

Along with extra libraries I have written for using linked data sources on the web, this

book will hopefully provide you with new tools to rapidly solve application problems

that would be more difficult to handle using relational databases.

Franz also provides support for embedding AllegroGraph in Lisp applications and

for using it in a client mode with external AllegroGraph servers. Since the APIs

are almost identical, I take a shortcut in writing this book and concentrate on using

AllegroGraph in embedded mode.

1http://franz.com/agraph/support/documentation/current/agraph-introduction.html

1

index-16_1.png

index-16_2.png

index-16_3.png

index-16_4.png

index-16_5.png

index-16_6.png

1. Introduction

Typical Semantic Web Application

Information Sources

(web sites, relational

Data to

databases, document

RDF Filters

RDF Reository

repositories)

RDF/RDFS/OWL

Application

APIs

Program

Figure 1.1.: Example Semantic Web Application

There are many books, good tutorials and software about the Semantic Web on the

web. However, there is not a single reference for developers who want to use the

combination of Common Lisp and AllegroGraph for development using technologies

like RDF/RDFS/OWL modeling, descriptive logic reasoners, and the SPARQL query

language.

If you own a Franz Lisp and AllegroGraph development license, then you are set to

go. If not, you need to download and install a free edition copy at:

http://www.franz.com/downloads/

You may also want to download and install the free versions of the AllegroGraph

standalone server, Gruff, and WebView.2

Franz Inc. has provided support for my writing this book in the form of technical re-

views and my understanding is that even though you will need to periodically refresh

your free non-commercial license, there is no inherent time limit for non-commercial

use. I would also like to thank Franz for providing me with an Enterprise developers

license for my MacBook that I use for my own research and development projects.

2I do not use these associated products in this book but I do in the Java, Clojure, Scala, and JRuby edition of this book.

2

1.2. Why a PDF Copy of this Book is Available Free on the Web

1.2. Why a PDF Copy of this Book is Available

Free on the Web

As an author I want to earn a living writing and have many people read and enjoy my

books. By offering for sale the print version of this book I can earn some money for

my efforts and also allow readers who can not afford to buy many books or may only

be interested in a few chapters to read it from my web site. If you support my future

writing projects by purchasing either the print or PDF version of this book, I thank

you by offering you more flexibility in the software license terms for the example

programs and libraries I developed (see Section 6 in the Preface).

Please note that I do not give permission to post the PDF version of this book on other

people’s web sites: I consider this to be at least indirectly commercial exploitation in

violation the Creative Commons License that I have chosen for this book.

1.3. Book Software

You can download a large ZIP file containing all code and test data from the URL:

http://markwatson.com/opencontent/lisp_semantic_web_code.zip

The book example code, libraries, and applications are organized in subdirectories

organized by topic:

1. dbpedia - use the DBPedia web services

2. freebase client - use the Freebase web services

3. geonames - use the Geonames web service

4. knowledgebooks nlp - my natural language processing library

5. opencalais - use the OpenCalais web services

6. quick start allegrograph lisp embedded - code snippets used to introduce Al-

legrograph

7. quick start allegrograph standalone server - code snippets for Chapter 2

8. rdf - additional code snippets for created RDF triples and making queries

9. reasoning - code snippets for Chapter 8

10. sparql - code snippets and sample data for SPARQL queries

3

1. Introduction

11. test data - miscellaneous test data files

12. utils - third party libraries3 that I use for the book examples

13. web app - both backend code from Chapter 16 and the front end web applica-

tion code from Chapter 17

1.4. Why Graph Data Representations are Better

than the Relational Database Model for

Dealing with Rapidly Changing Data

Requirements

When people are first introduced to Semantic Web technologies their first reaction is

often something like, “I can just do that with a database.” The relational database

model is an efficient way to express and work with slowly changing data models.

There are some clever tools for dealing with data change requirements in the database

world (ActiveRecord and migrations being a good example) but it is awkward to have

end users and even developers tagging on new data attributes to relational database

tables.

A major theme in this book is convincing you that modeling data with RDF and

RDFS facilitates freely extending data models and also allows fairly easy integration

of data from different sources using different schemas without explicitly converting

data from one schema to another for reuse. You will learn how to use the SPARQL

query language to use information in different RDF repositories. It is also possible to

publish relational data with a SPARQL interface. 4

1.5. What if You Use Other Programming

Languages Other Than Lisp?

If you are a Java programmer, you probably still want to learn about AllegroGraph

because Franz distributes a free Java version of AllegroCache that can be used for any

purposes (including commercial applications) – the free Java version is limited to 50

million RDF triples. The Java version is a natively compiled Franz Lisp application

that provides plain socket and HTTP/REST interfaces.

3cl-json, s-xml, split-sequence, usocket, trivial-gray-streams, flexi-streams, chunga, cl-base64, puri, drakma, and cl-geonames

4The open source D2R project provides a wrapper for relational databases that provides a SPARQL query interface.

4

1.5. What if You Use Other Programming Languages Other Than Lisp?

If you do most of your development in other languages like Ruby and Python then

you can run the free server edition using the HTTP/Sesame client protocol. Sesame

is a high quality “batteries included” Java library for Semantic Web development; the

Sesame client protocol is well documented and simple to use but will not be covered

here. If you use the Sesame protocol then you have the flexibility of using both

Franz’s free server edition of AllegroGraph and Sesame which is open source with a

BSD style license.

5

2. AllegroGraph Embedded Lisp

Quick Start

The first section of this book will cover Semantic Web technologies from a theoretical

and reference point of view. Since I want you to follow along with the book material

as I present it, this chapter is intended to get you comfortable using Lisp and embed-

ded AllegroGraph: it will be easier to work through the theory in Chapters 3, 4, and 6

if you understand the basics of AllegroGraph. After this more detailed look at some

theory we will dig deeper into AllegroGraph development techniques in Chapters 7,

8, and 9.

2.1. Starting AllegroGraph

In this chapter and in much of this book, you can save some effort by copying and

pasting the code snippets into the Lisp listener. The code snippets used in this chap-

ter are contained in the source file quick start lisp embedded.lisp. I assume that

most readers are trying AllegroGraph using the free non-commercial use version so

that is what I will use here. If you are using a commercially licensed version the

examples will work the same but the initial banner display by alisp (conventional

case insensitive Lisp shell) and mlisp (“modern” case sensitive Lisp shell) will be

slightly different. While I usually use alisp in my work (I have been using Lisp for

professional development since 1982), Franz recommends using mlisp for Allegro-

Graph development so we will use mlisp in this book. You will need to follow the

directions in acl81 express/readme.txt to build a mlisp image to use. When showing

interactive examples in this chapter I remove some Lisp shell messages so when you

work along with these examples expect to see more output than what is shown here:1

markw$ mlisp

International Allegro CL Free Express Edition

8.2 [Mac OS X (Intel)] (Jul 9, 2009 17:15)

Copyright (C) 1985-2007, Franz Inc., Oakland, CA, USA.

All Rights Reserved.

1I use OS X and Linux for my development. If you are a Windows user, follow the installation instructions on the AllegroGraph download web page and expect to see slight differences to the interactive example sessions that I use in this book.

7

2. AllegroGraph Embedded Lisp Quick Start

This development copy of Allegro CL is licensed to:

Trial User

;; Current reader case mode: :case-sensitive-lower

cl-user(1): (require :agraph)

AllegroGraph Lisp Edition 3.2 [built on March 16, 2009 15:05:15 GMT-0700]

t

cl-user(2): (in-package :db.agraph.user)

#<The db.agraph.user package>

TRIPLE-STORE-USER(3):

Please note that you will see many lines of output that I did not show. Here I

required the :agraph package and changed the current Common Lisp package to

db.agraph.user. In examples later in this book when we develop complete applica-

tion examples we will be using our own application-specific packages and I will show

you then what you need in general to import from db.agraph and db.agraph.user.

We will continue this interactive example Lisp session in the following sections.

I use interactive sessions in a command window for the examples in this book. If you

are a Windows user then you will may want to alternatively try the Windows-specific

IDE. I recommend that OS X, Linux, and Windows users use Emacs to develop Lisp

code.2

If you run Franz Lisp in a terminal shell then I recommend that you start it using

rlwrap. As an example, using OS X and Linux, I create an alias like:

alias lisp=’rlwrap alisp’

Using rlwrap lets you use the up arrow key to rerun previous commands, edit previous

commands, etc.

2.2. Working with RDF Data Stores

RDF data stores provide the services for storing RDF triple data and provide some

means of making queries to identify some subset of the triples in the store. It is

important to keep in mind that the mechanism for maintaining triple stores varies in

different implementations. Triples can be stored in memory, in disk-based btree stores

like BerkeleyDB, in relational databases, and in custom stores like AllegroGraph.

2Franz provides their own Emacs tools: look for instructions for installing ELI. However, I also use the SLIME Emacs Lisp development tools that are compatible with all versions of Lisp that I use: Franz, SBCL, ClozureCL, and Gambit-C Scheme. Franz provides SLIME installation instructions for Franz

Common Lisp

8

2.2. Working with RDF Data Stores

While much of this book is specific to Common Lisp and AllegroGraph, the concepts

that you will learn and experiment with can be useful if you also use other languages

and platforms like Java (Sesame, Jena, OwlAPIs, etc.), Ruby (Redland RDF), etc.

For Java developers Franz offers a Java version of AllegroGraph (implemented in

Lisp with a network interface that also supports Python and Ruby clients) that I cover

in the Java edition of this book.

2.2.1. Creating Repositories

AllegroGraph uses disk-based RDF storage with automatic in-memory caching. For

the examples in this book I will assume that all RDF stores are kept in the temporary

directory /tmp. For deployed systems you will clearly want to use a permanent loca-

tion. For Windows(tm) development you can either change this location or create a

new directory in c:\tmp. In the examples in this book, I assume a Mac OS X, Linux,

or other Unix type file system:

TRIPLE-STORE-USER(3): (create-triple-store

"/tmp/rdfstore_1")

#<db.agraph::triple-db /tmp/rdfstore_1, open @ #x109682>

I hope that you are following along with this running example – you will better un-

derstand this material if you type it into a Lisp shell.

While it is possible to simultaneously work with multiple repositories (and this is

well documented in Franz’s online documentation for the non-free versions of Alle-

groGraph) for all of the tutorials, examples, and sample applications in this book we

need just a single open repository in order to be compatible with the free versions of

AllegroGraph.

We will see in Chapter 3 how to partition RDF triples into different namespaces and

to use existing RDF data and schemas in different namespaces. In the following code

snippet I introduce the AllegroGraph APIs for defining new namespaces and listing

all namespaces defined in the current repository:

TRIPLE-STORE-USER(4): (register-namespace "kb"

"http://knowledgebooks.com/rdfs#")

"http://knowledgebooks.com/rdfs#"

TRIPLE-STORE-USER(5): (display-namespaces)

rdfs => http://www.w3.org/2000/01/rdf-schema#

err => http://www.w3.org/2005/xqt-errors#

fn => http://www.w3.org/2005/xpath-functions#

rdf => http://www.w3.org/1999/02/22-rdf-syntax-ns#

xs => http://www.w3.org/2001/XMLSchema#

9

2. AllegroGraph Embedded Lisp Quick Start

xsd => http://www.w3.org/2001/XMLSchema#

owl => http://www.w3.org/2002/07/owl#

kb => http://knowledgebooks.com/rdfs#

Here I created a new name space that has an abbreviation (or nickname) kb: and

then printed out all registered namespaces. To insure data integrity be sure to call

(close-triple-store) to close an RDF triple store when you are done with it. I leave

the connection open because we will continue to use it in this chapter.

2.2.2. AllegroGraph Lisp Reader Support for RDF

In general, the subject, predicate, and object parts of an RDF triple can be either URIs

or literals.

AllegroGraph provides a Lisp reader macro ! that makes it easier to enter URIs and

literals. For example, the following two URIs are functionally equivalent given the

(register-namespace “kb” ...) in the last section:

<http://knowledgebooks.com/rdfs#containsPerson>

!kb:containsPerson

String literals are also defined using the ! reader macro; for example:

!"Barack Obama"

!"101 Main Street"

2.2.3. Adding Triples

A triple consists of a subject, predicate, and object. We refer to these three values as

symbols :s, :p, and :o when using the AllegroGraph APIs. We saw the use of literals

with the ! Lisp reader macro in the last section. If we need to refer to either a subject,

predicate, or object as a web URI then we use the function resource:

TRIPLE-STORE-USER(15): (resource "http://demo_news/12931")

!<http://demo_news/12931>

TRIPLE-STORE-USER(16): (defvar *demo-article*

(resource

"http://demo_news/12931"))

*demo-article*

TRIPLE-STORE-USER(17): *demo-article*

!<http://demo_news/12931>

10

2.2. Working with RDF Data Stores

The function add-triple takes three arguments for the subject, predicate, and object

in a triple:

TRIPLE-STORE-USER(18): (add-triple *demo-article*

!rdf:type

!kb:article)

1

TRIPLE-STORE-USER(19): (add-triple *demo-article*

!kb:containsPerson

!"Barack Obama")

2

We used a combination of a generated resource, two predicates defined in the rdf:

and kb: namespaces, and a string literal to define two triples. You notice that the

function add-triple returns an integer as its value: this is a unique ID for the newly

created triple.

2.2.4. Fetching Triples by ID

Triples in an AllegroGraph RDF store can be identified by a unique ID; this ID value

is returned as the value of calling add-triple and can be used to fetch a triple:

TRIPLE-STORE-USER(20): (get-triple-by-id 2)

<12931 containsPerson Barack Obama>

TRIPLE-STORE-USER(21): (defvar *triple*

(get-triple-by-id 2))

*triple*

TRIPLE-STORE-USER(22): *triple*

<12931 containsPerson Barack Obama>

We will seldom access triples by ID – we will see shortly how to query a RDF store

to find triples.

2.2.5. Printing Triples

The function print-triple can be used to print a short form of a triple and by adding

the arguments :format :concise we can also print a triple in the NTriple format:

TRIPLE-STORE-USER(23): (print-triple *triple*

:format :concise)

11

2. AllegroGraph Embedded Lisp Quick Start

<4: http://demo_news/12931 kb:containsPerson

Barack Obama>

<12931 containsPerson Barack Obama>

TRIPLE-STORE-USER(24): (print-triple *triple*)

<http://demo_news/12931>

<http://knowledgebooks.com/rdfs#containsPerson>

"Barack Obama" .

<12931 containsPerson Barack Obama>

Function print-triple prints a triple to standard output and returns the triple value in

the short notation. We will see later in Section 2.2.6 how to create something like

a database cursor for iterating through multiple triples that we find by querying a

triple store. For now we will use query function get-triples-list that returns all triples

matching a query in a list. The utility function print-triples prints all triples in a list:

TRIPLE-STORE-USER(27): (print-triples (list *triple*))

<http://demo_news/12931>

<http://knowledgebooks.com/rdfs#containsPerson>

"Barack Obama" .

TRIPLE-STORE-USER(28): (print-triples (get-triples-list))

<http://demo_news/12931>

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<http://knowledgebooks.com/rdfs#article> .

<http://demo_news/12931>

<http://knowledgebooks.com/rdfs#containsPerson>

"Barack Obama" .

When get-triples-list is called with no arguments it simply returns all triples in a data

store. We can specify query matching values for any combination of :s, :p, and :o.

We can look at all triples that have their subject equal to the resource we created for

the demo article:

TRIPLE-STORE-USER(31): (print-triples

(get-triples-list :s *demo-article*))

<http://demo_news/12931>

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<http://knowledgebooks.com/rdfs#article> .

<http://demo_news/12931>

<http://knowledgebooks.com/rdfs#containsPerson>

"Barack Obama" .

We can limit query results further; in this case we add the condition that the object

must equal the value of the type !kb:article:

12

2.2. Working with RDF Data Stores

TRIPLE-STORE-USER(33): (print-triples

(get-triples-list :s *demo-article*

:o !kb:article))

<http://demo_news/12931>

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<http://knowledgebooks.com/rdfs#article> .

I often need to manually reformat program example text and example program output

in this book. The last three lines in the last example would appear on a single line if

you are following along with these tutorial examples in a Lisp listener (as you should

be!). In any case, RDF triple data in the NTriple format that we are using here is

free-format: a triple is defined by three tokens (each with no embedded whitespace

unless inside a string literal) and ended with a period character.

2.2.6. Using Cursors to Iterate Through Query Results

You are probably familiar with relational databases, the SQL query language, and

client libraries that allow you to iterate through very large result sets. Allegrograph

provides a cursor API for doing the same thing, as seen in this example:

TRIPLE-STORE-USER(39): (setq a-cursor (get-triples

:s

*demo-article*))

#<DB.AGRAPH::FILTERED-CURSOR

#<DB.AGRAPH::ROW-CURSOR

#<DB.AGRAPH::TRIPLE-RECORD-FILE @ #x113fd61a> ...

#x11672082>

@ #x1167219a>

TRIPLE-STORE-USER(40): (while (cursor-next-p a-cursor)

; cursor-next returns a vector, not a triple:

(print (cursor-next-row a-cursor)))

<12931 type article>

<12931 containsPerson Barack Obama>

NIL

TRIPLE-STORE-USER(41):

I usually find it simpler to use the get-triples-list API that returns a list of results. I

only use cursors when a query may return hundreds or thousands of results.

13

2. AllegroGraph Embedded Lisp Quick Start

2.2.7. Saving Triple Stores to Disk as XML, N-Triples, and

N3

It is often useful to copy either all triples in data store or triples matching a query to

a flat disk file in N-Triples format:

(with-open-file (output "/tmp/sample.ntriple"

:direction :output

:if-does-not-exist :create)

(print-triples (get-triples-list)

:stream output :format :ntriple))

In this example, I did not use any query filtering when calling get-triples-list so the

entire contents of the data store is written to a local flat file. Note that in this last

example, everything gets read into memory; this could