I would like to thank my wife Carol Watson for copyediting this book.
7For your convenience, I include in the code ZIP file third party libraries, most of which are released under MIT, BSD, Lisp LGPL, or Apache licenses.
8downloading the free PDF from http://markwatson.com/opencontent does not give you the rights to this waiver.
xiv
1. Introduction
Franz has good online documentation1 for all of their AllegroGraph products. One
purpose of this book is to provide a brief introduction to AllegroGraph but I assume
that you also reference the documentation on the Franz web site. The broader purpose
of this book is to provide application programming examples using AllegroGraph and
Linked Data sources on the web. This book also covers some of my own open source
Common Lisp projects that you may find useful for Semantic Web applications. The
combination of interactive Lisp development with embedded AllegroGraph and my
utilities covered later should provide you with an agile development environment for
writing knowledge based and semantic web applications.
AllegroGraph is an RDF data repository that can use RDFS and RDFS+ inferencing.
AllegroGraph also provides three non-standard extensions:
1. Text indexing and search
2. Geo Location support
3. Network traversal and search for social network applications
1.1. Who is this Book Written For?
I assume that you both already know how to program in Common Lisp and that
you write applications that require handling large amounts of unstructured informa-
tion. AllegroGraph is a powerful tool for handling large amounts of data and Lisp
programming environments are excellent for rapidly prototyping new applications.
Along with extra libraries I have written for using linked data sources on the web, this
book will hopefully provide you with new tools to rapidly solve application problems
that would be more difficult to handle using relational databases.
Franz also provides support for embedding AllegroGraph in Lisp applications and
for using it in a client mode with external AllegroGraph servers. Since the APIs
are almost identical, I take a shortcut in writing this book and concentrate on using
AllegroGraph in embedded mode.
1http://franz.com/agraph/support/documentation/current/agraph-introduction.html
1
1. Introduction
Typical Semantic Web Application
Information Sources
(web sites, relational
Data to
databases, document
RDF Filters
RDF Reository
repositories)
RDF/RDFS/OWL
Application
APIs
Program
Figure 1.1.: Example Semantic Web Application
There are many books, good tutorials and software about the Semantic Web on the
web. However, there is not a single reference for developers who want to use the
combination of Common Lisp and AllegroGraph for development using technologies
like RDF/RDFS/OWL modeling, descriptive logic reasoners, and the SPARQL query
language.
If you own a Franz Lisp and AllegroGraph development license, then you are set to
go. If not, you need to download and install a free edition copy at:
http://www.franz.com/downloads/
You may also want to download and install the free versions of the AllegroGraph
standalone server, Gruff, and WebView.2
Franz Inc. has provided support for my writing this book in the form of technical re-
views and my understanding is that even though you will need to periodically refresh
your free non-commercial license, there is no inherent time limit for non-commercial
use. I would also like to thank Franz for providing me with an Enterprise developers
license for my MacBook that I use for my own research and development projects.
2I do not use these associated products in this book but I do in the Java, Clojure, Scala, and JRuby edition of this book.
2
1.2. Why a PDF Copy of this Book is Available Free on the Web
1.2. Why a PDF Copy of this Book is Available
Free on the Web
As an author I want to earn a living writing and have many people read and enjoy my
books. By offering for sale the print version of this book I can earn some money for
my efforts and also allow readers who can not afford to buy many books or may only
be interested in a few chapters to read it from my web site. If you support my future
writing projects by purchasing either the print or PDF version of this book, I thank
you by offering you more flexibility in the software license terms for the example
programs and libraries I developed (see Section 6 in the Preface).
Please note that I do not give permission to post the PDF version of this book on other
people’s web sites: I consider this to be at least indirectly commercial exploitation in
violation the Creative Commons License that I have chosen for this book.
1.3. Book Software
You can download a large ZIP file containing all code and test data from the URL:
http://markwatson.com/opencontent/lisp_semantic_web_code.zip
The book example code, libraries, and applications are organized in subdirectories
organized by topic:
1. dbpedia - use the DBPedia web services
2. freebase client - use the Freebase web services
3. geonames - use the Geonames web service
4. knowledgebooks nlp - my natural language processing library
5. opencalais - use the OpenCalais web services
6. quick start allegrograph lisp embedded - code snippets used to introduce Al-
legrograph
7. quick start allegrograph standalone server - code snippets for Chapter 2
8. rdf - additional code snippets for created RDF triples and making queries
9. reasoning - code snippets for Chapter 8
10. sparql - code snippets and sample data for SPARQL queries
3
1. Introduction
11. test data - miscellaneous test data files
12. utils - third party libraries3 that I use for the book examples
13. web app - both backend code from Chapter 16 and the front end web applica-
tion code from Chapter 17
1.4. Why Graph Data Representations are Better
than the Relational Database Model for
Dealing with Rapidly Changing Data
Requirements
When people are first introduced to Semantic Web technologies their first reaction is
often something like, “I can just do that with a database.” The relational database
model is an efficient way to express and work with slowly changing data models.
There are some clever tools for dealing with data change requirements in the database
world (ActiveRecord and migrations being a good example) but it is awkward to have
end users and even developers tagging on new data attributes to relational database
tables.
A major theme in this book is convincing you that modeling data with RDF and
RDFS facilitates freely extending data models and also allows fairly easy integration
of data from different sources using different schemas without explicitly converting
data from one schema to another for reuse. You will learn how to use the SPARQL
query language to use information in different RDF repositories. It is also possible to
publish relational data with a SPARQL interface. 4
1.5. What if You Use Other Programming
Languages Other Than Lisp?
If you are a Java programmer, you probably still want to learn about AllegroGraph
because Franz distributes a free Java version of AllegroCache that can be used for any
purposes (including commercial applications) – the free Java version is limited to 50
million RDF triples. The Java version is a natively compiled Franz Lisp application
that provides plain socket and HTTP/REST interfaces.
3cl-json, s-xml, split-sequence, usocket, trivial-gray-streams, flexi-streams, chunga, cl-base64, puri, drakma, and cl-geonames
4The open source D2R project provides a wrapper for relational databases that provides a SPARQL query interface.
4
1.5. What if You Use Other Programming Languages Other Than Lisp?
If you do most of your development in other languages like Ruby and Python then
you can run the free server edition using the HTTP/Sesame client protocol. Sesame
is a high quality “batteries included” Java library for Semantic Web development; the
Sesame client protocol is well documented and simple to use but will not be covered
here. If you use the Sesame protocol then you have the flexibility of using both
Franz’s free server edition of AllegroGraph and Sesame which is open source with a
BSD style license.
5
2. AllegroGraph Embedded Lisp
Quick Start
The first section of this book will cover Semantic Web technologies from a theoretical
and reference point of view. Since I want you to follow along with the book material
as I present it, this chapter is intended to get you comfortable using Lisp and embed-
ded AllegroGraph: it will be easier to work through the theory in Chapters 3, 4, and 6
if you understand the basics of AllegroGraph. After this more detailed look at some
theory we will dig deeper into AllegroGraph development techniques in Chapters 7,
8, and 9.
2.1. Starting AllegroGraph
In this chapter and in much of this book, you can save some effort by copying and
pasting the code snippets into the Lisp listener. The code snippets used in this chap-
ter are contained in the source file quick start lisp embedded.lisp. I assume that
most readers are trying AllegroGraph using the free non-commercial use version so
that is what I will use here. If you are using a commercially licensed version the
examples will work the same but the initial banner display by alisp (conventional
case insensitive Lisp shell) and mlisp (“modern” case sensitive Lisp shell) will be
slightly different. While I usually use alisp in my work (I have been using Lisp for
professional development since 1982), Franz recommends using mlisp for Allegro-
Graph development so we will use mlisp in this book. You will need to follow the
directions in acl81 express/readme.txt to build a mlisp image to use. When showing
interactive examples in this chapter I remove some Lisp shell messages so when you
work along with these examples expect to see more output than what is shown here:1
markw$ mlisp
International Allegro CL Free Express Edition
8.2 [Mac OS X (Intel)] (Jul 9, 2009 17:15)
Copyright (C) 1985-2007, Franz Inc., Oakland, CA, USA.
All Rights Reserved.
1I use OS X and Linux for my development. If you are a Windows user, follow the installation instructions on the AllegroGraph download web page and expect to see slight differences to the interactive example sessions that I use in this book.
7
2. AllegroGraph Embedded Lisp Quick Start
This development copy of Allegro CL is licensed to:
Trial User
;; Current reader case mode: :case-sensitive-lower
cl-user(1): (require :agraph)
AllegroGraph Lisp Edition 3.2 [built on March 16, 2009 15:05:15 GMT-0700]
t
cl-user(2): (in-package :db.agraph.user)
#<The db.agraph.user package>
TRIPLE-STORE-USER(3):
Please note that you will see many lines of output that I did not show. Here I
required the :agraph package and changed the current Common Lisp package to
db.agraph.user. In examples later in this book when we develop complete applica-
tion examples we will be using our own application-specific packages and I will show
you then what you need in general to import from db.agraph and db.agraph.user.
We will continue this interactive example Lisp session in the following sections.
I use interactive sessions in a command window for the examples in this book. If you
are a Windows user then you will may want to alternatively try the Windows-specific
IDE. I recommend that OS X, Linux, and Windows users use Emacs to develop Lisp
code.2
If you run Franz Lisp in a terminal shell then I recommend that you start it using
rlwrap. As an example, using OS X and Linux, I create an alias like:
alias lisp=’rlwrap alisp’
Using rlwrap lets you use the up arrow key to rerun previous commands, edit previous
commands, etc.
2.2. Working with RDF Data Stores
RDF data stores provide the services for storing RDF triple data and provide some
means of making queries to identify some subset of the triples in the store. It is
important to keep in mind that the mechanism for maintaining triple stores varies in
different implementations. Triples can be stored in memory, in disk-based btree stores
like BerkeleyDB, in relational databases, and in custom stores like AllegroGraph.
2Franz provides their own Emacs tools: look for instructions for installing ELI. However, I also use the SLIME Emacs Lisp development tools that are compatible with all versions of Lisp that I use: Franz, SBCL, ClozureCL, and Gambit-C Scheme. Franz provides SLIME installation instructions for Franz
Common Lisp
8
2.2. Working with RDF Data Stores
While much of this book is specific to Common Lisp and AllegroGraph, the concepts
that you will learn and experiment with can be useful if you also use other languages
and platforms like Java (Sesame, Jena, OwlAPIs, etc.), Ruby (Redland RDF), etc.
For Java developers Franz offers a Java version of AllegroGraph (implemented in
Lisp with a network interface that also supports Python and Ruby clients) that I cover
in the Java edition of this book.
2.2.1. Creating Repositories
AllegroGraph uses disk-based RDF storage with automatic in-memory caching. For
the examples in this book I will assume that all RDF stores are kept in the temporary
directory /tmp. For deployed systems you will clearly want to use a permanent loca-
tion. For Windows(tm) development you can either change this location or create a
new directory in c:\tmp. In the examples in this book, I assume a Mac OS X, Linux,
or other Unix type file system:
TRIPLE-STORE-USER(3): (create-triple-store
"/tmp/rdfstore_1")
#<db.agraph::triple-db /tmp/rdfstore_1, open @ #x109682>
I hope that you are following along with this running example – you will better un-
derstand this material if you type it into a Lisp shell.
While it is possible to simultaneously work with multiple repositories (and this is
well documented in Franz’s online documentation for the non-free versions of Alle-
groGraph) for all of the tutorials, examples, and sample applications in this book we
need just a single open repository in order to be compatible with the free versions of
AllegroGraph.
We will see in Chapter 3 how to partition RDF triples into different namespaces and
to use existing RDF data and schemas in different namespaces. In the following code
snippet I introduce the AllegroGraph APIs for defining new namespaces and listing
all namespaces defined in the current repository:
TRIPLE-STORE-USER(4): (register-namespace "kb"
"http://knowledgebooks.com/rdfs#")
"http://knowledgebooks.com/rdfs#"
TRIPLE-STORE-USER(5): (display-namespaces)
rdfs => http://www.w3.org/2000/01/rdf-schema#
err => http://www.w3.org/2005/xqt-errors#
fn => http://www.w3.org/2005/xpath-functions#
rdf => http://www.w3.org/1999/02/22-rdf-syntax-ns#
xs => http://www.w3.org/2001/XMLSchema#
9
2. AllegroGraph Embedded Lisp Quick Start
xsd => http://www.w3.org/2001/XMLSchema#
owl => http://www.w3.org/2002/07/owl#
kb => http://knowledgebooks.com/rdfs#
Here I created a new name space that has an abbreviation (or nickname) kb: and
then printed out all registered namespaces. To insure data integrity be sure to call
(close-triple-store) to close an RDF triple store when you are done with it. I leave
the connection open because we will continue to use it in this chapter.
2.2.2. AllegroGraph Lisp Reader Support for RDF
In general, the subject, predicate, and object parts of an RDF triple can be either URIs
or literals.
AllegroGraph provides a Lisp reader macro ! that makes it easier to enter URIs and
literals. For example, the following two URIs are functionally equivalent given the
(register-namespace “kb” ...) in the last section:
<http://knowledgebooks.com/rdfs#containsPerson>
!kb:containsPerson
String literals are also defined using the ! reader macro; for example:
!"Barack Obama"
!"101 Main Street"
2.2.3. Adding Triples
A triple consists of a subject, predicate, and object. We refer to these three values as
symbols :s, :p, and :o when using the AllegroGraph APIs. We saw the use of literals
with the ! Lisp reader macro in the last section. If we need to refer to either a subject,
predicate, or object as a web URI then we use the function resource:
TRIPLE-STORE-USER(15): (resource "http://demo_news/12931")
!<http://demo_news/12931>
TRIPLE-STORE-USER(16): (defvar *demo-article*
(resource
"http://demo_news/12931"))
*demo-article*
TRIPLE-STORE-USER(17): *demo-article*
!<http://demo_news/12931>
10
2.2. Working with RDF Data Stores
The function add-triple takes three arguments for the subject, predicate, and object
in a triple:
TRIPLE-STORE-USER(18): (add-triple *demo-article*
!rdf:type
!kb:article)
1
TRIPLE-STORE-USER(19): (add-triple *demo-article*
!kb:containsPerson
!"Barack Obama")
2
We used a combination of a generated resource, two predicates defined in the rdf:
and kb: namespaces, and a string literal to define two triples. You notice that the
function add-triple returns an integer as its value: this is a unique ID for the newly
created triple.
2.2.4. Fetching Triples by ID
Triples in an AllegroGraph RDF store can be identified by a unique ID; this ID value
is returned as the value of calling add-triple and can be used to fetch a triple:
TRIPLE-STORE-USER(20): (get-triple-by-id 2)
<12931 containsPerson Barack Obama>
TRIPLE-STORE-USER(21): (defvar *triple*
(get-triple-by-id 2))
*triple*
TRIPLE-STORE-USER(22): *triple*
<12931 containsPerson Barack Obama>
We will seldom access triples by ID – we will see shortly how to query a RDF store
to find triples.
2.2.5. Printing Triples
The function print-triple can be used to print a short form of a triple and by adding
the arguments :format :concise we can also print a triple in the NTriple format:
TRIPLE-STORE-USER(23): (print-triple *triple*
:format :concise)
11
2. AllegroGraph Embedded Lisp Quick Start
<4: http://demo_news/12931 kb:containsPerson
Barack Obama>
<12931 containsPerson Barack Obama>
TRIPLE-STORE-USER(24): (print-triple *triple*)
<http://demo_news/12931>
<http://knowledgebooks.com/rdfs#containsPerson>
"Barack Obama" .
<12931 containsPerson Barack Obama>
Function print-triple prints a triple to standard output and returns the triple value in
the short notation. We will see later in Section 2.2.6 how to create something like
a database cursor for iterating through multiple triples that we find by querying a
triple store. For now we will use query function get-triples-list that returns all triples
matching a query in a list. The utility function print-triples prints all triples in a list:
TRIPLE-STORE-USER(27): (print-triples (list *triple*))
<http://demo_news/12931>
<http://knowledgebooks.com/rdfs#containsPerson>
"Barack Obama" .
TRIPLE-STORE-USER(28): (print-triples (get-triples-list))
<http://demo_news/12931>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://knowledgebooks.com/rdfs#article> .
<http://demo_news/12931>
<http://knowledgebooks.com/rdfs#containsPerson>
"Barack Obama" .
When get-triples-list is called with no arguments it simply returns all triples in a data
store. We can specify query matching values for any combination of :s, :p, and :o.
We can look at all triples that have their subject equal to the resource we created for
the demo article:
TRIPLE-STORE-USER(31): (print-triples
(get-triples-list :s *demo-article*))
<http://demo_news/12931>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://knowledgebooks.com/rdfs#article> .
<http://demo_news/12931>
<http://knowledgebooks.com/rdfs#containsPerson>
"Barack Obama" .
We can limit query results further; in this case we add the condition that the object
must equal the value of the type !kb:article:
12
2.2. Working with RDF Data Stores
TRIPLE-STORE-USER(33): (print-triples
(get-triples-list :s *demo-article*
:o !kb:article))
<http://demo_news/12931>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://knowledgebooks.com/rdfs#article> .
I often need to manually reformat program example text and example program output
in this book. The last three lines in the last example would appear on a single line if
you are following along with these tutorial examples in a Lisp listener (as you should
be!). In any case, RDF triple data in the NTriple format that we are using here is
free-format: a triple is defined by three tokens (each with no embedded whitespace
unless inside a string literal) and ended with a period character.
2.2.6. Using Cursors to Iterate Through Query Results
You are probably familiar with relational databases, the SQL query language, and
client libraries that allow you to iterate through very large result sets. Allegrograph
provides a cursor API for doing the same thing, as seen in this example:
TRIPLE-STORE-USER(39): (setq a-cursor (get-triples
:s
*demo-article*))
#<DB.AGRAPH::FILTERED-CURSOR
#<DB.AGRAPH::ROW-CURSOR
#<DB.AGRAPH::TRIPLE-RECORD-FILE @ #x113fd61a> ...
#x11672082>
@ #x1167219a>
TRIPLE-STORE-USER(40): (while (cursor-next-p a-cursor)
; cursor-next returns a vector, not a triple:
(print (cursor-next-row a-cursor)))
<12931 type article>
<12931 containsPerson Barack Obama>
NIL
TRIPLE-STORE-USER(41):
I usually find it simpler to use the get-triples-list API that returns a list of results. I
only use cursors when a query may return hundreds or thousands of results.
13
2. AllegroGraph Embedded Lisp Quick Start
2.2.7. Saving Triple Stores to Disk as XML, N-Triples, and
N3
It is often useful to copy either all triples in data store or triples matching a query to
a flat disk file in N-Triples format:
(with-open-file (output "/tmp/sample.ntriple"
:direction :output
:if-does-not-exist :create)
(print-triples (get-triples-list)
:stream output :format :ntriple))
In this example, I did not use any query filtering when calling get-triples-list so the
entire contents of the data store is written to a local flat file. Note that in this last
example, everything gets read into memory; this could