3Uniform Resource Identifiers (URIs) are special in the sense that they (are supposed to) represent unique things or ideas. As we will see in Chapter 10, URIs can also be ”dereferenceable” in that we can treat them as URLs on the web and ”follow” them using HTTP to get additional information about a URI.
4We will model classes (or types) using RDFS and OWL but the difference is that an object in an OO
language is explicitly declared to be a member of a class while a subject URI is considered to be in a class depending only on what properties it has. If we add a property and value to a subject URI then we may immediately change its RDFS or OWL class membership.
5I think that there is some similarity between modeling with RDF and document oriented data stores like MongoDB or CouchDB where any document in the system can have any attribute added at any time.
This is very similar to being able to add additional RDF statements that either add information about a subject URI or add another property and value that somehow narrows the ”meaning” of a subject URI.
27
3. RDF
http://knowledgebooks.com/ontology/#containsPerson
The first part of this URI is considered to be the namespace6 for (what we will
use as a predicate) “containsPerson.” Once we associate an abbreviation like kb
for http://knowledgebooks.com/ontology/ then we can just use the QName (“quick
name”) with the namespace abbreviation; for example:
kb:containsPerson
Being able to define abbreviation prefixes for namespaces makes RDF and RDFS
files shorter and easier to read.
When different RDF triples use this same predicate, this is some assurance to us that
all users of this predicate subscribe to the same meaning. Furthermore, we will see
in Section 4.1 that we can use RDFS to state equivalency between this predicate (in
the namespace http://knowledgebooks.com/ontology/) with predicates represented by
different URIs used in other data sources. In an “artificial intelligence” sense, soft-
ware that we write does not understand a predicate like “containsPerson” in the way
that a human reader can by combining understood common meanings for the words
“contains” and “person” but for many interesting and useful types of applications that
is fine as long as the predicate is used consistently.
Because there are many sources of information about different resources the ability
to define different namespaces and associate them with unique URI prefixes makes it
easier to deal with situations.
A statement in N-Triple format consists of three URIs (or string literals – any combi-
nation) followed by a period to end the statement. While statements are often written
one per line in a source file they can be broken across lines; it is the ending period
which marks the end of a statement. The standard file extension for N-Triple format
files is *.nt and the standard format for N3 format files is *.n3.
My preference is to use N-Triple format files as output from programs that I write to
save data as RDF. I often use either command line tools or the Java Sesame library to
convert N-Triple files to N3 if I will be reading them or even hand editing them. You
will see why I prefer the N3 format when we look at an example:
@prefix kb:
<http://knowledgebooks.com/ontology#> .
<http://news.com/201234 /> kb:containsCountry "China"
.
6You have seen me use the domain knowledgebooks.com several times in examples. I have owned this
domain and used it for business since 1998 and I use it here for convenience. I could just as well
use example.com. That said, the advantage of using my own domain is that I then have the flexibility to make this URI ”dereferenceable” by adding an HTML document using this URI as a URL that
describes what I mean by ”containsPerson.” Even better, I could have my web server look at the request header and return RDF data if the requested content type was ”text/rdf”
28
3.1. RDF Examples in N-Triple and N3 Formats
Here we see the use of an abbreviation prefix “kb:” for the namespace for my com-
pany KnowledgeBooks.com ontologies. The first term in the RDF statement (the
subject) is the URI of a news article. When we want to use a URL as a URI, we
enclose it in angle brackets – as in this example. The second term (the predicate) is
“containsCountry” in the “kb:” namespace. The last item in the statement (the object)
is a string literal “China.” I would describe this RDF statement in English as, “The
news article at URI http://news.com/201234 mentions the country China.”
This was a very simple N3 example which we will expand to show additional features
of the N3 notation. As another example, suppose that this news article also mentions
the USA. Instead of adding a whole new statement like this:
@prefix kb:
<http://knowledgebooks.com/ontology#> .
<http://news.com/201234 /> kb:containsCountry "China"
.
<http://news.com/201234 /> kb:containsCountry "USA"
.
we can combine them using N3 notation. N3 allows us to collapse multiple RDF
statements that share the same subject and optionally the same predicate:
@prefix kb:
<http://knowledgebooks.com/ontology#> .
<http://news.com/201234 /> kb:containsCountry "China" ,
"USA" .
We can also add in additional predicates that use the same subject:
@prefix kb:
<http://knowledgebooks.com/ontology#> .
<http://news.com/201234 /> kb:containsCountry "China" ,
"USA" .
kb:containsOrganization "United Nations" ;
kb:containsPerson "Ban Ki-moon" , "Gordon Brown" ,
"Hu Jintao" , "George W. Bush" ,
"Pervez Musharraf" ,
"Vladimir Putin" ,
"Mahmoud Ahmadinejad" .
This single N3 statement represents ten individual RDF triples. Each section defining
triples with the same subject and predicate have objects separated by commas and
ending with a period. Please note that whatever RDF storage system we use (we will
be using AllegroGraph) it makes no difference if we load RDF as XML, N-Triple, of
N3 format files: internally subject, predicate, and object triples are stored in the same
way and are used in the same way.
29
3. RDF
I promised you that the data in RDF data stores was easy to extend. As an example,
let us assume that we have written software that is able to read online news articles
and create RDF data that captures some of the semantics in the articles. If we extend
our program to also recognize dates when the articles are published, we can simply
reprocess articles and for each article add a triple to our RDF data store using the
N-Triple format to set a publication date7.
<http://news.com/2034 /> kb:datePublished "2008-05-11" .
Furthermore, if we do not have dates for all news articles that is often acceptable
depending on the application.
3.2. The RDF Namespace
You just saw an example of using namespaces when I used my own namespace
<http://knowledgebooks.com/ontology#>.
When you define a name space you can assign any “Quick name” (QName, or ab-
breviation) to the URI that uniquely identifies a namespace if you are using the N3
format.
The RDF namespace <http://www.w3.org/1999/02/22-rdf-syntax-ns#> is usually reg-
istered with the QName rdf: and I will use this convention. The next few sections
show the definitions in the RDF namespace that I use in this book.
3.2.1. rdf:type
The rdf:type property is used to specify the type (or class) of a resource. Notice that
we do not capitalize “type” because by convention we do not capitalize RDF property
names. Here is an example in N3 format (with long lines split to fit the page width):
@prefix rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix
kb:
<http://knowledgebooks.com/rdf/publication#> .
<http://demo_news/12931> rdf:type kb:article .
7This example is pedantic since we can apply XML Scehma (XSL) data types to literal string values, this could be more accurately specified as ”2008-05-11”@http://www.w3.org/2001/XMLSchema#date
30
3.3. Dereferenceable URIs
Here we are converting the URL of a news web page to a resource and then defining
a new triple that specifies the web page resource is or type kb:article (again, using the
QName kb: for my knowledgebooks.com namespace).
3.2.2. rdf:Property
The rdf:Property class is, as you might guess from its name, used to describe and
define properties. Notice that “Property” is capitalized because by convention we
capitalize RDF class names.
This is a good place to show how we define new properties, using a previous example:
@prefix
kbcontains:
<http://knowledgebooks.com/rdf/contains#> .
<http://demo_news/12931>
kbcontains:person
"Barack Obama" .
I might make an additional statement about this URI stating that it is a property:
kbcontains:person rdf:type rdf:Property .
When we discuss RDF Schema (RDFS) in Chapter 4 we will see how to create sub-
types and sub-properties.
3.3. Dereferenceable URIs
We have been using URIs as unique identifiers representing either physical objects
(e.g., the moon), locations (e.g., London England), ideas or concepts (e.g., Christian-
ity), etc. Additionally, a URI is dereferenceable if we can follow the URI with a web
browser or software agent to fetch information from the URI. As an example, we
often use the URI
http://xmlns.com/foaf/0.1/Person
to represent the concept of a person. This URI is dereferenceable because if we use
a tool like wget or curl to fetch the content from this URI then we get an HTML
document for the FOAF Vocabulary Specification. Dereferenceable content could
also be a RDFS or OWL document describing the URI, a text document, etc.
31
3. RDF
3.4. RDF Wrap Up
If you read the World Wide Web Consortium’s RDF Primer (highly recommended)
at http://www.w3.org/TR/REC-rdf-syntax/ you will see many other classes and prop-
erties defined that, in my opinion, are often most useful when dealing with XML
serialization of RDF. Using the N-Triple and N3 formats, I find that I usually just use
rdf:type and rdf:Property in my own modeling efforts, along with a few identifiers
defined in the RDFS namespace that we will look at in the next chapter.
An RDF triple has three parts: a subject, predicate, and object.8 By itself, RDF
is good for storing and accessing data but lacks functionality for modeling classes,
defining properties, etc. We will extend RDF with RDF Schema (RDFS) in the next
chapter.
8AllegroGraph also stores a unique integer triple ID and a graph ID for partitioning RDF data and to support graph operations. While using the triple ID and graph ID can be useful, my own preference is to stick with using just what is in the RDF standard.
32
4. RDFS
The World Wide Web Consortium RDF Schema (RDFS) definition can be read at
http://www.w3.org/TR/rdf-schema/ and I recommend that you use this as a reference
because I will discuss only the parts of RDFS that are required for implementing
the examples in this book. The RDFS namespace http://www.w3.org/2000/01/rdf-
schema# is usually registered with the QName rdfs: and I will use this convention1.
4.1. Extending RDF with RDF Schema
RDFS supports the definition of classes and properties based on set inclusion. In
RDFS classes and properties are orthogonal. We will not simply be using properties
to define data attributes for classes – this is different than object modeling and object
oriented programming languages like Java. RDFS is encoded as RDF – the same
syntax.
Because the Semantic Web is intended to be processed automatically by software sys-
tems it is encoded as RDF. There is a problem that must be solved in implementing
and using the Semantic Web: everyone who publishes Semantic Web data is free to
create their own RDF schemas for storing data; for example, there is usually no single
standard RDF schema definition for topics like news stories, stock market data, peo-
ple’s names, organizations, etc. Understanding the difficulty of integrating different
data sources in different formats helps to understand the design decisions behind the
Semantic Web: the designers wanted to make it not only possible but also easy to
use data from different sources that might use similar schema to define properties and
classes. One common usage pattern is using RDFS to define two properties that both
define a person’s last name have the same meaning and that we can combine data that
use different schema.
We will start with an example that also uses dRDFS an is an extension of the example
in the last section. After defining kb: and rdfs: namespace QNames, we add a few
additional RDF statements (that are RDFS):
@prefix kb:
<http://knowledgebooks.com/ontology#> .
1The actual namespace abbreviations that you use have no effect as long as you consistently use whatever QName values you set for URIs in the RDF statements that use the abbreviations.
33
4. RDFS
@prefix rdfs:
<http://www.w3.org/2000/01/rdf-schema#> .
kb:containsCity rdfs:subPropertyOf kb:containsPlace .
kb:containsCountry rdfs:subPropertyOf kb:containsPlace .
kb:containsState rdfs:subPropertyOf kb:containsPlace .
The last three lines that are themselves valid RDF triples declare that:
• The property containsCity is a subproperty of containsPlace.
• The property containsCountry is a subproperty of containsPlace.
• The property containsState is a subproperty of containsPlace.
Why is this useful? For at least two reasons:
• You can query an RDF data store for all triples that use property containsPlace
and also match triples with property equal to containsCity, containsCountry, or
containsState. There may not even be any triples that explicitly use the property
containsPlace.
• Consider a hypothetical case where you are using two different RDF data stores
that use different properties for naming cities: “cityName” and “city.” You
can define “cityName” to be a subproperty of “city” and then write all queries
against the single property name “city.” This removes the necessity to convert
data from different sources to use the same Schema.
In addition to providing a vocabulary for describing properties and class membership
by properties, RDFS is also used for logical inference to infer new triples, combine
data from different RDF data sources, and to allow effective querying of RDF data
stores. We will see examples of more RDFS features in Chapter 5 when we perform
SPARQL queries.
4.2. Modeling with RDFS
While RDFS is not as expressive of a modeling language as the RDFS++2 or OWL,
the combination of RDF and RDFS is usually adequate for many semantic web appli-
cations. Reasoning with and using more expressive modeling languages will require
increasingly more processing time. Combined with the simplicity of RDF and RDFS
it is a good idea to start with less expressive and only “move up the expressivity scale”
as needed.
2RDFS++ is a Franz extension to RDFS that adds some parts of OWL. I cover RDFS++ in some detail in
the Lisp Edition of this book and mention some aspects of RDFS++ in Section 4.3, the Java Edition.
34
4.2. Modeling with RDFS
Here is a short example on using RDFS to extend RDF (assume that my namespace
kb: and the RDFS namespace rdfs: are defined):
kb:Person rdf:type rdfs:Class .
kb:Person rdfs:comment "represents a human" .
kb:Manager rdf:type kb:Person .
kb:Manager rdfs:domain kb:Person .
kb:Engineer rdf:type kb:Person .
kb:Engineer rdfs:domain kb:Person .
Here we see the use of rdfs:comment used to add a human readable comment
to the new class kb:Person. When we define the new classes kb:Manager and
kb:Engineer we make them subclasses of kb:Person instead of the top level super
class rdfs:Class. We will look at examples later in that that demonstrate the utility of
models using class hierarchies and hierarchies of properties – for now it is enough to
introduce the notation.
The rdfs:domain of an rdf:property specifies the class of the subject in a triple
while rdfs:range of an rdf:property specifies the class of the object in a triple. Just
as strongly typed programming languages like Java help catch errors by performing
type analysis, creating (or using existing) good RDFS property and class definitions
helps RDFS, RDFS++, and OWL descriptive logic reasoners to catch modeling and
data definition errors. These definitions also help reasoning systems infer new triples
that are not explicitly defined in a triple data store.
We continue the current example by adding property definitions and then asserting a
triple that is valid given the type and property restrictions that we have defined using
RDFS:
kb:supervisorOf rdfs:domain kb:Manager .
kb:supervisorOf rdfs:range kb:Engineer .
"Mary Johnson" rdf:type kb:Manager .
"John Smith’’ rdf:type kb:Engineer .
"Mary Johnson" kb:supervisorOf "John Smith" .
If I tried to add a triple with “Mary Johnson” and “John Smith” reversed in the last
RFD statement then an RDFS inference/reasoning system could catch the error. This
example is not ideal because I am using string literals as the subjects in triples. In
general, you probably want to define a specific namespace for concrete resources
representing entities like the people in this example.
The property rdfs:subClassOf is used to state that all instances of one class are also
instances of another class. The property rdfs:subPropertyOf is used to state that
35
4. RDFS
all resources related by one property are also related by another; for example, given
the following N3 statements that use string literals as resources to make this example
shorter:
kb:familyMember rdf:type rdf:Property .
kb:ancestorOf rdf:type rdf:Property .
kb:parentOf rdf:type rdf:Property .
kb:ancestorOf rdfs:subPropertyOf kb:familyMember .
kb:parentOf rdfs:subPropertyOf kb:ancestorOf .
"Marry Smith" kb:parentOf "Sam" .
then the following is valid:
"Marry Smith" kb:ancestorOf "Sam" .
"Marry Smith" kb:familyMember "Sam" .
We have just seen that a common use of RDFS is to define additional application or
data-source specific properties and classes in order to express relationships between
resources and the types of resources. Whenever possible you will want to reuse ex-
isting RDFS properties and resources that you find on the web. For instance, in the
last example I defined my own subclass kb:person instead of using the Friend of a
Friend (FOAF) namespace’s definition of person. I did this for pedantic reasons: I
wanted to show you how to define your own classes and properties.
4.3. AllegroGraph RDFS++ Extensions
The unofficial version of RDFS/OWL called RDFS++ is a practical compromise be-
tween DL OWL and RDFS inferencing. AllegroGraph supports the following predi-
cates:
• rdf:type – discussed in Chapter 3
• rdf:property – discussed in Chapter 3
• rdfs:subClassOf – discussed in Chapter 4
• rdfs:range – discussed in Chapter 4
• rdfs:domain – discussed in Chapter 4
• rdfs:subPropertyOf – discussed in Chapter 4
36
4.3. AllegroGraph RDFS++ Extensions
• owl:sameAs
• owl:inverseOf
• owl:TransitiveProperty
We will now discuss owl:sameAs, owl:inverseOf, and owl:TransitiveProperty to
complete the discussion of frequently used RDFS predicates seen earlier in this Chap-
ter.
4.3.1. owl:sameAs
If the same entity is represented by two distinct URIs owl:sameAs can be used to
assert that the URIs refer to the same entity. For example, two different knowledge
sources might might define different URIs in their own namespaces for President
Barack Obama. Rather than translate data from one knowledge source to another it is
simpler to equate the two unique URIs. For example:
kb:BillClinton rdf:type kb:Person .
kb:BillClinton owl:sameAs mynews:WilliamClinton
Then the following can be verified using an RDFS++ or OWL DL capable reasoner:
mynews:WilliamClinton rdf:type kb:Person .
4.3.2. owl:inverseOf
We can use owl:inverseOf to declare that one property is the inverse of another.
:parentOf owl:inverseOf :childOf .
"John Smith" :parentOf "Nellie Smith" .
There is something new in this example: I am using a “default namespace” for :par-
entOf and :childOf. A default namespace is assumed to be application specific and
that no external software agents will need access to resources defined in the default
namespace.
Given the two previous RDF statements we can infer that the following is also true:
"Nellie Smith" :childOf "John Smith" .
37
4. RDFS
4.3.3. owl:TransitiveProperty
As its name implies owl:TransitiveProperty is used to declare that a property is
transitive as the following example shows:
kb:ancestorOf a rdf:Property .
"John Smith"
kb:ancestorOf "Nellie Smith" .
"Nellie Smith" kb:ancestorOf "Billie Smith" .
There is something new in this example: in N3 you can use a as shorthand for
rdf:type. Given the last three RDF statements we can infer that:
"John Smith" : kb:ancestorOf "Billie Smith" .
4.4. RDFS Wrapup
I find that RDFS provides a good compromise: it is simpler to use than the Web On-
tology Language (OWL) and is expressive enough for many linked data applications.
As we have seen, AllegroGraph supports RDFS++ which is RDFS with a few OWL
extensions:
1. rdf:type
2. rdfs:subClassOf
3. rdfs:domain
4. rdfs:range
5. rdfs:subPropertyOf
6. owl:sameAs
7. owl:inverseOf
8. owl:TransitiveProperty
Since I only briefly covered these extensions you may want to read the documentation
on Franz’s web site3.
Sesame supports RDFS ”out of the box” and back end reasoners are available for
Sesame that support OWL4. Sesame is likely to have OWL reasoning built in to the
3http://www.franz.com/agraph/support/learning/Overview-of-RDFS++.html
4You can download SwiftOWLIM or BigOWLIM at http://www.ontotext.com/owlim/ and use either as a
SAIL backend repository to get OWL reasoning capability.
38
4.4. RDFS Wrapup
standard distribution in the future. My advice is to start building applications with
RDF and RDFS with a view to using OWL as the need arises. If you are using
AllegroGraph for your application development then certainly use the RDFS++ ex-
tensions if RDFS is too limited for your applications.
We have been using SPARQL in examples and in the next chapter we will look at
SPARQL in some detail.
39
5. The SPARQL Query Language
SPARQL is a query language used to query RDF data stores. While SPARQL may
initially look like SQL you will see that there are important differences because the
data is graph-based so queries match graph patterns instead SQL’s relational matching
operations. So the syntax is similar but SPARQL queries graph data and SQL queries
relational data in tables.
We have already been using SPARQL queries in examples in this book. I will give you
more introductory material in this chapter before using SPARQL in larger example
programs later in this book.
5.1. Example RDF Data in N3 Format
We will use the N3 format RDF file data/news.n3 for examples in this chapter. We
use the N3 format because it is easier to read and understand. There is an equivalent
N-Triple format file data/news.nt because AllegroGraph does not currently support
loading N3 files. I created these files automatically by spidering Reuters news stories
on the news.yahoo.com web site and automatically extracting named entiti