Practical Semantic Web and Linked Data Applications by Mark Watson - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

3Uniform Resource Identifiers (URIs) are special in the sense that they (are supposed to) represent unique things or ideas. As we will see in Chapter 10, URIs can also be ”dereferenceable” in that we can treat them as URLs on the web and ”follow” them using HTTP to get additional information about a URI.

4We will model classes (or types) using RDFS and OWL but the difference is that an object in an OO

language is explicitly declared to be a member of a class while a subject URI is considered to be in a class depending only on what properties it has. If we add a property and value to a subject URI then we may immediately change its RDFS or OWL class membership.

5I think that there is some similarity between modeling with RDF and document oriented data stores like MongoDB or CouchDB where any document in the system can have any attribute added at any time.

This is very similar to being able to add additional RDF statements that either add information about a subject URI or add another property and value that somehow narrows the ”meaning” of a subject URI.

27

3. RDF

http://knowledgebooks.com/ontology/#containsPerson

The first part of this URI is considered to be the namespace6 for (what we will

use as a predicate) “containsPerson.” Once we associate an abbreviation like kb

for http://knowledgebooks.com/ontology/ then we can just use the QName (“quick

name”) with the namespace abbreviation; for example:

kb:containsPerson

Being able to define abbreviation prefixes for namespaces makes RDF and RDFS

files shorter and easier to read.

When different RDF triples use this same predicate, this is some assurance to us that

all users of this predicate subscribe to the same meaning. Furthermore, we will see

in Section 4.1 that we can use RDFS to state equivalency between this predicate (in

the namespace http://knowledgebooks.com/ontology/) with predicates represented by

different URIs used in other data sources. In an “artificial intelligence” sense, soft-

ware that we write does not understand a predicate like “containsPerson” in the way

that a human reader can by combining understood common meanings for the words

“contains” and “person” but for many interesting and useful types of applications that

is fine as long as the predicate is used consistently.

Because there are many sources of information about different resources the ability

to define different namespaces and associate them with unique URI prefixes makes it

easier to deal with situations.

A statement in N-Triple format consists of three URIs (or string literals – any combi-

nation) followed by a period to end the statement. While statements are often written

one per line in a source file they can be broken across lines; it is the ending period

which marks the end of a statement. The standard file extension for N-Triple format

files is *.nt and the standard format for N3 format files is *.n3.

My preference is to use N-Triple format files as output from programs that I write to

save data as RDF. I often use either command line tools or the Java Sesame library to

convert N-Triple files to N3 if I will be reading them or even hand editing them. You

will see why I prefer the N3 format when we look at an example:

@prefix kb:

<http://knowledgebooks.com/ontology#> .

<http://news.com/201234 /> kb:containsCountry "China"

.

6You have seen me use the domain knowledgebooks.com several times in examples. I have owned this

domain and used it for business since 1998 and I use it here for convenience. I could just as well

use example.com. That said, the advantage of using my own domain is that I then have the flexibility to make this URI ”dereferenceable” by adding an HTML document using this URI as a URL that

describes what I mean by ”containsPerson.” Even better, I could have my web server look at the request header and return RDF data if the requested content type was ”text/rdf”

28

3.1. RDF Examples in N-Triple and N3 Formats

Here we see the use of an abbreviation prefix “kb:” for the namespace for my com-

pany KnowledgeBooks.com ontologies. The first term in the RDF statement (the

subject) is the URI of a news article. When we want to use a URL as a URI, we

enclose it in angle brackets – as in this example. The second term (the predicate) is

“containsCountry” in the “kb:” namespace. The last item in the statement (the object)

is a string literal “China.” I would describe this RDF statement in English as, “The

news article at URI http://news.com/201234 mentions the country China.”

This was a very simple N3 example which we will expand to show additional features

of the N3 notation. As another example, suppose that this news article also mentions

the USA. Instead of adding a whole new statement like this:

@prefix kb:

<http://knowledgebooks.com/ontology#> .

<http://news.com/201234 /> kb:containsCountry "China"

.

<http://news.com/201234 /> kb:containsCountry "USA"

.

we can combine them using N3 notation. N3 allows us to collapse multiple RDF

statements that share the same subject and optionally the same predicate:

@prefix kb:

<http://knowledgebooks.com/ontology#> .

<http://news.com/201234 /> kb:containsCountry "China" ,

"USA" .

We can also add in additional predicates that use the same subject:

@prefix kb:

<http://knowledgebooks.com/ontology#> .

<http://news.com/201234 /> kb:containsCountry "China" ,

"USA" .

kb:containsOrganization "United Nations" ;

kb:containsPerson "Ban Ki-moon" , "Gordon Brown" ,

"Hu Jintao" , "George W. Bush" ,

"Pervez Musharraf" ,

"Vladimir Putin" ,

"Mahmoud Ahmadinejad" .

This single N3 statement represents ten individual RDF triples. Each section defining

triples with the same subject and predicate have objects separated by commas and

ending with a period. Please note that whatever RDF storage system we use (we will

be using AllegroGraph) it makes no difference if we load RDF as XML, N-Triple, of

N3 format files: internally subject, predicate, and object triples are stored in the same

way and are used in the same way.

29

3. RDF

I promised you that the data in RDF data stores was easy to extend. As an example,

let us assume that we have written software that is able to read online news articles

and create RDF data that captures some of the semantics in the articles. If we extend

our program to also recognize dates when the articles are published, we can simply

reprocess articles and for each article add a triple to our RDF data store using the

N-Triple format to set a publication date7.

<http://news.com/2034 /> kb:datePublished "2008-05-11" .

Furthermore, if we do not have dates for all news articles that is often acceptable

depending on the application.

3.2. The RDF Namespace

You just saw an example of using namespaces when I used my own namespace

<http://knowledgebooks.com/ontology#>.

When you define a name space you can assign any “Quick name” (QName, or ab-

breviation) to the URI that uniquely identifies a namespace if you are using the N3

format.

The RDF namespace <http://www.w3.org/1999/02/22-rdf-syntax-ns#> is usually reg-

istered with the QName rdf: and I will use this convention. The next few sections

show the definitions in the RDF namespace that I use in this book.

3.2.1. rdf:type

The rdf:type property is used to specify the type (or class) of a resource. Notice that

we do not capitalize “type” because by convention we do not capitalize RDF property

names. Here is an example in N3 format (with long lines split to fit the page width):

@prefix rdf:

<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix

kb:

<http://knowledgebooks.com/rdf/publication#> .

<http://demo_news/12931> rdf:type kb:article .

7This example is pedantic since we can apply XML Scehma (XSL) data types to literal string values, this could be more accurately specified as ”2008-05-11”@http://www.w3.org/2001/XMLSchema#date

30

3.3. Dereferenceable URIs

Here we are converting the URL of a news web page to a resource and then defining

a new triple that specifies the web page resource is or type kb:article (again, using the

QName kb: for my knowledgebooks.com namespace).

3.2.2. rdf:Property

The rdf:Property class is, as you might guess from its name, used to describe and

define properties. Notice that “Property” is capitalized because by convention we

capitalize RDF class names.

This is a good place to show how we define new properties, using a previous example:

@prefix

kbcontains:

<http://knowledgebooks.com/rdf/contains#> .

<http://demo_news/12931>

kbcontains:person

"Barack Obama" .

I might make an additional statement about this URI stating that it is a property:

kbcontains:person rdf:type rdf:Property .

When we discuss RDF Schema (RDFS) in Chapter 4 we will see how to create sub-

types and sub-properties.

3.3. Dereferenceable URIs

We have been using URIs as unique identifiers representing either physical objects

(e.g., the moon), locations (e.g., London England), ideas or concepts (e.g., Christian-

ity), etc. Additionally, a URI is dereferenceable if we can follow the URI with a web

browser or software agent to fetch information from the URI. As an example, we

often use the URI

http://xmlns.com/foaf/0.1/Person

to represent the concept of a person. This URI is dereferenceable because if we use

a tool like wget or curl to fetch the content from this URI then we get an HTML

document for the FOAF Vocabulary Specification. Dereferenceable content could

also be a RDFS or OWL document describing the URI, a text document, etc.

31

3. RDF

3.4. RDF Wrap Up

If you read the World Wide Web Consortium’s RDF Primer (highly recommended)

at http://www.w3.org/TR/REC-rdf-syntax/ you will see many other classes and prop-

erties defined that, in my opinion, are often most useful when dealing with XML

serialization of RDF. Using the N-Triple and N3 formats, I find that I usually just use

rdf:type and rdf:Property in my own modeling efforts, along with a few identifiers

defined in the RDFS namespace that we will look at in the next chapter.

An RDF triple has three parts: a subject, predicate, and object.8 By itself, RDF

is good for storing and accessing data but lacks functionality for modeling classes,

defining properties, etc. We will extend RDF with RDF Schema (RDFS) in the next

chapter.

8AllegroGraph also stores a unique integer triple ID and a graph ID for partitioning RDF data and to support graph operations. While using the triple ID and graph ID can be useful, my own preference is to stick with using just what is in the RDF standard.

32

4. RDFS

The World Wide Web Consortium RDF Schema (RDFS) definition can be read at

http://www.w3.org/TR/rdf-schema/ and I recommend that you use this as a reference

because I will discuss only the parts of RDFS that are required for implementing

the examples in this book. The RDFS namespace http://www.w3.org/2000/01/rdf-

schema# is usually registered with the QName rdfs: and I will use this convention1.

4.1. Extending RDF with RDF Schema

RDFS supports the definition of classes and properties based on set inclusion. In

RDFS classes and properties are orthogonal. We will not simply be using properties

to define data attributes for classes – this is different than object modeling and object

oriented programming languages like Java. RDFS is encoded as RDF – the same

syntax.

Because the Semantic Web is intended to be processed automatically by software sys-

tems it is encoded as RDF. There is a problem that must be solved in implementing

and using the Semantic Web: everyone who publishes Semantic Web data is free to

create their own RDF schemas for storing data; for example, there is usually no single

standard RDF schema definition for topics like news stories, stock market data, peo-

ple’s names, organizations, etc. Understanding the difficulty of integrating different

data sources in different formats helps to understand the design decisions behind the

Semantic Web: the designers wanted to make it not only possible but also easy to

use data from different sources that might use similar schema to define properties and

classes. One common usage pattern is using RDFS to define two properties that both

define a person’s last name have the same meaning and that we can combine data that

use different schema.

We will start with an example that also uses dRDFS an is an extension of the example

in the last section. After defining kb: and rdfs: namespace QNames, we add a few

additional RDF statements (that are RDFS):

@prefix kb:

<http://knowledgebooks.com/ontology#> .

1The actual namespace abbreviations that you use have no effect as long as you consistently use whatever QName values you set for URIs in the RDF statements that use the abbreviations.

33

4. RDFS

@prefix rdfs:

<http://www.w3.org/2000/01/rdf-schema#> .

kb:containsCity rdfs:subPropertyOf kb:containsPlace .

kb:containsCountry rdfs:subPropertyOf kb:containsPlace .

kb:containsState rdfs:subPropertyOf kb:containsPlace .

The last three lines that are themselves valid RDF triples declare that:

• The property containsCity is a subproperty of containsPlace.

• The property containsCountry is a subproperty of containsPlace.

• The property containsState is a subproperty of containsPlace.

Why is this useful? For at least two reasons:

• You can query an RDF data store for all triples that use property containsPlace

and also match triples with property equal to containsCity, containsCountry, or

containsState. There may not even be any triples that explicitly use the property

containsPlace.

• Consider a hypothetical case where you are using two different RDF data stores

that use different properties for naming cities: “cityName” and “city.” You

can define “cityName” to be a subproperty of “city” and then write all queries

against the single property name “city.” This removes the necessity to convert

data from different sources to use the same Schema.

In addition to providing a vocabulary for describing properties and class membership

by properties, RDFS is also used for logical inference to infer new triples, combine

data from different RDF data sources, and to allow effective querying of RDF data

stores. We will see examples of more RDFS features in Chapter 5 when we perform

SPARQL queries.

4.2. Modeling with RDFS

While RDFS is not as expressive of a modeling language as the RDFS++2 or OWL,

the combination of RDF and RDFS is usually adequate for many semantic web appli-

cations. Reasoning with and using more expressive modeling languages will require

increasingly more processing time. Combined with the simplicity of RDF and RDFS

it is a good idea to start with less expressive and only “move up the expressivity scale”

as needed.

2RDFS++ is a Franz extension to RDFS that adds some parts of OWL. I cover RDFS++ in some detail in

the Lisp Edition of this book and mention some aspects of RDFS++ in Section 4.3, the Java Edition.

34

4.2. Modeling with RDFS

Here is a short example on using RDFS to extend RDF (assume that my namespace

kb: and the RDFS namespace rdfs: are defined):

kb:Person rdf:type rdfs:Class .

kb:Person rdfs:comment "represents a human" .

kb:Manager rdf:type kb:Person .

kb:Manager rdfs:domain kb:Person .

kb:Engineer rdf:type kb:Person .

kb:Engineer rdfs:domain kb:Person .

Here we see the use of rdfs:comment used to add a human readable comment

to the new class kb:Person. When we define the new classes kb:Manager and

kb:Engineer we make them subclasses of kb:Person instead of the top level super

class rdfs:Class. We will look at examples later in that that demonstrate the utility of

models using class hierarchies and hierarchies of properties – for now it is enough to

introduce the notation.

The rdfs:domain of an rdf:property specifies the class of the subject in a triple

while rdfs:range of an rdf:property specifies the class of the object in a triple. Just

as strongly typed programming languages like Java help catch errors by performing

type analysis, creating (or using existing) good RDFS property and class definitions

helps RDFS, RDFS++, and OWL descriptive logic reasoners to catch modeling and

data definition errors. These definitions also help reasoning systems infer new triples

that are not explicitly defined in a triple data store.

We continue the current example by adding property definitions and then asserting a

triple that is valid given the type and property restrictions that we have defined using

RDFS:

kb:supervisorOf rdfs:domain kb:Manager .

kb:supervisorOf rdfs:range kb:Engineer .

"Mary Johnson" rdf:type kb:Manager .

"John Smith’’ rdf:type kb:Engineer .

"Mary Johnson" kb:supervisorOf "John Smith" .

If I tried to add a triple with “Mary Johnson” and “John Smith” reversed in the last

RFD statement then an RDFS inference/reasoning system could catch the error. This

example is not ideal because I am using string literals as the subjects in triples. In

general, you probably want to define a specific namespace for concrete resources

representing entities like the people in this example.

The property rdfs:subClassOf is used to state that all instances of one class are also

instances of another class. The property rdfs:subPropertyOf is used to state that

35

4. RDFS

all resources related by one property are also related by another; for example, given

the following N3 statements that use string literals as resources to make this example

shorter:

kb:familyMember rdf:type rdf:Property .

kb:ancestorOf rdf:type rdf:Property .

kb:parentOf rdf:type rdf:Property .

kb:ancestorOf rdfs:subPropertyOf kb:familyMember .

kb:parentOf rdfs:subPropertyOf kb:ancestorOf .

"Marry Smith" kb:parentOf "Sam" .

then the following is valid:

"Marry Smith" kb:ancestorOf "Sam" .

"Marry Smith" kb:familyMember "Sam" .

We have just seen that a common use of RDFS is to define additional application or

data-source specific properties and classes in order to express relationships between

resources and the types of resources. Whenever possible you will want to reuse ex-

isting RDFS properties and resources that you find on the web. For instance, in the

last example I defined my own subclass kb:person instead of using the Friend of a

Friend (FOAF) namespace’s definition of person. I did this for pedantic reasons: I

wanted to show you how to define your own classes and properties.

4.3. AllegroGraph RDFS++ Extensions

The unofficial version of RDFS/OWL called RDFS++ is a practical compromise be-

tween DL OWL and RDFS inferencing. AllegroGraph supports the following predi-

cates:

• rdf:type – discussed in Chapter 3

• rdf:property – discussed in Chapter 3

• rdfs:subClassOf – discussed in Chapter 4

• rdfs:range – discussed in Chapter 4

• rdfs:domain – discussed in Chapter 4

• rdfs:subPropertyOf – discussed in Chapter 4

36

4.3. AllegroGraph RDFS++ Extensions

• owl:sameAs

• owl:inverseOf

• owl:TransitiveProperty

We will now discuss owl:sameAs, owl:inverseOf, and owl:TransitiveProperty to

complete the discussion of frequently used RDFS predicates seen earlier in this Chap-

ter.

4.3.1. owl:sameAs

If the same entity is represented by two distinct URIs owl:sameAs can be used to

assert that the URIs refer to the same entity. For example, two different knowledge

sources might might define different URIs in their own namespaces for President

Barack Obama. Rather than translate data from one knowledge source to another it is

simpler to equate the two unique URIs. For example:

kb:BillClinton rdf:type kb:Person .

kb:BillClinton owl:sameAs mynews:WilliamClinton

Then the following can be verified using an RDFS++ or OWL DL capable reasoner:

mynews:WilliamClinton rdf:type kb:Person .

4.3.2. owl:inverseOf

We can use owl:inverseOf to declare that one property is the inverse of another.

:parentOf owl:inverseOf :childOf .

"John Smith" :parentOf "Nellie Smith" .

There is something new in this example: I am using a “default namespace” for :par-

entOf and :childOf. A default namespace is assumed to be application specific and

that no external software agents will need access to resources defined in the default

namespace.

Given the two previous RDF statements we can infer that the following is also true:

"Nellie Smith" :childOf "John Smith" .

37

4. RDFS

4.3.3. owl:TransitiveProperty

As its name implies owl:TransitiveProperty is used to declare that a property is

transitive as the following example shows:

kb:ancestorOf a rdf:Property .

"John Smith"

kb:ancestorOf "Nellie Smith" .

"Nellie Smith" kb:ancestorOf "Billie Smith" .

There is something new in this example: in N3 you can use a as shorthand for

rdf:type. Given the last three RDF statements we can infer that:

"John Smith" : kb:ancestorOf "Billie Smith" .

4.4. RDFS Wrapup

I find that RDFS provides a good compromise: it is simpler to use than the Web On-

tology Language (OWL) and is expressive enough for many linked data applications.

As we have seen, AllegroGraph supports RDFS++ which is RDFS with a few OWL

extensions:

1. rdf:type

2. rdfs:subClassOf

3. rdfs:domain

4. rdfs:range

5. rdfs:subPropertyOf

6. owl:sameAs

7. owl:inverseOf

8. owl:TransitiveProperty

Since I only briefly covered these extensions you may want to read the documentation

on Franz’s web site3.

Sesame supports RDFS ”out of the box” and back end reasoners are available for

Sesame that support OWL4. Sesame is likely to have OWL reasoning built in to the

3http://www.franz.com/agraph/support/learning/Overview-of-RDFS++.html

4You can download SwiftOWLIM or BigOWLIM at http://www.ontotext.com/owlim/ and use either as a

SAIL backend repository to get OWL reasoning capability.

38

4.4. RDFS Wrapup

standard distribution in the future. My advice is to start building applications with

RDF and RDFS with a view to using OWL as the need arises. If you are using

AllegroGraph for your application development then certainly use the RDFS++ ex-

tensions if RDFS is too limited for your applications.

We have been using SPARQL in examples and in the next chapter we will look at

SPARQL in some detail.

39

5. The SPARQL Query Language

SPARQL is a query language used to query RDF data stores. While SPARQL may

initially look like SQL you will see that there are important differences because the

data is graph-based so queries match graph patterns instead SQL’s relational matching

operations. So the syntax is similar but SPARQL queries graph data and SQL queries

relational data in tables.

We have already been using SPARQL queries in examples in this book. I will give you

more introductory material in this chapter before using SPARQL in larger example

programs later in this book.

5.1. Example RDF Data in N3 Format

We will use the N3 format RDF file data/news.n3 for examples in this chapter. We

use the N3 format because it is easier to read and understand. There is an equivalent

N-Triple format file data/news.nt because AllegroGraph does not currently support

loading N3 files. I created these files automatically by spidering Reuters news stories

on the news.yahoo.com web site and automatically extracting named entiti