RDFGRrowth 0.1

User Documentation



NOTE: This documentation is old. Those interested in an update should contact us and ask.

Contents

 

Introduction and Overview

The RDFGrowth algorithm is a P2P algorithm where clients locally "harvest" Semantic Web informations for later use. Since each peer will have a local, "semi completele" db, it will be able to :

All in an overall scalable way since no external burden is generated while "using the semantic web" (as opposed to approaches which distribute and replicate queries).

A good, in depth, description of this philosophy and comparison with other approaches is given in our AcademicPapers section. The same documents are also to be used as detailed reference of how the algorithm works.

While RDFGrowth was made to suit the needs of "Dbin", this section explains how to use RDFGrowth in standalone applications or web sites.

The algorithm works on top of a local Sesame Repository interface (usually a local installation of the Sesame Semantic Web Database http://openrdf.org/). Once an Knowledge Growth Agent is up and running this said database will grow by discovering and importing information found on the other peers about the URI that have been defined interesting (GUED).

What is a GUED

A GUED (Group URIs Exposing Definition), is basically a way to select which URIs we're interested in exchanging in a P2P group.

As an example, in a group dedicated to the discussion surrounding a particular musician, a GUED might specify "select all the URI belonging to his songs, albums, concerts and interviews".

The set of URIs selected by the GUED on each peer will be "exposed" to the group, that is the peer will advertize that they are in fact of interest and will be willing to answer requests from other peers about the "RDF Neighbours" (RDFN, see next section).

When starting a RDFGRowth agent a GUED must be specified. GUEDs are currently implemented as a set of SeRQL queries. SerQL is the RDF query language defined inside the Sesame project, see SeRQL online manual.

If you want to run an RDFGrowth agent in a Java standalone application just refer to the Javadoc, if you are using a servlet container then the GUED must be specified in a simple text (see section using RDFGrowth in a servlet container).

An example of GUED file (again, needed only in the servlet container installation), defining a GUED for a group interested in beers:

Beer's Brainlet GUED
SELECT X FROM {X} <rdfs:subClassOf> {<!http://www.purl.org/net/ontology/beer#Beer>}
SELECT X FROM {X} <rdf:type>  {<!http://www.purl.org/net/ontology/beer#Beer>}
SELECT X FROM {X} <rdfs:subClassOf> {<!http://www.purl.org/net/ontology/beer#Ingredient>}
SELECT X FROM {X} <rdf:type>  {<!http://www.purl.org/net/ontology/beer#Ingredient>}

The first line defines the group name, following lines are SeRQL queries (one per line) to define the URI that will be exposed to the group.

RDFN and the Subgraph of Interest

In short. A RDFN of a URI is composed by all the triples that have it as a subject or object. In case some of these have blank nodes "on the other side", then the RDN is also recursively composed by all the triples until just "ground" (URI or Litterals) node form the "edge" of the RDFN.

The following picture might shed light a bit, otherwise please read the [paper|http://www.dbin.org/twiki/pub/About/WebHome/RDFGROWth_workshopISWC2004.pdf]:


In this sense a RDFN might be called the "blank node closure graph" around a given URI. Please keep in mind that the RDFN definizion and implementation is subject to change in subsequent revisions, e.g. to incorporate reifications and other constructs which might be useful to consider inside the RDFN.

Transport infrastructure

A few, high level, nuts and bolts need to be explained:

At logical, algorithmical level, an agent communicates directly with other agents that are "visible" to him. The "visibility" is given by the KEL driver (Knowledge Exchange Layer) once might choose. A KEL driver implements the few basic functions such as publish or lookup a URI, that are needed for the algorithm, see the paper for details.

A KEL driver could operate by working in a fullly distributed P2P fashion as the algorithm itself is perfect for a Distributed Hash Table (DHT) (e.g. the jxta project). At this implementation point however, the only working KEL drivers follow a simple client server model: An RDFGrowth server is in place of the "network" and takes care of storing the "publishes" and answering the "lookup" queries.

Given the design of the algorithm, multiple servers can be used if needed thus sharing the load.

In order to reach the RDFGrowth server, the KEL driver needs a low level communication layer. The first driver that was developed worked over Jabber. While this worked fine, special server configurations were needed and performance was anyway limited. Giving we wouldnt quickly find an alternative, we decided to brew our own so we developed JaSiMPA (Java Simple Messaging API) for details see its homepage.

Currently, JaSiMPA works connecting to a central server which acts as a message dispatcher and helps clients that are behind firewalls/NATS. This way, overall, the RDFGrowth over JaSiMPA currently works basically in any network environment. (provided outgoing connections are possible, no proxy support at the moment).

Both the RDFGrowth server and the Client side implementations are equal users under the JaSiMPA transport layer point of view. The RDFGrowthServer has a fixed JaSiMPA username (usually "RDFGrowthServer"),so that a KEL RDFGrowth client knows how to reach the RDFserver upon joining the JaSimPa "network".

here an overall schema of the KEL/RDFGrowth Client Server implemented over JaSiMPA:

          AGENT <--RDFGROWTH ALGORITHM->  AGENT
                           |
                           | Knowledge Exchange Layer (KEL)
                           |
                           | 
                  Client Server implementations:
               RDFGrowthClient <---> RDFGrowthServer
                     |              |
                     |     (use)    |
                     |              |
                Jasimpa Client    Jasimpa Client: 
                     |           Jasimpa username="dbinserver"
                     |               |
                     | (connect to)  |
                     |               |
                      JASIMPA SERVER                          

Bottom line: to run a RDFGrowth peer you need an RDFGrowth server and a Jasimpa server running. But luckly: the 2 servers come in a 1 easy package.. so just run

org.dbin.rdfgrowth.knowledgegrowth.transport.JaSiMPA.server.RDFGrowthJasimpaServer

running it as a command line application automatically starts a jasimpa server as well, listening to the default 20002 jasimpa port. The RDFGrowth "Jasimpa username" can be specified via command line switch (--servername my_jasimpa_server) otherwise it is "RDFGrowthServer" by default.

Once the server is up and running it is ready to serve as exchange point for RDFGrowth agents.

There are currently 2 ways to use RDFGrowth in your application:

A demostrative command line application

This release also provide a toy application enabling a simple p2p group sharing documents and references between them. Just run the org.dbin.rdfgrowth.knowledgegrowth.liteclient.LiteDBinClient class and type help:

   help                                           : this menu
   all                                            : list all the known documents
   add DOCUMENTURI                                : adds the document
   addreference ORIGINALDOCUMENT TARGETDOCUMENT   : adds the reference to targetdocument from originaldocument
   references DOCUMENTURI                         : list all the references to the given document
   referenced DOCUMENTURI                         : list all the documents referencing this one
   startagent NAME                                : Starts the knowledge exchange agent. 
                                                    IMPORTANT: the agent must have a unique name!
   stopagent                                      : Stops the knowledge exchange agent.
   startall                                       : Starts a set of knowledge exchange agents, corresponding to the set of
                                                    presetted GUEDs (txt files in 'gueds' directory)

Using simple commands on the console you can add URIs of documents (e.g. http URLs) and refernces to a local Sesame repository. Run multiple clients (also from the same machine) to see how they learn from each other when new information is inserted.

E.g.

from client 1

startagent foo1 _add http://www.dbin.org_/ _addreference http://www.dbin.org/ http://www.w3.org_/

from client 2

startagent foo2 ... time passes... "cool, i have learnt something about http://www.dbin.org/ _referenced http://www.w3.org_/ http://www.dbin.org/

RDFGrowth in you standalone Java application

It's simple. We suggest you take a look at org.dbin.rdfgrowth.knowledgegrowth.liteclient.LiteDBinClient that is, the code of the application we just introduced. Anyway:

First you need to have a local sesame repository. The following snippet will create one for you but of course refer to the Sesame documentation for any serious setup.

private static SesameRepository createSesame() {
   
   LocalRepository sr=null;
   
   System.setProperty("org.xml.sax.driver","org.apache.xerces.parsers.SAXParser");
   SystemConfig sysConfig = SesameServer.getSystemConfig();
   LocalService ls = Sesame.getService();
   RepositoryConfig config=new RepositoryConfig("MyRepository");
   
   config.setWorldReadable(true);
   config.setWorldWriteable(true);
   config.addSail(new SailConfig("org.openrdf.sesame.sailimpl.sync.SyncRdfSchemaRepository"));
   SailConfig sailconf = new SailConfig("org.openrdf.sesame.sailimpl.memory.RdfSchemaRepository");
   
   //sailconf.setParameter("file","db/persistenceFile.sesame");
   sailconf.setParameter("compressFile","yes");
   config.addSail(sailconf);
   try {
      sr = ls.createRepository(config);
   } catch (ConfigurationException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
   }
   sysConfig.addRepositoryConfig(config); 
   SesameServer.setSystemConfig(sysConfig);
   return sr;
}

Once a repository is available you're ready to start a "Knowledge Growth Agent" (KGA)

A Knowledge Growth Agent is responsable for a single instance of the RDFGrowth algorith. It will "hang around" the specified P2P group and syncronize its knowledge, as defined by the GUED concept, see the paper and the GUED documentation, with the other peers. To operate an agent also needs a local Sesame repository, the name of a Knowledge Exchange Layer (KEL) and a configuration object for it.

Parameters in the constructor:

A code snippet starting a knowledge agent in the LiteClient application

public void startKnowledgeAgent(String agentName, String groupName, GUED gued, String jasimpaServerName, int jasimpaServerPort) {
      if (agent==null) {
         JasimpaConfiguration configuration= new JasimpaConfiguration();
         try {
            configuration.OurName="ip" + InetAddress.getLocalHost().getHostAddress().toString().replaceAll("\\.","") + agentName;
            System.out.println("OurName: " + configuration.OurName);
         } catch (UnknownHostException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
         }
         configuration.OurPassword=configuration.OurName+"pwjk";
         configuration.createIfNotExists=true;
   //         configuration.serverHostName="invincible.deit.univpm.it";
         configuration.jasimpaServerHostName=jasimpaServerHostName;
         configuration.jasimpaServerPort=jasimpaServerPort;
         configuration.rdfGrowthServerName=rdfGrowthServerName;
         Class  groupFactory=JasimpaGroupFactory.class;
         agent = new GrowthAgent(
                     agentName,
                     groupFactory,configuration,
                     groupName,
                     gued,
                     sr);   
         System.out.println("user: " + configuration.OurName + " pwd: " + configuration.OurPassword);
         System.out.println("Agent " + agent.getName() + " started on group: " + groupName);
      } 
      if (agent.getState()!=GrowthAgent.STATE_RUNNING) {
         agent.start();
      }      
      
   }

Once the agent has started correctly, you should have your RDF repository syncronized along the topic "described" by the GUED.

Using RDFGrowth in a Servlet Container

Installing RDFGrowth in a servlet container allows it to be controlled over HTTP. First set up a Sesame web installation, see Sesame online user manual.

Once sesame has been put in the webapps dir, use get RDFGrowth-webinst and (Tomcat 5):

If all went well you'll be able to start a RDFGrowth agent just making a call to:

HOSTNAME/sesame/RDFGrowth/startAgent.jsp?repository=repName&name=guedFile_&jserver=jasimpaServerName&jport=jasimpaServerPort.

where

To stop a previously started agent just make an http call to HOSTNAME/sesame/RDFGrowth/stopAgent.jsp?name=guedFile.