Version 9, last updated by Robert Isele at September 07, 2011 15:18 UTC

Introduction

This document describes Silk Single Machine which can be used to generate RDF links on a single machine. The datasets that should be interlinked can either reside on the same machine or on remote machines which are accessed via the SPARQL protocol. Silk – Single Machine provides multithreading and caching. In addition, the performance can be further enhanced using an optional blocking feature.

Using the declarative Silk Link Specification Language (Silk-LSL), data publishers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must fulfill in order to be interlinked. These linkage rules may combine various similarity metrics and can take the graph around data items into account, which is addressed using an RDF path language. Silk accesses the datasets that should be interlinked via the SPARQL protocol and can thus be used against local as well as remote SPARQL endpoints.

The main features of the Silk Single Machine are:

  • Flexible, declarative language for specifying linkage rules
  • Support of RDF link generation (owl:sameAs links as well as other types)
  • Employment in distributed environments (by accessing local and remote SPARQL endpoints)
  • Usable in situations where terms from different vocabularies are mixed and where no consistent RDFS or OWL schemata exist
  • Scalability and high performance through efficient data handling (Silk 2.0 is about 20 times faster than Silk 0.2):
    • Reduction of network load by caching and reusing of SPARQL result sets
    • Multi-threaded computation of the data item comparisons (Over 1 billion comparisons per hour on a Core i7, 4GB RAM)
    • Optional blocking directive which allows users to reduce the number of comparisons on cost of recall, if necessary.

In order to run Silk Single Machine, developers need to:

  1. Have SPARQL access to the datasets that should be interlinked.
  2. Write a link specification. For details refer to the documentation of the Silk Link Specification Language.
  3. Install the Silk framework as described in the Installation and Usage section.

Installation and Usage

Running Silk from the Command Line

In order to use Silk Single Machine, you need:

  1. Silk Link Discovery Framework: Get the most recent version.
  2. Java Runtime Environment: The Silk Link Discovery Framework runs on top of the JVM. Get the most recent JRE.

What to do:

  1. Write a Silk-LSL configuration file to specify which resources should be interlinked.
  2. Run Silk Single Machine:
    java -DconfigFile=<Silk-LSL file> [-DlinkSpec=<Interlink ID>] [-Dthreads=<threads>]  [-DlogQueries=(true/false)] [-Dreload=(true/false)] -jar silk.jar
  3. Review Results: Open the output files designated in the Silk-LSL configuration and review the generated links.

Using the Silk API

In order to use the Silk API, you need:

  1. Silk Link Discovery Framework. Check out the most recent version from the Silk SVN repository.
  2. Java Development Kit The Silk Link Discovery Framework runs on top of the JVM. Get the most recent JDK from http://java.sun.com.
  3. Maven is used for project management and build automation. Get it from: http://maven.apache.org.

What to do:

  1. Write a Silk-LSL configuration file to specify which resources should be interlinked.
  2. Call executeFile on the Silk object.
    Silk.executeFile(configFile, [linkSpecId], [numThreads])
  3. Review Results: Open the output files designated in the Silk-LSL configuration and review the generated links.