Version 6, last updated by Robert Isele at November 21, 2011 17:55 UTC

Overview

Data sources hold the access parameters to local or remote SPARQL endpoints or RDF files. The defined data sources may later be referred to and used by their ID. Data Sources can be defined using either the API or XML.

Available Data Sources

SPARQL Endpoint Data Source Definitions

For SPARQL endpoints (dataSource type: sparqlEndpoint) the following parameters exist:

Parameter Description Default
file The URI of the SPARQL endpoint.
login Login required for authentication No login
password Password required for authentication No password
instanceList A list of instances to be retrieved. If not given, all instances will be retrieved. Multiple instances can be separated by a space. Retrieve all instances
pageSize Limits each SPARQL query to a fixed amount of results. Silk implements a paging mechanism which translates the pagesize parameter into SPARQL LIMIT and OFFSET clauses. 1000
graph Only retrieve instances from a specific graph.
pauseTime To allow rate-limiting of queries to public SPARQL severs, the pauseTime statement specifies the number of milliseconds to wait in between subsequent queries. 0
retryCount To recover from intermittent SPARQL endpoint connection failures, the retryCount parameter specifies the number of times to retry connecting. 3
retryPause Specifies how long to wait between retries. 1000

Example (XML)

<DataSource id="dbpedia" type="sparqlEndpoint">
  <Param name="endpointURI" value="http://dbpedia.org/sparql" />
  <Param name="retryCount" value="100" />
</DataSource>

Example (Scala API)
Note that all parameters except the endpoint URI are optional and can be left out.

Source("dbpedia", 
  SparqlDataSource(
    endpointURI = "http://dbpedia.org/sparql",
    login= "user",
    password= "password",
    graph= "http://dbpedia.org",
    pageSize = 1000,
    pauseTime = 0,
    retryCount = 3, 
    retryPause = 1000
  )
)

RDF File Data Source Definitions

For RDF files (dataSource type: file) the following parameters exist:

Parameter Description Default
file (mandatory) The location of the RDF file.
format (mandatory) The format of the RDF file. Allowed values: “RDF/XML”, “N-TRIPLE”, “TURTLE”, “TTL”, “N3”

Currently the data set is held in memory.

Example (XML)

<DataSource id="musicbrainz" type="file">
  <Param name="file" value="musicbrainz_dump.nt" />
  <Param name="format" value="N-TRIPLE" />
</DataSource>

Example (Scala API)

Source("musicbrainz",
  FileDataSource(
    file = "musicbrainz_dump.nt",
    format = "N-TRIPLE"
  )  
)