Version 7, last updated by Robert Isele at October 22, 2011 16:15 UTC
Performance
This page discusses how to achieve maximum performance using Silk and presents a number of experiments.
If you are experiencing performance problems, please follow the best practices below. If you still get a significantly worse performance than achieved in the presented experiments, please contact us so we can have a look at the cause.
Best Practices
Recommended Distance Measures
Silk is extensible and allows arbitrary distance measures to be plugged in. For this reason not all distance measures have the same level of maturity. The following distance measures are recommended to be used in performance critical applications:
- Equality/Inequality
- Levensthein Distance
- Jaccard Distance
- wgs84
- Numeric similarity
Only compare properties of a single language
Many databases such as DBpedia provide labels in multiple languages for each entity. In many use cases it is sufficient to compare only the english labels of the entities in the link specification. In this cases the matching performance can be improved significantly by using the language filter on the properties.
Example:
<Input path="?a/rdfs:label[@lang='en']" />Experiments
All experiments have been executed using Silk 2.5.1 on a 3GHz IntelĀ® Core i7 CPU with 4 cores while the heap has been restricted to 2GB.
Interlinking cities in DBpedia and LinkedGeoData
Input
- 101,928 settlements from DBpedia
- 560,123 settlements from LinkedGeoData
Linkage Rule
<LinkageRule>
<Aggregate type="min">
<Aggregate type="max" required="true" >
<!-- We need two comparators because some resources in LinkedGeoData do not provided an english label -->
<Compare metric="{see results}" threshold="{see results}">
<Input path="?a/rdfs:label[@lang='en']" />
<Input path="?b/rdfs:label[@lang='en']" />
</Compare>
<Compare metric="{see results}" threshold="{see results}">
<Input path="?a/rdfs:label[@lang='en']" />
<Input path="?b/rdfs:label[@lang='']" />
</Compare>
</Aggregate>
<Compare metric="wgs84" threshold="30" required="true">
<Input path="?a/wgs84:geometry" />
<Input path="?b/wgs84:geometry" />
<Param name="unit" value="km"/>
</Compare>
</Aggregate>
</LinkageRule>Results
| Distance Measure | Distance Threshold | Generated Links | Runtime |
|---|---|---|---|
| levenshteinDistance | 1 | 32,728 | 98 s |
| qGrams | 0.1 | 30,969 | 114 s |