SemMF - A Semantic Matching Framework

How to Implement New Matchers

Radoslaw Oldakowski
November 2006

 

Integrating new matchers into SemMF consists of three steps:

  1. Creating a matcher class

    First, you have to write your own matcher class which will perform the desired matching operation on two given RDF nodes. The new matcher class must implement a simple Matcher interface which defines the method:

    float calcSim(com.hp.hpl.jena.rdf.model.RDFNode queryNode, com.hp.hpl.jena.rdf.model.RDFNode resNode)

    The first parameter passed to this method represents a property value of a query object (e.g. price of a product) to be matched against the second parameter - a semantically corresponding property value of a resource object. Because query and resource objects are described in form of RDF Graphs, thus property values of both objects are RDF Nodes in these graphs. Note, that each RDF Node may either be a Resource or a Literal. It is up to the matcher implementation to handle these parameters correctly according to their type. The float value returned by this method indicates the similarity score between both parameters and MUST lay in the interval [0,1].

  2. Creating a matcher assembler class

    As next, you have to write a matcher assembler class which will be used by the matching engine at start-up to create an instance of your matcher, given parameters provided in a matching description. Note that, SemMF assemblers utilize Jena Assembler classes. So, the fastest way to write a new matcher assembler is to extend the Jena abstract class AssemblerBase by overriding the open(Assembler a, Resource root, Mode mode) method (see code example for StringMatcherAssembler).

    As a simple example, consider the following snippet of a matching description which specifies that the build-in StringMatcher should be used. Please note the parameter indicating case sensitive matching.

    [] a semmf:NodeMatchingDescription ;
       semmf:matcher _:b1 ;
       semmf:label "foo" ;
       semmf:queryNodePath "(<http://example.org/someResource> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?x)" ;
       semmf:resNodePath "(<#graphEntryURI#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?x)" ;
       semmf:weight "0.3333" .

    _:b1 a semmf:StringMatcher ;
         semmf:caseSensitive "true" .

    If the MatchingDescriptionReader encounteres a Resource holding matcher information (in this example: _:b1) it will be passed to the corresponding macher assembler (in this example StringMatcherAssembler) which retrieves the description of all matcher parameters hanging from the root (_:b1) and returns a new instance of StringMatcher accordingly.

    package de.fuberlin.wiwiss.semmf.engine;

    import de.fuberlin.wiwiss.semmf.vocabulary.MD;
    import com.hp.hpl.jena.assembler.Assembler;
    import com.hp.hpl.jena.assembler.assemblers.AssemblerBase;
    import com.hp.hpl.jena.assembler.Mode;
    import com.hp.hpl.jena.rdf.model.Literal;
    import com.hp.hpl.jena.rdf.model.Resource;

    public class StringMatcherAssembler extends AssemblerBase {

        public Object open (Assembler ignore, Resource matcherNode, Mode ignore2) {

            // default value
            boolean caseSensitive = false;

            try {
                // retrieve value of parameter caseSensitive hanging from the matcherNode
                String cs = ( (Literal) matcherNode.getProperty(MD.caseSensitive).getObject() ).getLexicalForm();

                if (cs.equalsIgnoreCase("true"))
                    caseSensitive = true;
            }
            catch (Exception e) {}

            return new StringMatcher (caseSensitive);
        }
    }

  3. Mapping your matcher to a coresponding assembler class

    Finally, you must provide a mapping of your matcher's URI to the corresponding assembler class using Jena Assembler vocabulary, as shown below:

    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
             xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
             xmlns:owl="http://www.w3.org/2002/07/owl#"
             xmlns:ja="http://jena.hpl.hp.com/2005/11/Assembler#">

       <rdfs:Class rdf:about="http://semmf.ag-nbi.de/vocabulary/1.1/semmf.rdfs#StringMatcher">
          <rdfs:subClassOf rdf:resource="http://jena.hpl.hp.com/2005/11/Assembler#Object"/>
          <ja:assembler>de.fuberlin.wiwiss.semmf.engine.StringMatcherAssembler</ja:assembler>
       </rdfs:Class>

    </rdf:RDF>

    and insert those RDF statements either into your matching description or update the file config\assemblerMappings.rdf storing the mappings for all build-in matchers, which is loaded by the engine at start-up.