Sindice @ 20+ Millions and Openings

Works on Sindice are proceeding at full speed and so is the indexing of the Semantic Web.

Sindice now indexes over 20+ millions Semantic Web documents (21,5 m as i type) and will index your submitted RDFs in usually less than 30 minutes. This great result is entirely due to the dedication of the Sindice development team.

Some of the geeky bits :-). We have now the version 2 of the indexing pipeline up and running (Renaud, Eyal, Michele).

The Sindice indexing pipeline does a job which is all but trivial. And does it at an amazing speed.

Basically each document is integrated by recursively resolving the URIs of the properties and classes in use, thus calculating a “Web closure” of the explicitly or implicitly imported ontologies. Once this is performed, reasoning happens using RDFS and some OWL ( e.g. FunctionalProperty, TransitiveProperty, sameAs, inverseOf, InverseFunctionalProperty, SymmetricProperty). Sindice has done this for each of the 22 million source independently,in less than 3 weeks (plus the actual indexing and all sort of other processes) on a relatively small cluster (4-6 xeon cores). Not bad? :-)

Thanks to this processing, we can be as precise and complete as possible in solving tasks such computing the IFP index, composing human legible descriptions of documents and powering at best the forthcoming entity based APIs.

Notably, all large datasets (e.g. the huge UniProt) are now proudly processed using our brand new Hadoop based Semantic Sitemap processor, specific courtesy of Holger Stenzhorn who has joined the team last month.

Sindice is Hiring!

In the context of the EU project OKKAM, to start Jan 2008, we are now looking for candidates who’re interested in developing highly scalable and innovative Semantic Web infrastructures and applications. Positions include Interns, Masters, Ph.D, and Postdocs and Scientific Developers.

While we of course highly value academic brilliance, we’re expecially looking for candidates who, like us, believe that it is through clever but hard core software engineering and development that we can make the difference on the Semantic Web.

Successful candidates will be rewarded with top salaries and working conditions.

Written by Giovanni Tummarello on Nov 22, 2007. Post filed under Announcements.

2 comments

  1. Comment by Shantanu on March 4, 2008 

    The work is really interesting. One would have worked on this project even without the salary

  2. Comment by Vinay Kumar on June 11, 2008 

    Im a student of india, i love to work here as an intern. How to apply?

Add your comment

HTML tags allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> .

You can trackback this post from your own site using this URL:
http://blog.sindice.com/2007/11/22/sindice-20-millions-and-openings/trackback/