Sindice Blog
-
Sindice: Developer and Cluster Infrastructure Responsable needed
To further develop Sindice and for a new project for large scale analysis of Web Data, the data intensive infrastructure group ad DERI (yes us) has currently opening for 2 positions:
- Scientific Developer. We’re looking for a highly skilled and motivated individual to help us develop research demonstrators and prototypes. We require advanced and proven experience in Java technology stack. Experience with Lucene, Hadoop and or Ajax development are strong plus.
- Linux Cluster Responsible. We are building a 500 core cluster starting soon, the candidate will be a wiz in building, improving, and managing this, its users and the experiments that it will run.
We offer:
- Higly competent team members to work with. A very supportive environment.
- A work which is stricly connected to research on new topics, new projects and can allow opportunities for experimentation and deployment of new ideas.
- For academia, a very competitive salary.
- Location is the DERI institute in Galway Ireland. Galway is a very pittoresque city in the West of Ireland.
Positions available immediately, salary on application, please write to giovanni.tummarello@deri.org
Update 1/1/2009: These positions have been filled, thanks to all those who responded.
-
Sindice Awarded by Saltlux and ESTC
We’re happy to report that Sindice and the Sindice Team have been awarded recently in two separate occasions.
Sindice was awarded 1st prize at the European Semantic Web
Techology Conference ,Business Idea Contest ( http://www.estc2008.com/index.php/contest ) unfortunately it is not reported on the homepage just yet, but you can find a nice account here.
Stefan Decker presented on behalf of the team and apparently the pitch was very well received in fact. He receved the prize itself during the ESTC gala dinner.In a somehow related news, a few weeks earlier, “Sindice.com: A
Documented-oriented Lookup Index for Open Linked Data” , by E. Oren, R. Delbru, M. Catasta, R. Cyganiak, H. Stenzhorn and G. Tummarello recevied the Saltlux prize for the best 2007 DERI paper.As a team, we wish to thank all for such recognitions.
As it should be clear by now, the Semantic Web is all but an easy thing to get started.
We at Sindice have taken our own route and decided to invest a lot of time in getting a quality infrastructure up for everybody’s use. In an academic setting, this is a somehow bold choice. As we know Academia is mostly based on publications and therefore actual infrastructures (That cost a LOT of time, sweats and have much more uncertainty associated than, say, a conference submission) are generally speaking an investment that few do.
It is therefore particularly nice, and useful in fact, to receive these recognitions. They do and will help us continue our commitment toward a more and more useful Web of Data.
-
Sindice Beta 1 index
The Sindice Beta 1 index is now online.
Apart from the exciting geek wizardries (e.g. the crawling, the reasoning and the indexing are now all performed in pipelined Hadoop jobs), there are plenty of interesting new feature to play with:
- Much improved support for free text queries. Great improvements in the ranking of the results
- Much improved support for structured queries: ask for documents where a value appears specifically at certain properties. Examples: “self confessed researchers” or “who claims to knows Richard“?
- Mixed filtering operations allow restrictions on datasets or classes (e.g. try Washington as a person)
- Microformats! Supports for plenty of new microformats has been added and previous support improved.
- Pings are working fine, so anything pinged to us will show just after a short while.
Please notice, however, that Sindice results are not per se “answers” but instead they’re pointers to documents which should contain the answer you seek (which is on the other hand the strong point of SWSE).
How to turn them into answers? 2 ways:
a) Process the results from your applications. See examples in Java, Ruby and PHP here. This will not only get you the piece of information that you need but will also naturally filter out the possibly non relevant document returned (as Sindice query expressivity is necessarely more limited than that of full SPARQL).You can also process Microformats directly in RDF, just access our cached-reasoned-RDF version of it.
b) Bug us to improve and provide the functionality you need as readily consumable API such as e.g. our Sindice Sioc API (which nicely returns a JSON object, no further processing needed).
Sorry for some issues on the visualization end still not working (e.g. visualization of most microformats) but the to see the data, all you really have to do is hit that “cache” button and you’ll see the bare triple, which is all you wanted anyway right?
. Neat visualization will come.Also some datasets are still missing; just watch the number of RDF sources grow as more large datasets offering semantic sitemaps are indexed into the index.
What to say? We are much looking forward to help you implement your cool web of data applications and enhancements. Just drop us a message here and we’ll be right there with you.
-
An Exciting Hard Hat area
As you might have noticed by the look of the site we’re now in a very exciting transition phase. Here is a short summary of the main news with more posts about the details of it to follow in the next days:
The new look: its for developers!
Now we finally don’t look like Google anymore
. We’re working on making Sindice a great place for developers to go, discover and experiments with querying the web of data.We perfectly know how noisy and creatively complex the Web of Data can be and that’s why we’ve prepared a cosy Web of Data forum where to ask questions and exchange tips.
But what data is out there? See it all in the Web Data Map. (Alpha!!
). Watch it as we update it nightly with new datasources pinged or picked up by the crawlers. Soon to come, the map will also be providing example s of interesting queries that spawn across datasets and entity types.Once you know what you’re looking for, then use our new query language to find the Semantic Sources you need.
STATUS: Less than beta.
but isn’t this exciting? so keep your hard hat on and thanks for bearing with both the issues (report!) and with our hearty enthusiasm is seeing the next generation of the Web coming to life. -
Sindice in use
Sindice is really meant to be used by your project, and for us it could’t be any nicer than seeing this happening more and more
.Thanks to Sergio Fernández, the SWAML project (which converts mailing list archives to RDF using SIOC and friends) now uses Sindice to find the URIs for email authors, using our IFP lookup on their email addresses.Also, Alexandre Passant developed a Drupal plugin that uses Sindice to find an appropriate URIs for each of your post tags.
To interact with Sindice, there is now a small Python library, and even a Sindice module for SWI Prolog.More developers-related information can now be found in our http://www.sindice.com/dev, including the new RPC ping API.
-
Sindice @ 20+ Millions and Openings
Works on Sindice are proceeding at full speed and so is the indexing of the Semantic Web.
Sindice now indexes over 20+ millions Semantic Web documents (21,5 m as i type) and will index your submitted RDFs in usually less than 30 minutes. This great result is entirely due to the dedication of the Sindice development team.
Some of the geeky bits
. We have now the version 2 of the indexing pipeline up and running (Renaud, Eyal, Michele).The Sindice indexing pipeline does a job which is all but trivial. And does it at an amazing speed.
Basically each document is integrated by recursively resolving the URIs of the properties and classes in use, thus calculating a “Web closure” of the explicitly or implicitly imported ontologies. Once this is performed, reasoning happens using RDFS and some OWL ( e.g. FunctionalProperty, TransitiveProperty, sameAs, inverseOf, InverseFunctionalProperty, SymmetricProperty). Sindice has done this for each of the 22 million source independently,in less than 3 weeks (plus the actual indexing and all sort of other processes) on a relatively small cluster (4-6 xeon cores). Not bad?
Thanks to this processing, we can be as precise and complete as possible in solving tasks such computing the IFP index, composing human legible descriptions of documents and powering at best the forthcoming entity based APIs.
Notably, all large datasets (e.g. the huge UniProt) are now proudly processed using our brand new Hadoop based Semantic Sitemap processor, specific courtesy of Holger Stenzhorn who has joined the team last month.
Sindice is Hiring!
In the context of the EU project OKKAM, to start Jan 2008, we are now looking for candidates who’re interested in developing highly scalable and innovative Semantic Web infrastructures and applications. Positions include Interns, Masters, Ph.D, and Postdocs and Scientific Developers.
While we of course highly value academic brilliance, we’re expecially looking for candidates who, like us, believe that it is through clever but hard core software engineering and development that we can make the difference on the Semantic Web.
Successful candidates will be rewarded with top salaries and working conditions.
-
Sindice Search in your Web Browser
We received a couple of requests for this so here it is: an OpenSearch plugin is now available for Sindice.
To start using the plugin, you can either install it directly in your browser or download the plugin xml file and install it manually. As a result, you’ll be able to use Sindice Search (text mode) directly from the browser toolbar.
-
Sindice Beta 0: Say Hello to (most of) the Semantic Web
We are extremely proud to announce today the first beta version of Sindice, a Semantic Web Search Engine centered on indexing and ranking online RDF data sources.
Sindice is focused on scalability: most of the Semantic Web, to the best of our knowledge, is already indexed (currently 11M documents). Sindice includes big datasets like DBLP in RDF, DBPedia, Geonames, Uniprot, plenty of FOAF and more.
Lots of data to index?
Sindice manages big datasets the smart way. You can easily create a Sitemap using the Semantic Sitemap Extension [1] which Sindice then accepts to process huge databases quickly and without overloading your server.
Data changes frequently?
Sindice is live! Ping us and see your Semantic Web source indexed within 15 minutes.
Ontology Import and Reasoning at Web Scale!
Sindice does Web scale, online reasoning: for each source, it will recursively resolve, compose and reason on top of all the related ontologies. Because of this we can accurately index things like document labels, Inverse Functional Properties and others independently for each RDF file published on the Semantic Web.
Applications:
* As an application developer, you can use our scalable API to serendipitously find data on the Semantic Web: find sources and automatically ping us with your data to make sure that it is found by others!
* As a Semantic Web user, it is already pretty useful to find appropriate URIs around. Please read well: no excuses now for minting yet more URIs for SW people:)
Credits:The results are achieved thanks to the ideas and dedication of the Sindice Beta 0 team Eyal Oren, Renaud Delbru, Michele Catasta and Giovanni Tummarello.
Sindice is made possible thanks to the investment from the DERI Galway, supported by the Science Foundation Ireland (www.sfi.ie).
Sindice is available at http://www.sindice.com
-
Weblog launched
We’re happy to announce the launch of this official Sindice weblog. We’ll be updating this space regularly with information about Sindice, our goals, our status, and our progress in indexing Semantic Web resources. Stay tuned…














