Sig.ma – Live views on the Web of Data
Today we release Sig.ma, Hurray \o/ !
Sig.ma is a pretty advanced application implemented on top of Sindice which gives a very visual and interactive access to the “Web of Data” as a whole. Best thing to do, really, is watch the screencast. Bear the first 60 seconds where I introduce the Web of Data, it’s pretty fast after that.
While the demo is probably.. agreably cool there is more to talk about.
While Sig.ma is by no mean the first data aggregator for the Semantic Web, its contribution is to show that the sum is really bigger than the single parts and exciting possibilities lie in a holistic approach for automatic semistructured data discovery and consolidation.
In Sig.ma, elements such as large scale semantic web indexing, logic reasoning, data aggregation heuristics, pragmatic ontology alignments and, last but not least, user interaction and refinement, all play together to provide entity descriptions which become live, embeddable data mash ups.
An interesting example:
when we first saw the B&W pictures (e.g. see the demo ) pop up automatically the first time we ran Sigma we were really excited: that DERI data had been there forever yet never meaningfully used or integrated.. let alone automatically! That DERI RDF file does no reuse the right URI for people , doesn’t use Inverse Functional Properties such as “emails”, and uses only one of many ways to say “author”.
But here it was! That file was there, discovered automatically and contributing marvelously to the mashup providing information about papers, (including technical reports that would not be listed otherwise) an extra picture, the phone number, a confirmation of the personal homepage, research projects and more.
Note: this doesn’t mean that the DERI file is bad at all actually. It’s simply not unrealistically great, in other words it was created with a realistic effort, the same that we can expect from any data publisher.
There was no way to get that very useful data with classic Semantic Web inference and rule consolidation alone. All it took was instead the mix of semantic web practices and tricks with pragmatic and elements of soft computing (quite basic indeed).
In our opinion it all makes sense and inspires the following thoughts:
- A little semantic might in fact go a long way: no way there could be something comparable to Sig.ma had we not had a large core of semantically structured data (the Web of Data itself). Publish way more please! Be this in whatever format can be consolidated to RDF.
- … it goes in fact even more a long way when the user is involved, and can with pragmatic actions (e.g. “reject” or “approve”) to steer and validate the results.
- For data publishers: just like on the HTML web you can simply care only about your site. If you don’t reuse other people URIs or you don’t put “sameAs” links or you don’t really use the ontology everyone else is using then.. it can work all the same most likely and for most applications!
- .. but overdescribe
Be verbose with your semantic descriptions, more than what you would be for a human. A well described entity will be the best possible “entity” identifier that one could think. It will automatically generate invisible but robust links to others entity descriptions. So dont just write name = fooguy, make sure you expose all you have (and are willing to share) and let aggregation engines use this data to at least do the best consolidation possible. Good descriptions will also make you show more often in semantic aggregations, foster new applications and make people more likely to integrate with you.
- For data consumers: We are working for you really and willing to do the hard work.
This is again very similar to the HTML world. How difficult is to make sense of all the broken HTML out there? Very! How many people have to do it really? just a few, the browser makers. Others can reuse their efforts and concentrate on other aspects. Sig.ma and Sindice are engines that do the hard part for you as a Web of Data developer. We provide open services and open source components (heck, at the end of the week we’re even releasing our index open source , next the reasoning engine). If there is interest and market others will come and there will be more choice
So let me conclude with a good-fortune Sig.ma of Stefan Decker (50 sources, sigma “Stefan Decker” + add info “Stefan Decker DERI”, with a couple of manual sources added or deleted)
And the rest follows from the small FAQ in the Sig.ma about page. Cheers!
Why is this potentially revolutionary?
As appropriate data sources become available (pages annotated with RDFa or Microformats), Sig.ma is in a different league in terms of information richness and precision compared to methods solely based on web text analysis.
Sig.ma can be used by humans and software agents alike to obtain structured data about any entity.
Is Sigma noise free?
Not yet. Sig..ma still employs heuristics for many aspects and has to deal with heterogeneous data in the current Web of Data – a very early stage environment! What we can say however is:
- Sig.ma is interactive and can learn from its usage: when a user deletes a piece of information or a source, Sigma writes it down and that piece of information is less likely to show back at a later time.
- We have deliberately chosen very simple strategies at this point to test the general idea more than advanced strategies: the potential for improvement is tremendous.
- The Web of Data itself is very new: until very recently there was basically no way to see this data in action and markup has been done on a best effort-hacker enthusiastic-leap of faith way. Now that Google and Yahoo are starting to recognize the value of page markup, it is realistic to expect improvements in data coverage and quality.
Why does my phone number/picture/favourite movie not appear?
Pages exposing RDF, RDFa or Microformats will appear. If you or your company want information to be found on the web of data, it is very simple to mark up your HTML using RDFa, then submit it to Sindice. You will find it returned by Sig.ma within 10-15 minutes.
How is Sig.ma built? Can I build applications like Sig.ma?
Sig.ma is enabled by Sindice, an index of the web of data. Thanks to Sindice, Sig.ma can accurately locate sources of web data using not only text but also precise attribute value searches and more. Sindice is alive and growing, constantly finding new information, receiving “pings” and immediately adding new documents etc. Where to start? Please write on our forum.
Sig.ma and Sindice are built at DERI mainly within the OKKaM Project (ICT-215032) but also with the support of the Science Foundation Ireland under Grant No. SFI/02/CE1/I131, of the ROMULUS project(ICT-217031) and the iMP project.
Add your comment