Plone as a semantic aggregator

Here is an output of my imagination (no code, sorry, just a speech) : what if a CMS such as Plone could be turned into a universal content aggregator. It would become able to retrieve any properly packaged content/data from the Web and import it so that it can be reused, enhanced, and processed with the help of Plone content management features. As a universal content aggregator, it would be able to « import » (or « aggregate ») any content whatever its structure and semantic may be. Buzzwords ahead : Plone would be a schema-agnostic aggregator. It would be a semantic-enabled aggretor

Example : On site A, beer-lovers gather. Site A’s webmaster has setup a specific data schema for the description of beers, beer flabours, beer makers, beer drinkers, and so on. Since site A is rich in terms of content and its community of users is enthusiastic, plenty of beers have been described there. Then site B, powered by a semantic aggregator (and CMS), is interested in any data regarding beverages and beverages impact on human’s health. So site B retrieves beer data from site A. In fact it retrieves both the description of beer1, beer2, beerdrinker1, … and the description of what a beer is, how data is structured when it describes a beer, what the relationship is between a beer and a beer drinker. So site B now knows many things about beer in general (data structure = schema) and many beers specifically (beers data). All this beer data on site B is presented and handled as specific content types. Site B’s users are now able to handle beer descriptions as content items, to process them through workflows, to rate them, to blog on them, and so on. And finallly to republish site B’s own output in such a way it can be aggregated again from other sites. That would be the definitive birth of the semantic web !

There are many news aggregators (RSSBandit, …) that know how to retrieve news items from remote sites. But they are only able to aggregate news data. They only know one possible schema for retrievable data : the structure of a news item (a title + a link + a description + a date + …). This schema is specified in the (many) RSS standard(s).

But now that CMS such as Plone are equipped with schema management engines (called « Archetypes » for Plone), they are able to learn new data schema specified in XML files. Currently, Plone’s archetypes is able to import any schema specified in the form of an XMI file output by any UML modelizing editor.

But XMI files are not that common on the Web. And the W3C published some information showing that any UML schema (class diagram I mean) is the equivalent of an RDF-S schema. And there even is a testbed converter from RDF-S to XMI. And there even are web directories inventoring existing RDF schemas as RDF-S files. Plus RSS 1.0 is based on RDF. Plus Atom designers designed it in such a way it is easily converted to RDF.

So here is my easy speech (no code) : let’s build an RDF aggregator product from Plone. This product would retrieve any RDF file from any web site. (It would store it in the Plone’s triplestore called ROPE for instance). It would then retrieve the associated RDF-S file (and store it in the same triplestore). It would convert it to an XMI file and import it as an Archetypes content type with the help of the ArchGenXML feature. Then it would import the RDF data as AT items conforming to the newly created AT content type. Here is a diagram summarizing this : Plone as a semantic aggregator

By the way, Gillou (from Ingeniweb) did not wait for my imagination output to propose a similar project. He called it ATXChange. The only differences I see between his proposal and what is said above are, first, that Gillou might not be aware about RDF and RDF-S capabilities (so he might end with a Archetypes-specific aggregator inputting and outputting content to and from Plone sites only) and that Gillou must be able to provide code sooner or later whereas I may not be !

Last but not least : wordpress is somewhat going in the same direction. The semweb community is manifesting some interest in WP structured blogging features. And some plugins are appearing that try to incorporate more RDF features in WP (see also seeAlso).

11 réflexions au sujet de « Plone as a semantic aggregator »

  1. leobard

    you are not alone with this ideas. Michael Zeltner from Vienna and Netalleynetworks are on this. I heard, they code it actually.

  2. Sig

    Fortunately I am definitely not alone. Reinout Van Rees points out the follow-up of the announcement of ATXChange by Gillou : Sidnei da Silva already coded some advanced precursors with his Marshall package for Archetypes (although he has not ran into RDF magics yet). And, as I said, the semantic web community is hoping someone would eventually come up with such a semantic aggregator to go beyond news aggregation right into the « semweb ».

    Thank you, leobard, for your mention of Michael Zeltner. Unfortunately I couldn’t find any online track leading to the project you are mentioning. Do you have some link to share here ?

  3. Gillou

    Yes, my ATXChange (dead born) project was focused on mass import/export of Archetypes based objects and not to play with scemantic aggregators.

    Sidnei Da Silva already made a great job with the Marshall product I wasn’t aware of when I mailed about ATXChange. And Marshall makes all the job ATXChange was supposed to do and more (except the DTD stuff)

    Cheers.

  4. Ping : AkaSig » Blog Archive » From OWL to Plone

  5. Megan's Purses

    Wow. I stumble on this post by accident. A lot had changed from the time you originally wrote this and I guess your last option, WordPress had found it’s place in history :)

  6. Elliot

    Hmmm interesting. Plone is a really good tool and yeh, having it as a generic semantic aggregator would work really cool. Nice thinking. all the best with the coding n stuff. Nice blog too. Cheers.

  7. Jerry

    great stuff, i think that is a really interesting idea, i think it could really change the game.

  8. Manly Jobs

    Yeah the feedfeeder extension capability can be quite powerful, we have recently used it to import a whole lots of badly formatted course data. Raising exceptions and solving collisions where necessary. Your idea for a semantic aggregator is quite interesting but I guess like RSS semantic standards need to provide immediate and obvious benefits for publishers.

  9. Antonio De Marinis

    I like the idea.

    At the European Environment Agency we are already developing packages for Plone which can be used for semantic web and linked data. For example the eea.rdfmarshaller is a Plone product which let plone be able to export any content into RDF. The eea.rdfmarshaller uses introspection of archetypes schema to auto generate RDF. It also generate the OWL for each portal type.

    Moreover the product makes zope/plone respond correctly to RDF browsers and spiders, by accepting the ‘HTTP_ACCEPT’ request header « application/rdf+xml ».

    We do also have other specific packages to import RDF, but this task is a bit more complicated, as it can blow up the ZODB with too many objects and make it slow at the end. We mostly export RDF data and we think it is better to use other databases and software to handle RDF data, e.g. openlink virtuoso. However we are still investigating this area.

    You find the eea.rdfmarshaller package here:
    https://svn.eionet.europa.eu/repositories/Zope/trunk/eea.rdfmarshaller/

    or as an egg:
    http://eggrepo.eea.europa.eu/simple

    there are doc tests to understand how it works. Let me know if you have any comments.

    Cheers

  10. Dave

    @Antonio: Your project looks really interesting. Can you provide a link to the best place to keep up with its development?

Les commentaires sont fermés.