Video Active – European Television Heritage online
By Johan Oomen - Netherlands Institute for Sound and Vision, The Netherlands; Vassilis Tzouvaras - National Technical University of Athens, Greece and Marco Rendina - Cinecittà Luce, Italy
Only a few percent of the many millions of hours of archived audiovisual material can be found online, but before audiovisual archives can set up meaningful online services, they must overcome obstacles in intellectual property management, digitisation technologies, metadata standardisation, and source presentation.
Video Active aims to address these challenges to create multilingual access to Europe’s television heritage, providing access to a balanced collection of thousands of videos from 14 archives across Europe, each selected to reflect the cultural and historical similarities and differences of television across the European Union. Complementing this archive is a set of well-defined contextual metadata, allowing the portal to support textual search modes as well as faceted, thematic and timeline-based browsing.
Video Active uses Semantic Web technologies to provide expressive representation of its metadata, mapping heterogeneous metadata schema in the common Video Active schema, sophisticated query services, and interactive presentation modes. Video Active is thus fully compliant with the interoperability specifications of Europeana, the EU’s massive digital library, due to launch in 2010 with links to 10 million cultural items.
Defining user requirements
Video Active worked to understand user requirements to ensure user satisfaction and revisits, using surveys, interviews, usability tests and desk research to help define its technical specifications and architecture. The excellence of the portal was acknowledged during the 2009 Museums and the Web conference, where Video Active won the Best of the Web award.
High level architecture
The Video Active system comprises various modules that manage its entire workflow, including annotating and uploading material, transcoding material, extracting keyframes, and storing and searching for metadata. Each module exploits semantic web technologies, enabling automation, sophisticated query services (based on the SPARQL standard) and semantic interoperability with other digital archives. The metadata are represented using the Resource Description Framework (RDF) and the Simple Knowledge Organizational System (SKOS), and are stored in the Sesame semantic metadata repository. This use of semantic web technologies enables light reasoning services (the use of implicit knowledge through subsumption and equivalence relations), merging/aligning metadata from heterogeneous sources, and sophisticated query facilities using the SPARQL RDF query language. Relational databases and full-text search technologies have also been used where semantic processing is not required. The Video Active metadata are public and can be harvested using OAI-MPH technology.
Storing and querying data, the semantic way
The Video Active metadata schema is based on the Dublin Core set of metadata schema with additional elements where necessary (i.e. genre, English title). The video metadata are generated automatically and represented in an MPEG-7-based schema. The metadata are then transformed in RDF triples and stored in a semantic metadata repository.
The annotation process is either manual or semi-automatic. When semi-automatic, the archives export their metadata using a common XML schema. Elements that cannot be mapped to the Video Active schema are inserted manually using the Web Annotation Tool, which contains a module that transcodes the original format to Flash and Windows Media streaming formats, creates low and medium bit rates for the streaming service, and performs keyframe extraction for thumbnail creation. The Web Annotation Tool produces an XML file that contains metadata, based on Dublin Core, as well as content encoding and key frame extraction information. The XML is then transformed into RDF triples and stored in the Sesame semantic repository. Sesame is an open source Java framework for storing, querying and reasoning with RDF. It allows storage of RDF triples in several storage systems (e.g. Sesame local repository, MySQL database). The use of an ontology language such as RDF that has formal semantics enables rich representation and reasoning services that facilitate sophisticated query, process automation and semantic interoperability. Search and retrieval in Video Active is performed using a combination of structured RDF queries in SeRQL (an optimization of SPARQL query language for Sesame) and full text search queries using the high-performance, full-text search engine library Lucene.
All metadata stored in Sesame are exposed to external systems/archives with the help of an OAI-PMH compliant repository. Distributed OWL/RDF query mechanisms will be employed in a future release.
Multilingual access: using SKOS
Video Active supports eleven languages in four distinct ways.
- Includes localized interfaces for each of the languages covered
- Translates key metadata elements into English, thus providing the platform with a monolingual baseline
- Uses a timeline view to provide a visual overview of milestones in the development of television in Europe
- Includes multilingual controlled vocabularies for Keywords, Genre and Location metadata: the keyword vocabulary stems from the International Press and Telecom Council’s 1500-term thesaurus (translated by the Video Active project into eleven languages); genre vocabulary uses the ESCORT 2007 EBU System of Classification of Radio and Television Programmes; and geographical names use the ISO 3166 English Country Names and Code Elements. A specialized application called ThesauriX handles the translation of these terms and their export to machine-readable XML. To achieve semantic interoperability, the thesaurus taxonomy has been transformed into a semantic web language using the recommended Simple Knowledge Organisation System (SKOS) standard, which is built on top of the RDF language.
Conclusion
Video Active exploits recent advances in Semantic Web technologies to provide sophisticated web services using machine-understandable metadata. Semantic Web technologies such as RDF, SKOS, OWL and SPARQL have be used for the representation, query, presentation and exchange of the Video Active Metadata.
Video Active is a content enrichment project under the eContentPlus programme, and an invited member of EDLnet, the network initiated in 2006 to build consensus to create the European Digital Library. Europeana, a library bringing together hundreds of collections across Europe, has already indexed the data from the Video Active repository www.videoactive.eu
