Outcomes of the BELIEF workshop on Data Provenance
By Eleni Toli, Michael Pantazoglou and Dimitra Keramida, National Kapodistrian University of Athens, Greece
A hot topic for scientific communities is the challenge of dealing with massive volumes of digital research objects. For such data to be of long term worth, they must be traceable in terms of provenance (data origin) and authenticity (data validity). Achieving this would also extend return on e-Infrastructure investment and increase competitiveness. However, managing provenance and authenticity data also poses significant challenges in terms of formal modelling, storage, and maintenance.
The Athens Core
The 5th BELIEF brainstorming workshop, held in Athens, April 2009, aimed to tackle these issues, furthering discussions started at the 6th eConcertation Meeting in Lyon, November 2008. The Athens workshop led to several tangible results and recommendations, including the need to formally define a minimal set of provenance information and to develop a roadmap towards data provenance. Together, these actions form what is called the “Athens Core”.
The need for high-level definition of a roadmap towards data provenance is part of this “core”; the roadmap would involve Communicable Information Packages (CIPs) and their definition, implementation, testing, real-world application, and so on.
The “Athens Core”, if further elaborated and accepted, could act as a cornerstone for future research and standardisation activities in data provenance and authenticity.
Other important technical outcomes included the need for:
- Abstract definition of data provenance information as a “relationship”, with a graph model
- Layered viewing of provenance information to reflect their granularity and differences in stakeholder perspectives
Recommended actions
The European Commission will play an important role in directing policy in this area, and the workshop recommended they take a number of actions:
- Encourage and fund seminal research to reach a critical mass of provenance offers, thus encouraging investors to adopt it.
- Use digital resources produced by the EU (e.g. Official Journal [OJEU]) as vehicles/pilots for the systematic provision of provenance information.
- Encourage key stakeholders (e.g. EMBL, ECMWF, ESA, CERN) and providers of digital information to include provenance information in their digital assets.
- Set the provision of provenance information as a primary topic in the research agendas of future EU- or nationally funded grid/e-Science/data preservation programmes.
- Fund pilot projects/demonstrations that address provenance within the lifecycle management of domain-specific information.
Thus the workshop recommended that the initial roadmap be further elaborated and refined, aiming to create a revised roadmap that can contribute to and influence future research agendas in data provenance and e-Infrastructures.
A report on the Athens Brainstorming, as well as full presentations, are available in the BELIEF Digital Library.
