Report and Recommendations to HEFCE 19th December 2008

0

The UK Research Data Service Feasibility Study

TABLE OF CONTENTS EXECUTIVE SUMMARY .......................................................................................... 3 1.

INTRODUCTION ............................................................................................. 8

2.

STUDY METHODOLOGY APPROACH .......................................................... 9

3.

FEASIBILITY STUDY .................................................................................... 11 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15

4.

High-level analysis of the potential shared service Project partners Analysis of costs of current provision Appropriate structures for a National Research Data Service Assessment of costs, benefits and payback Analysis of potential to scale up the service Non-financial benefits Potential constraints to the development of UKRDS Proposed corporate structure of UKRDS Vat implications and proposals Learning from the project and plans for further dissemination Analysis of potential to extend beyond the HE sector Success criteria Risk analysis Engagement with the Corporate Social Responsibility agenda

11 24 25 27 34 34 35 35 36 36 36 37 37 38 39

BUSINESS CASE AND PLAN ...................................................................... 40 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10

Project and service objectives Benefits of UKRDS Development plans Estimated costs Costs / Benefits analysis Taxation Funding propositions Assessement of Sustainablity Success criteria, constraints and dependencies Key risks and mitigating actions

40 40 45 48 50 50 51 52 53 53

5.

OUTLINE GOVERNANCE AND MANAGEMENT PROPOSALS .................. 55

6.

CONCLUSIONS, RECOMMENDATIONS AND NEXT STEPS ...................... 56

7.

REFERENCES .............................................................................................. 57

8.

GLOSSARY................................................................................................... 59

1

The UK Research Data Service Feasibility Study

APPENDICES Appendices to the Feasibility Study A1: Stakeholders list and examples A2: Summary of research data policies of research funders A3: Roles and responsibilities for research data

1 2 5

Appendices to the Business Case and Plan B1: UKRDS ‘Line of Sight’ route-map and High-level Plan B2: Work packages desciptions B3: Detailed plans and resource costs B4: Detailed benefits analysis

8 9 16 17

2

The UK Research Data Service Feasibility Study EXECUTIVE SUMMARY

The Challenge The volume and complexity of digital research data are increasing significantly. Studies over the last ten years1 have shown that: • • • • • •

Research data remains a substantially untapped resource beyond the originators in many disciplines Management of research data is not consistent or coherent and only a minority of researchers have access to national or international facilities Provision of skills for data curation support and training for researchers is under-developed Research data is often unstructured and inaccessible to others There is no consistency of policy or practice across funders and disciplines Pressure is growing on HEI library and IT organisations to assist researchers with these issues and it is unlikely that the necessary data curation capacity can be provided locally.

Moreover the UK Government in producing “Science and Innovation Investment Framework 20042014”2 argued that the UK research base must have ready and efficient access to digital information of all kinds. Similar views have been expressed internationally, notably in the USA and Australia, where decisions have already been made to develop regional and/or national services for research data management. The UK is fortunate to have a number of important building blocks in place and a robust technological infrastructure which will facilitate the setting up of a coherent national approach to the issues. These include significant investments by the Funding Councils in JISC activities in particular, for example JISC Services and Collections, JANET, federated access management, the Digital Curation Centre and institutional repositories. Such services are complemented on the other side of the dual support system by such facilities as the data centres run by ESRC, NERC and STFC. While the jointly funded Research Information Network is providing an increasingly important evidence base of researchers’ needs.

The Aims of the UKRDS Feasibility Study The study was set up to test the feasibility of a national shared service for managing research data. This would build on existing investment and good practice, fill gaps and develop capacity for the long term. A successful UKRDS should: • • • • • • • •

Advocate practical data management methods for researchers and funders Co-ordinate dataset management in accordance with protocols established between data providers, data users and data storage and curation facilities Enable reliable data discovery Provide access to tools and expertise Be a focus for the development of policies and standards Provide access to effective training services and materials Engage existing facilities and expertise Be more cost effective as a shared service than institutions acting independently

1

E.g. Lievesley and Jones 1998, Lord and MacDonald 2003, Tessella 2006, OSI 2006, Lyons 2007

2

HMSO 2004 3

The UK Research Data Service Feasibility Study The Methodology and Key Milestones for the Study The study involved three key areas of work: 1. The assessment of researchers’ requirements through the use of questionnaires, interviews and workshops in the four case study Universities 3 2. Engagement with a wide range of stakeholder groups including major funding bodies, archives and libraries, existing facility and service providers and others, including those involved in this work internationally. 3. Wide ranging desk research aimed particularly at assessing the UK provision in an international context. 4

The study was set up as a project using the principles of the PRINCE2 methodology with a Steering Committee comprising senior representatives of stakeholder bodies and a Project Management Board of senior IT and library staff, recognising the roles of the Russell Group IT Directors (RUGIT) and Research Libraries UK (RLUK) as project sponsors. The project management board appointed Serco Consulting to carry out the feasibility study. The key milestones in the project plan were:

Description

Milestone delivery date

1

Project initiation and mobilisation

End March 08

2

Initial workshop options long-list Feasibility study analysis and interim report

End April 2008 01/8/2008

3

Development of business plan for proposed viable options

15/8/2008

4

Shared service vehicle design, governance definition and management structuring

10/10/2008

5

Business plan update, final reporting and presentations

28/11/2008

Key Findings The case study work involved consultation with groups representing approximately 700 researchers and showed a number of issues: • • • • • • •

3

An increasing number of disciplines are producing electronic research data; There is an acknowledged difficulty for researchers in retaining or managing research data beyond the life of a project once the funding associated with the project ceases; Most research data is stored at faculty or department level unless there is a national facility 21% of researchers use a national or international facility; Most researchers share data - only c12% do not make their data available in any way. Informal peer exchange networks within research teams and with collaborators are predominant. Only c18% share via a data centre; Although a relatively small percentage (c18%) deposit data with a data centre, a much higher percentage (c43%) expressed the need to access other researchers’ data; Those who did not have access to an established facility were particularly keen on a UKRDS.

Bristol. Leeds, Leicester and Oxford all agreed to participate as case studies and are considered to be

representative of the range of research intensive Universities 4

www.prince2.com for background on the project methodology PRojects IN Controlled Environments

recommended by the Office of Government Commerce

4

The UK Research Data Service Feasibility Study The engagement with stakeholders and desk research indicated that there was substantial infrastructure and expertise in the UK, but that this was set up as “islands” because each facility had been established to address a particular problem and coherence and communication between islands was limited. Although the conclusion is that UK has a sound basis on which to build, there is work underway in Europe, USA, Canada and Australia and the UK needs to maintain its competitive position.

Options Three options were considered for a way forward: • • •

No Change - This would leave the current situation in place and UKRDS would not exist in any form. Some disciplines would be well provided for, others would not; Highly Centralised - The study identified this option as the most invasive and expensive of the options, creating a new monolithic institution with responsibilities in every area of data management; Co-operative Service - Under this option, UKRDS would act as an enabling service, working with the many UK stakeholders. Such a service would be well placed to act as a catalyst for new services and partnerships, as a centre of excellence, as a standardsguiding body and as a source of expertise and information about data management and repositories, building on current best practice and facilities.

The Steering Committee and the project team concluded that the third of these options was feasible and most likely to succeed. Development of the recommendations and costing in this report have concentrated on that approach.

The Features and Benefits of the Chosen Approach Research costs are growing and the management of research data is a significant cost. A shared service approach to data management holds the promise of minimising the long-term impact and adding value through better exploitation of datasets. Central to the co-operative service model is the development of data management plans for the 5 data life cycle as described by the Digital Curation Centre . This approach allows for a UKRDS to bring increasing coherence to the data management task through exploiting the existing facilities within the UK and enabling the filling of gaps in provision. Benefits include: • • • • • • • • • •

Protection of investment in research and extraction of greater value; Preservation of opportunities for future research; Promotion of the work of the institution and researcher; Informing the strategic development of the research infrastructure; Reduction of duplication, recreation and errors, and unplanned data loss; Volume growth/capacity planning is more cost-effective; More opportunity for re-use, cross reference and dataset integration; Better targeted retention and disposal; Shared skills giving better coverage and thus better productivity in both service providers and researchers; Proper focus for practical best practice.

Additional direct benefits to the institution, to the researcher and to funders could include: • • • • • 5

Providing a focus for promoting the work of the institution and the researcher; Guidance on which repository to get research data from and a gateway to approved service providers; Working with institutions, researchers and funders to promote and encourage the use of Data Management Plans to facilitate life cycle management of datasets; Informing strategic development of the research infrastructure both at local and national levels, and working with stakeholders to inform policy and resourcing of post-project longterm data management; Commissioning new services to fill gaps in data management.

http://www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf

5

The UK Research Data Service Feasibility Study Cost Benefit Assessment The feasibility study has demonstrated the potential for a shared services model to achieve significant long-term savings for the UK HE sector and to add value to the national research base. The consultants undertook significant cost analysis in order to develop a model which would facilitate a phased implementation and would enhance the knowledge base required to scale to a national service. In costing the service and assessing its benefits we have assumed therefore that the service will start up with a relatively small group of key stakeholders which we term the Pathfinder phase. The Pathfinder represents an integrated approach rather than a pilot, and implies that a complete service can be implemented with a limited set of stakeholders so that a level of certainty in the practical implications of the service delivery infrastructure can be tested before the service is scaled up to serve the whole sector. Once established the service will scale up driven by demand while constrained by available resources at the time. The project team is satisfied that there is a cost benefit that will build up from the foundations laid in the Pathfinder stage and the costs and benefits analysis shows this in detail in the full report. Based on modelling across the sector in the UK, the projected savings which could be delivered by a scaled-up UKRDS service have been estimated to be the financial equivalent of 63.5 FTEs. This represents approximately 20% of what it would cost, on average, for each institution to develop its own central data management, training and advisory capabilities, and is a deliberately conservative estimate. The following table summarises the detailed financial assessment in section 4 of the report and in the appendices. It shows that on the most conservative assumptions the sector should realise the equivalent of £6m after 5 years in increased efficiency in research from better management of data sets.

Activities

Pathfinder Years 1 and 2 £2.29m

Scaling Up Phase Years 2 to 5 £1.94m

Totals £4.23m

Operational development and service delivery Capabilities - development and delivery Totals

£3.02m

£11.33m

£14.35m

£5.31m

£13.27 m

£18.58

Benefits/Return on investment Overall Savings Equivalent

0 -£5.31m

24.42 £11.15m

£24.42m £5.84m

Governance The most effective long-term structure is likely to be a Company Limited by Guarantee with a Board made up of stakeholder representatives. For the initial Pathfinder implementation a suitable host institution needs to be identified where the organisation can be established and supported until it reaches an appropriate level of maturity to stand alone. Further discussions about the most appropriate future governance model will be discussed during the Pathfinder phase in the light of lessons learned.

Conclusions 1.

The need for a UK-wide approach to research data management has been established through case study work, engagement with stakeholders and existing providers, and desk research on initiatives in the UK and abroad.

2.

There are significant gaps in provision to be filled.

3.

There are also significant building blocks in place with which to develop a UK-wide shared service for maximum cost-effectiveness and efficiency.

4.

Considerable efforts are being made in other countries to surface and exploit data on a national or regional scale and the UK should ensure that it does not fall behind.

5.

A UKRDS is deemed to be feasible and would offer the following strategic benefits: 6

The UK Research Data Service Feasibility Study • • • • •

Capitalising on existing investment by the Funding Councils by providing coherence to extract maximum value from infrastructures and services already in place; Providing a shared service across the dual support system and thereby supporting DIUS strategies; Achieving leverage of additional research value and output, and increasing the UK’s global competitiveness in research; Gap-filling at marginal cost; Avoiding the opportunity costs of leaving individual HEIs to manage their own research data as opposed to taking the shared service UKRDS approach.

Key Recommendations 1. That a UKRDS is feasible and should be considered for funding over a period of at least 5 years; 2. That in the first instance a 2-year Pathfinder phase should be funded at a cost of £5.31m.

7

The UK Research Data Service Feasibility Study 1.

INTRODUCTION

The UK Research Data Service (UKRDS) proposals to HEFCE This document presents the findings of the feasibility study into the potential for developing and maintaining UKRDS - a national shared research data service for UK Higher Education Institutions (HEIs). The information is set out in line with HEFCE guidelines for the reporting of such studies 6 and has been carefully structured to articulate as clearly as possible the overall business case for UKRDS. The document draws on an extensive range of source material, including an interim feasibility study report produced in July 2008 (Serco 2008a)7 which identified the available options for proceeding with the development of UKRDS. The UKRDS Steering Committee was content with the recommended way forward. A summary of the options considered and the agreed way forward is highlighted in this document which contains full details of the analysis and other options considered. All previous supporting reports (Serco 2008a, 2008b, 2008c) are available to HEFCE, as is an extensive file-store of materials which includes a contemporaneous record of the research results, interview notes and analyses. The remainder of the report is structured as follows: •

Section 2 - Study methodology and approach;



Section 3 - Feasibility study report;



Section 4 - Business case and plan;



Section 5 - Outline governance and management proposals;



Section 6 - Conclusions;



Section 7 - References and



Section 8 - Glossary.

Supporting appendices are also included as appropriate. The study was conducted using many sources but critical to the outcome, was the work conducted with four case study sites – the Universities of Bristol, Oxford, Leicester and Leeds – whose input and support was invaluable. The study team thanks the case study institutions for their support during the work, especially as it frequently coincided with intense periods of activity at each institution. Their commitment and continuing enthusiasm serves only to underline further the value which is recognised and potentially realisable from UKRDS.

6

HEFCE Shared Services Guidance for Feasibility Study and Business Plan

7

UKRDS Interim Report to UKRDS Steering Committee www.ukrds.ac.uk

8

The UK Research Data Service Feasibility Study 2.

STUDY METHODOLOGY APPROACH

Following an ITT, Serco Consulting in partnership with Charles Beagrie Limited and Grant Thornton were appointed as consultants for the study. During the course of the study, a very substantial amount of information was gathered, all of which has been indexed and is held in a dedicated project file-store. It is therefore available for reference by HEFCE as required. Two major documents were produced – UKRDS Interim Report (July 2008) and UKRDS Feasibility Study Report (October 2008) which form the basis for the final submission to HEFCE. All supporting documents contain many cross-references to source material to ensure key statements, wherever possible, are linked to the evidential basis.

Workplan and milestones The following main activities were undertaken in conducting and managing the study:

8



A JISCmail stakeholder list was drawn up which covered over 40 stakeholder agencies including all HE Funding Councils and Research Councils, RCUK, BL and TNA, UUK and Russell Group, SCONUL and UCISA.



A UKRDS Steering Committee was constituted comprising a smaller group of members from the list of stakeholders. The study was overseen by a RUGIT/RLUK appointed UKRDS Project Manager reporting to a UKRDS Project Management Board;



Four ‘case study’ volunteer universities were identified - Bristol, Leeds, Leicester and Oxford;



Case study focus groups, interviews and questionnaires for researchers (c700 researchers represented across wide range of disciplines), validation of research findings, developing and testing real-life scenarios against emerging UKRDS process options;



In addition to working with the case studies, over 20 interviews were conducted with key stakeholders including all the Research Councils, Wellcome Trust, JISC, HEFCE, UKDA and JANET(UK); international bodies such as ANDS were also interviewed;



Significant desk research was conducted, improving our understanding of current UK provision, and creating a significant body of intelligence about initiatives in the EU and elsewhere (notably the US, Canada, and Australia);



Based on the detailed appraisal of the research data and feedback from interviews, the picture began to get clearer in terms of the gaps in the services provided in respect of research data management and the options for dealing with the gaps;



From the gap analysis, a number of key themes emerged, leading to the identification of a list of ‘capabilities’ which a UKRDS should ideally be able to deliver;



The detailed appraisal, gap analysis and options for moving forward were included in an interim report8 to the Steering Committee who then advised on the recommended option for proceeding in terms of producing detailed costs and the associated business case;



The UKRDS model was defined, costs for delivering the service were assessed and benefits – both financial and non-financial - were affirmed, leading to the development of the business case;



The business case and plan create to UKRDS was designed and ratified by the Project Management Board;



The optimal governance model of UKRDS was defined and agreed for insertion into the final HEFCE submission.

Serco 2008a

9

The UK Research Data Service Feasibility Study

Overall, the following milestone checkpoints were followed: Description

Milestone delivery date

1

Project initiation and mobilisation

End March 08

2

Initial workshop options long-list Feasibility study analysis and interim report

End April 2008 01/8/2008

3

Development of business plan for proposed viable options

15/8/2008

4

Shared service vehicle design, governance definition and management structuring

10/10/2008

5

Business plan update, final reporting and presentations

28/11/2008

10

The UK Research Data Service Feasibility Study 3.

FEASIBILITY STUDY

3.1

HIGH-LEVEL ANALYSIS OF THE POTENTIAL SHARED SERVICE

Background – The challenges to be addressed Data has always been fundamental to research but it in recent years it has become central to more disciplines and inter-disciplinary projects and grown substantially in scale and complexity. There is increasing awareness of its strategic importance as a resource in addressing modern global challenges such as climate change, and the possibilities being unlocked by rapid technological advances and their application in research. In the US the National Science Board has stated that: “It is exceedingly rare that fundamentally new approaches to research and education arise. Information technology has ushered in such a fundamental change. Digital data collections are at the heart of this change. They enable analysis at unprecedented levels of accuracy and sophistication and provide novel insights through innovative information integration. Through their very size and complexity, such digital collections provide new phenomena for study.” (National Science Board 2005). Similar views have been expressed internationally through the International Council for Science: “Because of the critical importance of data and information in the global scientific enterprise, the international research community must address a series of new challenges if it is to take full advantage of the data and information resources available for research today. Equally, if not more important than its own data and information needs, today’s research community must also assume responsibility for building a robust data and information infrastructure for the future.” (International Council for Science 2004). In a UK context, the UK Government is a signatory to the OECD’s Declaration on Access to Research Data from Public Funding (OECD 2004) and has a strong policy commitment to supporting science and innovation. The Treasury, Department of Trade and Industry (DTI) and the Department for Education and Skills (DfES) published in 2004 the “Science and Innovation Investment Framework 2004-014”, which set out the government’s ambitions for UK science and innovation over that period, in particular their contribution to economic growth and public services. A section of the Framework addressed the need for an e-infrastructure for research. It argued that over the next decade the growing UK research base must have ready and efficient access to digital information of all kinds such as experimental data sets, journals, theses, conference proceedings and patents. This is the life blood of research and innovation but presents a number of major risks due to unresolved challenges in their long-term management (HMSO 2004). These challenges in the management, preservation and curation of research data have been recognised over many years and different studies (e.g. Lievesley and Jones 1998, Lord and MacDonald 2003, Tessella 2006, OSI 2006, Lyons 2007). Although encouraging progress has been made towards fulfilling some of the recommendations of these studies much remains to be done. Over the next decade, we expect an increasing proportion of the UK’s research activity to acquire the characteristics of e-science; and, in particular, for the volumes of data it generates to grow very rapidly. In other words, e-research is likely to become mainstream. It is important to understand that the data management challenge is by no means restricted to socalled “big science”, although large-scale facilities in areas such as particle physics generate huge data volumes. More modestly-funded projects in all disciplines will also bring data challenges of varying complexity. The issue of the data deluge is not going to be restricted to Russell Group universities, although they can reasonably expect to have the largest volumes of research data to deal with. All UK universities will be affected by the problem and all should be involved in benefiting from a UK-wide research data service. The frequency of demands for data management support at institutional level, by researchers, is starting to increase as data volumes grow, and as more research funders (notably the UK’s research councils) develop policies for the management of data outputs generated by grant holders. Researchers’ own expectation is often that the institutional library and/or IT service will make the necessary provision; although this has started to happen on a small scale, HE managers have serious concerns about the cost, scalability and sustainability of purely local solutions, and the duplication of effort that may result. Although some of the UK’s Research Councils have data centres for their outputs, and there are some discipline-based repositories at national and international level, the large gaps in 11

The UK Research Data Service Feasibility Study the UK’s current provision, and the lack of a coherent and consistent UK policy framework, mean that the challenge of managing this data and ensuring its long-term sustainability and potential re-use defaults to the individual HEI. HEFCE has invested significant sums of money with the JISC to build the Integrated Information Environment9, and consequently the infrastructure now exists in the UK to deliver the vision of the shared research data service. The RUGIT/RLUK proposal aims to realise the excellent R&D work that the JISC has been doing in the field, including the work on federated access and Shibboleth, and turn it into a workable service in support of research. This is not, however, just a question of data storage capacity. In itself, that could indeed be delivered as a shared service and would produce savings for the sector, but it would add no value to the research process. Instead, what is proposed is much more comprehensive and has the potential to add immense value to researchers. The active management of the creation, selection, ingestion, storage, retrieval and preservation of research data - the “data lifecycle” - is now recognised as a complex process requiring an integrated approach. Increasingly, researchers will require access to previously generated data sets, and the facility to undertake new analyses and syntheses of, and to annotate, existing data, and a national managed service could facilitate this. The full scope of research data includes the widest possible range of data volumes from relatively small data sets up to vast data volumes generated by research in fields such as particle physics. It also includes great variety and heterogeneity of data. Examples could include: complex data used in climate modelling, aerodynamics, molecular modelling, bioinformatics; video and image archives used in archaeology, anthropology and drama; massively large data sets used in particle physics. There are significant cultural issues relating to the willingness of researchers individually or in groups to consider the sustainability of their data, and to place trust in digital data repositories whether as depositors or users. These are likely to require a collective stakeholder approach. At the same time significant differences are already becoming apparent between disciplines in terms of their attitude towards and engagement with these issues. The development of the service will need to take careful account of these. RUGIT and RLUK, supported by their Library and IT colleagues in universities across the UK, believe that that this feasibility study makes a strong case for a national shared service in order to provide the capacity, skills, and R&D investment needed to sustain UK research data costeffectively. A national shared service might take a number of forms, with components provided by HEIs, the Research Councils, and other agencies. An important example of an existing shared service in the HE sector on the scale envisaged is JANET, which provides continually developing infrastructure and network applications through a collaborative shared service model. The feasibility study has reviewed and appraised the possible options. Such a shared service would represent a critical component of the national e-infrastructure for research identified in the Treasury’s Science and Innovation Investment Framework 2004-201410. In February 2007 the JISC published on behalf of the Office of Science and Innovation (OSI) a 11 report Developing the UK’s e-infrastructure for science and innovation , intended to set out in more detail what the e-infrastructure should comprise and to help define its future development. The report makes several references to data management; one of the six sub-groups commissioned by the OSI was specifically tasked to look at data preservation and curation. (The OSI has now become the Science and Innovation Group of the Department for Innovation, Universities and Skills). The OSI report stopped short of setting out a detailed roadmap for action. Nevertheless, RUGIT and RLUK see the publication of the report, and the current government interest in shared services, as a crucial opportunity to take this work forward to the next stage. There is a clear risk that the UK’s position among the global leaders in e-research and digital library and information management will be lost unless effective national solutions are developed for the long-term management of research data. The UK’s competitors are already investing in this area: in 9

Integrated Information Environment: http://www.jisc.ac.uk/whatwedo/themes/information_environment.aspx

10

Science and Innovation Investment Framework 2004-2014. HM Treasury, 2004.

http://www.hmtreasury.gov.uk/spending_review/spend_sr04/associated_documents/spending_sr04_science.cfm 11

Developing the UK’s e-infrastructure for science and innovation:

http://www.nesc.ac.uk/documents/OSI/report.pdf

12

The UK Research Data Service Feasibility Study particular, the US is prioritising the delivery of a national network of digital data stores within the development of its national e-infrastructure. The US National Science and Technology Council has established an Interagency Working Group on Digital Data to develop a national approach to data storage and curation. The National Science Foundation’s Office of Cyberinfrastructure, with an initial annual budget allocation of $127M, has just delivered NSF’s Cyber infrastructure Vision for 21st Century Discovery12. In Australia, the government has accepted the recommendation of the proposal report Towards the Australian Data Commons: A proposal for an Australian National Data Service (ANDS Technical Working Group 2007) to establish A National Australian Data Service (ANDS) and provided funding of 24 million Au$ (£11.5 million) over 4 years. In Canada, at the beginning of this year, a working comprised of a number of Canadian organizations and agencies was established to provide recommendations and an action plan on a new national approach to the stewardship of research data in Canada. Other European countries are also keenly aware of the challenges and opportunities associated with this aspect of e-infrastructure and the Alliance of German Science Organisations has already published some ideas: the EU has commissioned a major study under FP6 aimed at enhancing the use of digital repositories in science (e-SciDR). The European Strategy Forum on Research Infrastructures (ESFRI), a high-level advisory group set up by the EC, published a European 13 roadmap for research infrastructures in 2006 This states: Data management and curation is becoming more and more important. While data quality has always been a key issue in scientific research, new measurement methods have increased the amount of data in many areas by orders of magnitude. This makes data management much more difficult, and curation of the data by humans becomes impossible. Combining data from different sources and measurements is crucial in many areas of, e.g., environmental and medical research, posing difficult issues of data integration.

Project objectives The goal of the project is to establish whether there is a need for a national data management service and determine its feasibility. Should this be the case then, as a community-owned national organisation of shared digital research data management services, UKRDS would form a crucial component in the UK’s e-infrastructure for research and innovation, adding significantly to the country’s global competitiveness. Such a research data service is seen by the project sponsors as forming a key component of the UK’s e-infrastructure for research and innovation, and one which will add significantly to the UK’s global competitiveness. The ultimate goal is the development of a national research data service which is: a) supported by RLUK and RUGIT members as one they believe would be best suited to the needs of researchers; b) supported by the JISC and HEFCE as being of benefit to the entire HE sector. for funding the realisation of the project; c) attractive as a partner in research support to research councils and other funders, service providers and other agencies.

Sector Analysis The potential for a shared service In order to assess the potential for a shared digital research data service, the feasibility study examined the positions of the main groups of stakeholders. Direct stakeholders fall into four key groups, though it is acknowledged that there is a very wide range of potential ‘indirect stakeholders’. The four direct groups are:

12

NSF’s Cyberinfrastructure Vision for 21st Century Discover (NSF 07-28). Arlington VA: National Science

Foundation, 2007. 13 European roadmap for research infrastructures: report 2006. Luxembourg: Office for Official Publications of the European Communities, 2006 ftp://ftp.cordis.europa.eu/pub/esfri/docs/esfri-roadmap-report-26092006_en.pdf

13

The UK Research Data Service Feasibility Study 1. Research Funding Bodies 2. Research Data Creators 3. Research Data Consumers 4. Repository Providers. Some stakeholders fall into more than one grouping (e.g. researchers are both creators and consumers of data). Organisations working on the government’s digital preservation shared services programme (such as TNA), the World Health Organisation and search providers (such as Google) are examples of the significant community of ‘indirect stakeholders’ whose involvement could benefit the direction and success of UKRDS. Within each group there is a wide range of organisations and teams large and small but their motivations and obligations in the context of UKRDS show a broadly consistent and coherent pattern. Research Funding Bodies - All those interviewed during the course of the feasibility study expressed similar concerns about the difficulty of obtaining the maximum value out of research data created by the research groups that they fund and were keen to see improvement in this through cost effective curation and preservation of data sets. Concern was expressed about the management of IPR and of personal data, but these were felt to be surmountable difficulties if a solid business case for data sharing could be developed. A number of major funders such as Research Councils and the Wellcome Trust are starting to encourage or mandate research groups to prepare management plans for the data sets they create and/or use and to apply for funding to support this work as part of the overall grant application. Wellcome Trust in particular has already made such grants and is expecting to see the first fruits of this initiative from late 2009. Research Data Creators - While research data creators were in many cases willing to share data sets they were keen to ensure that they had extracted as much research value as possible from their datasets before they were willing to share them further. They were also anxious about the potential effort and cost of making data sets shareable and how the skills, tools and safeguards on IPR and personal data would be made available or managed. Research Data Consumers -Data is the life blood of research and every research group is critically dependent on access to data. In the most cases the availability of suitable data is a study in itself as there is no coherent method of searching for relevant data other than through references in publications and trawling the many institutions where repositories are maintained. In “big science” much investment has been made and the relative narrowness of the major topics means that (volume aside) data access and processing is relatively simple. In arts and humanities and “small science” the problem is more complex and intractable. Repository Providers - A wide range of national and institutional repositories already exist. Considerable effort has also been expended on looking at the standards practices and skills needed to support the curation and preservation of digital research data sets, notably through ADS, DCC and UKDA. However there is as yet no coherent framework crossing discipline boundaries for the provision of these services to research groups and more work has been done in stating problems then in suggesting solutions. It seems clear that a key goal for any UKRDS implementation must to be to provide a coherent framework to extract further value from existing provision. Gap Analysis - Identification of Needs The feasibility study produced a detailed analysis of current service provision for researchers and HEIs (see Appendix A2). It also considered the research data policies of the research funders (Appendix A3) and conducted a gap analysis based on desk research of all available information, detailed interviews with key stakeholders and workshops with the case study sites. The following conclusions were then drawn: •

It is clear from this analysis that different disciplines have different requirements and approaches to research data. Some, such as NERC and ESRC have a significant track record of support and good practice in this area. Many funders are now including guidance for research data outputs in terms and conditions of grants and have established policies on retention, preservation and access, for example BBSRC and MRC. Some actively support archive and data services and require deposit with specific archives, such as the UK Data Archive and the Archaeology Data Service. Some other funders place the responsibility for preserving data with the grant holder. 14

The UK Research Data Service Feasibility Study •







Current provision of facilities to encourage and ensure that researchers have data stores where they can deposit their valuable data for safe keeping and for sharing as appropriate varies from discipline to discipline. Those looking for facilities to perform the function of a secure archive in case of major disaster and also providing the data distribution function to interested parties often turn to specialist national or international services where they exist. The skills provided by such centres are highly valued. Local data management and preservation activity is very important with most data being held locally. This appears, from the feedback from the case study sites, to be department and faculty based with central services providing some capacity, although acknowledging great difficulty in maintaining up to date information about the amount of research data across the institution and projected demand for additional storage capacity. This situation may be improved in future from the output expected from the JISC’s Data Audit Framework development project, referred to later under the section on projecting the future. Expectations about the rate of increase in research data generated speak not only about higher data volumes but also an increase in different types of data and data generated by disciplines that have not until recently been producing volumes of digital output. This brings with it challenges around data management and preservation that includes informing and assisting researchers in best practice to facilitate data re-use and sharing. The survey data shows higher volumes of research data associated with research teams particularly in scientific, technical and medical (STM) disciplines but also that very high and atypical volumes can be associated with particular individuals or areas of research elsewhere. It has also documented variations in requirements for retention and access within and between different disciplines. There is no concept of required provision and best practice existing for all research data. There are examples of excellence and good practice but not available to all. Several themes have emerged during the study, not least that of the nature of research funding and how this impacts on the very sustainability of data beyond the life of a project, and raising the question of how research data repository services are funded to ensure long term viability.

In further support of this analysis, the RIN 2007 report “Research Funders’ Policies for the management of information outputs”, provides some insight into local provision. The report indicated that although some universities aspire to curating data, none interviewed did so in their institutional repositories, leaving this task to individual researchers and departments. Central advice and storage for data is available in Cambridge and Kings College via DSpace@Cambridge and the Centre for e-Research respectively. A recent report on research data digital preservation costs funded by HEFCE provides costs for these facilities and notes the significantly higher level of staffing and equipment required for these central data services compared to institutional repositories for e-prints (Beagrie et al 2008).The University of Glasgow is also developing an institutional repository for data based on the DSpace system from MIT, to hold a broader range of data rather than just publications. A similar facility is under consideration at Oxford. The case study sites also provided substantial information, both from the researcher and support professional viewpoint about local provision within their institutions. When researchers were asked about their data location and storage the response was that most research data is held locally with fewer than 20% of respondents using an international or national facility for data deposit. They also indicated that most of their data is on PCs and departmental servers. It is difficult to extrapolate more general conclusions about the UK’s researcher population from this sample; however, common aspects emerge from all four case study sites: • •

• •

Most research data is dealt with at departmental or faculty level; There is a growing awareness of the increase in research data being generated and the need for more storage and managed environments and as a consequence central provision is being reviewed e.g. Bristol is developing a petascale storage facility for HPC, Leeds has done a lot of investigation into requirements through the University’s Storage Evolution Project, and Oxford is in the process of undertaking a university wide scoping study; An increasing number of disciplines are producing electronic research data; An acknowledged difficulty by researchers in retaining research data beyond the life of a project once the funding associated with the project ceases.

Where the case study sites were able to provide cost data, it related mainly to central provision only. Although the capital costs were readily available, it was more difficult to estimate costs relating to staffing and support. The FEC costs were again difficult to ascertain.

15

The UK Research Data Service Feasibility Study

Projecting the future There tends to be general consensus that we are in a period that is seeing a significant increase in the generation of digital research data. However, the rate of increase does vary across disciplines. The case study sites found it difficult to assess what the per annum rate of growth might be across their institutions. Bristol, that has done a lot of work on assessing demand for storage predicted a 127% p.a. growth rate in demand or storage from researchers within their institution. This came close to an estimate volunteered by a representative from the National e-Science Centre (NeSC) who estimated that national requirements for storage appear to be doubling every 9 months – about 125% p.a. The JISC’s Data Audit Framework Development Project14 aims to enable all universities and colleges to carry out an audit of departmental research data collections, awareness, policies and practice for research data curation and preservation. The success and take-up of these projects’ outputs will facilitate the gathering of more accurate information thereby enabling better prediction of requirements over time at both a local and national level. The work conducted by the OSI e-infrastructure Working Group which was formed to explore the current provision of the UK’s e-infrastructure and to help define its future development provides interesting insight into the core importance placed on research data as a major element of “The Body of Knowledge”: “A national e-infrastructure needs: the means of producing, managing and preserving vast amounts of digital data; sophisticated means of accessing an ever-increasing range of electronic resources of all kinds; technologies and structures to support dynamic and virtual communities of researchers; unprecedented network, grid and computational capacity; and the necessary national 15 services and systems” The working group was formed in response to the ‘Science and Innovation Investment Framework 2004 – 2014’, which was published by the Treasury, the DTI and the DfES in 2004(HMSO 2004). The report (OSI 2007),) details the findings of six working groups. The key findings and recommendations of working group for Preservation and Curation [p.18 & 19] set a direction for the future that takes into account the research data growth and its potential. The following sub-set of key findings is particularly relevant to any UKRDS development:

14

http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2007/dataauditframework.aspx

15

Quoted from the Executive Summary of the OSI 2007 report “DEVELOPING THE UK’S E-

INFRASTRUCTURE FOR SCIENCE AND INNOVATION”

16

The UK Research Data Service Feasibility Study 2. Continued dramatic growth in digital research data over the next decade means that the e-infrastructure must evolve so that UK researchers can benefit from the new opportunities created. 3. Major challenges in the preservation and curation of digital information require a strategic approach to policy and development of the national e-infrastructure. 4. Where disciplinary data centres and services exist across the Research Councils, they represent approx 1.4–1.5% of total research expenditure in their disciplines. 5. Major investments are being made in the US information and cyber infrastructure, and the gap between the UK and the USA is growing. 7. This is not only a problem for the UK government to solve and there will need to be close partnership with industry. The recommendations from the working group reflect the anticipated impact of the growth of data over the next 5 – 10 years. Recommendation 2 below is relevant to any consideration of a UKRDS development: “2. Persistent National Information Infrastructure Development Programme. We envisage a national information infrastructure will incorporate and expand existing facilities and services and develop new ones to provide: 1. One or more very large-scale national research data repositories. 2. National discipline-based research data centres and services. 3. Major national digital libraries and university digital libraries and an electronic network of UK legal deposit libraries. 4. Federated institutional repositories based in universities and colleges. 5. A UK web archive. 6. Major national digital archives and shared services for the records of government and the public sector. 7. National or regional shared repositories to meet the needs of small and medium-size public organisations. 8. Data and publication repositories maintained by publishers and major private-sector industries. 9. Links to international facilities, data sources and data centres such as CERN and the network of Global Data Centres.” The analysis provided in the “Dealing with Data: Roles, Rights, Responsibilities and Relationships” report (Lyons, 2007) resulted in recommendations across eight main areas: Co-ordination and Strategy, Policy and Planning, Practice, Technical Integration and Interoperability, Legal and Ethical Issues, Sustainability, Advocacy, and Training and Skills. These included a recommendation that “Research funding organisations should jointly develop a co-ordinated Data Curation and Preservation Strategy to address critical data issues over the longer term.” This chimes well with the working group recommendation above. Emerging Themes There are excellent facilities currently in place which provide outstanding services on behalf of their communities for access to and storage of research data (see Appendix A). However, that does not imply that if we take a holistic look at the situation within HEIs and opportunities and guidance available to researchers for preserving and providing access to research data across all disciplines the picture is coherent or consistent or that researchers' needs are fully met. The researcher surveys that were conducted at the Case Study Sites had over 200 responses from individuals or research groups covering over 700 researchers in total. The findings were followed up and discussed at 4 focus group sessions attended by researchers and service support professionals. In addition, the feedback from the interviews has also informed the emerging picture of need within the research community. From this input the following main themes emerged: Theme 1: Advocacy - A requirement for more advice on practical issues related to managing data such as producing a data management/sharing plan, best formats on data creation, options for 17

The UK Research Data Service Feasibility Study storing and sharing data securely, publishing and preserving them. This sentiment was also expressed in some of the interviews conducted with the Research Councils as a need for researcher awareness and institutional support for researchers. The Director of Communication and Information at ESRC, commented that “It remains difficult for national subject centres to reach researchers in institutions. Institutions and their researchers are leading on data creation and use with the subject centres keen to engage them more in the preparation of research data for longterm preservation. It would be helpful to be able to develop greater coherence and mechanisms for sharing expertise between the two.” Theme 2: Coordination and information - Providing information about and links to established repositories, both national and international, helping to inform researcher choice and encouraging them to deposit their data. Also providing information about and links to established national advisory services for data management (e.g. national and regional e-science centres, the National Grid Service), digital curation and preservation (the Digital Curation Centre), and subject specific advice (e.g. MRC Data Support Service). Theme 3: Coherence - The need to focus on actions to improve linkages and collaboration at institutional and national level across dual support and all disciplines, with the aim for progressing a coherent approach to the application of standards and good practice, facilitating inter-disciplinary re-use of research data and general interoperability. Theme 4: Data Depository - The need to address concerns about local provision and capacity, together with the problem of retaining data beyond the life of a project due to lack of funding for capacity provision to do this. In some disciplines instrument/ observational data is valuable for reuse but the increasing volumes of such data make it difficult to retain. Theme 5: Skills and Training - A deficit of relevant skills, training and career structures for data management and data scientists has been raised in some UKRDS focus groups and in recent reports (e.g. Lyon 2007, Brown and Swan 2008). There are several different initiatives attempting to address these needs, which could be enhanced by greater coordination and sharing of outcomes. Theme 6: Seeding the Data Commons - “Seeding the Data Commons is seen as a priority activity in the Australian National Data Service” (Rhys Francis pers comm.). Within the UKRDS Case studies and surveys, researchers have drawn attention to the effort and investment needed to upgrade data collections developed for sharing between the research team and collaborators to the higher standards and documentation to be useful for wider use. Not all data collections have the potential to support wider use which would justify this additional effort. National subject centres to undertake this selection and added value role do not exist for all disciplines. Theme 7: Data Security and Controlled Access - Research data originating from third-parties or containing information on individuals will often only be available to researchers under strict legal or contractual conditions. Providing access to such resources requires secure environments for data storage and controlled access and secure data handling procedures. Wider availability of these environments, combined with training of researchers and co-ordination of policies on consent and data re-use will improve availability of such data for future research.

18

The UK Research Data Service Feasibility Study The following diagram provides the vision in terms of how these themes translate into supporting the researcher of the future. World-class policies and procedures: by propagating and influencing best practice in research data management planning, policies and procedures Expert support for Data Management Plas: by establishing coherence, rigour, trust and consistency throughout the stakeholder communities through a structured approach to the long-term management of valuable research data

Extensive discovery services: by providing trusted data sources as inputs to research projects through the registration and analysis of large numbers of DMPs

A clear framework for capacity development: by

The ‘Standards Champion’: by working with all stakeholders to facilitate the development and adoption of appropriate standards for research data management

providing comprehensive capacity planning both institutionally and nationally to inform development decisions for research data support infrastructures

Secure access management: by establishment and operation of recognised access control protocols and their supporting systems in line with respective stakeholder policy

Consistent international links: by establishing mutually beneficial access rights to research data and helping to develop global standards in research data management

State of the Art Tools and Methodologies: by sustained investment in data management tools, discovery capabilities, methodologies and handbooks to support researchers and their funders

Assessment of the market and existing provision The Dual Support System for UK Research Under the dual support system, the seven Research Councils provide grants for specific projects and programmes or for research via their own institutes, while the UK’s Funding Councils provide block grant funding to support the research infrastructure in universities and enable institutions to undertake ground-breaking research of their choosing. Such funding also provides the capacity to undertake research commissioned by the private sector, Government Departments, charities, the European Union and other international bodies. Research Council funds are awarded on the basis of applications made by individual researchers, assessed on the basis of the research potential and are awarded irrespective of geographical location in the UK. There are four Funding Councils in the UK, supported by the Department for Innovation, Universities and Skills and the devolved Departments of Education. Funding Council support for research (Quality Related or QR funding) is distributed on the basis of the excellence of individual departments in higher education institutions, using the results of the Research Assessment Exercise (RAE) and at a future date the proposed Research Excellence Framework (REF). Roles and Responsibilities for Research Data Research data is highly variable in terms of its formats, processing, interpretation and use and there are highly diverse organisational provisions and inter-relationships between researchers, funders, institutions, and funders in the UK and beyond to international research and organisations. Roles and responsibilities for UK research data have been reviewed and summarized by Lyon (Lyon 2007). Appendix A provides details taken from that report with the addition of the aggregator role proposed by the Australian National Data Service (ANDS 2007). Although very useful in the context of addressing the nature of human and organisational roles, the information does not address the specific landscape of provision and dual support that UKRDS needs to take into account and work with. As noted above under Dual Support for Research there are a number of inter-dependent roles and responsibilities for researchers, institutions, and a range of research funders which are briefly summarized below. Within this landscape HEIs and their employees are responsible for: 19

The UK Research Data Service Feasibility Study • • •

Creating and managing data over the life of research projects; Maintaining “the record of research” and retaining research data for a specified duration after completion of the project to meet best practice, legal and contractual obligations; In many cases long-term or indefinite retention and re-use particularly for those subject areas which have no national or international data centres and services.

Some Research Councils fund or co-fund: • • • •

A number of national data services including the NERC data centres, the Economic and Social Data Service, and the Atlas Data Store; Advisory Services such as the national and regional e-science centres; A proportion (80%) of the direct and indirect costs of research projects in universities or Research Council institutes; Brokering of access to external datasets in other sectors e.g. government and the NHS via strategic collaboration and/or data centres and data services

The Funding Councils are responsible for and fund or co-fund: • • • •

The JISC and its national services and programmes; Advisory services such as the Digital Curation Centre (via the JISC); Research Infrastructure including data storage and HPC facilities in HEIs via capital programmes or QR block grants; Independent research in HEIs via the QR grants.

Other parties including charities and industry can play major roles respect of research data and its infrastructure, and in some disciplines may be the major funder of research. There are several ongoing or proposed initiatives to improve co-ordination and inter-operability between these efforts and organisations involved. In some cases they may be focused on specific disciplines or research areas (e.g. The National Data Strategy and UK Data Forum for the Social Sciences, the UK Clinical Research Collaboration, the National Cancer Informatics Initiative, the Environment Research Funders Forum), practical information exchange (e.g. the Research Data Management Forums jointly supported by the Digital Curation Centre and the Research Information Network), or broader national co-ordination and strategies (e.g. proposed follow-on work for the OSI e-infrastructure steering committee or the RCUK Research Outputs Working Group). The development of a model for a UKRDS not only requires an understanding of the technical landscape and current infrastructure but also the cultural and economic framework upon which a UKRDS might build, including the dual support system for research noted above.

Learning from existing provision The feasibility study has recognised the need to take cognisance of funders’ policies and guidance to researchers to ensure that any proposed UKRDS model will be in tune with the strategic directions and policies of key stakeholder groups. As noted in the “Research Funders’ Policies for the management of information outputs”, a report commissioned by the Research Information Network published in January 2007 (RIN 2007), “Councils differ significantly in the stances they adopt towards unpublished outputs, especially data.” And goes on to note that: “Differences in the nature and origins of data bring with them differences in value, and implications for policy and practice. And there are major differences in the degree to which funders see it as their responsibility to make data accessible and act as long-term guardians.” The Research Councils are a major source of funding for UK research. The analysis from the case 16 study sites indicate that across these four institutions between 30% - 40% of research income comes from the Research Councils. They collectively and separately are key stakeholders in deliberations about the fate of research outputs in this country. Charities and other funders make up the remaining funding. Significant funding from such sources comes in the areas of biological and medical sciences and charities such as the Wellcome Trust have been pro-active in their policy and guidance regarding research outputs and data. The following pie chart gives an indication of this distribution for the Case Study Sites.

16

Serco 2008a

20

The UK Research Data Service Feasibility Study

Summary

British Academy AHRC UK, EU & Other Overseas Industry EU Government (including EC)

BBSRC

EPSRC ESRC MRC NERC PPARC

UK Government

STFC Other RC UK Charity

Summary of funding distribution for Case Study Sites In June 2005 RCUK published a draft position statement on 'access to research outputs'. RCUK Executive Group received many comments on this statement. In the light of these, it published an updated position paper on this matter in June 2006 (RCUK 2006).) To date the RCUK position statement on access to research outputs has largely focused upon published researched outputs. However the original 2005 position paper (RCUK 2005)) did note the importance of data outputs as well: “8. RCUK also notes that one of the benefits of digitisation and publication in digital formats is the ability to provide access to primary research data alongside the traditional article; and it shares the Select Committee’s and the Government’s view that the data underpinning the published results of publicly-funded research should be made available as widely and rapidly as possible. For a number of years, Research Councils including the AHRB, ESRC and NERC have funded data centres and services which are responsible for preserving, managing and providing access to research data; and these Councils have well-established policies and procedures for preservation and access……. Further work is needed to develop a common framework of policies and procedures for determining what sets of data are collected, whether in university or in Council-run repositories or elsewhere; and how and on what terms they are made accessible to the research community and others[our emphasis].” Since 2005 individual research councils have been developing or building upon existing data policies and there have been a number of major changes. Notably the AHRC who had a track record in this area now has changed its stance and no longer funds the Arts & Humanities Data Service in the way that it did in 2005, and its position now is that, “Council believes that long term storage of digital materials and sustainability is best dealt with by an active engagement with HEIs rather than through a centralised service.” AHRC 2007. The RIN published in January 2008 “Stewardship of Digital Research Data - Principles & Guidelines”. It sets out five principles: •

• •

I The roles and responsibilities of researchers, research institutions and funders should be defined as clearly as possible and they should collaboratively establish a framework of codes of practice to ensure that creators and users of research data are aware of and fulfil their responsibilities in accordance with these principles. II Digital research data should be created and collected in accordance with applicable international standards, and the processes for selecting those to be made available to others should include proper quality assurance. III Digital research data should be easy to find, and access should be provided in an environment which maximises ease of use; provides credit for and protects the rights of those who have gathered or created data; and protects the rights of those who have legitimate interests in how data are made accessible and used. 21

The UK Research Data Service Feasibility Study • •

IV The models and mechanisms for managing and providing access to digital research data must be both efficient and cost-effective in the use of public and other funds. V Digital research data of long term value arising from current and future research should be preserved and remain accessible for current and future generations.

As these are expanded in the document with guidance on how policy and practice may need to be changed to ensure that they are to comply with the principles, it elaborates upon how these five principles provide a broad framework for developing good practice. It calls for a coordinated approach to provide a cohesive framework of policies and procedures for key agents and stakeholders in order to maximise the potential benefits of digital research data. Clearly much effort is being put into developing policy and guidance, not just through the overarching activity of the RIN and RCUK but also by research funders and key stakeholders in the international community such as the OECD (OECD 2004) and the EU (European Commission 2007) amongst others. For some funders this serves to support the infrastructure that they have developed and / or funded over many years for supporting research data. Funders Policies and Guidelines The “Research Funders’ Policies for the management of information outputs”, a report commissioned by the RIN published in January 2007(RIN 2007)17 provides an excellent summary analysis of the situation. Although published just two years ago it requires update in part as the some of the Research Councils have already modified or changed their policy. Appendix A2 sets this out, based on the RIN report p.57-59 updated and / or elaborated where such update information has been readily available. 18

(Note that the Sherpa JULIET project includes where they exist, summaries and links to research funders archiving mandates and guidelines and provides a quick reference point for researchers seeking information.) Where funders lay the responsibility for data with the grant holder it is difficult to assess how well policy and guidance is being met. Even where funders stipulate potential sanctions if data is not offered for deposit there is still a feeling that more could be done to improve researcher awareness and institutional support for researchers. For example the Case Study Sites response to this study’s Researchers Questionnaire showed that 36% did not know if there were any grant or legal requirements to retain their data. Responses relating to questions about sharing data indicated that most researchers share data (only c12% do not make their data available) but that informal peer exchange/networks within research teams and with collaborators pre-dominate. Only c.19% of data producers share data via a data centre and in contrast c.43% of data users make use of data centres (e.g. the European Bioinformatics Institute or the NERC data centres) to access other researchers’ data. Researchers noted the difficulty in retaining their data beyond the life of a project, largely because of lack of funding to do so. National Facilities For many years, Research Councils including the AHRB, ESRC and NERC have funded data centres and services which are responsible for preserving, managing and providing access to research data; and these Councils (with the exception of AHRC who have recently modified their guidance) have well-established policies and procedures for preservation and access. For example the network of seven NERC data centres provide support and guidance in data management to those funded by NERC, are responsible for the long-term curation of data and provide access to NERC's data holdings. Two of these data centres are located at the Rutherford Appleton Labs (RAL), part of the Science and Technology Facilities Council (STFC), a Council that provides major data services to a range a disciples through its Atlas Data Store (ADS). STFC also undertakes both long term research and short term development projects in order to ensure that its data centric services apply the most advanced technology in order to meet the stringent requirements of the scientists and engineers who use them. The approach to secured funding for most of the national facilities is based upon grant horizons of 3-5 years. Although some of the services have been in place for decades long term planning has to assume long term funding rather than being determined by guaranteed funding. 17

http://www.rin.ac.uk/policy-information-outputs

18

http://www.sherpa.ac.uk/juliet/

22

The UK Research Data Service Feasibility Study International repositories International repositories tend to be discipline based and are often supported by multiple agencies. There are many that are well established and accept data from the global community. Minimum requirements for metadata and data formats are normally applied. Institutes like the EBI detailed below, will enhance appropriate data and make it available to the wider researcher community. The repositories listed below are examples only. For those outside of the discipline it can be difficult to discover some of these important services. •







European Bioinformatics Institute - The European Bioinformatics Institute (EBI) is a nonprofit academic organisation that forms part of the European Molecular Biology Laboratory (EMBL). It is a centre for research and services in bioinformatics. The Institute manages databases of biological data including nucleic acid, protein sequences and macromolecular structures. It is the European node for globally coordinated efforts to collect and disseminate biological data. Many of its’ databases are household names to biologists – they include EMBL-Bank (DNA and RNA sequences), Ensembl (genomes), ArrayExpress (microarray-based gene-expression data), UniProt (protein sequences), InterPro (protein families, domains and motifs) and MSD (macromolecular structures). Others, such as IntAct (protein–protein interactions), Reactome (pathways) and ChEBI (small molecules), are new resources that help researchers to understand not only the molecular parts that go towards constructing an organism, but how these parts combine to create systems. EBI is the custodian (not the owners) of biological data provided by the community, and progress in biological research depends on completely open access to these data. All data and tools are therefore freely available to the research community, without restriction. European Union Joint Research Centre - The EU Joint Research Centre runs a number of activities under its’ “Scientific and technical reference function”, including INSPIRE Europe-wide access to geo-data. The planning, implementation, monitoring and evaluation of European environmental policy require high-quality, reliable geo-information. This information is already in existence in various countries and institutions, but often unfortunately not in a coherent form that can easily be collated for analysis, and not always available unconditionally and at any time. It is to meet these circumstances that INSPIRE (Infrastructure for Spatial Information in Europe) is being established, as a framework for the construction of a European Spatial Data Infrastructure (SDI). Once fully installed, INSPIRE will provide political and economic decision-takers, as well as scientists, with reliable geo-data for assessing environmental conditions across the whole of Europe. European Centre for Medium-Range Weather Forecasts - The Meteorological Archival and Retrieval System (MARS) is the main repository of meteorological data at ECMWF. The European Centre for Medium-Range Weather Forecasts is an international organisation supported by 31 States. It contains terabytes of operational and research data as well as data from special projects. MARS data is freely available to registered users in the Member States and Co-operating States. There is no public access to MARS. For research and commercial use, data can be obtained through the Data Services. For research use only, some datasets are freely available. IRIS Data Management Center - The IRIS Data Management Center (Incorporated Research Institutions for Seismology) is a U.S. University research consortium dedicated to exploring the Earth's interior through the collection and distribution of seismographic data. IRIS is funded by the National Science Foundation's Division of Earth Sciences. and is located in Seattle, Washington. The IRIS DMC receives earthquake and seismic data from a variety of Data Collection Centres and is responsible for the long term archive and distribution of all IRIS generated data. Seis-UK has an agreement with IRIS to archive and distribute data collected by Seis-UK projects worldwide. IRIS is able to restrict the access to datasets, e.g. for 3 years, if required for legal or contractual reasons. Entire data sets of unprocessed data seismic waveform data (not the raw instrument output) are archived as well as processed data sets. IRIS also provides tools for automated retrieval of data and seismic data analysis/QC software.

Other data services hosting data used by researchers In addition to the facilities above there are a number of other national services that host and provide access to research data and outputs. They are not repositories for research data outputs from researchers in HEIs, although some of the services may allow for certain aspects of this, for example UKPMC. These organisations include the following:

23

The UK Research Data Service Feasibility Study • • •

19

EDINA – a JISC national academic data centre based at the University of Edinburgh and is host to reference, map and image data resources. 20 MIMAS - a JISC and ESRC-supported national data centre, based at The University of Manchester and provides access to key data and information resources across a wide range of disciplines. The portfolio includes UKPMC21, a service led by The British Library. NDAD22 - The National Digital Archive of Datasets (NDAD) preserves and provides online access to archived digital datasets and documents from UK central government departments. Our collection spans 40 years of recent history, with the earliest available dataset dating back to about 1963. The data remains in the legal custody of The National Archives, but is managed by University of London Computer Centre. NDAD preserves this important data from the ravages of time and technology, and makes it freely available on the web.

The options for addressing the challenges The interim report consideration:

23

to the UKRDS Steering Committee identified three main options for

No Change - This would leave the current situation in place and UKRDS would not exist in any form. Some disciplines would be well-provided for, others would not. The study was advised by stakeholders in a wide variety of fields that doing nothing and expecting the problem to sort itself out is not a realistic option. The study explored the reasoning behind this option at great length and concluded it would be high-risk, as it would rely on a solution arising through market forces. Highly Centralised - It became clear during the study that some stakeholders assumed that UKRDS was all about providing a very large national data repository, either a new facility at a single location or a virtual repository spread over a number of institutions. The study identified this option as the most invasive and expensive of the options, creating a new monolithic institution with responsibilities in every area of data management. Although such an institution would leverage the traditional Shared Services approach, the issues around migration to such a state are problematic and the risks are higher than would be advisable. Co-operative Service - In this model, UKRDS acts as an enabling service, representing the interests of many UK stakeholders. Such a service would be well-placed to provide an enabling framework, act as an advocate for standards and as a centre of excellence for information about data management and repositories, and supporting many other stakeholders such as the Digital Curation Centre (DCC) who already have processes in place. This approach brings the Shared Services model into the current environment of grid computing and cloud-based data storage, with an emphasis on distributed, rather than centralised, shared services. The UKRDS Steering Committee supported the Co-operative Service concept and the feasibility study and subsequent planning proceeded with this option.

3.2

PROJECT PARTNERS

The Business Plan recommends the establishment of a Pathfinder piloting stage as the starting point for UKRDS. This stage will work with three main groups of partners: •



An expanded group of case study institutions comprising the four universities who took part at the feasibility study stage, plus two universities from Scotland and one from Wales; others may be invited to join but at present the business planning assumes seven institutions; A group of funders comprising at least one Research Council and the Wellcome Trust;

19

EDINA is not primarily concerned with preservation of data. http://edina.ac.uk/

20

MIMAS is not primarily concerned with preservation of data. http://www.mimas.ac.uk/

21

http://ukpmc.ac.uk/ - Based on PubMed Central (PMC), the U.S. National Institutes of Health (NIH) free

digital archive of biomedical and life sciences journal literature, UK PubMed Central (UKPMC) provides a stable, permanent, and free-to-access online digital archive of full-text, peer-reviewed research publications. 22

http://www.ndad.nationalarchives.gov.uk/about/

23

Serco 2008a

24

The UK Research Data Service Feasibility Study •

A number of other stakeholders who provide services and processes including DCC, RIN and UKDA.

This list is not yet complete and no potential project partners have yet been formally approached, although all current case studies have already indicated their enthusiasm to remain closely involved with the project.

3.3

ANALYSIS OF COSTS OF CURRENT PROVISION

Current Data on Research Data Preservation Costs To date there is relatively little data or research on preservation costs for research data. Members of the OSI e-infrastructure working group on preservation and curation reported that funding bodies and organisations had only been able to provide relatively limited data on costs when this had been requested by the working group. For example individual research councils and other funders had no metrics or means of tracking the proportion of research grants allocated for data management. The only clearly identifiable cost for data curation and preservation was the budgets allocated for national data centres and services. The OSI working group reported that where data centres and services exist they represent approx 1.4-1.5% of total research expenditure by the research council (OSI 2007). More recently HEFCE has funded a study on research data preservation costs. This provides the first methodology for establishing full economic costs of research data preservation costs and extensively documented costs on a comparable basis in four case studies: the Centre for eResearch (King’s College London); DSpace@Cambridge (Cambridge University); the Archaeology Data Service (York University); and the Department of Chemistry (Southampton University)(Beagrie et al 2008). The study notes the complexity of costing on a comparative basis across different organisations and the lack of previous data preservation cost data. Of particular importance for the UKRDS study are the detailed staffing and equipment costs for the two existing central research data storage and advisory services in UK universities (Cambridge and KCL). There are ongoing studies at Glasgow and Oxford University aimed at establishing similar facilities. In short national surveys of digital preservation provision and work for the UKRDS feasibility study show that there are islands of good practice and provision but at institutional (and national) level many gaps. Current data on research data preservation costs exists for some of these areas of existing provision. The HEFCE study provides detailed comparative costs for the facilities at Cambridge and KCL which may provide an initial baseline for costing the introduction of similar local facilities across all individual universities in the sector.

Costs Data from the Case Study Sites The project team anticipated that the provision of research data storage within the case study sites would be highly distributed between the PCs of individual researchers, departmental servers, and central facilities, with very limited institutional-wide data being available. This proved to be the case with no central overview of institutional provision being possible at the case study sites. However we have data from the central services at each case study site on data storage provision and planning. All have also provided examples of capital costs for purchasing on equipment for research data storage or estimates of pro rata allocations for research data. However all noted there was no history or data to allow costing on a full economic cost basis to include staff, utility costs etc, which would allow costings to be compared with a projected shared service. The researcher survey has also provided a sample of researcher data storage requirements and where it is stored. However the sample is comparatively small and we found very wide variation within and across disciplines, with individual researchers and projects often having significantly large and non-standard “average” requirements. It would not be safe to project storage requirements of individual researchers to the wider UK sector therefore without significantly expanding the sample. Information was supplied by the case study sites on a confidential basis and is therefore not included in the body of this report.

25

The UK Research Data Service Feasibility Study The UK University Sector and Baseline Statistics for the UK Higher Education sector are available from the Higher Education Statistical Agency (HESA) and updated annually by individual universities. The figures for all UK universities for the Quality Research related funding (QR) provided by the funding councils are provided in the figure below. This provides one measure of ranking research activity in universities and correlates quite closely with figures for research projects and number of researchers provided by our case study sites. The position of each of the case study sites in this ranking is also shown below.

QR Rankings of Case Study Sites

Projection of Current Costs Using QR as a broad indicator of overall research volume and thus also of research data output, we can develop an approximation of the requirement for data curation capacity at individual HEI level across the UK as a whole. The small number of appropriate local analogues for UKRDS identified in the HEFCE study, together with information obtained from the case studies, has enabled the team to deduce that a minimum staffing for maintaining viable technical and curation skills in one university’s central data curation support service would be in the order of 2.5FTEs. The figures pertaining to the analogues were provided under terms of confidentiality but their use in the context of the UKRDS feasibility study were accepted. Based on the 127 UK institutions currently receiving QR research funding from funding councils the following calculations could be made: • • •

Idealised UK cost of local central services for data curation using Analogue A (minus London weightings) as a baseline and weighting cost per institution proportionately by QR funding is £11.7 million per annum (staff, overheads, equipment capital costs); However this is an “efficient” idealised projection – in practice each institution would not be able to employ the small fractions of FTEs embedded in the projection or individually recruit the expert staff with rare skills necessary for the functions; A more accurate estimate of the projected cost reflecting actual inefficiencies of complete distribution to each individual university would be to use the estimate of a minimum of 2.5 FTE to establish a central service as a viable long-term unit locally; 26

The UK Research Data Service Feasibility Study •

On this basis estimated cost (staff, overhead, equipment capital costs) for all 127 QR institution of an independent minimum central data curation support service of 2.5fte would be c. £31.75 million per annum– the larger research institutions would require more than this minimum figure and adjusting for this provides an estimated figure of £34.31 million per annum.

It should be noted that the QR projections suggest that it is uneconomic in any way for anyone other than the top 15 institutions to provide a service locally. For the remaining 112 institutions anything other than a nationally supported service which they can utilise is not viable. Even for most of the top 15 institutions the viability of a solely local service would be marginal over the long-term a mix of local and central provision would be more economic and viable in terms of shared skills and expertise. These costs projections form the basis of the projected savings to be realised from the development of UKRDS as set out in the Business Case and Plan.

3.4

APPROPRIATE STRUCTURES FOR A NATIONAL RESEARCH DATA SERVICE

The structuring of the shared service provision envisaged by UKRDS is dependent on the three key aspects: 1. Clear identification of the stakeholders is it designed to serve; 2. The service capabilities it will deliver; 3. The optimal vehicle for delivering the services. Each if these aspects is expanded below.

Identification of stakeholders A lengthy list of stakeholders was compiled at the onset of the feasibility study. This list has been refined over the course of the project but, essentially, the initial groups of stakeholders – or ‘communities of interest’ – have been confirmed. A small number of additional communities have also been identified. The model described below is based around all of these communities considering both their service needs and what they can provide for the service. UKRDS is intended to be an enabling framework which will encourage participation from communities not currently well served. It will also potentially add value to some communities who, whilst using a current service, require more support in areas such as data management planning and preservation. In such cases UKRDS can fill any such gaps in existing provision where they exist. This will provide the opportunity for a significant number of ‘quick wins’ across communities with the most pressing needs which, in turn, will accelerate the visibility, success and ultimately broadbased acceptance of the importance of UKRDS. In the model representation below, the communities have been identified under a high-level generic title. The composition of the communities at a more detailed level and, in particular their differing service needs will be examined further as part of the business planning stage of the study. Two distinct community groupings have been identified. •

‘Early adopters’ will form the initial “critical mass” of skills and capability for the establishment of a comprehensive and coherent UKRDS service. Early adopters may grow in scale over time, but their efforts will be focused initially on areas of most pressing need; it is likely that at the detailed planning stage early adopters will be further qualified in the light of practical priorities;



‘Later adopters’ may be added subsequently, once UKRDS is firmly established; Later adopters may start to come on board roughly two years after the launch of UKRDS and, as for early adopters, may also be scaled-up over time.

It is important to stress however that these groupings not rigid.

27

The UK Research Data Service Feasibility Study It is probable that not all members of a community described as ‘early adopters’ will use the services of UKRDS in its earliest stages; they will come on board in phases and some will be late joiners. In addition, whilst the later adopters are less likely to be early adopters, it could make sense one or two to be included at the early stages, for example where a particular research programme needs to engage with specific publications.

Early

Later

Adopters

Adopters

HEIs & Research Institutes (academic status) • Researchers • IT Directors • Librarians • Archivists • other experts

Funders

Commercial users & generators of data

Service providers

Public sector users & generators of data *

Vendors .

Other educational institutions

Venture Capitalists

International links

Journal and data publishers

Others .

Processes defined by needs of the Communities

Communities of interest

UKRDS

Further explanation of the constituent members of each community is provided in Appendix A1.

Proposed service capabilities and processes UKRDS will provide an enabling framework for all the identified communities to work within – with the emphasis being placed on the communities themselves to see and grasp the value the service will provide. Based on analysis of service needs and the gaps in service provision described in section above, the feasibility study identified the main capabilities which will be required to deliver UKRDS. The emphasis is on providing an enabling framework for all communities. Capabilities are listed below in descending priority order in terms of likely development. •





UKRDS management and administration - to provide the central coordination and organisational infrastructure to sustain and develop UKRDS. This will include establishing the organisation and reporting structure capable of moving forward the initiative and of being accountable to key stakeholders and dealing with finance, HR and other administrative services; Service Provider contract negotiation, letting and management - UKRDS will work with established and expert service providers in the field and seek to establish formal relationships with those who join together to collaborate on delivering the UKRDS vision and service. It is envisaged that there will a spectrum of such relationships with partner organisations. This will involve collating common operational requirements of a diverse and autonomous community and articulating that need to public/private sector suppliers in a coherent manner; Policy setting to aid in the development of data management strategies for researchers and related data management plans - UKRDS will work with organisations leading the way with best practice with the aim of propagating this to the widest community, and influencing how good practice is embedded and applied throughout the UK research 28

The UK Research Data Service Feasibility Study







• •







community. This will be an important step in ensuring that research data can be shared and preserved for the future; Access Management including access control using internationally agreed methods and tools - UKRDS will minimise barriers to use but at the same time recognising the importance of secure access to data that have to adhere to licences and restricted access. To facilitate this data stores accessed through UKRDS will where possible use community standard solutions for access control. A lead will be taken from the developments in this area that the JISC has encouraged and supported with the move to Federated Access Management. This area will be monitored and kept in step with developments over time including challenges such as the NHS and VREs as appropriate; Training in Lifecycle based Data Management - UKRDS has identified through the case study sites the need for such training at researcher level, as well as to some extent at the service professional level also. Training is available for certain disciplines. Some research councils, the DCC and RAL already have a focus in this area. Current efforts could be further supported and augmented to give UK researchers the benefit of a generic, wide reaching approach to such training.24 The service will seek to collaborate with those organisations leading this kind of training and coordinate a “training the trainers” programme of activity, responsive to feedback and need over time; Relationship management - this will encompass both stakeholder and user groups and will ensure consistency and professionalism, delivering to an agreed Communications Strategy that will include: o Advocacy o Researchers, librarians etc support service o Research funders o Commercial users o General public; Advisory services - UKRDS will provide advisory services to its contributors and users, including guidance about where information and help can be obtained on a variety of topics; Foresight development (for example working with Research Councils and Funding Councils using DCC or RIN) - the assessment and tracking of the research community needs; technical developments across a spectrum of activity that might impact UKRDS; national and international initiatives that may influence the development of UKRDS. Both DCC and RIN are well placed to play a major role in this area given their current remits and contact base and could also provide UKRDS with a ‘Quick Win’; Tools, Methodologies and Handbooks for data curation (which may initially include Discovery tools, though this will only be a step towards a Discovery service which will exist in its own right) - UKRDS will work with experts in this area to establish current best practice where it exists and to identify gaps and areas of particular need, putting in place a strategy and resources to deliver and achieve positive impact across the widest community The JISC support activity in this area including the work of the DCC, as well as work at the research council level, particularly ESRC, does mean that there is a ready expert community to liaise with; Capacity planning and investment (with Service Providers, Funders, HMG) - UKRDS must be well positioned to work with stakeholders to provide a UK-wide multi-disciplinary perspective upon research data generation and management over time. It will actively work with funders and institutions to encourage the use and deposit of Data Management Plans (DMPs) and will provide a central repository or collective view of Data Management Plans, to enable value added services to be developed by the mining of said plans, providing information on research topics, projected storage requirements, archive requirements etc across the UK research base. This would take into account all funders of UK HEI based research including: the Research Councils who provide approximately 35%25 of research funding in HEIs; charities; government; and others; Accreditation/Certification, standards setting and representation on relevant international bodies - UKRDS must be able to ensure that it is trusted by all stakeholders. Its staff and service providers will therefore need a high level of credibility in the field both nationally and internationally. A key element of success for long term data management will

24

Skills, Role & Career Structure of Data Scientists & Curators: Assessment of Current Practice & Future Needs, Alma Swan & Sheridan Brown 2008 http://www.jisc.ac.uk/publications/publications/dataskillscareersfinalreport.aspx

25

Requires further verification as this figure is based on case study sites only

29

The UK Research Data Service Feasibility Study





be the agreement of standards and practices across the globe that earn the confidence and trust of researchers and their funders in particular. UKRDS will need to establish a strong position in all relevant fora. International access services - International research collaborations require that UKRDS provides service beyond UK borders. Access to data from outside the UK will be controlled and monitored appropriately, taking into consideration whether data is open or restricted. Reciprocal arrangements with similar organisations will be looked into as opportunities arise. Facilities for repositories and publishers of journals and other output to cite datasets reliably - There is an increasing trend to make available research data with the published article, often by providing links to the data. It is important that these links are reliable and persist over time and that a standard approach to citing research data is encouraged. There is evidence to indicate that this may convince more researchers to share their data. 26 In an article published in 2007, H.Piwowar put forward evidence indicating that there is a correlation between publicly available data and increased literature impact and that this may further motivate investigators to share their detailed research data.

The model below highlights the main process interfaces between UKRDS and the respective communities it will serve.

HEIs & Research Institutes (academic status) • Researchers • IT Directors • Librarians • Archivists • other experts

Funders Work with funders on policy issues and data management planning

Public sector users & generators of data

Services covering: data management advice, DCC lifecycle adoption and guidance, training in DMPs, tools / discovery development, and accession planning

Service providers

Provision of conditional data set access

Commercial users & generators of data Other educational institutions

Coordinate capacity planning and help address implications for long-term storage and infrastructure investment

UKRDS

Vendors . Engage as appropriate to maximise exploitation of financial support for long-term data management capabilities

Engage as appropriate to maximise exploitation of vendor support for long-term data management capabilities Ensure provision of accession and access procedures

International links

Facilitate provision of persistent citation links

Venture Capitalists

Journal and data publishers

The importance of Data Management Plans for UKRDS Data Management Plans (DMPs) have long been a feature of research data collection programmes in bodies such as NASA and NERC. More recently, a growing focus on data sharing has begun to broaden out their application and take-up by other research funders such as the Wellcome Trust

26

“Sharing Detailed Research Data Is Associated with Increased Citation Rate”, H. A. Piwowar*, Roger S. Day,

Douglas B. Fridsma: retrieved 10.09.08 from http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0000308

30

The UK Research Data Service Feasibility Study and by inter-disciplinary research programmes such as the Rural Economy and Land-Use (RELU) Programme. UK training in preparing DMPs is also becoming available through the DCC 101 research data management training course. Work to promote and utilise DMPs is seen as a central to early phases of work in the development of UKRDS and in subsequent operational phases of the service. There are two main reasons for this: 1. DMPs could have a valuable role in increasing awareness and understanding about data management issues amongst researchers and in providing structured support and training for them to address these issues. Data management training that promotes the use of a tangible working tool, the DMP, if supported by funders could gradually begin to change current working practice and help researchers and support services to not only to address data management during the life time of the project but also beyond it. 2. Establishing a UKRDS registry for DMPs could help leverage and seed data audit and capacity planning for participating institutions, UKRDS services, and for funders. Subject to appropriate safeguards and consents, the registry could also contribute to resource discovery by identifying new data sources being made available for research. It is important to note DMPs would not be the sole means of achieving these objectives and are currently also unknown to or distrusted by many existing researchers. There is likely to be a need to work with both researchers and funders to develop DMPs and their utility and acceptance. In particular thought will need to be given to how to move beyond or update basic data in DMPs and selectively create or capture additional metadata and documentation needed for resource discovery especially for disciplines where no national or international subject services and processes for this exist. However over time and used in combination with related documentation such as research designs and resources, such as existing and new discovery services for data, they could be a key mechanism for achieving major objectives for UKRDS and its stakeholders.

UKRDS process flows The process diagram below places the UKRDS capabilities into a fuller context, including the vital role played by DMPs - showing how they provide enabling support for researchers (in this example). It should be noted that all researchers should benefit and not just those who already have funding. During the course of the study, process scenarios were developed and tested with the case studies against real-life research projects at various stages of their development. Whilst it became clear that much work will be needed on developing the processes in detail, the underlying structure as represented below was supported by the case studies.

31

The UK Research Data Service Feasibility Study

UKRDS Basic Processes Research Project Process

Research Consults UKRDS about other research & data sources

Research Professional Prepares Proposal (Including DMP)

Funder Approves Proposal Y/N

NO Researchers Review For Possible Re-Submission

Data Management Plan

YES Research Team Carries Out Research Project

Update for status + changes since initial registration

Project Report

Institutional Data Services

Research Professional Formulates Research Concept

Research Data

Government & other sectors

Enquiry, discovery and advisory services

National & international data centres

On-Line Authorised User Enquiry

UKRDS Registry

Manual User Enquiry

UKRDS Commissioned Storage

Research Data Sharing Process

UKRDS Services and Administration Advisory Services

Access Management

Relationship Management

Foresight Development

Capacity planning & Investment

International Access Services

Policy and Strategy

Training and Development

Service Provider Administration

Tools, Methodologies and Handbooks

Accreditation& Certification

Citation Repositories

UKRDS Management and administration

Initial process mapping of all community roles and inter-relationships has been completed by the feasibility study team and is available if required. It should be noted that a significant part of these processes embrace the current familiar DCC lifecycle.

UKRDS – the optimal vehicle for delivering the service The UKRDS steering committee recommended that the feasibility study should examine embedding UKRDS, if at all possible, within an existing legal entity/host organisation and avoid creating a new legal entity. To help with the analysis of the options, the feasibility study team developed a set of criteria which describes the ideal characteristics of a UKRDS host institution. These are set out below. • • • • • • • • • • • • •

Preferably an existing organisation; Not committee-based, but delivery-based; Service delivery/service contract management capability; Well-regarded by stakeholders; Broad disciplinary base and stakeholder appeal; Independent legal entity (ideally in its own right); Proven capability in developing and delivering quality national service to the HE and RC sectors; Technically able and mature organisation; Good at partnerships with other organisations; Accountability to main funders; Synergies with other data activities; Data preservation and management expertise/reputation; Proven commercial contracting expertise.

32

The UK Research Data Service Feasibility Study Options identification and evaluation The following four main hosting options were identified: •

Option 1: House within current data repositories/archives;



Option 2: House within current discipline-based data services;



Option 3: Develop a separate quasi-commercial organisation;



Option 4: Build on a current not-for-profit organisation with broad HEI existing user base.

Aided by the identification of the ideal hosting characteristics, the team considered the above options in detail and came to the following conclusions. Option 1: The type of service or organisation that this option points to would be existing services such as RAL; NDAD (spell this out); EDINA; and MIMAS. With this option one would need to overcome the potential difficulties of these being data service providers in their own right. It is not thought appropriate for one particular operation to “wear two hats”. In addition, NDAD, EDINA and MIMAS are not legal entities in their own right, each being a part of a university. RAL is part of STFC and as such is different to the other three. Given the focus on UKRDS on coordinating and sub-contracting major activity, being an independent legal entity would be beneficial. Option 2: There are a number of subject-based data services but, perhaps, the broadest based and longest established, is the UKDA. Again with this type of service/organisation there are similar difficulties to option 1. In addition, having a broad disciplinary base and stakeholder appeal is going to be important to the success of UKRDS. The project team took the view that an organisation that presents the same face to all stakeholders is essential to a successful implementation. Option 3: This would undoubtedly ensure an appropriate balanced relationship with all stakeholders, but has the disadvantage that a suitable legal entity would have to be established with all the cost and complexity that may imply. Moreover it would also be necessary to establish independent overheads to manage and administer the organisation and this was felt to be counter to the spirit of shared services – a point endorsed by the UKRDS Steering Committee. Option 4: There are a number of current not-for-profit organisations with broad HEI existing user base. The advantage of using one of these as foundation organisations for UKRDS is that investment risk is minimised by building upon an existing organisation and leveraging investment already made, existing infrastructure and networks. The project sponsors have concluded that options 3 or 4 most closely match the suggested criteria.

Draft organisation Based on the assessment given above, an eventual likely outcome will be a structure such as that shown below. The timing and governance of such a structure is discussed further in section 3.9 below.

33

The UK Research Data Service Feasibility Study

UKRDS Board

UKRDS Advisory Board

Head of UKRDS

Capacity planning and Investment

Includes working with providers, funders and HMG; maybe also foresight dev (with RIN & appropriate agencies e.g. JISC)

Relationship

Operations and

management

support

Includes contract letting and mgmt, service mgmt, international, all user communities

Includes access mgmt, advisory and support services, close links to current JANET (UK) ops inc.FAM

Policy dev. advocacy and communications

Includes policy setting plus working to develop and advocate best practice etc in close contact with JISC/RCs/ DCC/RIN etc.

Training and

Finance and

development

administration

Includes tools, methods and handbooks dev, training tools and training in lifecycle DM (working with experts in ESRC; DCC; etc.

Enhanced to include UKRDS needs including publications services and UKRDS user communities

Development of UKRDS The way UKRDS develops its capabilities, its priorities and timing, has direct impact on the build-up of the costs. Its success (or otherwise) is totally dependant on the value it is seen to be providing to the communities involved in research programmes - particularly the researchers themselves, their respective institutions and the funders. UKRDS will develop gradually, beginning with a meaningful set of services delivered to a realistic community size - a UKRDS ‘critical mass’. This will be founded on the institutions who took part as case studies during the feasibility study. This initial development will be defined as a ‘Pathfinder’ piloting stage and is set out in detail the Business Case and Plan.

3.5

ASSESSMENT OF COSTS, BENEFITS AND PAYBACK

The full benefits of UKRDS, build-up of costs, quantification of potential financial benefits, costs/benefits analysis and pay-back are set out in the Business Case and Plan.

3.6

ANALYSIS OF POTENTIAL TO SCALE UP THE SERVICE

We believe there are good prospects for scaling up the service over time.

34

The UK Research Data Service Feasibility Study The four case study sites in the feasibility study were selected to be broadly representative of the large and medium-sized research-led universities. In terms of UK ranking by Quality Related nd th th th funding (2005-6) Oxford is 2 , Leeds 9 , Bristol 13 and Leicester 27 . We believe the universities in this group ranked from 1-27 in QR funding account for the majority of funded research projects and a large part of the potential market of UKRDS for research data services. The online surveys, focus groups, and discussions held with the Universities of Bristol, Leeds, Leicester, and Oxford provide findings which suggest that participating in the UKRDS service could be of interest to these institutions and comparable universities of similar scale: a suggestion which is supported by the expressions of interest from peer institutions to be involved in future phases of UKRDS. What is currently unknown however is the potential level of interest in much smaller institutions ranking between 27 and 127 in QR funding which undertake some research and have much lower levels of research funding and numbers of research active staff and research projects. This could be a subject for future investigation by UKRDS. However it seems likely that individual active researchers or research teams within such institutions would also benefit from institutional participation in UKRDS and the potential long-term scope of the service would be the 127 HE institutions across the UK undertaking research. The HEFCE shared services programme envisages services will be developed from pilot implementations. We believe that an incremental development of UKRDS building up a network of provision and partnerships would be appropriate given the complexity of the prevailing landscape and the financial outlook for Government funding in coming years. Because institutions and researchers are looking to contribute to a sustainable service which can persist over time we prefer the term (and approach) of a “pathfinder” for UKRDS. UKRDS project team’s recommended approach to developing a Pathfinder programme to become the starting point of the future UKRDS has already been referred to earlier and is set out in the Business Plan. It should be noted that the full benefits of UKRDS will only come from the pro-active engagement of a wide range of institutions and growing the network of participants. Network effects should leverage the benefits of its services significantly for participating institutions over time.

3.7

NON-FINANCIAL BENEFITS

The potential non-financial benefits of UKRDS are described in detail within the consolidated benefits section of the Business Case and Plan.

3.8

POTENTIAL CONSTRAINTS TO THE DEVELOPMENT OF UKRDS

Possible constraints to the development of UKRDS include: Funding constraints: •

Severe economic downturn and inability to fund UKRDS development;



Benefits to participants and the potential for diversifying income streams will be less in early years until service reaches a critical mass;



Duel funding system and variable funding terms and conditions of research funders may impact on the service.

Cultural constraints: •

Funders and research disciplines have variable practice and experience of data management;



The academic reward system provides insufficient incentives for data stewardship;



Training and skills in data curation are under-developed;



UKRDS may be seen as a threat rather than an opportunity by existing players.

Misrepresentation of the shared services vision: •

Negative comment on the proposed scheme;



Mistrust of the key participants by stakeholders; 35

The UK Research Data Service Feasibility Study •

Division of opinion between research-intensive universities and others;



Unrealistic stakeholder expectations.

Inadequate or ineffective communications with stakeholders: •

Messages to and from stakeholders are not received;



Messages to and from stakeholders are not understood correctly;



There is a failure to engage effectively with the stakeholder group;



There is failure to reach all levels of the stakeholder group;



Mixed, conflicting or inconsistent messages are given to stakeholders;



Too many messages are provided that blur the strategic vision and focus recipients on minor issues.

3.9

PROPOSED CORPORATE STRUCTURE OF UKRDS

As was noted earlier, the ‘blueprint’ for the creation of UKRDS shows a staged approach, beginning with the Pathfinder stage. The corporate governance and structure of UKRDS will therefore emerge over time. At present, the most likely scenario covers three stages: •

Stage 1: Pathfinder Stage, covering the initial development of UKRDS to an agreed funding level, under the management of a lead-institution, reporting progress to HEFCE;



Stage 2: UKRDS build-up, agreement to open up UKRDS to all early adopter communities and preparations made to install a corporate structure along the lines indicated above;



Stage 3: UKRDS constituted as a company limited by guarantee, managed by a board drawn from its major funders and other key stakeholders.

Stages 2 and 3 could be combined, if the lead-institution could not accept responsibility for the extension of UKRDS to the wider communities.

3.10

VAT IMPLICATIONS AND PROPOSALS

The VAT implications for UKRDS were considered at length by the study team, including expert advice from Grant Thornton. The outcomes of this investigation and salient recommendations are contained with the Business Case and Plan.

3.11

LEARNING FROM THE PROJECT AND PLANS FOR FURTHER DISSEMINATION

The project produced a rolling plan of communications, dissemination and advocacy which operated throughout the feasibility study stage. Already, the July interim findings of the study have been made available on the project website and one-to-one briefings given to key stakeholders in addition to presentations at a series of workshops, focus groups, and conferences. This is because it is important to influence and continue to engage with a wide range of agencies and individuals along the way, including those at the highest level in the academic and research community and amongst the potential funders beyond. Ideally when the project finishes there should be no hiatus before the next steps begin. Effectively this means priming key stakeholders about the likely conclusions well in advance and preparing the ground for the next steps which require major decisions (especially funding decisions) to be taken. The project’s communications and advocacy plan should form the basis of the longer-term communications strategy as part of the UKRDS implementation plan. Plans for further dissemination in 2009 include an international conference in February to disseminate project findings.

36

The UK Research Data Service Feasibility Study 3.12

ANALYSIS OF POTENTIAL TO EXTEND BEYOND THE HE SECTOR

A primary area for extension beyond the UK HE sector should be to related sectors overseas. National initiatives such as Australia’s National Data Service (ANDS) and the NSF’s Datanets programme are well underway and they are looking to collaborate with comparable programmes internationally. These, together with initiatives in Germany, Canada and Europe collaboratively are already tackling similar issues and some are ahead of the UK. There are potential roles for UKRDS as a contact point and partner for these programmes and others. Initial contact made by UKRDS during this feasibility stage with ANDS has demonstrated an openness and willingness to share experience and expertise. For example, the indications are that the initial main focus for ANDS will be on their programme of “Feeding the Data Commons”. This work could help to inform one the main themes identified by the UKRDS case study sites, that of how to address the issue of adequately annotating and documenting research data to make it suitable for preserving and sharing widely in a cost effective and resource efficient way. UK researchers are involved in large international projects and initiatives, and for those who are not, there is a growing need for them to be aware of and to exploit outputs, including data, from all around the world to inform and enhance their work, and to increase the visibility of their own outputs with researchers elsewhere. The UK is a signatory to the OECD Declaration (OECD 2004) on public access to research data outputs and to Council communications and decisions of the European Union on scientific publication (European Commission 2007) supporting wider access to research data and these will continue to influence UK government and UK organisations policies for research data. Therefore, UKRDS needs to be capable of inter-operating with other national and international initiatives. Other sectors and organisations working on the government’s digital preservation shared services programme (such as TNA), international organisations such as the World Health Organisation, and search providers (such as Google) are examples of the significant community of other ‘indirect stakeholders’ whose involvement, if only to the extent that their activities are monitored, could also benefit the direction and success of UKRDS.

3.13

SUCCESS CRITERIA

Success Criteria The table below summarises the main criteria for UKRDS’ subsequent success against which robust KPIs must be constructed. Headline Engagement with researchers and suppliers

Targeted delivery

Relevant data

Clear added-value

Pragmatic build-up

Detail The main measures of success for a service such as UKRDS will be the willingness of potential service providers to engage in the service delivery process through a UKRDS and the willingness of researchers to seek education in data management and opportunities to deposit their datasets with service providers via the UKRDS route. It is essential for this to work and be measured in a reasonable fashion that availability is controlled and targeted at low risk low cost providers and researchers initially. It is clear that there is a considerable degree of existing data management that can be adopted unchanged by UKRDS as it is well established and targeted at a highly specialist research community. Much of this is in “big science” such as astronomy, particle physics and the like. It will clearly mitigate against the success of any UKRDS if its advent has any negative impact on such work. It is likely that UKRDS will succeed by adding value to the work of arts and humanities, social sciences and small science. In this context there is some evidence that supporting overlap in dataset usage between social sciences and clinical medicine would be of benefit. The major constraint in this work will be the need to engage with suitable institutions and national service providers; the four case study institutions and UKDA and ADS are thought to hold out the 37

The UK Research Data Service Feasibility Study Headline Embrace, don’t threaten, value where it already exists Good communications to target community

Must be built to succeed, not to fail

Quality as well as quantity

Availability of resources skilled in data management

Detail opportunity of a carefully controlled early implementation. Any service such as UKRDS must recognise current areas of excellence such as ESDA/UKDA and not detract from them – the model should embrace and enhance them were possible. It remains difficult for national subject centres to reach researchers in institutions. Institutions and their researchers are leading on data creation and use with the subject centres keen to engage them more in the preparation of research data for long-term preservation. Any service such as UKRDS must ensure its role is well communicated and constantly refreshed and ideally help to address the problem faced by all repositories. Key elements are – an affordable, sustainable business model; focus on ease of use, with no heavy metadata overhead; presentation is key – UKRDS must ensure the message gets across that is a journey, the first phase of which alone is likely to last three years, it is not a quick fix; there are unlikely to be any ‘quick wins’ other than proper representation of UKRDS’ intent. Data stored in UKRDS must have a quality stamp on it, can be trusted, is secure and made accessible through simple but effective data mining tools. Sufficient support/advisory staff must be available who are familiar at discipline-level and costed into the funding models.

Stakeholder management and communications Engagement with stakeholders during the feasibility study provided valuable insight into the key communications aspects which will impact the success or failure UKRDS. The degree of consistency that exists across all stakeholders was significant. This list will form the basis for a consistent UKRDS communications strategy. •

• •

• • •

3.14

Any ‘solution’ promoted by UKRDS must not threaten a status quo where much value already exists. Rather it must build on this success to help create something of greater value. In so doing, UKRDS could fulfil a valuable role in becoming ‘an agent of change’ for strengthening data preservation standards; Continue recognise the significant difference between disciplines; UKRDS must present consistent, clear and regularly refreshed articulation of the needs it is aiming to address: o What gaps is it filling? o What ‘problems’ is it trying to solve? o Where can it add real value? o Can it continue to enable those groups which are currently under-represented or not represented at all, to become involved and to benefit from the excellent solutions developed elsewhere? o Can it be trusted? o Is it truly sustainable? All propositions must be recognisably attainable and realistically planned; On-going affordability must be closely monitored, especially in relation to technology challenges and the threat of obsolescence; Continue to monitor availability of the resources/skills needed to manage the data.

RISK ANALYSIS

The challenges as set out in this document are not going to be solved by ‘doing nothing’. This feasibility study has delivered not only a clear articulation of the current situation but also a practical pathway forward which will provide the research community in the UK, for the first time, with a coherent approach to full life-cycle data management including the long-term preservation and security of valuable research data - irrespective of discipline. At the same time, the approach recommended by the study team ensures investment risk is minimised by building upon an existing

38

The UK Research Data Service Feasibility Study organisation and leveraging investment already made in existing repositories, infrastructure and networks. There is no doubt that falling behind the competitors (US, Germany, Australia, Canada) is a real risk. Slipping in global research ratings is a real danger as other countries have recognised that unlocking research data and making it shareable, accessible etc will move their research capabilities and fleetness of foot ahead The biggest risk is to do nothing.

3.15

ENGAGEMENT WITH THE CORPORATE SOCIAL RESPONSIBILITY AGENDA

The beneficial contribution UKRDS can make in this area is documented under the benefits section of the Business Case and Plan.

39

The UK Research Data Service Feasibility Study 4.

BUSINESS CASE AND PLAN

4.1

PROJECT AND SERVICE OBJECTIVES

The objectives of UKRDS have been clearly set out in the feasibility report section. The business case and plan for UKRDS, as set out below, is built around an incremental development approach to develop its capabilities and capacity, beginning with the solid foundations laid during a Pathfinder piloting stage. The design of UKRDS ensures that the service is scalable and the single most important driver to this scalability is the value it brings and is seen to be bringing to the research communities. The planning is therefore inextricably linked to the benefits both tangible and intangible which UKRDS will deliver. Therefore this section begins by setting out those benefits in considerable detail.

4.2

BENEFITS OF UKRDS

Business case value proposition – outcomes and benefits to the HE sector Clear evidence emerged from the work conducted with the UKRDS case study sites that the majority of HEI-based research projects produce valuable data that varies considerably in size and complexity across all disciplines. A key message that came across from both support professionals and researchers is that support expertise and structuring the data for long-term preservation management can be problematic for these projects. The feasibility study also identified that consideration of the problem is not confined to the UK but has also been recognized, for example, in the paper by Lynch, Nature 2008 where he references studies conducted in the USA reporting similar problems; the recent Priority Initiative on Digital Information by the Alliance of German Scientific Organisations which prioritises action on preservation and re-use of research data; the Australian Commonwealth’s Government National Collaborative Research Infrastructure Strategy which has funded the creation of the Australian National Data Service (ANDS); and in the Canadian National Consultation on Access to Scientific Research Data Final Report which established the research data strategy group that is undertaking a gap analysis of Canadian data services. These initiatives send a clear signal to the UK in terms of actions planned or already underway across the international community. The option for the UK to ‘Do Nothing’ may still be viewed as acceptable in some quarters but the increasing pace of activity amongst our ‘competitors’ offers a clear challenge to this view. Closer to home, the option of doing nothing has its own more ‘localised’ implications. These include the potential economic consequences of lost data where experiments have to be re-run, or the failure to exploit the opportunity cost of data that is lost and that cannot be recreated; both sets of outcomes must be taken very seriously in an age where increasingly our knowledge base is held electronically. The development of UKRDS begins to address the issues that arise from the generation of increasing amounts of research data, acknowledging the need for a collective approach, and broad multi-disciplinary view of the UK’s research data “bank”, thus bringing about a radical improvement in the management and exploitation of research data. It will promote dialogue and shared practice between communities of interest, with the aim of protecting investment in research and preserving opportunities for future research. The benefits of bringing a coherent approach to the application of standards and good practice will be to facilitate and actively foster inter-disciplinary re-use of research data and general interoperability, thus improving linkages and collaboration institutionally, nationally, and internationally. UKRDS has the potential to be the keystone in the infrastructure necessary to 27 deliver the government policy commitments to support science and innovation , supporting the need for an e-infrastructure for research, facilitating ready and efficient access to research data in the growing UK research base. The key value proposition elements of UKRDS can therefore be summarised as follows: • 27

Protecting investment in research and delivering more value;

“Science and Innovation Investment Framework 2004-014”(HMSO 2004)

40

The UK Research Data Service Feasibility Study • • • • • • • • • • • • • • • • • • •

Preserving opportunities for future research; Promoting the work of the institution and researcher; Informing the strategic development of the research infrastructure; Reduce duplication, recreation and errors; Researchers better equipped in data handling, save time and focus on better use of their time; Data management strategies can be recycled; Volume growth/capacity planning is more cost effective; More opportunity for re-use; More opportunity for cross reference and dataset integration; Better targeted retention and disposal; Shared skills gives better coverage, thus better productivity in both service providers and researchers; Proper focus for practical best practice; Reduced unplanned data loss; Greater transparency and visibility of research data [Data as an ‘opportunity’]; Falling behind the competitors is a real risk, therefore protects international competitiveness – EU, USA, Australia, Canada are all investing; Aggregating requirements will give payback to funders; Consistent with and complimentary to other initiatives; Maintain research standing as other countries have recognised that unlocking research data and making it shareable, accessible etc will move their research capabilities and fleetness of foot ahead; HEFCE is considering the inclusion of research data as a category for evaluation as part of the Research Excellence Framework (REF). It would be a significant development and would mean that research data could therefore contribute to a university’s overall research rating and the resultant research funding it receives from HEFCE.

Additional benefits include direct benefits to the institution, to the researcher and to funders. For example, UKRDS: • • • •

Would provide the focus for promoting the work of the institution and the researcher; Would provide guidance on which repository to get research data from and act as a gateway to approved service providers. Would work with institutions, researchers and funders to promote and encourage the use of Data Management Plans; Would inform strategic development of the research infrastructure both at local and national levels, and work with stakeholders to inform policy and resourcing of post-project long-term data management.

Savings and value-added benefits A UKRDS service will be highly attractive and provide significant economies of scale for HEIs and additional benefits to many departments, especially those that do not have subject/national data services for their discipline. Other value-added benefits will be possible from the greater coherence established thanks to the simple expedient of improved knowledge of data store availability and capacity plus facilitated ease of access. Although there are very few existing comparators, the following areas of realisable financial benefits were identified during the course of the study: 1. Based on the QR modelling set out in the feasibility study, the projected savings which could be delivered by a fully scaled-up UKRDS service have been estimated by the study team to be the financial equivalent of 63.5 FTEs across the sector. This is a deliberately conservative estimate in that local central data services would be unlikely to replace data curation provision in larger departments within institutions. 2. Clear evidence emerged from the case studies that demand for local data storage frequently exceeds supply. Whilst storage itself is now inexpensive and continuing to fall, when staffing costs, air conditioning and other energy costs are taken into account, the overall service (as provided by data centres) is becoming increasingly expensive. Economies of scale are therefore possible, especially through shared facilities which, when

41

The UK Research Data Service Feasibility Study extrapolated across the full HEI community, will undoubtedly provide substantial marginal cost benefits. 3. The recent HEFCE Keeping Research Data Safe study emphasised the importance of economies of scale and the impact this has on unit costs for digital preservation. As an example, it quotes the University of London Computer Centre (ULCC) which runs the National Digital Archive of Datasets, that provided the study with costs for accession rates of 10 or 60 data collections: a 600% increase in accessions only increased costs by 325% as a result of economy of scale effects brought about by centralising support and guidance. The study also has a detailed case study for the Department of Chemistry at Southampton. This includes comparison of current costs for an outsourced national datastore with projected costs for an institutional central datastore which shows an institutional solution’s costs would be c.8% higher. A UKRDS service which had a UK-wide view of capacity and a current view of market costs would add considerable value to such decisions.

Estimating the financial benefits of UKRDS The long-term benefits of UKRDS should be viewed primarily in terms of the added value it gives to the research process, to researchers, and to UK research competitiveness. However, expressing these benefits to the UK and research in economic terms remains extremely challenging. Other mature national services engaged in support of research have undertaken consultancies which seek to establish this value in economic terms. The British Library study Measuring our Value (British Library 2004) used the process of contingent valuation to measure the direct and indirect contribution of the British Library as a national service to research and the British economy. It concluded that for every £1 of public funding spent on the BL, £4.40 was generated for the British economy. The JISC has also undertaken a study of the value for money of its national services to UK universities (JISC 2006). It concluded that on average for every £1 of the JISC budget the community received at least £4.86 of demonstrable value. The majority of financial benefits identified arise from gains in quality or efficiency in the work environment. In some areas the value is significantly higher, for example national content negotiation via JISC Collections is seen as generating £26 of savings for every £1 spent. The UK Research Councils are also working on establishing the impact of research on the UK economy. The ESRC are examining ways of measuring the economic impact of their research funding and the impact of their data support services within this (Frontier Economics 2007). Their study notes the challenges in establishing economic value and proposes a framework that will enable ESRC to begin to assess the economic impact of their funding activities more effectively in the future. Section 3.1 focuses on ESRC supported datasets and sets out how one could measure: • • •

Aggregate benefits of the datasets ESRC funds under the headings of users, volume and value; The “counterfactual” - benefits that could be expected to accrue under the counterfactual of no ESRC; and Impact and effectiveness of datasets for high impact, low volume users.

It is important to note that all the above studies are setting out to measure the economic impact and value of established services, in many cases already in existence for several decades. It is clear that the economic benefits of these services have increased over time as the services have matured. Although these studies cannot be used for projecting value of a proposed service in its early years, we believe they provide valuable pointers to the likely economic value of a mature UKRDS in terms of the benefits to research data collection, accessibility and re-use. We also believe metrics being developed in these studies may also be helpful in measuring the impact and value of UKRDS over time.

Basis for benefits calculations The table below highlights the key baseline components, with a link to appropriate commentaries, used in the calculation of financial benefits.

42

The UK Research Data Service Feasibility Study Savings within HEIs (based on QR modeling) arising from a centralised UKRD support service

Build-up of value added by satisfying demand for capacity building which cannot be satisfied locally*

Build-up of value added to research programmes

Build-up of value added from driving down unit costs for digital preservation*

On average each FTE, in a similar role, costs 28 £95,000 at FEC per HEI

To be determined during Pathfinder stage

Based on extrapolation of case studies’ data, almost £4billion is spent on research in UK universities pa

To be determined during Pathfinder stage

QR modeling calculations have resulted in an assessment of 317.5 FTEs, i.e. an average of 2.5 FTEs per HEI to operate an equivalent local service

50% of this expenditure will produce research data to which UKRDS can add value

At least 20% of the QR figure or 63.5 FTEs can be saved by using UKRDS

The added value is linked to the growth in DMPs (2% in Year 3, 10% in Year 4, 20% by Year 5)

Savings are scaled-up on the basis of 10% of available savings by Year 3, 15% by Year 4 and 20% by Year 5

Added value is scaled-up on the basis of £1 added for every £50 spent (or, a ‘saving’ of 2%)

*Note: These two benefits are closely linked but at present, insufficient baseline information exists from which to produce reasoned and robust financial benefits. However, the baselines will be established during the Pathfinder stage and the activity will also benefit from other initiatives known to be looking at these issues29.

Benefits summary The following table summarises the financial benefits from UKRDS which can reasonably be constructed at this time. Detailed spreadsheets on which these calculations are based, are attached as Appendix B4.

Build-up of savings from QR analysis Build-up of value added from demand for capacity building which cannot be satisfied locally To be assessed during Pathfinder

Build-up of value added to research programmes Build-up of value added by reducing unit costs To be assessed during Pathfinder

TOTALS 28

29

PATHFINDER Year 1 Year 2 £m £m 0 0

SCALED-UP UKRDS Year 3 Year 4 Year 5 £m £m £m 2.41 3.62 4.83

TOTAL £m 10.87

0

0

tbc

tbc

tbc

tbc

0

0

0.79

4.10

8.67

13.56

0

0

tbc

tbc

tbc

tbc

0

0

3.20

7.73

13.50

24.42

Based on data point calculations set out in Appendix B3 Reference SHED initiative

43

The UK Research Data Service Feasibility Study Further non-financial benefits of UKRDS Widening Participation As noted by the National Science Foundation in the US, investment in e-infrastructure (“Cyberinfrastructure”) and research data collections can promote wider objectives for social inclusion and widening participation in Higher Education and Research. Digital data collections give researchers access to data from a variety of sources and enable them to integrate data across fields. The relative ease of sharing digital data allows researchers, students, and lecturers from different disciplines, institutions, and geographical locations to contribute to the research. It has the potential to broaden participation in research by expanding the opportunity for all who have access to these data collections to make a contribution. Sustainability Sustainability specifically relating to global warming and environmental issues is a major concern to government, the funding councils and HEIs. Growing price inflation and uncertain lines of supply for energy are acting to reinforce environmental concerns. UKRDS has the potential to support green computing initiatives and their contribution to environmental sustainability through efficiencies in collaborative data storage. Enhancing the contribution of HE to the economy and society The growth of the UK’s knowledge-based economy depends significantly upon the continued support of the research community and in particular its activities to engage with industry and to apply its world-leading innovations to commercial use. A national e-infrastructure for research provides a vital foundation for the UK’s science base, supporting not only rapidly advancing technological developments, but also the increasing possibilities for knowledge transfer and the creation of wealth. A knowledge infrastructure supporting the deployment of repositories of research information is seen also as a key requirement for UK science, technology, public policy and economy. Ready and efficient access to digital information of all kinds such as experimental data sets, journals, theses, conference proceedings and patents is the life blood of research and innovation. The HFCE strategy states that HEFCE will: “...continue to encourage the effective sharing of research findings and outcomes, both to support research and teaching within HE and to inform the wider public. To achieve this we will work with partners to improve systems for researchers to share information and disseminate outputs as widely as possible, including through new technology.” (HEFCE 2007). The work of UKRDS to promote curation and access to research data outputs supports these aims. Enhancing excellence in learning and teaching The benefits from preserving and enhancing access to digital research data also support excellence in learning and teaching. Research data is an essential input to scholarly endeavour, whether that endeavour is focused on extending the frontiers of knowledge, or understanding the discoveries of the past. Research and learning are facilitated when the scholarly record is complete and easily accessible. Properly curated digital research data can be readily integrated into research and learning workflows now and in the future. The data is easily located when needed, and maintained in forms compatible with contemporary technology environments. This can lead to lower costs and higher productivity in research and learning, as less time is spent searching for needed data and converting it to usable forms. Digital preservation services are part of the general information infrastructure needed to support research and learning workflows at HEIs.

44

The UK Research Data Service Feasibility Study

4.3

DEVELOPMENT PLANS

Overview The feasibility report includes the vision of a UKRDS that will have the capacity to develop all of the capabilities for which it has been designed and deliver services to an extensive, multi-disciplinary research community. The project sponsors recognise that it will take several years to reach this point - always assuming that value of UKRDS is recognised, delivered consistently and built upon. The development plan is founded on a Pathfinder stage which will build the initial capabilities of UKRDS, working with a small group of stakeholders. Thereafter, UKRDS growth will depend on the results obtained during the Pathfinder and the ongoing success of the service.

Key milestones The diagram below illustrates the key milestones particularly during the Pathfinder stage, showing the two separate strands of activity – Operational development, which essentially covers the establishment of the UKRDS central functions and Capability development, which covers the design and development of the service capabilities UKRDS will deliver to the research community. Key UKRDS milestones over a 5 year period Pathfinder Year 2

Year 1

Q1

Q2

Q3

Q4

Q1

Q2

Q3

Q4

Capacity planning and investment Establish governance and management structure Access management Recruit staff Training, tools, methodologies, handbooks Engage with Pathfinder community Develop communications strategy and deliver communications programme Develop performance measurement criteria and monitor UKRDS development and take-up Review and agree scaling-up plans Assess additional service benefits

Develop policy, Relationship mgmt. and Advisory services, Service provider administration

Post-Pathfinder scaled-up UKRDS Year 3

Q1

Year 5

Year 4

Q2

Q3

Q4

Q1

Q2

Q3

Q4

Q1

Scaled-up UKRDS service developed on value model and financial basis agreed during Pathfinder

Operational development Capability development

UKRDS Pathfinder The UKRDS Pathfinder will build on the work carried out during the feasibility stage by including the current four case study institutions, plus at least three more - potentially two from Scotland and one from Wales - as partners. This group of Pathfinder partners will be widened to include representation from the funding community (at least one Research Council and a major third-sector funder such as the Wellcome Trust) and service providers such as DCC, UKDA, RIN and JANET (UK).

45

The UK Research Data Service Feasibility Study Development will be structured around a number of work packages with clearly defined scope and outputs. Full specifications of the work packages will be an early deliverable of the Pathfinder but the study team was able to define these requirements to a sufficient level of detail to support the development estimates presented later. Work package high-level descriptions are contained in Appendix B2. The main objectives of the Pathfinder stage are to: 1. Provide effective management, communication and co-ordination across all Pathfinder activity; 2. Produce an effective critical mass of services to ensure a workable and measurable UKRDS is created; 3. Ensure all customers of the services are prepared and trained as necessary; 4. Build expertise in Data Management Plans (DMPs) and all aspects of using a DMP Registry; 5. Ensure Pathfinder performance and usage is monitored to feed into timely response to UKRDS scaling-up decisions. 6. Develop appropriate relationships with Service Providers. The Pathfinder stage is aimed at delivering the following key outcomes: •

Outcome 1: At an appropriate review point, the Pathfinder stage is ended and UKRDS enters a 'steady state' stage;



Outcome 2: Confidence and trust in the coherence UKRDS can deliver has been established;



Outcome 3: The most appropriate governance structure for UKRDS is established;



Outcome 4:The financial and operational basis for scaling-up of UKRDS is confirmed;



Outcome 5: The long-term security of UKRDS has been established.

The table below summarises the key outputs and target milestones at each step of the Pathfinder stage. Detailed resource plans are provided in Appendix B3. Key outputs/capabilities

Milestones

UKRDS Pathfinder contract with lead institution

Year 1 Q1

Work package

UKRDS OPERATIONAL DEVELOPMENT WORKSTREAMS

WP1.1

Establish governance and management

WP1.2

Mobilisation (1) ~ Recruit and initiate staff, install infrastructure etc

WP1.3

Mobilisation (2) ~ Engage with and establish commitment from PATHFINDER community members

Agreements with each member of the Pathfinder community

WP1.4

Communications ~ Pathfinder stakeholder management

Communications plan and stakeholder management plan

WP1.5

Performance and Usage Measurement:

KPIs and Measurement Plan

Work packages detailed planning

Detailed work packages specifications; UKRDS implementation planning documentation (e.g. detailed plans and risk register)

WP1.6

Year 1 Q1/Q2 Pathfinder organisation structure Year 1 Q1/Q2

Year 1 Q2 Year 1 Q2 Year 1 Q2/Q3

46

The UK Research Data Service Feasibility Study Key outputs/capabilities

Milestones

UKRDS CAPABILITIES DEVELOPMENT WORKSTREAMS GROUP A Capabilities in policy direction and standards

Year 1 Q3

Policy and strategy

Capabilities in communications and stakeholder management

Year 1 Q3

Relationship Management

Advisory Services

Capabilities in specialist data management advice and first and second-line support

WP2.1

Year 1 Q2/Q3

GROUP B WP2.2

WP2.3

Training and Development

Data management courseware, DMP templates and exemplars

Year 1 Q3/Year 2 Q2 Year 1 Q4/Year 2 Q2

Tools, Methodologies and Handbooks

Detailed procedures (incorporating existing best practice), ingestion and discovery tools and associated documentation

Year 1 Q3/Q4

Service Provider Administration

Capabilities in procurement and negotiations; expertise in the current supply-side market and third-party vendor market

Year 1 Q4/Year 2 Q2

Capacity planning and Investment

Analysis of current availability (maintained regularly); trend analysis; evaluation and recommendations on commissioning new capacity

Database Registry

DMPs registry; capabilities in registry development

Year 1 Q3 for initial version

Access Management

Capability in access management, specialism in secure access management and tools

GROUP C

WP2.4

WP2.5

GROUP D WP2.6

WP2.7

Year 1 Q3/Year 2 Q1

GROUP E

WP2.8

Review and prepare to implement

Analysis of build-up of value of UKRDS; transition planning recommendations

Year 2 Q2 onwards

47

The UK Research Data Service Feasibility Study Post-Pathfinder development The performance of UKRDS and its ability to attract support from the research community will be constantly monitored during the course of the Pathfinder stage. Over the final three quarters of the stage, a specific work package (WP 2.7) will consider how UKRDS is developing, in the context of moving to the next stage. If UKRDS is to be scaled-up, as the project sponsors believe at present it will, the subsequent detailed development work for UKRDS will de determined during this period. The transition to a longer-term governance model will also be finalised as part of this work package.

4.4

ESTIMATED COSTS

Overview All costs and benefits have been calculated using the full economic costs, based on current economics and are inclusive of value added tax. During the first two years of the five-year planning period it is assumed that the operation of UKRDS will be managed from within a host institution and that it will not, at that stage, be a separate legal entity. It is assumed that the costs during this period are limited to those arising out of the specific definition of the Pathfinder approach and are designed to encompass the outcomes documented above. During the subsequent years, indicative costs have been included based on the assumption that the service builds gradually, based on a gradual increase in the use of its services, as the value it delivers is accepted by an increasing proportion of researchers and funders. The evolving outcomes of the Pathfinder will significantly influence the exact scope and structure of future developments and therefore those costs and benefits carry considerably less certainty. Of particular note during this period will be the legal structure of the operating entity as this will have a bearing not only on operating cost structure of also on taxation, which is covered later in the section on VAT. In the calculation of costs and financial benefits it has been assumed that no financial benefits will accrue during the Pathfinder period. Whilst we would expect that some benefit would be derived and that this would be in line the scale of benefits delivered later, it is felt that the costs of ramping up to that level means that during the first two years there will be limited financial benefit.

UKRDS forecast costs The forecast costs of UKRDS are shown in the table below and are based on current economics, inclusive of input VAT where appropriate. The basis for the costs Full costs details are provided in Appendix B3. Base data: A number of key data points were calculated for use in the costs model as prerequisites for the calculations, the most important of which are the data points relating to staff costs. Based on HEI spine rates for appropriate grades, average salaries were converted to daily unit costs against which an FEC uplift was applied, in line with the costs assumptions given below. To shape the costs, the project team divided the development of UKRDS into the series of work packages listed above and as stated earlier, the work packages were defined to the level of detail necessary at this stage to estimate required resource levels. In the summary table below: UKRDS Operational development: covers resource costs to establish the central services and are almost entirely covered by staffing costs with a small budget set aside for workshops and marketing events. There is also a budget to provide financial support to the participating institutions of £250,000 for each of the two years of the Pathfinder stage. UKRDS capabilities development and delivery: It is anticipated that large parts of the development costs will be procured from suitable third party providers or, potentially, participating institutions that have both the capability and experience of similar development tasks. The developments include a scalable approach to building up the registry - beginning with a prototype and, providing it can be justified – moving to a substantial database environment.

48

The UK Research Data Service Feasibility Study Capital Investment: Capital investment sufficient to provide the infrastructure for the project team and initial operating capability of UKRDS. It is assumed that staff will be allocated space within existing facilities and will be provided with appropriate furniture fixtures and fittings, with desktop PC configuration. Thereafter, it is anticipated that both capital and investment and development costs will be expanded to more significant levels designed to achieve expanded capability and capacity to service the future demands of a growing UKRDS community. In particular these costs will include purchase/licensing of specialist data management software, the creation of extensive discovery capabilities together with the associated tools, documentation and training. PATHFINDER Year 1 Year 2 £m £m UKRDS Operational Development SUB TOTAL UKRDS Capabilities Development and Delivery (near-term) SUB TOTAL UKRDS Capabilities Development and Delivery (future) SUB TOTAL COSTS TOTAL COSTS ANALYSIS (of the above) UKRDS running costs External development costs and ongoing maintenance CAPITAL Pathfinder development environment UKRDS initial scaled up environment

SCALED-UP UKRDS Year 3 Year 4 Year 5 £m £m £m

TOTAL £m

1.18

1.11

0.62

0.66

0.66

4.24

0.74

2.25

2.99

3.30

4.71

13.98

0.00

0.03

0.11

0.11

0.11

0.37

1.92

3.36

3.61

3.95

5.37

18.22

1.87

2.73

2.67

3.59

4.74

15.60

0.05

0.63

0.93

0.36

0.63

2.61

50,000 250,000

The main costs inclusions and exclusions The main costs assumptions in the model are: 1. Funding of institutions activities and infrastructure is currently excluded, both at the Pathfinder stage and on-going; 2. Full Economic Costs (FEC) basis have been used for staff costs and include salaries, NI, benefits, office accommodation and facilities including individual ICT needs, staff recruitment and general training costs and input VAT; the uplift used for this model is 100% of salary, which is in line with HEI practice; 3. Following the building of the prototype register, an Oracle environment has been used as the benchmark and HEI discounts (currently 85%) factored into the calculations; 4. NPV calculations have not been included at this stage; 5. Inflation has been ignored at this stage; 6. Capital costs are shown separately and include computer hardware but exclude software licences and development; 7. Capital is generally shown as a depreciation line over three years, apart from the Pathfinder stage, which is two years. Growth assumptions Two growth scenarios post the Pathfinder stage have been considered on the following basis: 1. Conservative growth: Doubles by year 5, and 2. Ambitious growth: Quadruples by year 5.

49

The UK Research Data Service Feasibility Study Growth assumption (1) has been used in the current modelling.

4.5

COSTS / BENEFITS ANALYSIS

Based on the information above, the forecast payback model is shown as shown below. Whilst this is currently showing a payback by Year 5, the project sponsors firmly believe this is a conservative assessment as there are further financial benefits to be assessed which will be calculated during the Pathfinder stage.

COSTS UKRDS Operational Costs UKRDS Capabilities Development and Delivery TOTAL COSTS BENEFITS Build-up of savings from QR analysis Build-up of value added from demand for data storage which cannot be satisfied locally* Build-up of value added to research programmes Build-up of value added by reducing unit costs* TOTAL BENEFITS Net beneficial gain (p.a.) Net beneficial gain (cumulative)

Year 1 £m

Year 2 £m

Year 3 £m

Year 4 £m

Year 5 £m

Overall £m

1.18

1.11

0.62

0.66

0.66

4.24

0.74 1.92

2.28 3.40

3.10 3.72

3.41 4.07

4.82 5.48

14.34 18.58

0.00

0.00

2.41

3.62

4.83

10.86

0.00

0.00

0.79

4.10

8.67

13.56

0.00

0.00

3.20

7.73

13.50

24.42

-1.92

-3.40 -5.31

-0.52 -5.83

3.66 -2.17

8.02 5.84

* To be assessed during pathfinder stage The on-going operational costs, once the initial development stage is completed, are modest in comparison with the continuing costs for development of capabilities and service delivery. In steady-state, they represent approximately 12% of overall costs. The costs for continuing to develop capabilities and deliver services to a growing community will only be justified if the commensurate benefits are being delivered. As said several times in this document, and worth repeating in this context, such a situation only arises if the community sees the value of UKRDS and demonstrates this by increasing the demand on its services.

4.6

TAXATION

UKRDS expenditure The following analysis lays out the position regarding VAT. It is envisaged that this would apply after the initial Pathfinder period, It is fair to say that the majority of the costs incurred by UKRDS will be taxable and VAT will be charged at 17.5%. Where this VAT cannot be recovered it will be a cost to a new entity.

UKRDS income The UKRDS entity will be making taxable and exempt supplies and it will therefore be partially exempt. It will also receive grant income which is outside the scope of VAT and it may also be giving access to certain categories of customers without a charge being made. This is considered non-business income for VAT purposes. Therefore a business/non-business calculation may need to be carried out. UKRDS will therefore incur VAT which it cannot recover unless some effective planning is put in place.

50

The UK Research Data Service Feasibility Study The VAT status of UKRDS customers The VAT status of UKRDS customers will vary depending on the nature of the body. Some customers will be able to recover all the VAT, such as commercial bodies, local authorities and NHS Trusts, subject to certain exceptions30. However, customers such as educational institutes and research institutes can only recover VAT incurred on UKRDS services in part or not at all.

UKRDS' potential VAT structure There are four potential options: Option 1 - If UKRDS is set up as part of a university then its VAT recovery position will be that of the University. Depending on the type of university, its VAT recovery percentage is typically in the range of 3% to 30%, this is dependent on the partial exemption method of the university. Therefore, it will incur irrecoverable VAT. Option 2 - If UKRDS is set up as part of an organisation such as UKDA then its activities will become part of UKDA's (the university host) VAT registration and its recovery percentage will be that of UKDA. The recovery percentage of UKDA is not public domain information. Option 3 - If UKRDS was set up as a commercial organisation, the new entity would be partially exempt and as it would also receive non-business income it will need to carry out a business/nonbusiness calculation. It will suffer irrecoverable VAT. Option 4 - If UKRDS becomes part of an organisation such as JANET(UK) then its activities will be subsumed into JANET(UK)'s single VAT registration. As UKRDS will be making both taxable supplies and receiving non-business income, adding UKRDS' activities to JANET(UK) may change its overall VAT recovery percentage which is currently 15% - 20%. Any change of activity will need agreeing with HMRC. It will incur irrecoverable VAT. With all these options UKRDS will incur and suffer irrecoverable VAT. For options1, 2 and 4 UKRDS will be included within an existing VAT registration and will therefore have an impact on the existing entity’s recovery position and the existing entity's position will impact on UKRDS' recovery. Potential Solution 1 There is provision in EU law for members of a group of entities which incur costs for non-business purposes to exactly reimburse those costs to its members as an exempt supply. Potential Solution 2 If UKRDS was set up as an 'eligible body'31 then it could be argued that the UKRDS services supplied to Educational Institutes were 'closely related' to education and should be exempt. This means that VAT will not be charged to Educational Institutes, thus keeping their irrecoverable VAT costs down. If this 'eligible body' is VAT grouped with a subsidiary which is not an eligible body, the subsidiary could be used to supply UKRDS services to its fully taxable customers and charge them VAT. The VAT group would be partially exempt but VAT would not need to be charged to its Educational customers.

Recommended way forward The project team recommend UKRDS undertakes further detailed financial modelling to understand the trade off between providing taxable services and the VAT cost to its customers. There may be a need to obtain information from the encompassing entity. This work should take place during the Pathfinder stage.

4.7

FUNDING PROPOSITIONS

UKRDS would need to work within the existing dual support funding model for sustaining research. As noted in the feasibility study, UK research has a wide funding base including substantial contributions from charities who do not contribute overheads or indirect costs for sustaining 30

Reference Grant Thornton’s report

31

As defined in the Grant Thornton report

51

The UK Research Data Service Feasibility Study research infrastructure such as UKRDS. Similarly the research councils have differing policies on direct funding of research data infrastructure although all contribute to this to some degree indirectly via institutional overheads included in research grants. The UK funding councils are the major funders of institutional infrastructure for research via Quality Related funding streams to HEIs and also fund the shared services programme for universities and many existing shared services via the JISC. In the first instance we believe the Pathfinder model for UKRDS would need central core funding from the funding councils and a principal of their central funding would need to be maintained into subsequent phases with additional income streams from elsewhere contributing over time. Other stakeholders participating in the Pathfinder and subsequent phases might contribute either funding or resources in kind to leverage this funding. Potential analogues for longer-term funding of UKRDS include other established shared services within the sector such as JISC Collections and JANET UK. They are particularly relevant to UKRDS given their focus on electronic services and innovation. In particular we would point to models within JANET UK which has mix funding streams centrally from the funding councils (via the JISC), subscription income direct from HEIs, and ad hoc funding and subscriptions from Research Councils and other bodies for services. These have been developed over a long time period as the network and services reached critical mass. In a similar vein, JISC Collections has full central funding for new resources and services to encourage take-up and innovation until a critical mass is achieved and then subscription fees have been introduced to allow investment in new resources. With the example of these organisations it is clear that important initiatives can be sustained through a mix funding model and that UKRDS beyond the Pathfinder stage can usefully employ these strategies to ensure appropriate levels of sustainable funding. For the Business Plan, as presented, we have assumed the following sources of revenue:

4.8



HEFCE to provide grant funding for the Pathfinder stage;



HEFCE to provide grant funding for the subsequent development programme during years 3 to 4;



After the Pathfinder stage, the Research Councils provide project funding for the project teams to develop data management plans for their respective plans;



Registration of DMP’s with UKRDS may eventually be charged for, but such income is excluded from the current model



Training for courses could be levied eventually for research staff/librarians taking up responsibility for Data Management Plans, but such income is also excluded from the current model.

ASSESSEMENT OF SUSTAINABLITY

During the course of this project a number of workshops were carried out with the case study institutions and without exception there was considerable support for the concept. There were however concerns expressed about differentiating between project costs, data curation costs and infrastructure costs. Currently, under the dual funding model, institutions provide the staff and facilities with assumptions about infrastructure costs inherent within them. This is especially the case with the infrastructure required to provide digital data storage. Where research councils or other funding bodies support specific projects, they do not currently do so on a full economic cost basis and it is true to say that the issue of medium to long term data curation and management are underprovided. There is therefore an issue that we are not simply looking to replace a fragmented way of achieving an end result with a structured approach that is more coherent. There is actually an argument for saying that UKRDS will expand the extent to which the services are provided at the same time as enhancing the structure, consistency, coherence and professionalism embodied in the approach. During the course of the Pathfinder, it is anticipated that one of the aspects of work carried out by the programme team will be to explore evaluate and propose detailed funding models. During this time, UKRDS would need to work within the existing dual support funding model for sustaining research. As noted in the feasibility study, UK research has a wide funding base including substantial contributions from charities who do not contribute overheads or indirect costs for sustaining research infrastructure such as UKRDS. Similarly the research councils have differing 52

The UK Research Data Service Feasibility Study policies on direct funding of research data infrastructure although all contribute to this to some degree indirectly via institutional overheads included in research grants.

4.9

SUCCESS CRITERIA, CONSTRAINTS AND DEPENDENCIES

These aspects have been previously covered in the feasibility section.

4.10

KEY RISKS AND MITIGATING ACTIONS

A full risk register will be defined to recognised standards during the detailed planning of the Pathfinder work packages. At this stage the following table summarises the main risks as judged by the project team.

Risk

Sample risk factors

Mitigating actions

There are persistent negative perceptions of UKRDS among funders leading to lack of support

Lack of confidence in governance, management, or program delivery

UKRDS Pathfinder is not governed effectively

Lack of effective mechanisms for planning, leadership and collaboration between work packages is not effective

UKRDS stakeholders are not effectively engaged (Pathfinder initially)

Stakeholders are not prepared to undertake the changes within their own organisations that are necessary for the UKRDS Pathfinder to make progress; they do not see their interests in data management and those of UKRDS as being properly aligned

UKRDS service providers do not contribute effectively

Service providers see themselves as disconnected from UKRDS decision making or strategic planning

Re-users of research data do not use UKRDS supplied mechanisms to discover and access it (i.e. the DMP register approach)

Access control mechanisms are too restrictive, complex or simply don’t exist whilst other sources of data for re-use are more attractive and easier to use

Maintain regular communications to allow input to decision making; provide a central point where progress towards UKRDS is tracked and promote positive feedback ‘good news’ from the research community Management and planning processes are established in line with recognised industry standards; key positions of chair of key bodies such as steering committee, executive director and steering committee members are made with a view to gaining the necessary experience and commitment for a major program such as UKRDS Build on the effectiveness of the extensive consultation already begun with case study universities and work with the Pathfinder institutions to address their concerns; ensure current best practice and complementary initiatives are fully appreciated and work constructively to improve such initiatives (e.g. JISC support activity including the work of the DCC, as well as work at the research council level, particularly ESRC) Put in place formal procurement processes to ensure that the requirements are understood and that potential providers are in a position to the meet set criteria; ongoing contract management to ensure the continuation of required services to contracted service levels Ensure use of best practice and include adequate time at the design stages of the appropriate work packages to ensure attractive tools are made available; involve a small number of key vendors early in the Pathfinder stage to consider available options

High quality UKRDS staff are hard to recruit

The constant problem of limited availability of skilled staff in key data management roles

Commence recruitment as early as possible and look very closely at the market rates for key individuals; look to attract interims if necessary on short 53

The UK Research Data Service Feasibility Study Risk Funding for UKRDS is inadequate to achieve its objectives

Sample risk factors

Mitigating actions

Government budget constraints and/or competition from other programs results in reduced funding

contracts which will also help control unit costs Ensure UKRDS delivers early wins during the first year of the Pathfinder stage to generate momentum and be prepared to re-focus UKRDS activity to improve likelihood of successful outcomes

54

The UK Research Data Service Feasibility Study 5.

OUTLINE GOVERNANCE AND MANAGEMENT PROPOSALS

The project believes that UKRDS needs to be established as a service delivery organisation first and foremost and must be seen to be an integral part of the HEI research infrastructure. There are a number of ways that this may be achieved and we propose that these be addressed during the Pathfinder phase. It may be that an organisation analogous to JISC Collections, Janet or the JISC Services Company may be suitable. The Pathfinder may be hosted as seems convenient while the long term structure is considered in detail by all of the key stakeholders.

55

The UK Research Data Service Feasibility Study 6.

CONCLUSIONS, RECOMMENDATIONS AND NEXT STEPS

The proposition that there are significant gaps to be filled in relation to meeting the UK’s research data management challenge has been borne out by the detailed research conducted during the feasibility study . A UK-wide approach is feasible and a UKRDS could be a cost effective response to growing demand for research data management generated by the increasing volumes of research data. However UKRDS it is not solely about data storage: it is essentially about the management of the whole data lifecycle and the co-operation, training, and infrastructure this requires. It’s also about leveraging more research value and a higher global research reputation for the UK from the investment made by funding bodies. This underlines the proposition made by other UK studies, now also being articulated by overseas competitors, that research data can be a significant research resource in its own right and can potentially enhance the UK’s research output and reputation. A UKRDS would embrace rather than replace existing facilities - many first-class building blocks are already in place. It is vital therefore not to reinvent any wheels or interfere with existing good data management practice. Thus the Co-operative model approach has much existing excellence and potential best practice on which to build. This should provide strong assurances to HEFCE that investment in UKRDS is soundly based.

Recommendation Overall, the costs benefits analysis, whilst challenging to produce, has shown how clear financial benefits can be delivered and hence the investment in UKRDS justified. Equally importantly, this investment is scalable and can be managed to the extent that such investment only continues as the value of UKRDS is firstly established via a Pathfinder approach and subsequently grows in acceptance for the UK’s research communities.

Next steps The project team believes that it is imperative that momentum be maintained. Therefore we propose that: 1. Detailed planning of the Pathfinder implementation is undertaken in co-operation with case study institutions and DCC, RIN and UKDA and other stakeholders as may be appropriate; 2. A suitable host institution is identified to establish a Pathfinder Organisation; 3. HEFCE give urgent consideration to the funding of the Pathfinder phase.

56

The UK Research Data Service Feasibility Study 7.

REFERENCES

Arts and Humanities Research Council (AHRC), 2007, Information for applicants to AHRC June Deadline, cited in AHDS News. Retrieved 20/5/2008 from http://ahds.ac.uk/exec/news/ahrc-news-may07.htm ANDS Technical Working Group, 2007, Towards the Australian Data Commons: A proposal for an Australian National Data Service (Commonwealth of Australia, Canberra). Retrieved 20/5/2008 from http://www.pfc.org.au/twiki/pub/Main/Data/TowardstheAustralianDataCommons.pdf Beagrie, N., Chruszcz, J. and Lavoie, B., 2008, Keeping Research Data Safe: a cost model and guidance for UK Universities, (Joint Information Systems Committee 2008). Retrieved 20/5/2008 from http://www.jisc.ac.uk/publications/publications/keepingresearchdatasafe.aspx Brown, S., and Swan, A., 2008, Skills, Role & Career Structure of Data Scientists & Curators: Assessment of Current Practice & Future Needs. Retrieved 01/10/08 from: http://www.jisc.ac.uk/media/documents/programmes/digitalrepositories/dataskillscareersfin alreport.pdf Commonwealth of Australia, 2004, Backing Australia’s Ability – Building our Future through Science and Innovation (Commonwealth of Australia, Canberra). Retrieved 20/5/2008 from http://backingaus.innovation.gov.au/info_booklet.htm European Commission, 2007, Communication from the Commission to the European Parliament, the Council and the European Economic and Social Committee on Scientific Information in the Digital Age: Access, Dissemination and Preservation (Commission of the European Communities). Retrieved 20/5/2008 from http://ec.europa.eu/research/science-society/document_library/pdf_06/communication022007_en.pdf Grant Thornton, 2008a, UKRDS Shared Services VAT Report, 8th October 2008. Grant Thornton, 2008b, UK Research Data Services, case studies of comparators, 15th October 2008. Her Majesty’s Stationary Office (HMSO), 2004, Science and innovation investment framework 2004-2014, (Her Majesty’s Stationary Office, London). Hey, T., and Trefethen, A., 2003, “The Data Deluge: an e-science Perspective” in: Berman, Fran (Ed.) et al, 2003, Grid Computing: Making the Global Infrastructure a Reality, (John Wiley and Sons). Higher Education Funding Council for England, 2007, Shared services: invitation to submit expressions of interest, Circular letter 09/2007, (HEFCE, Bristol). Retrieved from: http://www.hefce.ac.uk/pubs/circlets/2007/cl09_07/ International Council for Science. 2004. ICSU Report of the CSPR Assessment Panel on Scientific Data and Information (International Council for Science). Lievesley, D. And Jones, S., 1998, An Investigation into the Digital Preservation Needs of Universities and Research Funders: the Future of Unpublished Research Materials, British Library Research and Innovation Centre Report no.109 (British Library 1998). Retrieved 29 June 2008 from http://www.ukoln.ac.uk/services/papers/bl/blri109/

57

The UK Research Data Service Feasibility Study Lord, P., and Macdonald, A., 2003, e-Science curation report (Joint Information Systems Committee) Lyon, E., 2007, Dealing and Relationships with Data: Roles, Rights, Responsibilities (UKOLN University of Bath). Retrieved 3/1/08 from http://www.jisc.ac.uk/media/documents/programmes/digitalrepositories/dealing_with_data_ report-final.pdf Martinez-Uribe, L., 2008, Scoping Digital Repository Services for Research Data Management: Project Plan, v2.2 date 27/2/08 (University of Oxford). Retrieved 20/4/08 from http://www.ict.ox.ac.uk/odit/projects/digitalrepository/docs/DigRepoProjectPlan.pdf National Science Board (NSB), 2005, Long-lived Digital Data Collections: Enabling Research and Education in the 21st century September 2005 (National Science Foundation). Retrieved 10/12/07 from http://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf National Science Foundation, 2007, Cyberinfrastructure Vision for 21st Century Discovery, (National Science Foundation Washington DC. Retrieved 10/12/07 http://www.nsf.gov/pubs/2007/nsf0728/index.jsp OECD, 2004, Declaration on Access to Research Data from Public Funding. (Organisation for Economic Co-operation and Development, Paris). Retrieved 20/12/07 from http://www.codataweb.org/UNESCOmtg/dryden-declaration.pdf OECD, 2007, Principles and Guidelines for Access to Research Data from Public Funding. (Organisation for Economic Co-operation and Development, Paris). Retrieved 20/12/07 from http://www.oecd.org/dataoecd/9/61/38500813.pdf OSI e-infrastructure Working Group, 2007, Developing the UK’s e-infrastructure for Science and Innovation (National e-Science Centre Edinburgh). Retrieved 20/5/2008 from http://www.nesc.ac.uk/documents/OSI/report.pdfResearch Councils UK, 2005, Research Council UK Position Statement on Access to Research Outputs. Retrieved 20/5/2008 from http://www.rcuk.ac.uk/cmsweb/downloads/rcuk/documents/2005statement.pdf Research Councils UK, 2006, Research Council UK’ Updated Position Statement on Access to Research Outputs. Retrieved 20/5/2008 from http://www.rcuk.ac.uk/cmsweb/downloads/rcuk/documents/2006statement.pdf Research Councils UK, 2005, Research Council UK Position Statement on Access to Research Outputs. Retrieved 20/5/2008 from http://www.rcuk.ac.uk/cmsweb/downloads/rcuk/documents/2005statement.pdf Research Information Network, 2007, Research Funders’ Policies for the management of information outputs. Retrived 23/4/08 from http://www.rin.ac.uk/files/Funders'%20Policy%20&%20Practice%20%20Final%20Report.pdf Research Information Network, 2008, Stewardship of digital research data: a framework of principles and guidelines. Retrieved 23/4/08 from http://www.rin.ac.uk/files/Research%20Data%20Principles%20and%20Guidelines%20full %20version%20-%20final.pdf Serco, 2008a, UKRDS Interim Report, Version v0.1a.030708 7th July 2008. Serco, 2008b, UKRDS Feasibility Study, November 2008. Tessella, 2006, Mind the Gap: Assessing Digital Preservation Needs in the UK (Digital Preservation Coalition, York). 58

The UK Research Data Service Feasibility Study 8.

GLOSSARY

Term ADS ADS AHDS AHRB ANDS APAC APSR BERR CERN Cyberinfrastructure Datanets

DCC DCMS DCSF DfES DIUS DTI EBI ECMWF EDINA EGEE EMBL EMBL ESA eScience ESDS ESRC FEC FTE GEANT GridPP HEFCE HEI HPC HTC INSPIRE IPR JISC KPI LHC MARS MIMAS MIT

Definition Archaeology Data Service Atlas Data Store Arts & Humanities Data Service Arts & Humanities Research Board Australian National Data Service Australian Partnership for Advanced Computing Australian Partnership for Sustainable Repositories Department for Business Enterprise and Regulatory Reform Central European Research Nexus American equivalent of eScience A NSF program to create several centers in the US where the collection and preservation of scientific data is closely integrated with the research that will make use of it. Digital Curation Centre Department of Culture, Media and Sport Department for Children, Schools and Families Department for Education and Schools (defunct - now DCSF) Department for Innovation, Universities and Science Department of Trade and Industry (defunct - now BERR) European BioInformatics Institute European Centre for Medium-Range Weather Forecasts A JISC national academic data centre based at the University of Edinburgh Enabling Grids for E-sciencE European Molecular Biology Laboratory European Molecular Biology Laboratory European Space Agency “The systematic development of research methods that exploit advanced computational thinking” (Prof M. Atkinson) Economic and Social Data Service Economic and Social Research Council Full Economic Costing Full-time Equivalent, referring to the costs of employing a member of staff on a full time basis Pan-European research and education network Computing/Storage grid developed by the UK for use by the LHC at CERN Higher Education Funding Council for England Higher Education Institution High Performance Computing High Throughput Computing Infrastructure for Spatial Information in Europe Intellectual Property Rights Joint Information Systems Committee Key Performance Indicator Large Hadron Collider Meteorological Archival and Retrieval System A JISC and ESRC-supported national data centre, based at The University of Manchester Massachusetts Institute of Technology 59

The UK Research Data Service Feasibility Study Term NCeSS NDAD NERC NESC NGS NSB NSF OECD OSCHR OSI OST Petabyte QR RAL RCUK REF Research Data

RIN RLUK RUGIT SDI STFC Terabyte TNA UKCRC UKDA UKDF UKRDS

Definition National Centre for e-Social Science National Digital Archive of Datasets Natural Environment Research Council National e-Science Centre National Grid Service National Science Board National Science Foundation Organisation for Economic Co-operation and Development Office for Strategic Coordination of Health Research Office of Science and Innovation (was OST) Office of Science and Technology (defunct - now the OSI) 1000 terabytes (or 10 to the power 15 bytes) Quality Related Rutherford Appleton Laboratory Research Councils UK Research Excellence Framework Research data is the evidence base on which academic researchers build their analytic or other work. This evidence base is continually refined as it is gathered, collated, structured, and described, sometimes (but not always) according to declared and accepted protocols. It includes the widest possible range of data volumes from relatively small data sets up to vast data volumes generated by research in fields such as particle physics. It also includes great variety and heterogeneity of data and its accompanying metadata and documentation to make it usable and understood, or the digital representations and records for physical research data. Examples could include: complex data used in climate modelling, aerodynamics, molecular modelling, bioinformatics; video and image archives used in archaeology, art history, anthropology and performance works; quantitative and qualitative data used in the social sciences; or electronic data and indices for fossils or skin tissue samples. Research Information Network Research Libraries UK The Russell Group IT Directors Spatial Data Infrastructure Science and Technology Facilities Council 1000 gigabytes (or 10 to the power 12 bytes) The National Archives UK Clinical Research Collaboration UK Data Archive UK Data Forum UK Research Data Service

60

APPENDICES APPENDICES TO THE FEASIBILITY STUDY A1: STAKEHOLDERS LIST AND EXAMPLES The table below sets in more detail the expected constituents of each main community f stakeholders, presented in the proposals for UKRDS structuring. Adopter

Community

Main Constituents

Examples only (to be confirmed)

Early Adopters

HEIs and Research institutes

Self evident

Pathfinder phase universities followed by all universities as UKRDS is scaled up

Funders

Research Councils

Pathfinder phase participation to be determined

Charities

Wellcome Trust

Service providers

HEFCE / JISC supported and research council supported providers

DCC; ESDS; Atlas

Vendors

Leaders in methods; tools and storage with interest in “Open” agenda, preservation, and education space

SUN Micro Systems; Google; Microsoft etc Providers of storage ‘cloud’ services

International links

Leaders of national initiatives

ANDS; etc

Commercial users & generators of data

NGO’s; Eurostat; World Bank; IMF

Satellite and Radar data; Economic Time Series; etc.

Public sector users & generators of data

Local and national government departments and services

Ordnance Survey, Meteorological, Census and Health data

Other educational institutions

Self evident

UK based HEIs and others

Journals and data publishers

Society; STMs and other major Publishers

Thomson Reuters; Elsevier; etc.

Venture capitalists

Financial investors

Later Adopters

1

The UK Research Data Service Feasibility Study A2: SUMMARY OF RESEARCH DATA POLICIES OF RESEARCH FUNDERS Note that update material is marked in blue and is either additional or modified from the original RIN report. Organisation AHRC

Data management requirements No specific requirements – to be addressed by researchers own organisations

Data deposit/sharing requirements Make available in an accessible depository for at least three years after the end of their grant.

BBSRC

Prior to setting up a project, good scientific practice is to plan for the research data element. Throughout their work, BBSRC requires researchers to keep clear and accurate records of the scientific procedures followed and of the results obtained, including interim results. Data generated in the course of research must be kept securely in paper or electronic form. BBSRC expects data to be securely held for a period of ten years after the completion of a research project, and institutions receiving funding from the BBSRC to have guidelines setting out responsibilities and procedures for keeping data. Encourages researchers to manage primary data as the basis for publications, securely and for an appropriate time, in a durable form under the control of the institution of their origin. Must carry out data review to ensure funding not requested for data that already exists. Also, recommended, contact the ESDS Acquisitions team at the UKDA, prior to making their application.

Data Sharing Policy published April 07; Applications for grants should include concise plans for data management and sharing. Expects data-sharing in line with best practice and within three years of the generation of the dataset where best practice does not exist; recognition that BBSRC covers communities with different needs. BBSRC is participating in the UK PubMed Central initiative, which will provide a place for supplemental material that has been submitted to the accepting journal in support of the manuscript to be accepted.

EPSRC

ESRC

32 33

32

Data curation and preservation Archaeologists should continue to use the ADS in York for the time being both for advice and deposit. For others data to be managed securely by institution of origin. BBSRC expects data to be kept for ten years after the completion of a project and institutions receiving Council funding must have guidelines setting out responsibilities and procedures for keeping data.

No.

Data to be managed securely by institution of origin.

Datasets Policy in Annex C of the Funding 33 Guide. Data must be offered to UKDA/ESDS for deposit within three months of end of project.

UKDA/ESDS.

www.bbsrc.ac.uk/publications/policy/data_sharing_policy.html www.esrcsocietytoday.ac.uk/ESRCInfoCentre/Images/ESRC_Research_Funding_Guide_May_2008_tcm6-9734.pdf

2

The UK Research Data Service Feasibility Study Organisation MRC

NERC

PPARC and CCLRC were subsumed into a new council in 2007 - the Science and Technology Facilities Council – STFC Wellcome Trust

Leverhulme

Data management requirements All proposals must include (costed) plans for preparing and documenting research data for preservation for sharing in line with MRC data sharing policy, access principles, and guidance. As part of the end of grant reporting process, MRC funded researchers are expected to report on data management and sharing activities relating to these plans. Data must be properly curated throughout its life-cycle and released with the appropriate high-quality metadata. This is the responsibility of the data custodians, who are usually those individuals or institutes that received MRC funding to create or collect the data. Programmes must have data management plans.

New projects must have plans that formalise ownership and agree distribution mechanisms for data before they are funded.

Data deposit/sharing requirements MRC is participating in the UK PubMed Central initiative, which will provide a place for supplemental material that has been submitted to the accepting journal in support of the manuscript to be accepted. As yet no formal requirements for deposit, but grant-holders must give explicit reasons for not sharing data. Recognition that communities differ and flexibility in the policy is required in terms of e.g. the length of period of exclusive use. Sharing should always take account of enhancing the long-term value of the data.

Data curation and preservation Encourages data preservation but does not provide centres. MRC Policy on Data 34 Sharing and Preservation MRC is also currently tendering for a MRC Data Support Service to provide advice and advocacy to its grant holders.

Must be offered to data centres, after a reasonable time has elapsed for exclusive use by the data creator.

Designated data centres.

Data should be made available to all, where possible and economic (with an exclusive period where appropriate).

35

The NERC Data Policy is dated 2002 and is currently under review. Data centres have been supported as projects. Recommended by strategy review that projects are required to address long term data curation, while recognising the need for flexibility. Funding for data curation to be reviewed every two years and priorities identified. Atlas Petabyte Store

Developing a policy to require researchers to have a data management plan, of which a key element should be how data will be shared. The review of the quality of the data management plan and costs should be an integral part of the funding decision. They are just starting to fund data management plan specifically. No specific obligations on researchers in

Recommends deposit with AHDS for the history of medicine programme [note this will now need review]; for biomedical research, the open access policy does not require the deposit of data, although researchers can deposit supplemental material alongside their papers in PubMed Central/UKPMC. Recommends deposit of digital resources with

34

http://www.mrc.ac.uk/PolicyGuidance/EthicsAndGovernance/DataSharing/PolicyonDataSharingandPreservation/index.htm

35

http://www.nerc.ac.uk/research/sites/data/policy.asp

Advocates use of PubMed Central/UKPMC to integrate data with the research paper; other public data repositories can also be used.

No.

3

The UK Research Data Service Feasibility Study Organisation Trust

Data management requirements relation to data management.

Universities

n/a

Commercial Organisations

Data may be shared between university researchers and companies for the purposes of a project. n/a

Government Departments

Data deposit/sharing requirements AHDS where the grant is in a relevant discipline NOTE that this will now need review. n/a

n/a

n/a

Data curation and preservation

Some universities aspire to curating data, but none interviewed in the RIN study did so in their institutional repositories, leaving this task to individual researchers and departments. Central advice and storage for data however is available in Cambridge and Kings College via DSpace@Cambridge and the Centre for eResearch respectively. A similar facility is under consideration at Oxford. Some universities also have curation activities located at them, funded by Research Councils. Internal data curation is important to some companies, e.g. GSK, Vodafone. The responsibility of the funding recipient and their institution. DFID expects the data to be retained and accessible. DH also states that data relevant to the findings of research should be accessible. ESRC has funded a National Data Strategy for the Social Sciences and is working closely with relevant government departments. The National Archives is also developing a shared service (the Digital Continuity Project) for digital preservation to be available to government departments.

4

The UK Research Data Service Feasibility Study A3: ROLES AND RESPONSIBILITIES FOR RESEARCH DATA (Lyons 2007 and ANDS 2007) Role Scientist: creation and use of data

Rights Of first use.

Responsibilities Manage data for life of project.

Relationships With institution as employee.

To be acknowledged.

Meet standards for good practice.

With subject community

To expect IPR to be honoured.

Comply with funder / institutional data policies and respect IPR of others.

With data centre.

To receive data training and advice. Institution: curation of and access to data

To be offered a copy of data.

With funder of work. Work up data for use by others. Set internal data management policy.

With scientist as employer.

Manage data in the short term.

With data centre through expert staff.

Meet standards for good practice. Provide training and advice to support scientists.

Data centre: curation of and access to data

To be offered a copy of data.

Promote the repository service. Manage data for the long-term.

With scientist as “client”

To select data of long-term value.

Meet standards for good practice.

With user communities.

Provide training for deposit.

With institution through expert staff.

Promote the repository service.

With funder of service.

Protect rights of data contributors. Provide tools for re-use of data.

5

The UK Research Data Service Feasibility Study Role rd User: use of 3 party data

Funder: set/react to public policy drivers

Rights To re-use data (non-exclusive licence).

Responsibilities Abide by licence conditions.

Relationships With data centre as supplier.

To access quality metadata to inform usability.

Acknowledge data creators / curators.

With institution as supplier.

Manage derived data effectively. Consider wider public-policy perspective & stakeholder needs.

With scientist as funder.

To implement data policies. To require those they fund to meet policy obligations.

With institution. Participate in strategy co-ordination. With data centre as funder. Develop policies with stakeholders. With other funders. Participate in policy co-ordination, joint planning & fund service delivery.

With other stakeholders as policymaker and funder of services.

Monitor and enforce data policies. Resource post-project long-term data management. Act as advocate for data curation & fund expert advisory service(s).

Publisher: maintain integrity of the scientific record

To expect data are available to support publication.

Support workforce capacity development of data curators. Engage stakeholders in development of publication standards.

With scientist as creator, author and reader.

To request pre-publication data deposit in long-term repository.

Link to data to support publication standards.

With data centres and institutions as suppliers.

Monitor & enforce public standards.

6

The UK Research Data Service Feasibility Study Role Aggregator:

Rights Enable federated discovery and access To be offered metadata describing data held in data centres To enable the development of more specialised lenses

Responsibilities Engage stakeholders in building a federated metadata repository

Relationships With data centre as contributor With user as primary target audience

Maintain a registry of contributors Promote the discovery service

With scientist as developer of specialised lens

Enable harvesting of or access to subsets

7

The UK Research Data Service Feasibility Study APPENDICES TO THE BUSINESS CASE AND PLAN

B1: UKRDS ‘LINE OF SIGHT’ ROUTE-MAP AND HIGH-LEVEL PLAN

E:\DG Files\DFES\ Serco jobs\Job 10010229 - UKRDS\Live Job\Report\Final HEFCE Report\v0.2\UKRDS Planning document issued.xls

8

The UK Research Data Service Feasibility Study B2: WORK PACKAGES DESCIPTIONS

Work package descriptions Work Package WP1.1

WP1.2

Operational and Staffing Assumptions

Other Overheads

Establishing Governance and Management • Draw up specification and requirements for letting contract for UKRDS to lead institution • Assign UKRDS to lead institution and agree terms, objectives and accountability structures, including steering. • Appoint Director/Senior manager to head UKRDS Pathfinder Mobilisation (1) • Recruitment and establish staff in organisation • Procurement of required equipment to support staff and initial work plans • Establish initial outward facing website (transition from current UKRDS website)

• Development environment (exact technical environment will be confirmed during Pathfinder)

WP1.3

Mobilisation (2) Commitment from Pathfinders • Agreements with Pathfinder case study sites on level of involvement and commitment and deliverables, e.g. identification of research projects that will be included in the Pathfinder, points of contact, plans for cascading training etc. • Agreeing scope and approach to their input with each of the Pathfinder institutions and briefing their staff. • Update internal UKRDS website for sharing information • Establish UKRDS single point of contact enquiry desk/ advisory service for Pathfinders. This will form the basis of the “Advisory Services” intended to grow over time

• Local Case Study Site’s management fee to cover local costs at Pathfinder institutions (£35k per institution) • Staff costs associated with specification covered in WP 2.1

WP1.4

Communication – Pathfinder and Stakeholder management • Develop Pathfinder and stakeholder communication plan • Plan implementation including establishing devises for news etc as required, e.g. email lists; news bulletins; RSS feeds from website etc. • Develop a “measurement plan” including KPIs

• Printing costs for periodic brochures and publications • Workshops for all participants. Assumed 30 delegates at £100 per head at each of four functions (£12k)

9

The UK Research Data Service Feasibility Study

Work package descriptions Work Package WP1.5

WP1.6

Operational and Staffing Assumptions

Other Overheads

Performance and Usage Measurement • Mapping of performance against agreed objectives and work plans for the Pathfinder phase. Put in place monitoring tools and procedures and reporting. • Develop a performance and usage measurement plan and propose KPIs to be implemented early in steady state service. Detailed Planning • Develop and agree detailed work plan and agree what is to be carried out internally and what will be contracted out. GROUP A

WP2.1

Policy and strategy The function is primarily the province of the organisation set up in work package 1. It will be a continuing role of the organisation that will work through an initial creation phase before evolving into an ongoing maintenance phase. It will last throughout the life of the organisation and its life may well be punctuated by periodic, major reviews. It is envisaged however that towards the end of the Pathfinder stage a formal appointment will be made to cover the combined role of public policy and standards manager.

Relationship Management This is similar to policy and strategy and should evolve from the level of work package one to include the following elements: • Broader marketing and communications capability • Professional publications liaison • Event management Each of these capabilities would require appropriate resources and it is envisaged that the manpower requirement to the end of the Pathfinder stage will comprise a marketing communications manager, professional publications liaison manager and events cocoordinator Advisory Services This is a much broader category of work that should evolve into a major part of the capability of UKRDS. This capability should be developed at the following three levels:

• • •

• • • •



Provision for legal fees in year 2 when, potentially transitioned to new organisation Potentially a need for a formal review that could result in additional cost (not provided for in this model) Additional requirement to attend relevant conferences

Need to establish and provide for a programme of workshops for scheme participants Advertorial fees for insertions into publications Print and publication provision (general) Ad hoc advice and support in writing and securing publication of articles.

Development of the system infrastructure to

10

The UK Research Data Service Feasibility Study

Work package descriptions Work Package

Operational and Staffing Assumptions •



Other Overheads support the provision of all advisory services

Web based and first line telephone support and advice for basic information and queries. In order to support this capability the needs to be a web interface, a knowledge base, a case management system, managed telephony and associated performance management. Resources planned to deliver this comprise two research assistants from the end of the Pathfinder stage. Second line support provided on the same infrastructure as first level support with appropriate escalation routines. This could be provided potentially by specialists that the exact capabilities of individuals would be defined during the organisation phase carried out under work package one. This plan assumes 2 heads from the end of the Pathfinder stage.



A consulting capability 1 to 1 advice and support in the case of major programmes work, or assisting with research Council or institutional strategy development. This is provided by two research data specialists and two data management plan specialists from the end of the Pathfinder stage an these staff would also provide support to training and development and to the development of handbooks and the knowledge base. The capability for the provision of advisory services would be something that would be free of charge at levels one and two and probably (though not in every case) chargeable at level 3. GROUP B WP2.2

Training and Development Training and development represents a significant part of the potential value added by UKRDS through providing a consistent platform of training in professional data management and the development of Data Management Plans. Therefore, this is one of the core areas of development that begins during the Pathfinder stage and evolves into part of the full ongoing offering of the organisation. We envisage that it would comprise the following stages: • Design of the initial courses to be offered during the Pathfinder stage • Promotion of those courses to the Pathfinder communities in order to test, develop and rollout of these courses the Pathfinder community and wider audiences • Towards the end of the Pathfinder phase identification and development of a wider range of course offerings appropriate to the needs of the research community will stop

• •

Externally procured support in the initial development of training courses It is assumed that all subsequent course costs are covered in full by delegate fees resulting in no net charge to UKRDS

11

The UK Research Data Service Feasibility Study

Work package descriptions Work Package

Operational and Staffing Assumptions

Other Overheads

It is envisaged that these training courses would comprise, as a minimum, the use of UKRDS capability, development of data management plans and a framework for professional data management. In terms of resources we anticipate that most of the training delivery would be on the basis of using contractors (those who are already active in this area within the sector) with full cost recovery. In addition we would anticipate the need for two training and development specialists one of whom would be appointed relatively early in the Pathfinder stage the second being appointed to the end of that stage. WP2.3

Tools, Methodologies and Handbooks This area would work hand in glove with advisory services and training & development because much of the collateral that is developed would feed into the knowledgebase, training courses and advice provision. This would be an ongoing requirement but would be front-end loaded in the development of handbooks, knowledgebase, and training materials during the Pathfinder stage. We envisage that much of this would be done by the same individuals as provide level 2 support, training and development. The initial workload would suggest that some of this is outsourced because the nature of the other aspects of work (especially training) would not be at a consistent level of demand.

• •

Externally procured development of collateral Initial license fees and set up costs for system tools (discovery and search)

Tools to be developed would cover tools to access the database registry including ingestion and discovery. WP2.4

Service Provider Administration and Vendor relations It is less easy to define this element of work although clearly there is going to be quite a high workload in identifying suitable providers, going through a structured procurement process and selecting suitable individuals or organisations. In addition it is a complex area because it covers IT development capability, consultancy and training, as well as general procurement. The resources required would have to start quite early on in the overall life of UKRDS and would build to the following capability. • One procurement specialist to begin with, growing to two at the end of year 2 to cope with the growing complexity of UKRDS

WP2.5

Capacity planning and Investment

Part of the performance and usage management aspects of WP1.5 would be to inform the development of capacity planning and investment. This would be less of an issue in the 12

The UK Research Data Service Feasibility Study

Work package descriptions Work Package

Operational and Staffing Assumptions

Other Overheads

early part of the Pathfinder stage but, as the development of UKRDS progresses, this would represent a significant part of its activities. We would therefore envisage that this aspect would be addressed by appointing the following team towards the end of the Pathfinder stage: • Capacity planning manager • Capacity planning specialist • Forward planning GROUP D WP2.6

Database registry At the core of the UKRDS capability is the need to develop a registry of data management plans. This needs to be set up together with databases of research institutions, members with associated access privileges, etc. Because it is such an important element of core capability it will be one will be initiated early on with development of an initial operating capability based upon a simplified database. This will then be elaborated and developed over the life of the Pathfinder programme into the full operating capability necessary to support the future UKRDS. In order to achieve this we envisage an internal team supported by outsourced services during the development phase. The in-house team would be comprised of/call on the following • Research data architecture manager • The research data expertise built into 2.1 • The DMO expertise built into 2.1 • Metadata schema design specialist • Web interface programmer

• •

Outsourced development of database registry inclusive of relevant license fees Provision of a fully resilient system with backup/mirroring capability

This staff complement would work in support of the database registry development and also assist in advisory services, training and development. The research data architecture manager would be appointed during the second quarter the Pathfinder phase of the other staff are being recruited progressively through the balance of the Pathfinder phase WP2.7

Access Management Once again access management represents part of the core infrastructure capability and therefore value added of UKRDS. It is envisaged that the capability will evolve into one with web enabled, layered, secure access, based upon managed privileges, to the core UKRDS database with pass through to the appropriate institutional / service organisation databases.

• •

Outsourced development of full web interface, layered security access etc Shibboleth will provide the main provision for

13

The UK Research Data Service Feasibility Study

Work package descriptions Work Package

Operational and Staffing Assumptions In order to support the development of this capability we envisage the appointment of and access administrator initially reporting to the development manager in the second half of the Pathfinder stage. Prior to this access management will be a significant part of the early development of capability also managed by the development manager.

Other Overheads providing layered secure access. However because this is a complex critical area, the requirements must be developed in detail during the Pathfinder and the appropriateness of Shibboleth may require a rethink

GROUP E WP2.8

Review and prepare to implement The criteria for measuring success need to agreed and implemented at the start of the Pathfinder stage so that reporting against these will provide the information required for decisions to be made regarding continuity and operational performance. This needs to be transitioned smoothly without break in the services and stakeholder engagement established during WP1. For this reason a more traditional “evaluation” that can be realistically of several months duration in itself is not the appropriate model in this case. Assuming the decision to move forward is taken the transition should be 6 months, starting July and maintaining service throughout the move. It should be transparent to users as much as possible. To ensure this is possible throughout the Pathfinder clear internal documentation of procedures, processes, programmes etc should be kept. IPR of anything produced / developed during the Pathfinder stage should be done so with the understanding that it will be transferred to a new host at some point. All this impacts upon the way the Pathfinder phase is undertaken and therefore need to examine resources in WP 1 to reflect accordingly.

WP3.1

International Access Services Work with relevant organisations in the international community to establish mutually beneficial access rights to research data and to develop global standards in research management. The earliest contact will be with the Australian ANDS and the work they are doing on “feeding the data commons”. Some resource should be assigned during the Pathfinder to start to foster these links. This will grow beyond the WP1 phase. At present we believe the additional resources from both the technical and liaison perspectives are covered by other WPs.

WP3.2

Research Data Citations This may grow out of WP2.1 and WP2.6 as standard ways of citing data become established. Also implied links with Publishers and the work that some institutional 14

The UK Research Data Service Feasibility Study

Work package descriptions Work Package

Operational and Staffing Assumptions

Other Overheads

repositories are doing – This is an area that could be included in a foresight advisory group (we suggested the RIN might lead). Costs are currently assumed to be included in the other relevant WPs. WP3.3

Accreditation & Certification UKRDS staff and service providers will need a high level of credibility in the field both nationally and internationally. A key element for successfully establishing this credibility will be the agreement of standards and practices across the globe that earn the confidence and trust of researchers and their funders in particular. UKRDS will need to establish a strong position in all relevant fora. Resource costs are largely assumed to be included within 2.1.

WP3.4

Foresight Development This activity will be informed by both internal and external developments. It will seek to forecast lines of development for research data outputs and for the capacity of the underlying infrastructure. It will establish a broad vision of “all” funded research in the UK and undertake forecast modeling. Draws on WP2.5 and resource costs are included in that WP.

15

The UK Research Data Service Feasibility Study B3: DETAILED PLANS AND RESOURCE COSTS

E:\DG Files\DFES\ Serco jobs\Job 10010229 - UKRDS\Live Job\Report\Final HEFCE Report\v0.2\UKRDS Cost Analysis issued.xls

16

The UK Research Data Service Feasibility Study B4: DETAILED BENEFITS ANALYSIS

E:\DG Files\DFES\ Serco jobs\Job 10010229 - UKRDS\Live Job\Report\Final HEFCE Report\v0.2\UKRDS Benefits Analysis issued.xls

17

UK Research Data Service - Drive

59. Whoops! There was a problem loading this page. Whoops! There was a problem loading this page. 6 UKRDS.pdf. 6 UKRDS.pdf. Open. Extract. Open with.

1MB Sizes 3 Downloads 131 Views

Recommend Documents

Government and research policy in the UK - Research Information ...
effective the information services provided for the UK research community are, ..... and Technology, whose job is to scrutinise Government policy and practice ...

uk data protection act 2003 pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. uk data protection act 2003 pdf. uk data protection act 2003 pdf.

man-41\vodafone-uk-customer-service-number-from-mobile.pdf ...
man-41\vodafone-uk-customer-service-number-from-mobile.pdf. man-41\vodafone-uk-customer-service-number-from-mobile.pdf. Open. Extract. Open with.

DOWNLOAD Business Research Methods (UK Higher ...
Online. Books detail. Title : DOWNLOAD Business Research Methods (UK q. Higher Education ... Principles and Practice of Marketing by Jobber/Ellis-Chadwick.

PDF-Download- Business Research Methods (UK ...
PDF-Download- Business Research Methods (UK. Higher Education Business Statistics) Full Online ... The Practice of Market Research: An Introduction.

DOWNLOAD Business Research Methods (UK Higher ...
Online. Books detail. Title : DOWNLOAD Business Research Methods (UK q ... Accounting for Managers - Interpreting Accounting Information for Decision ...