What is data archiving, and why is it required?
Data archiving is the long-term storage of research datasets, materials and documentation. After the project has finished, all project data should be stored for (usually) a minimum of 10 years. This is in line with the RDM policy at EUR to ensure scientific integrity, and it is also often required by research funders. Archiving also gives the researchers the ability to bring more structure in the data landscape of their research.
What is EUR Yoda Vault?
EUR Yoda Vault is a data storage, where a copy of your data, associated materials and documents are stored when you decide to archive it. When archiving, a ‘snapshot’ of the data is placed in a vault, which can be retrieved at a later point in time.
How can I archive my research project?
Step 1: Collect and organise your data
Firstly, prepare a data package containing the dataset(s) and all relevant project documentation. It is encouraged to include both raw and processed datasets, their documentation (at least a README file) and other research materials used, such as codebooks, instruments, analysis scripts, notes, etc. The project documentation should include, for example, agreements, contracts, ethical approval, signed informed consent forms, grant agreements, and terms of service for software. This infographic gives a good overview of the elements that are needed for archiving.
It is important to ensure that your data package is organised with a logical folder structure, consistent file naming, and sustainable file formats. Tips on data documentation and organisation here and here. If needed, your faculty Research Data Steward can also advise on that.
Step 2: Fill in the intake form
While working on preparation, you might already request to set up a research group folder in Yoda. To do this, please fill in the intake form to provide basic information about your project. If any additional information is needed, your faculty Research Data Steward will contact you. Otherwise, you will receive an email notification to access the project workspace within a few working days.
The research group folder is the main, top-level folder for your project, and its name always starts with 'research-'. It is important to remember that the request to set up a research group folder is project-based. If you are involved in multiple research projects, each of them must have a separate research group folder, thus a separate request must be submitted.
Step 3: Log in to the EUR Yoda Vault
To access the EUR Yoda Vault, you should use your institutional email to log in via SURFconext in the Yoda portal. After logging in, you will see a welcoming page with ‘Research’ and ‘Vault’ tabs in the navigation bar.
Step 4: Upload your data
After clicking on ‘Research’, you will see a folder called ‘research-[name-of-your-project]', where you can upload and store folders and files.
Files and (big) datasets can be easily uploaded using a browser. If you decide to upload via network disk, there is a 4 GB limit in Windows. With a separate WebDAV client like CyberDuck or YodaDrive there is no limit at all.
Step 5: Archiving your data
To archive your data, go to ‘Research’ in the navigation bar, and then to the folder that you want to archive. This will become your archival package. Make sure you are in the folder that you want to submit to the vault.
Note that it is possible to archive the whole project at the end or start archiving sub-parts of the project while it is still running.
1) Provide metadata
Adding information about your archival package is needed to find it in the EUR Data Vault, inform EUR support staff about the retention period, and provide details about its content. Filling in metadata on a project level is an obligatory step, when you want to deposit your archival package in EUR Yoda Vault. It is a built-in function, which can be found at the top right corner under the Metadata button.
2) Check compliance (optional step)
EUR Yoda Vault provides you with the option to double-check the file types, which is a good step to ensure the sustainability of your archival package. You can do it by choosing Check for compliance with policy under Actions. This allows the Yoda Vault to run through all the files to flag ones that can be adjusted to improve the FAIRness of your data.
3) Submit your archival package for review
At this point, your archival package is ready to be submitted for a final assessment before depositing it to the EUR Yoda Vault. The assessment includes a check against EUR policies and other relevant regulations, as well as whether it is self-explanatory for others. Your Research Data Steward or the Data Curator from the EUR Library will perform an initial check. If adjustments need to be made, you will be notified. Once your archival package is approved, you will receive a confirmation email.
Step 6: Notifications
Yoda uses a notification system to notify you when the status of a data package that was submitted for archiving to the vault. If you have unread notifications a Bell sign will be shown next to your email address on the top right button.
Click on the button and then “Notifications” to view them. If you want to configure email notifications, click on Settings in the above menu.
How much time and effort does it take to archive your data?
The time and effort needed to archive data will mostly depend on your level of RDM knowledge and data preparation. The most time-consuming part of the process might be preparing the archival package itself, if proper data management has not taken place throughout the project. Insufficient documentation might lead to reguests for adjustments (before accepting the archival package to the EUR Yoda Vault), slowing down the whole process. Information about archival policies and a checklist for curators will be published soon.
What are the differences between archiving and publishing data?
Below are some differences between storing data in an archive vs. repository:
- In an archive, data and materials are stored for audits and verification of scientific integrity. In a repository, data and research materials deposited are curated for further reuse by humans and machines.
- Archival packages usually include all data (raw and processed) and project documentation, (e.g., ethical approval, signed informed consent forms, grant agreements), while re-usable package contains processed datasets (e.g., datasets without personal information) and supplementary material related to scientific publication.
- The archival package is typically stored for 10 years and not accessible (except group members). Depending on a repository, the re-usable package might remain accessible for the lifetime of the repository.
- The archival package cannot be altered. On the other hand, you can usually update your materials in a repository.
Who should I contact for support?
If you have any questions about the process or how to prepare your data for archiving, get in touch with your faculty Research Data Steward.
This page was last updated in July 2023. Did you find a broken link or (seemingly) incorrect information? Please send an email with the title 'Website content' to datasteward@eur.nl.