Before you upload your data to the EUR Data Repository, you need to prepare your data. This ensures that you can effectively use the EUR Data Repository and have your data published quickly and without unnecessary delay.
Below you will find a list of things to keep in mind when preparing your data. Ideally these have already been addressed at the start of the research project in your data management plan. If so, preparing your data will be relatively quick. If not, getting your data ready to share may take some time.
EU privacy law or the General Data Protection Regulation (GDPR) applies to all personal data. Personal data is any piece of information or any combination of pieces of information that can directly or indirectly identify your research participant. Examples are: name, e-mail address, IP-address, geospatial location.
GDPR sets forth requirements that need to be met before you can publish personal data. This includes removing personal data that is not required when reusing the data, making your data as difficult as possible to trace back to a research participant, how to interpret the informed consent that has been given, if applicable, etc.
If you use proprietary data or have entered into a contract or agreement with a third party, check if you are allowed to share the data and under what conditions. This also includes Intellectual Property Rights, Copyright, and Terms of Use that might apply when you have used e.g. social media or other online platforms to collect data.
Choose a framework for naming your files in such a way that they reflect what they contain. This way, you and others can easily identify the files needed. Elements to consider for inclusion are date of creation, description, location, project number, version number. Other things to take into account are naming files consistently, keeping the file names short but descriptive (<25 characters), avoiding special characters or spaces, using capitals and underscores instead of periods or spaces or slashes, using a fixed date format (e.g. ISO 8601: YYYYMMDD), and include version numbers.
Examples: 20200125_DMP_V3.pdf, 20200211_IC_Template.pdf, 20190719_Image_Cropped.jpg, 20210628_Data_Processed.sav.
Data that is shared should be in a file format that ensures long-term access. It is recommended to use formats that are frequently used, have open specifications, and are independent of specific software, developers, or vendors.
Although frequently used, formats such as Word and Excel are not preferred. Instead, consider converting Word files to PDF, and Excel files to csv format. You can find a list of preferred formats here.
Open data is not just about the data file itself. Equally important is the accompanying documentation that provides the context in which the data was collected and describes how the data was collected and analyzed. This collection of files is often called the ‘publication package’ and it includes everything that is needed to reproduce the research or reuse the data.
Examples of files to include in a publication package are: the [raw] data file, a codebook listing the variables and categories, the syntax or code used to analyse the data, a copy of the actual survey or questionnaire that was used, a list of interview questions, the [transcripts of] audio or video recordings, and a readme file describing the method and steps you used to analyze the data.
If you have any questions or need help preparing your data, contact your faculty data steward. If needed, your faculty data steward will involve other research support staff such as a privacy or legal officer.