Repositories and Persistent Research Identifiers

Hello, and welcome to this short talk which will look at the related topics of Repositories and Persistent Research Identifiers. We begin by saying what we mean by a repository, and how repositories relate to open research. We’ll give some examples of the alternative kinds of repositories you might use, depending on the needs of your project. We will then move on to describe Persistent Research Identifiers — which are codes that make it easy to find and share materials stored in a repository. We’ll conclude with a pointer to guidance on how to prepare information for before deposit, and some tips on working with persistent identifiers.

What is a repository?

A repository is, basically, a place to put stuff. It has material “deposited” in it, with people able to retrieve it later and generally allow others to access it. In the area covered here, we are generally referring to a digital repository: basically a database with an interface that makes it straightforward to deposit, search for, retrieve, and share materials. Most research projects will have data storage needs: if you’re being systematic about how you handle data, you’re probably already using a repository.

How are repositories related to open research?

Ranging from simple data stores to complex digital library systems, repositories are quite widely varied as to contents and protocols. One way in which they can contribute to open research is on the sharing side: making research material viewable to the public, and available under the terms of an open licence and/or open access.

They can also contribute to open research in another way, by being “open” to submission of materials from a variety of people. This may be at the level of a single project with multiple collaborators — or the repository may collect materials from multiple projects.

⚠ Practice Example

For example, the Research And Digital Assets Repository (RADAR) at Brookes is open to submissions from people at Brookes and is open to anyone to browse. RADAR also allows people depositing material to choose how the material they deposit can be reused, for example, under the terms of a Creative Commons licence.

The people maintaining repositories which are open to contributions from the general public — like the Open Science Framework which hosts a “free, open platform to support your research and enable collaboration” — may enforce some light-weight moderation processes, though these are less intensive than scholarly peer review. This helps researchers share and discuss ideas at an early stage.

Flow

Alternatives when selecting a repository

Use of an institutional repository is strongly encouraged for the practical reason that publications must be deposited in an institutional repository no later than three months from the date of acceptance in order to be eligible for the REF. You may additionally (or, if the material isn’t REF-able) choose to upload some of your research materials in some other research repository, perhaps a discipline-specific archive like Arxiv or BioRxiv, or a general-purpose repository like Zenodo or Figshare.

Another example that may be of particular interest for people working in the social sciences is the UK Data Archive. The UK Data Archive provides curation and data handling safeguards, which distinguish it from a self-service data depository. If your concern has more to do with setting up a repository for an active project, one good alternative is Git: although it is typically used for sharing code, there is a lot of documentation available for how to use it to share research materials more broadly. Sites like Github and Gitlab make using Git fairly straightforward.

Persistent Research Identifiers

As mentioned above, repositories not only support deposit, search, and retrieval, they also support sharing. One well-established way to do this is for the repository to systematically assign a unique identifier to every deposit. This is referred to as a Persistent Research identifier. The most common type of identifier is in use across different repositories: it is a DOI or “Digital Object Identifier”. A DOI is a reference code which might appear rather inscrutable on its own. However, doi.org provides a service allowing the code to be used as a hyperlink, which enables people to share, access, and cite the published material.

Some other persistent research identifiers

The DOI is not the only persistent identifier out there, for example, researchers themselves can have a persistent identifier (ORCiD) as can research institutions (ROR). If you’re not familiar with ORCiDs, you may wonder why you might want your own personal digital identifier. Much like DOIs are for documents, ORCiDs make it easy for people to find and share information about you. For example, the UKRN’s Open Research Project plans to use ORCiD profile pages to check whether participants are making use of open research practices (like creating preprints) and also plans to use the profiles hosted on orcid.org as a place to post certifications of open research training. Another benefit of the ORCiD system is that in many cases, your profile on orcid.org is kept up to date automatically by computational services running behind the scenes. While the focus for most is on the identifier, the website orcid.org also has a repository of information about researchers. It also has an advanced search function which can show how many people at an institution have ORCiDs.

⚠ Practice Example

At Oxford Brookes University, for example, there are 1866.

Repositories with additional features

Having had a look at repositories of documents and repositories of profiles, it’s worth a quick look at another repository, Wikidata, which is a collection of all kinds of facts. It can be queried with the SPARQL language to answer simple statistical and inference questions, like “What are the largest cities in the world that have a female mayor?” and “What airports are located within 100km of Berlin?” Its particular relevance to open research is that Wikidata also allows you to upload your own structured data, which can support downstream uses in ways that a more static “data dump” would not. The broader point to consider when selecting a repository is: what added value does it have? How can people use the material they find there?

Keep it practical

Working with DOIs and persistent identifiers

As mentioned above, the website doi.org is useful for looking up a paper from its DOIs. Additional tools like doi2bib.org can turn the DOI into a usable citation. Although it’s a somewhat self-explanatory example, shortdoi.org can be used to shorten DOIs, which could be useful if you’re ever scrambling for column space. If you have set up a working repository on Github, it may be useful to know that you can use Zenodo to obtain a DOI for the repository, making it easier to cite work in progress there. Other identifiers like ORCiDs and RORs may be helpful when submitting material to journals, repositories, or other platforms (for example, if you upload to Octopus, you can flag the organisation that funded your work using their ROR).

Reference

Wattanakriengkrai, S., Chinthanet, B., Hata, H., Kula, R. G., Treude, C., Guo, J., & Matsumoto, K. (2022). GitHub repositories with links to academic papers: Public access, traceability, and evolution. Journal of Systems and Software, 183, 111117.

CC BY-SA 4.0 Joe Corneli et al. Last modified: April 24, 2025. Website built with Franklin.jl and the Julia programming language.