Tips for using the Crossref REST API
We love to share our data and we want it to be used. Yes, really—and by as many people as possible. Here we share tips to help you get the most out of our REST API. Everyone benefits if you use our REST API responsibly and efficiently. Very occasionally we have had to block users who misuse our APIs, usually through carelessness rather than malice. If you follow the advice on this page you should have no problem.
Note that for conciseness, the examples on this page omit the API URL, which is https://api-crossref-org.ezproxy.galter.northwestern.edu/v1
, so for an example like /members/120
the full query is https://api-crossref-org.ezproxy.galter.northwestern.edu/v1/members/120
.
Where do I start?
If you are just getting started with our API, we recommend heading to the learning hub. There you can get to grips with APIs, what kind of metadata you can retrieve, and how to formulate queries. For a description of all of the API features, see our Swagger documentation from where you can try out some queries.
Get the right endpoint
Most of our queries go through the ‘works’ endpoint. If you’re looking for research outputs, that’s probably where you want to start.
Bear in mind that we also have other endpoints, including ‘prefixes’, ‘journals’, and ‘licenses’. You can, for example, get all works in a journal by using its issn, e.g.: /journals/0003-3804/works
. The members endpoint contains summary metadata about organisations that have deposited metadata, including which metadata fields they deposit and the fraction of records with certain properties e.g.: /members/120
Use filter and query parameters
Our API contains a range of parameters that can be used to pull out records that match certain criteria. For example, various kinds of date filters: /works?filter=from-pub-date:2024-01-01,until-pub-date:2025-12-31
will get all records with a publication date in 2024.
There are filters to detect the presence of certain properties, e.g., /works?filter=has-references:1,filter=has-orcid:1
will return only works with both references and authors where at least one has an ORCID ID.
There are also filters for specific values, e.g., /works?filter=type:journal-article,funder:10.13039/100000040
retrieves journal articles funded by the organisation with the specified funder ID.
If you are looking to analyse a large number of metadata records, such as all records from a single journal or publisher, it is almost always more efficient to query using a filter than for each DOI individually.
Be selective with fields, if you need to be
If you are only interested in 2 or 3 fields of the output from the works endpoint, you can use select
to only retrieve that metadata. For example, /works?rows=10&select=DOI
returns the DOI field. Don’t do this if you’re looking for more than 3 or 4 fields: the longer your list of fields gets, the more it slows down the query. Instead, retrieve the whole record and discard the information you don’t need.
You can make more efficient use of the API and get results more quickly by thinking in advance about how many results you need. The default is 20 and you can increase it up to 1000. If you have a query where you only need to know the total number of results, you can use rows=0
. For requests with a query
parameter, 2–5 rows might be enough (see more on that below), whereas to look at a few examples of records with a certain property, maybe 10 records is enough.
If you need to get all the results returned by a request, use cursor:*
and a high number of rows. The result includes a cursor field that you can use in your next request to get the next page of results. Note that our REST API returns a cursor even on the last page. To stop your script at the correct point, check the number of results returned—you have reached the last page when it is less than the requested rows.
Large numbers of queries and very large results sets
You may be planning to get hundreds of thousands or even millions of records from our API, read this section before you get started!
First, determine whether you really need to use the REST API. We have an annual public data file that contains all of our data. If you are a Metadata Plus subscriber, you have access to a monthly snapshot. Using these and setting up a local database means that you can run more custom queries than with the API and get results more quickly.
Second, cache your results. Try to avoid making the same requests repeatedly. Our metadata does change over time, but the majority of records change infrequently, if at all.
Third, if you do need to make a query with a very large result set, we recommend splitting it into a series of smaller queries. You can use cursors to page through results, but if you’re running to thousands of pages, the chance of a cursor failing and expiring at some point becomes much higher. For example, if you break down a request into days or weeks and one of them fails, it will be much easier to go back and pick up the missing data. Also pay attention to the http status code and back off if you start seeing 4XX statuses.
How to keep your local data synced with the REST API
You might have a request that you want to make repeatedly and keep local results cached. You may even want to have a complete copy of the Crossref database and keep it up-to-date (in this case consider whether Metadata Plus is a good option for
you). Here are a few suggestions and tips for how to do that.
Choose your date filter. There are three types of date filters that can help you pick up items. The created date (from-created-date
, until-created-date
) will return any new records. Using updated date (from-update-date
, until-update-date
) will give you both new records and those with any changes deposited by the member. If you use the indexed date (from-index-date
, until-index-date
), you will get all created and updated records, as well as those that were reindexed in the REST API which may or may not have changed. As you can see, the third of these options gives you the most results, but can be a very large number of records. Which you choose will depend on how you want to use the metadata. For this application, we wouldn’t recommend using published date, as this can change over time and might be very different from when the record was created (a long time in the past or the future), meaning that you are likely to miss results.
Choose your frequency. How often do you want to get new records? All of these filters offer the option to retrieve updates once per second, but you might decide that once an hour, once a day, or once a month is ok. Note that the timestamps are inclusive, so to get everything created between 12:00 and 13:00 on 1 January 2025, you can use: filter=from-created-date:2025-02-02T12,until-created-date:2025-01-12
. Using 13 instead of 12 in the until-created-date
filter will get you two hours of data, not one.
Use cursors. If your time range is reasonably large or you aren’t using other additional filters, it is likely that you will reach more than the page limit of 1000 items. Use cursors in your request to make sure you get all of the results (see above).
Cache-cache. To make sure you don’t keep retrieving the same unchanged records, make sure you save the responses locally. If you are looking for updates or newly indexed works, you will need to replace the old records in your cache with the newest version.
What if I want to do reference matching?
Great, please do! Before using the API, you might want to check out the simple text query page, where you can paste a list of references and get the matching DOIs. This is useful if you are looking to add DOIs to manuscripts prior to publication.
If you are using the API, use the ‘query.bibliographic’ filter. You might be tempted to parse your references and use a series of our query filters (query.title, query.author, etc., etc.). This is slow and expensive, and it gives you worse results. That might sound counter-intuitive, but in tests with real-world data we have found that parsing references introduces or enhances errors in the original reference string. Put the full reference into ‘query.bibliographic’ and you’re more likely to get the result you are looking for, and more quickly.
If you are using the API for simple reference matching, and are not doing any post-validation (e.g. your own ranking of the returned results), then just ask for the first two results (rows=2
). This allows you to identify the best result and ignore those where there is a tie in score on the first two results (e.g. an inconclusive match). On the other hand, if you are analyzing and ranking the results, requesting five results (rows=5
) should be sufficient. Anything beyond that is very unlikely to be a match.
Character encoding
Escape special characters if you’re using them in queries, e.g. by replacing &
with &
. Characters that need escaping might turn up in DOIs or query strings and this can be done with a library in most programming languages.
We recommend that you identify yourself so that we can get in touch if there is a problem. The main way to do this is to include a mailto
parameter with a valid email address. We’ll use this if we see unusual behaviour rather than directly blocking you. We count the number of users with email addresses, but don’t store and use them for deeper data analysis, marketing, or other nefarious purposes.
If you’re using a script or app that regularly queries our API, add a User-Agent
header. This can help us to troubleshoot issues and give you more specific feedback if we do need to contact you.
You don’t have to identify yourself and we allow anonymous access to our APIs. In this case, we collect minimal user information, but note that our logs contain IP addresses and details of your query.
Pay attention to errors
Check whether your requests are returning 4XX or 5XX http statuses. In particular, 429 responses mean that you’ve been blocked. If you see a high rate of these responses back off, ideally exponentially. We employ automatic blocking, so if you are blocked you can take a pause for a few minutes and try again.
There are plenty of libraries that make this very easy. Since a lot of our API users seem to use Python, here are links to a few libraries that allow you to do this properly:
Help! Something isn’t working
If you aren’t getting the results you expect from our APIs, check the return status of your query (which should be 200 if everything is working well). You can also check our status page where we report any disruption to the API.
We do our best, but no software is perfect! We have a backlog of bugs and feature requests. If you see a new bug, please report it on our community forum.