So you want to reuse some of your existing bib files, but the BibTex processor complains there are duplicate entries in your bib files. How can I remove them?
Contents
Introducing bibtex-tidy
To remove the duplicates in BibTex, we will use the bibtex-tidy
, a web app to format your BibTex source. Click the link to open it.
Once you open it, bibtex-tidy
has two parts. The left side is the editor window, where you will copy and paste your BixTex code. The right side has a few options to format your BibTex code:
- Whitespace options
- convert tab to space and vice versa in your BibTex source
- align values: align the equal signs in you BibTex entries
- Values options
- Encode BibTex values in curly braces
- “BibTex Guide” becomes {BixTex Guide}
- Use numeric values where possible
- “1942” becomes 1942
- Strip double-braced values
- {{March}} become {March}
- Drop all caps: make caps to title case
- {MARCH} become {March}
- Escape special characters: to make sure no weird characters in your LaTex output
- Encode URLs with percent coded values
- Encode BibTex values in curly braces
- Sorting
- Sort bibliography entries
- You can then add the field to sort by, such as BibTex
key
- Multiple fields are separated by a space, such as
key year
- For descending order, prefix it with a dash, such as
-key
- You can then add the field to sort by, such as BibTex
- Sort bibliography entry fields
- You can then add the field order
- Multiple fields are also separated by a space, such as
title author
- The default list is
title, shorttitle, author, year, month, day, journal, booktitle, location, on, publisher, address, series, volume, number, pages, doi, isbn, issn, url, urldate, copyright, category, note, metadata
.
- Sort bibliography entries
- Duplicates
- As this is our focus here, we will discuss this in the next section
- Clean up
- Remove fields within BibTex entries
- You can type in some fields that you would like to remove from you BibTex entries
- Remove comments in your BibTex code
- Tidy comments: remove whitespace surround comments
- Remove fields within BibTex entries
Use bibtex-tidy to remove duplicates in BibTex
First, copy & paste your BibTex (.bib) file content to the left side of the editor.
In the sidebar on the right, scroll down and look for “DUPLICATES” option.
You can check for duplicates with 4 criteria:
- Matching Keys: this is the preferred option if you copy and paste BixTex entries from Google Scholar
- Matching DOIs: this is good only if you have Digital Object Identifiers (DOIs) in your BixTex entries, most publishers like ACM and IEEE have the DOI field but Google Scholar does not have it
- Similar author and title: use with caution as the similarity might not be intentional
- Similar abstracts: use with caution as it is uncommon to have abstracts in you BibTex source
Then, click blue button “Tidy”, which will remove all duplicates.
Copy & paste to your BibTex (.bib) file. It is always a good idea to check the Latex output for any new errors. You can also choose to check the generated PDF documents to see if the tool removes non-duplicated.
In A Nutshell
How to remove duplicates in BibTex
- Open the bibtex-tidy web app: https://flamingtempura.github.io/bibtex-tidy/
- Paste your BibTex source to the editor pane
- Scroll down the right pane to “DUPLICATES”
- Select the duplicate criteria
- Click the blue button “Tidy” at bottom right
- Copy and paste the BibTeX source from the editor pane
References:
- bibtex-tidy website: https://flamingtempura.github.io/bibtex-tidy/
- bibtex-tidy source: https://github.com/FlamingTempura/bibtex-tidy
1 reply on “How to remove duplicates in BibTex (.bib) file”
Thanks for creating this. This is an extremely useful tool, since I have many bib files that I want to consolidate. It would be very helpul if
one could save it directly to a bib file rather than having to do the screen capture.