Organizing and Combining LaTeX Acronyms/Glossary Entries with Python (glossaries package)

Introduction

Problem: you use LaTeX for developing your document and you use the glossaries package for defining acronym and glossary entries. You organize the entries into .tex files for each projects, e.g., "Acronyms.tex", "Glossary.tex". However, you've ended up with multiple versions of these .tex files from multiple projects and now you need ALL the unique acronyms in one file for a new project. How do you go about doing that? You could manually do this, but that could get tedious if they're are a lot of differences between files. Instead, you could use Python to automate the task. In addition, the entries can be organized along the way.

Set-up

Going to use acronyms in this project, but they could also be glossary entries as the glossaries package handles both nearly the same.

First I'll set-up some example Acronym.tex files. Note how they are unsorted, something we can improve on later.

Match Pattern

We need a pattern that captures the two different cases of acronym entries for the glossaries package. In one instance is standard/normal, where there are no optional parameters set:

\newabbreviation{abba}{ABBA}{Björn & Benny, Agnetha & Frida}

The other has optional parameters:

\newabbreviation[longplural="Ruminant Under Test"]{rut}{RUT}{Ruminant Under Test}

In the pattern defined below, the optional portion is covered by ([.*?])?

The other three parameters are covered by the three {(.*?)}

Run pattern against file content

Dataframize

Because I like pandas

Duplicate Handling

When you combine acronym entries from different documents, you'll probably find at some point that some have the same ID or the same long form. Below I identify these and set up a flag for when we generate the final .tex file. Flagging them makes it easy to manually correct the file once its generates, which I've found was better than trying to automate a correction (e.g., adding "2" to end of duplicate entry ID).

Create Organized Content, Write It

I take the dataframe and use it as a base for building a string which will be the contents of the final Acronyms.tex file.

I can organize the entries while I'm at it. The first letter of the entry ID is used to to alphabetize the entries. A large comment is written to clearly indicate in the file the letter groupings.

Conclusions

I can take this Acronyms.tex file and plop it into my Overleaf+LaTeX file and optimize it from there. This script especially becomes handy when you want to combine several different large (200+ entry) acronyms lists floating around.

This little project also highlights one of the benefits of building LaTeX documents, which is how you can automate the manipulation of plain text inputs.