Turns out losslessly removing a pdf page isn’t as easy as you’d think. You don’t want to use your OS “file print” feature as you might originally be inclined. That can remove valuable text data and make the pdf harder to search, also depending on how the fonts are embedded it can result in the fonts changing.
I looked into a few tools for doing this. First was mupdf’s
mutool which has a
“clean” mode that takes a range of pages and returns a cleaned up pdf in that
page range. Simple enough, or so I thought. It wound up stripping out the pdf’s
index (the outline thing you can view in most pdf viewers). I don’t know if
that’s intensional, but I searched the documentation and bug reports and found
nothing on it. So, I looked into other tools. I found
pdftk which seemed very
similar with its “cat” mode. It also stripped out the index, BUT has a mode to
store the index to a file and then apply said index to a pdf… so it works, but
is a bit of a hack.
Here’s how to remove the last page from a 400 page document with
pdftk original.pdf cat 1-399 output tmp.pdf pdftk original.pdf dump_data output index.info pdftk tmp.pdf update_info index.info output out.pdf