distillers
go
https://github.com/JohannesKaufmann/html-to-markdown
https://github.com/go-shiori/go-readability
https://github.com/philipjkim/goreadability
https://github.com/markusmobius/go-domdistiller
https://github.com/rubenfonseca/fastimage
https://github.com/rogchap/v8go
python
https://github.com/codelucas/newspaper
js
https://github.com/postlight/mercury-parser
https://github.com/mozilla/readability
https://www.npmjs.com/package/readability-cli
html to md
https://github.com/JohannesKaufmann/html-to-markdown
this one uses goquery?
https://github.com/rsc/tmp/tree/master/md2html
https://github.com/mattn/godown
about
Been wanting to make a nice web portal on gemini for a while. The most important tool is a reader mode/dom distiller which I have another note detailing. Long story short I've found 2 good ones for go. Both are just go ports of the two best reader mode tools, firefox's and chrome's old one. The firefox one is more actively developed.
The rough process could go one of two ways:
- Accept gemini request, probably via SCGI.
- Download html.
- "Distill" html with one of those libraries.
- Walk the simplified html tree and write gemtext directly.
- Send back gemtext.
Alternatively, more easily, but probably worse:
- Accept gemini request, probably via SCGI.
- Download html.
- "Distill" html with one of those libraries.
- Convert the html 2 markdown using any number of good libraries.
- Convert the markdown to gemtext using my goldmark renderer.
- Send back gemtext.