Some code to rescue your Diigo bookmarks

Sometimes, you notice you have some digital housekeeping to do, so you think: easy, I’ll just write a couple of lines of code to do the job. Bad idea – this will give you another project to abandon in no time, and take lots of time as part of the bargain. But it may have been worth it – here are some Python routines to manage your bookmarks on diigo.com and in your Nextcloud.

Diigo was unique when I first started using it around 2010 after having to leave delicious. Browsing the web, keeping what you see, making you tag and describe what you found, and why you thought it was worth keeping. It became an essential tool in managing my knowledge, and so I have been a paying customer for years.

But the service is a dead end, and one that is becoming more and more uncomfortable for a couple of reasons. There are those things that seemed like a good idea back then – putting most of what you read and what you think of it on a public directory – but that was when social media seemed much more of a promise than a nuisance. I do no longer think that the value this might have to a few hypothetical followers offsets the privacy risks, and I don’t want my bookmarks to become training data for AIs.

The second, and more important, reason for leaving Diigo is that it has started acting up: bookmarking is out of sync, searches do not turn up what I know I have been saving, and the service has been unreliable. So I guess that Diigo is no longer properly maintained – and it has seen no further development for years; As of 2024, it is still running, but I guess they might take it down any time soon.

Doing it by the book: What the official Diigo API can do for you

At least, there is hope. Diigo offers you some tools for exporting to a HTML file or a CSV table, and there is an API. The CSV is usable, so the first things I wrote was code that took that CSV and uploaded it to my Nextcloud. (That is not perfect but it works, although you lose the original creation date of the bookmarks.) But you would still have to delete all your bookmarks by hand to remove them – or, at least, set them to private. The Diigo site gives you options to edit in bulk, but that would still be a piece of work – I gathered a couple of thousand bookmarks over the years. So let’s try the API to automate that.

The Diigo API gives a strong vibe of “I’ll complete that when I get round to it”. Documentation is sketchy, you have to generate an API token to use along with user and password, and it basically offers you only two functions: Get a bookmark, write or overwrite a bookmark.

Failed to delete bookmark 'Nowcasting fatal COVID‐19 infections on a regional level in Germany - Schneble - - Biometrical Journal - Wiley Online Library' (https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000143). Status code: 400 API Message: {

After some experimentation, I found a third, undocumented function to delete bookmarks with a DELETE call, which was the whole point of using the API. This sort of works, but these API calls are heavily rate-limited. After a dozen deletes, you are forced to wait some minutes, then you may try to delete another 10 bookmarks until the API stops you once again. And even with precautions for that, my first scripts kept on crashing.

Surely, there’s a better way?

Well, there is a better way, at least for some of the things you may need to do. The Diigo web page isn’t limited to those two-and-a-half methods of the official API. It uses additional endpoints for getting the actual work done, and an external script can use them, too.

Browser tools, network tab; showing the diigo.com page using a /interact_api/load_user_items endpoint to get bookmarks

It also makes getting things to work a bit easier and more difficult at the same time. Rather than using an API token to authenticate external calls, it uses session cookies. So you will have to authenticated your account in a browser window, solving one of those annoying Click-all-traffic-lights-while-we-move-them-round CAPTCHAs.

But after that, you’re good – the script saves the cookies from that session, and reuses them to authenticate calls to find, list, filter, and modify bookmarks, This method – going through what I call the Interaction API – is much faster, and there is no rate limitation either.

You’re not allowed to do things fast, though

The “Write Bookmark” method requires the code to spoof a browser, otherwise it  doesn’t work. That can easily be done in the header of the API calls, but it is not enough for three more API calls that would speed things up enormously: delete_b, mark_readed, and convert_mode.

These calls return a “403 Forbidden” message, and I cannot get them to work. Which is a shame: The website uses them to bulk-edit and bulk-delete, but as long as I cannot use them, I will have to send 100 single API calls instead of one call with a list of 100 bookmarks.

My guess is that these methods only work on the Diigo server itself. I’ve asked Diigo, when I get a reply, I can try to update.

What the code does…

The main.py script features a simple command-line menu system to allow you to do some work:

  • Set all Diigo bookmarks to private
  • Export, delete, and re-import to Nextcloud
  • Export and import Nextcloud bookmarks in the same CSV format that Diigo uses

…and I’ll still have to do the impressive stuff

We live in the time of AI, and of course, I intended to have an AI language model do some maintenance work for me: Check the bookmarks, augment the description with an AI-generated summary, suggest tags. (Tags! Don’t get me started on tags! Let’s just note that humans are not very consistent in tagging – a sizeable portion of the tags in my Diigo file has been used only once.) Maybe, some day, write an embedding for that summary into a vector database to allow you to query your bookmarks with a chatbot.

But this will have to wait.

Verwandte Artikel:

  • Delicious noch köstlicher machen (Monday, 19. January 2009; Schlagworte: delicious.com, Gedächtnis, Mr. Wong, Onlinejournalismus, Social Bookmarking)
  • untergeek lernt Drupal (Wednesday, 22. July 2009; Schlagworte: Drupal, memory_limit, Openatrium, PHP, Projektmanagement, Speicher, Strato, Tools)

Leave a Reply

Your email address will not be published. Required fields are marked *