kota's memex

Core Concepts

Purpose

I'm not particularly interested in creating a general purpose operating system. But that's a very broad term.

The operating system on the nintendo ds is purpose built for playing ds games. For this purpose it works well and has more or less the features you would want without anything you wouldn't want. On my ereader the situation is less ideal, but still clearly purpose built for reading.

My interest is in making an operating system for journaling, art, and research. The feeling should be peaceful, playful, simple and coherent system without any of the distractions, inconsistencies ... and practicality that comes with more general systems. A cozy system.

It will not be able to run steam, zoom, or practically any currently existing software. It will not have a web browser. However, it will have tools and systems for photo editing, music creation, writing, reading, playing and making games, and many other such activities. You should be excited to power up this system on a rainy evening with a mug of tea by your side. All components and programs should be designed cohesively to work well with each other. A unified system.

Storage system

I'm calling it a "storage" system instead of a "file" system. This is by far the largest and most experimental feature I have planned. Instead of a traditional filesystem we will use a relational database. This allows for lots of more interesting usage paradigms and in many cases large performance increases.

Traditional filesystem issues

A traditional filesystem functions similarly to a key-value database. The "key" is the filepath, which is a string containing path separators / to create a hierarchical structure. Each key must be unique, you can have files /images/a and /video/a, but not /images/a and /images/a.

The value pointed to by the key is the raw binary blob representing the file's contents as well as a tiny bit of metadata: size, owner, group, permission mode, modification time, and perhaps creation time if you're lucky. There are many things not present here like the "filetype" or for a video the "length in seconds" or "camera lens", or anything else for that matter.

Unknown filetypes

The biggest missing feature is the type information. Without the type information we can only guess, using part of the filename or by reading part of the file, at what sort of data is contained within.

Why should the filetype even be part of the name anyway? Shouldn't it be a separate field we can read from, like the created time? In other words not only is the UI leaking irrelevant technical details, but it's leaking technical details which can be wrong. This leads to confusing situations for non-technical users in which a file's name can change from IMG.jpg to IMG.png, but the contents of the file are still encoded as a png.

How do people actually use their computers? I often want to "see all the photos I took in New York last year" or "see the videos recorded on my sony camera". Even a task as simple as "getting a list of all the photos I took this year" is incredibly difficult with a filesystem unless you've happened to diligently move every photo into folders by year.

If you haven't done that, you need a program which will look at every file on your computer, read part of it to determine the file type and then if your program supports that image format's metadata it can possibly extract the time the photo was taken.

Every single image viewer needs to include code for parsing jpeg, png, tiff, bmp, webp, gif, heic, avif, and many other formats. Libraries written in dozens of different languages do help with this matter, but ultimately every single time you wish to browse your photo collection your image viewer needs to read at least part of your files and call some library to determine the file type or worse make a guess based on the name and only then can it hope that the filetype your image is stored with happens to contain the correct meta data. This is incredibly slow, error prone, and is a large part of why people feel the need to install and import their photos into large bespoke "photo managing software". The operating systems have failed them.

No gods, no masters

A filesystem is a hierarchy. When you organize your photos, documents, books, music or anything else you must first devise an artificial structure, filing them by year, by type of media, or perhaps by activity or location. Each item can only be stored in one place, unless duplicates are created. Are your nature videos with your nature photos? Or are they in separate video and photo folders.

The human mind does not work this way. It operates by association. With one item in your mind you can imagine related items as an intricate web of associative thoughts.

"We could not suppose that there is a house besides the particular houses."

  • Aristotle

Our world is not fundamentally hierarchical in nature. Everything in nature exists in a relationship with all else around it. Consider the statement "The Mississippi is a river". We can not simply express this mathematically as Mississippi = River. If we could, the terms River and Mississippi would be interchangeable in all cases. We are conveying a more complex relationship. We are saying that Mississippi belongs to the category of river. We could use the set membership symbol Mississippi ∈ River, but even still, there's a bit more to consider. There is no such thing as "A River" besides all particular rivers. Our understanding of a river is cobbled together from common aspects of individual rivers: Fresh water, flowing to another body of water, fed by glaciers, precipitation, and aquifers. This abstract concept has no concrete reality and does not exist outside of the particulars. In this view, "River" comes about from Mississippi (and others like the it).

ECS Inspirations

For better of for worse a computer is a precise and exacting machine which cannot perfectly represent our world, but we can certainly do a bit better than enforcing a strict and arbitrary hierarchy upon everything.

The Entity Component System paradigm is a programming pattern originally created for use in games, as games are often complex interconnected systems with very high performance requirements. Inheritance-based models (especially Object-oriented Design) were proving not suitable in some cases. There are two main goals achieved with this pattern:

  1. Provide a way to represent complex relationships and interactions while remaining flexible to new and distinctive additions.
  2. Arrange data in memory to take as much advantage of the CPU cache as possible.

In the ECS pattern Entities are a unique identifier, usually just a long number. To give these entities meaning we associate pieces of data called Components, with these identifiers. Many Components can be composed together to represent complex entities. The final piece of the puzzle is Systems. A System processes a selection of components.

TODO: Code example

https://www.scattered-thoughts.net/writing/sql-needed-structure/

Nice article about why your database should be capable of returning structured data.

Application platform

There should be a well designed "application" platform that allows for displaying text, bitmaps, differing fonts and weights and layouts. All programs should be built on top of this interface.

Let's consider the some common methods for building a new application on Unix: Web apps, TUI apps, and directly with the windowing system (Wayland or X11) often via a GUI toolkit like GTK or QT. Each of these has issues and does not align with the goals of our operating system.

Issues with drawing to a surface directly

The third category, native programs, have the best performance and nearly unlimited potential. You are given a surface upon which to draw anything you please. Maybe you're creating a game and want to show a custom mouse cursor. You can do that. Perhaps you have some crazy new vision of a 3D UI for your application; it can be done. But, there are many drawbacks to this approach.

Creating a simple application with a few basic options is exceedingly complex; you need to define how a button looks and works and specify that the mouse should be shown. You need to create concepts for things like text wrapping and displaying images. Each program might decide separately what a button is and how it works. This makes each program vastly more difficult to learn as you cannot immediately apply your shared knowledge of the system as a whole. It's also pretty terrible for accessibility, will this particular program have a high contrast mode? Can I increase the font size? This cohesion issue started becoming a large problem on early graphical operating systems and in 1987 IBM attempted to publish some guidelines to address the situation.

In WordPerfect, the command to open a file was F7, 3. In Lotus 1-2-3, a file was opened with / (to open the menus), F (for File), R (for Retrieve). In Microsoft Word, a file was opened with Esc (to open the menus), T (for Transfer), L (for Load). In WordStar, ^ K D (to get to the Opening Menu), followed by D.[2]

The solutions to this have been published guidelines and creations of widget libraries like GTK and Qt. These libraries make it much easier to create simple applications and more importantly they allow a wide collection of different programs to have shared functionality and aesthetic. However, they have limitations, each program needs to decide to use a given toolkit. On a normal linux distro you have have several programs written with GTK while others with Qt and further some with other more niche libraries. This leads to a inconsistent experience lacking both design and usability cohesion.

I feel none of these toolkits go far enough to ensure the experience is unified and customizable for the user. The user should be able to globally configure hotkeys for saving, closing, splitting into tabs, or any other common operation. Each program should just know that the user pressed their "save key" rather than which key they pressed in particular. Similarly, they shouldn't have any say over if the user is using a dark theme, a light theme, or if they've decided to make everything green. They should only be able to use "foreground color", "background color", "highlight color", and so on.

Issues with terminal user interfaces

Terminal programs have some structure; you can expect to select text and some universal hotkeys will work. You can exit out of a terminal application and start another one. You can use them over remote connections. I've spent many years using and creating terminal programs because of these useful features, but they come with huge drawbacks.

In the terminal you are strictly limited to monospace text for your program displays, scrolling within an application like vim can only be line by line, getting the mouse position is tricky, displaying images is only possible with obscure hacks, and there are even problems with certain key combinations (Control + Shift) which are simply not possible to bind. Not to mention the abysmal performance of using a strange ancient text protocol for building pseudo-graphical applications.

Issues with web apps

Web browsers have a larger base structure, but it is one based on a document sharing format and an inheritance-based one at that. There are a good number of universal features: you can almost always save an image if you can view it (sadly this is changing), the URL bar gives some information about the state and information being viewed, and a user can open up a different resource / application either in the same tab, a new tab, or even a new window.

But, the web has a whole other host of problems as an application framework. It's vastly too complex, slow by way of needing to transfer the entire program's code nearly every time you load it, and yet still lacks many basic UI widgets needed to build most useful program interfaces. CSS is effectively required because browsers do a poor job of styling content and as a result. It's pretty cool that each person's site can have a different aesthetic, but for an application platform it's very frustrating that each webapp you use have a different control and UI scheme. Each one being unpredictable and feeling out of place compared with the last. It brings back those same issues dealt with in the 80s, but with added performance and correctness issues that come from building all of this on top of a document sharing protocol.

UI Building blocks

I've already hinted at what our solution should look like. A fast binary format for specifying a UI. Each program will specify its Content, Styling, and Layout.

Content

Content will be specified in terms of basic elements: buttons, text fields, paragraphs, images, headings, times, locations, etc. This information is crucial for ensuring consistent behavior, making application development quicker and more focused on the purpose of the application, and also allowing for the development of a useful and powerful screen reader.

Styling

The styling will be dramatically more limited from the application's point of view. Instead of being able to specify exact colors, fonts, and shapes you will select between colors such as "foreground color" or "secondary color", fonts such as "monospace", "title", and "body", and listen for key presses such as "select", "save", "close", "previous selection", etc. The operating system will have good defaults for this and allow users to switch to a dark theme or a theme appropriate for an e-ink screen or larger and darker fonts for limited eye-sight.

Layout

The arrangement of elements within a window should be separately defined from the base styling of these elements (as is the case in CSS). A series of layout primitives can be composed together to create just about any complex UI needed: Stack, Center, Cluster, Sidebar, Switcher, Cover, Grid, Frame, Reel, Imposter, Icon.

TODO: Flesh this out a bit more.

Customizability

It should feel like _your computer, not a computer.

TODO: Expand on ways things can be customized due to the unified application platform.

A different user model

With the recent invention of "personal computers" it's become quite common for a single computer to be used, primarily, by a single person. There are still exceptions to this, but the design of this operating system will be heavily leaning on the "personal" side of things, similar, I suppose to how phones are typically used.

There will still be support for multiple users, mostly in the context of networking, but it will be possible to have several accounts on a single computer. Each user will have their own public/private keypair; similar to ssh keys. These will be generated automatically at the creation of the account. The keys will be used to support a TOFU style networking system. If you and your friend are sitting at a cafe together you should be able to see their computer, add it (accepting the key the first time), and then share with them some photos.

A user on the system should be strictly designed for use by an actual person. We will not be using "system" users as a hacked together permissions model. There will similarly be no true concept of "ownership", but rather just one or more users who have access to read or write a particular piece of data.

These permissions can be set using a query rather than by manually setting the permissions on each new photo, video, etc.

TODO: Explain how sharing content works under the hood.

Push instead of pull

When items are renamed, deleted, created, or modified messages should be sent out to any programs that are listening. In general the system should be designed to avoid polling for updates whenever possible.

Changing configurations, or even editing a photo which is separately open in your photo viewer will show the changes instantly as they are saved.

Window management

Workspaces are good, but tags are better.

TODO: Add details.

Creating a new window

Right clicking on the desktop and selecting new window or pressing `super + enter opens a blank window running a shell.

TODO: Explain what a blank window does and how to launch programs / find items.

Undo instead of confirm

Prompting users to confirm actions constantly is bad design. It's frustrating, slows you down, and with enough time you just develop muscle memory to quickly click confirm which negates any advantages of having the confirmation in the first place.

A better solution is to do actions immediately, but allow undoing them, at least for a reasonable period of time afterwards. Delete an image? We mark it as deleted, but do not actually remove it until either harddrive space becomes limited or a configurable amount of time has passed. Many operating systems do this to some extent, with the trashcan or other mechanisms, but like most things it's not a consistent pattern across all programs and systems. When I close a window, I should be able to undo closing the window. To achieve this we will simply tag the window as closed, but only truly free the memory after some period of time or if memory becomes scarce.

Components

The basic building blocks of the system. Each of these is a tiny piece of unique functionality.

Link

This is the essential feature of the memex. The process of tying two items together is the most important part. - Vannevar Bush

The link is the most important feature.

Each link is simply a source address and a destination address. Every component being used by a resource is addressable. Meaning you can create a link from one header to another, from an image to a video, or anything else for that matter.

Text

A string of text of an arbitrary length without any other markup. Text components have no special semantics and will be presented to in a visually pleasing manner for general reading.

Heading

The heading of a section. Headings can be used to build a table of contents of a document and split the document into sections.

Quote

Resources

Each composite type is a combination of a number components. From a user's perspective these composite types are the actual resources they interact with on their computer.

Documents

A document primarily contains marked-up text, occasionally illustrated with images. While reading a document you should be able to highlight an interesting passage which will then be stored and searchable. I should be able to search any passage I've highlighted and jump from there to the source.

Image

https://phoboslab.org/log/2021/11/qoi-fast-lossless-image-compression

Audio

https://phoboslab.org/log/2023/02/qoa-time-domain-audio-compression

Video

TODO

Tag

TODO

Location

TODO

Dictionary

TODO

Permissions

TODO

Code

https://github.com/yairchu/awesome-structure-editors

Each function or block in a codebase should be stored as it's own row in a table. When browsing a codebase you will have relevant functions above and below the one you are editing, rather than always having them in the order they were written in.

This is similar to how code is currently split into different files by purpose, but more flexible. The editor must do a very good job making it seemless to scroll through functions as if they're in one big file. Perhaps browsing all functions related to "networking" or a lower level database interaction function next to all the functions that call it.

Programs

Search

Alternate name idea: Browse

Tabulate

Alternate name idea: Spreadsheet

Document Editor

Alternate name idea: Write

Document / Book Reader

Alternate name idea: Read

Code Editor

Alternate name idea: Code

Image viewer

Alternate name idea: Images

Image editor

Alternate name idea: Draw

Video player

Alternate name idea: Watch

Video editor

Alternate name idea: Film

Calculator

Alternate name idea: Calculate

Calendar

Clock

Chat

Alternate name idea: Chat

Maps

Alternate name idea: Map

Recorder

Alternate name idea: Record

Music Player

Alternate name idea: Listen

Audio Mixer

Alternate name idea: Volume

System Settings / Stats

Alternate name idea: Monitor

Emulator

Alternate name idea: Emulate

Font Editor

Alternate name idea: Typograph

Weather

Workspace

Alternative name idea: Snapshot

Resources

https://www.nayuki.io/page/designing-better-file-organization-around-tags-not-hierarchies

Someone with similar ideas. Broadly speaking there are two differences between their ideas and my own. They seek to maintain some compatability with existing software and systems, a noble goal, but not one I'm interested in myself. Perhaps as a result of that they use the term "file" where I am using the term "component" and additionally some things feel like they're still thinking with a HFS mindset. Still, this is an extremely inspirational piece of work for me.

https://pspodcasting.net/dan/blog/2019/plan9_desktop.html

A detailed guide / explanation of plan9's interesting features / design ideas. There are lots and lots of great ideas. Plan9 is a huge source of inspiration for me. In particular the way plan9 handles mounting and virtualization is brilliant.

https://files.spritely.institute/papers/petnames.html

A brilliant way to handle human-readable, but decentralized and globally unique naming.

https://www.scattered-thoughts.net/writing/against-sql

Some interesting thoughts about the usefulness of relational databases and the shortcomings of sql. Mistakes I seek to avoid.

https://borretti.me/article/composable-sql

A great article about making sql more composable which addresses one of the bigger issues with it as a declarative language.

https://borretti.me/article/how-australs-linear-type-checker-works

This is how you write a mother-fuckin linear type checker I'll tell you what.

https://borretti.me/article/the-design-space-of-wikis

This borretti has really thought about some things eh? A great article about wiki systems, comparing their features, shortcomings, and design questions. I don't fully agree with all of this one, but it's a great resource.

Operating Systems: Three Easy Pieces

A book about modern operating systems. Topics are broken down into three major conceptual pieces: Virtualization, Concurrency, and Persistence. Includes all major components of modern systems including scheduling, virtual memory management, disk subsystems and I/O, file systems, and even a short introduction to distributed systems.

https://operating-system-in-1000-lines.vercel.app/en/

A guide to writing your own very basic operating system.

https://every-layout.dev/

A book about CSS which describes a number of useful layout primitives. Generally a good idea for our own application platform.

https://bernsteinbear.com/blog/simple-search/

Nice article about implementing a basic search engine.