Core Concepts
Purpose
I'm not particularly interested in creating a general purpose operating system. But that's a very broad term.
The operating system on the nintendo ds is purpose built for playing ds games. For this purpose it works well and has more or less the features you would want without anything you wouldn't want. On my ereader the situation is less ideal, but still clearly purpose built for reading.
My interest is in making an operating system for journaling, art, and research. The feeling should be peaceful, playful, simple and coherent system without any of the distractions, inconsistencies ... and practicality that comes with more general systems. A cozy system.
It will not be able to run steam, zoom, or practically any currently existing software. It will not have a web browser. However, it will have tools and systems for photo editing, music creation, writing, reading, playing and making games, and many other such activities. You should be excited to power up this system on a rainy evening with a mug of tea by your side. All components and programs should be designed cohesively to work well with each other. A unified system.
Storage system
I'm calling it a "storage" system instead of a "file" system. This is by far the largest and most experimental feature I have planned. Instead of a traditional filesystem we will use a relational database. This allows for lots of more interesting usage paradigms and in many cases large performance increases.
Traditional filesystem issues
A traditional filesystem functions similarly to a key-value database. The "key"
is the filepath, which is a string containing path separators / to create a
hierarchical structure. Each key must be unique, you can have files /images/a
and /video/a, but not /images/a and /images/a.
The value pointed to by the key is the raw binary blob representing the file's contents as well as a tiny bit of metadata: size, owner, group, permission mode, modification time, and perhaps creation time if you're lucky. There are many things not present here like the "filetype" or for a video the "length in seconds" or "camera lens", or anything else for that matter.
Unknown filetypes
The biggest missing feature is the type information. Without the type information we can only guess, using part of the filename or by reading part of the file, at what sort of data is contained within.
Why should the filetype even be part of the name anyway? Shouldn't it be a
separate field we can read from, like the created time? In other words not only
is the UI leaking irrelevant technical details, but it's leaking technical
details which can be wrong. This leads to confusing situations for
non-technical users in which a file's name can change from IMG.jpg to
IMG.png, but the contents of the file are still encoded as a png.
How do people actually use their computers? I often want to "see all the photos I took in New York last year" or "see the videos recorded on my sony camera". Even a task as simple as "getting a list of all the photos I took this year" is incredibly difficult with a filesystem unless you've happened to diligently move every photo into folders by year.
If you haven't done that, you need a program which will look at every file on your computer, read part of it to determine the file type and then if your program supports that image format's metadata it can possibly extract the time the photo was taken.
Every single image viewer needs to include code for parsing jpeg, png,
tiff, bmp, webp, gif, heic, avif, and many other formats. Libraries
written in dozens of different languages do help with this matter, but
ultimately every single time you wish to browse your photo collection your image
viewer needs to read at least part of your files and call some library to
determine the file type or worse make a guess based on the name and only then
can it hope that the filetype your image is stored with happens to contain the
correct meta data. This is incredibly slow, error prone, and is a large part of
why people feel the need to install and import their photos into large bespoke
"photo managing software". The operating systems have failed them.
No gods, no masters
A filesystem is a hierarchy. When you organize your photos, documents, books, music or anything else you must first devise an artificial structure, filing them by year, by type of media, or perhaps by activity or location. Each item can only be stored in one place, unless duplicates are created. Are your nature videos with your nature photos? Or are they in separate video and photo folders.
The human mind does not work this way. It operates by association. With one item in your mind you can imagine related items as an intricate web of associative thoughts.
"We could not suppose that there is a house besides the particular houses."
- Aristotle
Our world is not fundamentally hierarchical in nature. Everything in nature
exists in a relationship with all else around it. Consider the statement "The
Mississippi is a river". We can not simply express this mathematically as
Mississippi = River. If we could, the terms River and Mississippi would be
interchangeable in all cases. We are conveying a more complex relationship. We
are saying that Mississippi belongs to the category of river. We could use the
set membership symbol Mississippi ∈ River, but even still, there's a bit more
to consider. There is no such thing as "A River" besides all particular rivers.
Our understanding of a river is cobbled together from common aspects of
individual rivers: Fresh water, flowing to another body of water, fed by
glaciers, precipitation, and aquifers. This abstract concept has no concrete
reality and does not exist outside of the particulars. In this view, "River"
comes about from Mississippi (and others like the it).
ECS Inspirations
For better of for worse a computer is a precise and exacting machine which cannot perfectly represent our world, but we can certainly do a bit better than enforcing a strict and arbitrary hierarchy upon everything.
The Entity Component System paradigm is a programming pattern originally created for use in games, as games are often complex interconnected systems with very high performance requirements. Inheritance-based models (especially Object-oriented Design) were proving not suitable in some cases. There are two main goals achieved with this pattern:
- Provide a way to represent complex relationships and interactions while remaining flexible to new and distinctive additions.
- Arrange data in memory to take as much advantage of the CPU cache as possible.
In the ECS pattern Entities are a unique identifier, usually just a long number. To give these entities meaning we associate pieces of data called Components, with these identifiers. Many Components can be composed together to represent complex entities. The final piece of the puzzle is Systems. A System processes a selection of components.
TODO: Code example
https://www.scattered-thoughts.net/writing/sql-needed-structure/
Nice article about why your database should be capable of returning structured data.
Application platform
There should be a well designed "application" platform that allows for displaying text, bitmaps, differing fonts and weights and layouts. All programs should be built on top of this interface.
Let's consider the some common methods for building a new application on Unix: Web apps, TUI apps, and directly with the windowing system (Wayland or X11) often via a GUI toolkit like GTK or QT. Each of these has issues and does not align with the goals of our operating system.
Issues with drawing to a surface directly
The third category, native programs, have the best performance and nearly unlimited potential. You are given a surface upon which to draw anything you please. Maybe you're creating a game and want to show a custom mouse cursor. You can do that. Perhaps you have some crazy new vision of a 3D UI for your application; it can be done. But, there are many drawbacks to this approach.
Creating a simple application with a few basic options is exceedingly complex; you need to define how a button looks and works and specify that the mouse should be shown. You need to create concepts for things like text wrapping and displaying images. Each program might decide separately what a button is and how it works. This makes each program vastly more difficult to learn as you cannot immediately apply your shared knowledge of the system as a whole. It's also pretty terrible for accessibility, will this particular program have a high contrast mode? Can I increase the font size? This cohesion issue started becoming a large problem on early graphical operating systems and in 1987 IBM attempted to publish some guidelines to address the situation.
In WordPerfect, the command to open a file was F7, 3. In Lotus 1-2-3, a file was opened with / (to open the menus), F (for File), R (for Retrieve). In Microsoft Word, a file was opened with Esc (to open the menus), T (for Transfer), L (for Load). In WordStar, ^ K D (to get to the Opening Menu), followed by D.[2]
The solutions to this have been published guidelines and creations of widget libraries like GTK and Qt. These libraries make it much easier to create simple applications and more importantly they allow a wide collection of different programs to have shared functionality and aesthetic. However, they have limitations, each program needs to decide to use a given toolkit. On a normal linux distro you have have several programs written with GTK while others with Qt and further some with other more niche libraries. This leads to a inconsistent experience lacking both design and usability cohesion.
I feel none of these toolkits go far enough to ensure the experience is unified and customizable for the user. The user should be able to globally configure hotkeys for saving, closing, splitting into tabs, or any other common operation. Each program should just know that the user pressed their "save key" rather than which key they pressed in particular. Similarly, they shouldn't have any say over if the user is using a dark theme, a light theme, or if they've decided to make everything green. They should only be able to use "foreground color", "background color", "highlight color", and so on.
Issues with terminal user interfaces
Terminal programs have some structure; you can expect to select text and some universal hotkeys will work. You can exit out of a terminal application and start another one. You can use them over remote connections. I've spent many years using and creating terminal programs because of these useful features, but they come with huge drawbacks.
In the terminal you are strictly limited to monospace text for your program displays, scrolling within an application like vim can only be line by line, getting the mouse position is tricky, displaying images is only possible with obscure hacks, and there are even problems with certain key combinations (Control + Shift) which are simply not possible to bind. Not to mention the abysmal performance of using a strange ancient text protocol for building pseudo-graphical applications.
Issues with web apps
Web browsers have a larger base structure, but it is one based on a document sharing format and an inheritance-based one at that. There are a good number of universal features: you can almost always save an image if you can view it (sadly this is changing), the URL bar gives some information about the state and information being viewed, and a user can open up a different resource / application either in the same tab, a new tab, or even a new window.
But, the web has a whole other host of problems as an application framework. It's vastly too complex, slow by way of needing to transfer the entire program's code nearly every time you load it, and yet still lacks many basic UI widgets needed to build most useful program interfaces. CSS is effectively required because browsers do a poor job of styling content and as a result. It's pretty cool that each person's site can have a different aesthetic, but for an application platform it's very frustrating that each webapp you use have a different control and UI scheme. Each one being unpredictable and feeling out of place compared with the last. It brings back those same issues dealt with in the 80s, but with added performance and correctness issues that come from building all of this on top of a document sharing protocol.
UI Building blocks
I've already hinted at what our solution should look like. A fast binary format for specifying a UI. Each program will specify its Content, Styling, and Layout.
Content
Content will be specified in terms of basic elements: buttons, text fields, paragraphs, images, headings, times, locations, etc. This information is crucial for ensuring consistent behavior, making application development quicker and more focused on the purpose of the application, and also allowing for the development of a useful and powerful screen reader.
Styling
The styling will be dramatically more limited from the application's point of view. Instead of being able to specify exact colors, fonts, and shapes you will select between colors such as "foreground color" or "secondary color", fonts such as "monospace", "title", and "body", and listen for key presses such as "select", "save", "close", "previous selection", etc. The operating system will have good defaults for this and allow users to switch to a dark theme or a theme appropriate for an e-ink screen or larger and darker fonts for limited eye-sight.
Layout
The arrangement of elements within a window should be separately defined from the base styling of these elements (as is the case in CSS). A series of layout primitives can be composed together to create just about any complex UI needed: Stack, Center, Cluster, Sidebar, Switcher, Cover, Grid, Frame, Reel, Imposter, Icon.
TODO: Flesh this out a bit more.
Customizability
It should feel like _your computer, not a computer.
TODO: Expand on ways things can be customized due to the unified application platform.
A different user model
With the recent invention of "personal computers" it's become quite common for a single computer to be used, primarily, by a single person. There are still exceptions to this, but the design of this operating system will be heavily leaning on the "personal" side of things, similar, I suppose to how phones are typically used.
There will still be support for multiple users, mostly in the context of networking, but it will be possible to have several accounts on a single computer. Each user will have their own public/private keypair; similar to ssh keys. These will be generated automatically at the creation of the account. The keys will be used to support a TOFU style networking system. If you and your friend are sitting at a cafe together you should be able to see their computer, add it (accepting the key the first time), and then share with them some photos.
A user on the system should be strictly designed for use by an actual person. We will not be using "system" users as a hacked together permissions model. There will similarly be no true concept of "ownership", but rather just one or more users who have access to read or write a particular piece of data.
These permissions can be set using a query rather than by manually setting the permissions on each new photo, video, etc.
TODO: Explain how sharing content works under the hood.
Push instead of pull
When items are renamed, deleted, created, or modified messages should be sent out to any programs that are listening. In general the system should be designed to avoid polling for updates whenever possible.
Changing configurations, or even editing a photo which is separately open in your photo viewer will show the changes instantly as they are saved.
Window management
Workspaces are good, but tags are better.
TODO: Add details.
Creating a new window
Right clicking on the desktop and selecting new window or pressing `super + enter opens a blank window running a shell.
TODO: Explain what a blank window does and how to launch programs / find items.
Undo instead of confirm
Prompting users to confirm actions constantly is bad design. It's frustrating, slows you down, and with enough time you just develop muscle memory to quickly click confirm which negates any advantages of having the confirmation in the first place.
A better solution is to do actions immediately, but allow undoing them, at least for a reasonable period of time afterwards. Delete an image? We mark it as deleted, but do not actually remove it until either harddrive space becomes limited or a configurable amount of time has passed. Many operating systems do this to some extent, with the trashcan or other mechanisms, but like most things it's not a consistent pattern across all programs and systems. When I close a window, I should be able to undo closing the window. To achieve this we will simply tag the window as closed, but only truly free the memory after some period of time or if memory becomes scarce.
Components
The basic building blocks of the system. Each of these is a tiny piece of unique functionality.
Link
This is the essential feature of the memex. The process of tying two items together is the most important part. - Vannevar Bush
The link is the most important feature.
Each link is simply a source address and a destination address. Every component being used by a resource is addressable. Meaning you can create a link from one header to another, from an image to a video, or anything else for that matter.
Text
A string of text of an arbitrary length without any other markup. Text components have no special semantics and will be presented to in a visually pleasing manner for general reading.
Heading
The heading of a section. Headings can be used to build a table of contents of a document and split the document into sections.
Quote
Resources
Each composite type is a combination of a number components. From a user's perspective these composite types are the actual resources they interact with on their computer.
Documents
A document primarily contains marked-up text, occasionally illustrated with images. While reading a document you should be able to highlight an interesting passage which will then be stored and searchable. I should be able to search any passage I've highlighted and jump from there to the source.
Image
https://phoboslab.org/log/2021/11/qoi-fast-lossless-image-compression
Audio
https://phoboslab.org/log/2023/02/qoa-time-domain-audio-compression
Video
TODO
Tag
TODO
Location
TODO
Dictionary
TODO
Permissions
TODO
Code
https://github.com/yairchu/awesome-structure-editors
Each function or block in a codebase should be stored as it's own row in a table. When browsing a codebase you will have relevant functions above and below the one you are editing, rather than always having them in the order they were written in.
This is similar to how code is currently split into different files by purpose, but more flexible. The editor must do a very good job making it seemless to scroll through functions as if they're in one big file. Perhaps browsing all functions related to "networking" or a lower level database interaction function next to all the functions that call it.
Programs
Search
Alternate name idea: Browse
Tabulate
Alternate name idea: Spreadsheet
Document Editor
Alternate name idea: Write
Document / Book Reader
Alternate name idea: Read
Code Editor
Alternate name idea: Code
Image viewer
Alternate name idea: Images
Image editor
Alternate name idea: Draw
Video player
Alternate name idea: Watch
Video editor
Alternate name idea: Film
Calculator
Alternate name idea: Calculate
Calendar
Clock
Chat
Alternate name idea: Chat
Maps
Alternate name idea: Map
Recorder
Alternate name idea: Record
Music Player
Alternate name idea: Listen
Audio Mixer
Alternate name idea: Volume
System Settings / Stats
Alternate name idea: Monitor
Emulator
Alternate name idea: Emulate
Font Editor
Alternate name idea: Typograph
Weather
Workspace
Alternative name idea: Snapshot
Resources
https://www.nayuki.io/page/designing-better-file-organization-around-tags-not-hierarchies
Someone with similar ideas. Broadly speaking there are two differences between their ideas and my own. They seek to maintain some compatability with existing software and systems, a noble goal, but not one I'm interested in myself. Perhaps as a result of that they use the term "file" where I am using the term "component" and additionally some things feel like they're still thinking with a HFS mindset. Still, this is an extremely inspirational piece of work for me.
https://pspodcasting.net/dan/blog/2019/plan9_desktop.html
A detailed guide / explanation of plan9's interesting features / design ideas. There are lots and lots of great ideas. Plan9 is a huge source of inspiration for me. In particular the way plan9 handles mounting and virtualization is brilliant.
https://files.spritely.institute/papers/petnames.html
A brilliant way to handle human-readable, but decentralized and globally unique naming.
https://www.scattered-thoughts.net/writing/against-sql
Some interesting thoughts about the usefulness of relational databases and the shortcomings of sql. Mistakes I seek to avoid.
https://borretti.me/article/composable-sql
A great article about making sql more composable which addresses one of the bigger issues with it as a declarative language.
https://borretti.me/article/how-australs-linear-type-checker-works
This is how you write a mother-fuckin linear type checker I'll tell you what.
https://borretti.me/article/the-design-space-of-wikis
This borretti has really thought about some things eh? A great article about wiki systems, comparing their features, shortcomings, and design questions. I don't fully agree with all of this one, but it's a great resource.
Operating Systems: Three Easy Pieces
A book about modern operating systems. Topics are broken down into three major conceptual pieces: Virtualization, Concurrency, and Persistence. Includes all major components of modern systems including scheduling, virtual memory management, disk subsystems and I/O, file systems, and even a short introduction to distributed systems.
https://operating-system-in-1000-lines.vercel.app/en/
A guide to writing your own very basic operating system.
https://every-layout.dev/
A book about CSS which describes a number of useful layout primitives. Generally a good idea for our own application platform.
https://bernsteinbear.com/blog/simple-search/
Nice article about implementing a basic search engine.