Package Management - Identifiers 2

Up a Level

It's been a many (seven) months since I started working on the ideas of a package management system. A lot's happened but something got me down the path again and I decided to look at it again. This isn't a straight journey where I write every single post ahead of time but instead its more of ramblings and thoughts.

Or, in the words of one of my favorite puzzle games, The Thirtieth Guest:

> Feeling lonely?

Series

This is going to be a series of posts, but I have no idea of how fast I'll be writing them out. I want to work out my ideas, maybe have a few conversations, and then start to move to more technical concepts.

2023-02-07 Package Management - Introduction

2023-02-08 Package Management - Versions

2023-02-12 Package Management - Identifiers

2023-02-13 Package Management - Dependencies

2023-09-20 Package Management - Identifiers 2

2023-11-30 Package Management - Formats and Registries

Mistakes Were Made

While I was reading the first identifiers post[1], I realized I made a few mistakes. One was that I was using a URL instead of a URN[2] to identify a package:

1: /blog/2023/02/12/package-management-identifiers/

2: https://en.wikipedia.org/wiki/Uniform_Resource_Name

> In contrast, URNs were conceived as persistent, location-independent identifiers assigned within defined namespaces, typically by an authority responsible for the namespace, so that they are globally unique and persistent over long periods of time, even after the resource which they identify ceases to exist or becomes unavailable.

When it comes to identifying a package, that is exactly what we want. A persistent identifier that doesn't point to a specific location. When we want Markdowny, an important part is that we don't want to mandate where to get it, but enough to identify it.

Rehashing as URN Components

URNs always start with `urn:` and a registered “code”. We're going to pretend `bakfu` is the registered code, so that means all the package identifiers would be `urn:bakfu:` then something.

Likewise, I think it is important to know when something is a package identifier (say `pkg:`) or a reference to a package which has ranges (`ref:`).

EDIT: After looking at my notes, I already figured out why I didn't need this so I removed it from the content below.

From earlier posts, I decided on the package format being a component with well-known versions (npm, nuget, deno) and arbitrary ones (domain-based).

`urn:bakfu:npm`

`urn:bakfu:minetest`

`urn:bakfu:authorintrusion.com/spell-check`

In the later examples below, I'm going to cut off the `urn:bakfu:` as noise so `npm`.

Authority

I don't have a lot of problems or doubt with the components above. The next part, on the other hand is a bit more complicated and fluid. As I develop more, I think we should have separate package repositories/registeries instead of putting everything at npmjs.com or nuget.org. However, that leads into potential name and identifier conflicts.

Using the example from my life, when I started Nitride, I made all the namespaces “Nitride” and was going to buy a developer SSL to push it up to nuget.org. However, by the time I got to a stable point, there was then a Nitride already there and that didn't work.

(This is also one reason why I don't like identifiers that aren't namespaced.)

At the same time, I want to keep these URNs relatively “simple” for the 99% cases. In those cases, that means I want to aim for something like:

`npm:markdowny`

`npm:@mfgames-writing/format`

`nuget:Humanizer`

But, if there is a non-default location, the URN needs to have some mechanism that identifies the “authority” of a package. This authority doesn't need to be a URL, just a unique key to distinguish between two packages with the same identifier.

Originally I thought about something like `npm:///markdowny` based on using `file:///` to reference the local file system but allow a domain and directory to be used:

`npm://mfgames.com/markdown`

`npm://mfgames.com/@mfgames-writing/epub2`

`npm://example.org/~user/@example-organization/example-package`

The problem with that is the last one. Where does the directory structure end, where does the package begin? If the entire URL is opaque, then it would be easy to leave as-is, but because this has to translate, there needs to be an unequivocal way of splitting them into an authority and a package identifier.

URL encoding to the rescue.

If we treat the optional directory structure (on the optional authority domain) as a single “unit”, then we can keep the slash to separate authority from the identifier but still keep the identifier in its most common format (NPM uses slashes):

`npm:markdowny`

`npm:///markdowny` (same as above)

`npm://npmjs.com/@mfgames-writing/epub2`

`npm://npmjs.com%2F~dmoonfire/@mfgames-writing/epub2`

I think this would work because we can have a rule that states that the component after the package type has either two slashes for an authority and the next slash ends that component. Everything after that is the full identifier.

Well-Known URLs

I'm fond of the .well-known/[3] infrastructure that has built up over the years. I could easily envision that this could translate into an actual URL to help identify the location of the packages if not known.

3: https://en.wikipedia.org/wiki/Well-known_URI

https://npmjs.com/.well-known/bakfu/npm/npmjs.com%2F~dmoonfire/@mfgames-writing/epub2

The resulting JSON file would give common locations where to find it. So going to the @mfgames-writing/epub well-known URL would then give the URLs for the official servers or locations, such as npmjs.com, my local package repository, an IPFS address, or whatever makes sense.

The reason it won't use query strings like webfinger[4] is because query strings don't play well with static sites and I use static sites pretty heavily.

4: https://webfinger.net/

Qualified Identifiers

I think the ideas from the original identifiers post for qualified identifiers still have merit, but without the `bakfu:` prefix because it ends up just being noise. I think these should be limited and defined ahead of time since there is flexibility on the features.

`java:org.example.hyphenated_name`

`npm:markdowny?version=1.1.0`

`cargo:serde?version=1.0.152&feature[]=derive&feature[]=rc`

`cargo:serde?version=1.0.152&platform=x86_64-unknown-linux-gnu`

Additional Versions

If the package version (as opposed to the content version) is needed, then `&package=1.0.0` can be used. Likewise, if the Bakfu itself needs to be bumped, then `&bakfu=1.2.0` can be used.

I thought about making versions arbitrary, but I couldn't imagine a case where a package would have two different versions for two purposes. Those would be two separate packages in that case.

Overriding Packages

One of the reasons of this exercise is how to do a modification to a library that is already releases but the users learn after the fact that it breaks semantic versioning (SlimMessageBus). One constraint to this is that according to the specification[5], the version of the content cannot change once published.

5: https://semver.org/#what-do-i-do-if-i-accidentally-release-a-backward-incompatible-change-as-a-minor-version

That is why the package has its own version, to indicate that the package metadata such as the dependencies and requirements, can change independently of the contents. In most cases, `package=1.0.0` but a proxy service could add in the modified dependency and call it `package=1.0.1-service` which would then cause the packaging system to prefer the highest version of the package with the same version.

There is some gaps because if we had a Bakfu-aware packaging system, someone could create a package and then keep bumping the package version higher to override anyone's overrides but I think this is a case where an upstream package modification should be blocked if there is an override given. At least until that new version can be reviewed and accepted.

Conclusion

The main reason to have these package identifiers is just to distinguish a package uniquely across the entire ecosystem. I strongly believe there needs to be a decoupling of the location verses identifier because of the other goals in this project: moving from one host to another, caching packages, being able to provide a curated list, blocking malicious packages, and to add after-the-fact changes.

Overall, I think this fits my need for something that is roughly ascetic (`urn:bakfu:npm:markdowny?version=1.0.1`), has a most-common use of something simple and readable (`urn:bakfu:npm:markdowny`), but still allows distributed packages and cases where there are name collisions (`urn:bakfu:nuget://mfgames.com/Nitride`).

It also can be reduced to a common form based on context such as removing the `urn:bakfu:` which makes the simplest version `npm:markdowny`.

Also, it shouldn't be hard to create a normalized rule to turn it into a proper C# or Rust structure for doing the next steps.