Improving package managers

I noticed two posts on improving package managers none of which mentions Debtags.

Daniel Burrows mentions various issues:

David Nusinov mentions that the ideal package manager should look like Google, where you search for things using just a simple one line text entry and pick from the results what you want to install.

I should probably do a bit of recap of things that have been going on.

I'll go through that list again:

Agreed. This used to be a bug about this, which has been closed by Debtags more than one year ago. We now have much more useful category data for about 73% of the archive (including experimental), but what we lack is software using it.

Here's a quick trick to try:

  1. install debtags, and this gives you an easy to read text file in /var/lib/debtags/package-tags.
  2. from that file, pick packages that have the tags role::program, scope::application and interface::x11.
  3. display the results, and use the tags works-with::* and use::* to navigate the results.

There is a python-debian package in experimental that has a debtags module you could play with.

Why is that that so far noone has written a simple package manager just for gamers, which uses only the game::* tags?

Do you think Debtags gives you too many tags? Then check out:

To summarise so far, we not only do have better categories, but also a number of cool algorithms to use them, and some interface prototypes as well. Just don't expect me to write a package manager as well: that's a job that so far I decided to leave to someone else. adept gave it a try, with positive results.

Indeed, Xapian for example. I use it as part of the backend of the Debtags smart search, and here's our Xapian-powered normal keyword based package search interface which does stemming, indexing and all you want to ask from a serious full text index.

In that page you don't see all the nice features of Xapian, but only the ones that I needed for my Debtags evil plans. Have a look at the documentation and give it a try.

Here is a way to see Xapian's similarity matching in action:

  1. go to the Go tagging! page
  2. click on a random untagged package
  3. the system gives you a rather relevant selection of tags
  4. look at it again: the package was untagged: how could the web engine possibly figure those tags out?

What is happening under the scenes is that:

  1. I ask Xapian: "what packages are similar to this one?".
  2. I aggregate the tags of the resulting packages.
  3. I rank the tags by how many resulting packages have them.

While we are on this topic, why don't we decide that we maintain a Xapian index of our package descriptions in, for example, /var/lib/apt/fulltext/, so that various applications can share it?

Indeed. Anyone would like to implement this little "popcon" tool? Having the data easily accessible locally can encourage people to use them.

The Debtags Go tagging! page already uses popcon data to show the most common untagged packages at the top, with double reason: it shows packages that more people are likely to know (and therefore likely to categorise) and it pushes for the most common packages to be tagged more urgently.

Indeed. Anyone volunteers to implement a prototype? The full unaggregated (but anonymised) popcon data are accessible to every Debian Developer on the host gluck.debian.org in the directory /org/popcon.debian.org/popcon-mail/popcon-entries.

Ideally one can do many interesting things with this concept: besides tag suggestions, one could identify the packages that are most representative of an installed system, and also offer negative suggestions like: "people who have packages like yours usually don't have this package: would you like to remove it?".

There is more than all this that could be done. Recently, almost by accident, I had the idea of querying packages by example, like pointing to a file and find packages that can work with it. I've asked Jeroen to have Mole collect info on all files that could possibly get installed in /usr/lib/mime/packages/ (as suggested by Bernhard R. Link), to see if that prototype can be made more accurate.

Query by similarity would be nice: I don't like this program, but what else do we have that does the same job? This is best implemented using Debtags data, since it directly maps to semantic properties. Note that you don't have to show a single tag to the user to implement this kind of interface. Do we have a way to point at the X window of an application and get the name of the package that installed it? Wouldn't it be about time to have it?

Why don't we have a system updater utility that shows the Debian weather?

Why aren't more people playing with semantic web?

But more generally, the problem with package managers is that we seem to be irrationally compulsive in wanting to make the one and only big easy and complete interface for everyone. Other more reasonable people would tell you that if you have two very different kinds of users you may want to consider having two different user interfaces.

Ubuntu for example installs by default 3 package manager interfaces: Synaptic; the thing that you access from the application menu to add applications to it; and the update manager. Does it sound like a waste? To me it makes lots of sense.

We have lots of interesting, usable metadata; we have algorithms; we have prototypes; we have ideas for lots of cool, implementable features. The question is, are we able to write applications that just combines what is needed from all this treasure to provide the right interface(s) for our base(s) of users?

Even if my English in 2004 wasn't easy to understand, a read here might still be useful.

There is so much really cool stuff to be written, just within reach.