Latest posts for tag devel

Things I learnt in March 2023

  • str.endswith() can take a tuple of possible endings instead of a single string

About JACK and Debian

  • There are 3 JACK implementations: jackd1, jackd2, pipewire-jack.
  • jackd1 is mostly superseded in favour of jackd2, and as far as I understand, can be ignored
  • pipewire-jack integrates well with pipewire and the rest of the Linux audio world
  • jackd2 is the native JACK server. When started it handles the sound card directly, and will steal it from pipewire. Non-JACK audio applications will likely cease to see the sound card until JACK is stopped and wireplumber is restarted. Pipewire should be able to keep working as a JACK client but I haven't gone down that route yet
  • pipewire-jack mostly works. At some point I experienced glitches in complex JACK apps like giada or ardour that went away after switching to jackd2. I have not investigated further into the glitches
  • So: try things with pw-jack. If you see odd glitches, try without pw-jack to use the native jackd2. Keep in mind, if you do so, that you will lose standard pipewire until you stop jackd2 and restart wireplumber.

Heart-driven drum loop

I have Python code for reading a heart rate monitor.

I have Python code to generate MIDI events.

Could I resist putting them together? Clearly not.

Here's Jack Of Hearts, a JACK MIDI drum loop generator that uses the heart rate for BPM, and an improvised way to compute heart rate increase/decrease to add variations in the drum pattern.

It's very simple minded and silly. To me it was a fun way of putting unrelated things together, and Python worked very well for it.

Generating MIDI events with JACK and Python

I had a go at trying to figure out how to generate arbitrary MIDI events and send them out over a JACK MIDI channel.

Setting up JACK and Pipewire

Pipewire has a JACK interface, which in theory means one could use JACK clients out of the box without extra setup.

In practice, one need to tell JACK clients which set of libraries to use to communicate to servers, and Pipewire's JACK server is not the default choice.

To tell JACK clients to use Pipewire's server, you can either:

  • on a client-by-client basis, wrap the commands with pw-jack
  • to change the system default: cp /usr/share/doc/pipewire/examples/ld.so.conf.d/pipewire-jack-*.conf /etc/ld.so.conf.d/ and run ldconfig (see the Debian wiki for details)

Programming with JACK

Python has a JACK client library that worked flawlessly for me so far.

Everything with JACK is designed around minimizing latency. Everything happens around a callback that gets called form a separate thread, and which gets a buffer to fill with events.

All the heavy processing needs to happen outside the callback, and the callback is only there to do the minimal amount of work needed to shovel the data your application produced into JACK channels.

Generating MIDI messages

The Mido library can be used to parse and create MIDI messages and it also worked flawlessly for me so far.

One needs to study a bit what kind of MIDI message one needs to generate (like "note on", "note off", "program change") and what arguments they get.

It also helps to read about the General MIDI standard which defines mappings between well-known instruments and channels and instrument numbers in MIDI messages.

A timed message queue

To keep a queue of events that happen over time, I implemented a Delta List that indexes events by their future frame number.

I called the humble container for my audio experiments pyeep and here's my delta list implementation.

A JACK player

The simple JACK MIDI player backend is also in pyeep.

It needs to protect the delta list with a mutex since we are working across thread boundaries, but it tries to do as little work under lock as possible, to minimize the risk of locking the realtime thread for too long.

The play method converts delays in seconds to frame counts, and the on_process callback moves events from the queue to the jack output.

Here's an example script that plays a simple drum pattern:

#!/usr/bin/python3

# Example JACK midi event generator
#
# Play a drum pattern over JACK

import time

from pyeep.jackmidi import MidiPlayer

# See:
# https://soundprogramming.net/file-formats/general-midi-instrument-list/
# https://www.pgmusic.com/tutorial_gm.htm

DRUM_CHANNEL = 9

with MidiPlayer("pyeep drums") as player:
    beat: int = 0
    while True:
        player.play("note_on", velocity=64, note=35, channel=DRUM_CHANNEL)
        player.play("note_off", note=38, channel=DRUM_CHANNEL, delay_sec=0.5)
        if beat == 0:
            player.play("note_on", velocity=100, note=38, channel=DRUM_CHANNEL)
            player.play("note_off", note=36, channel=DRUM_CHANNEL, delay_sec=0.3)
        if beat + 1 == 2:
            player.play("note_on", velocity=100, note=42, channel=DRUM_CHANNEL)
            player.play("note_off", note=42, channel=DRUM_CHANNEL, delay_sec=0.3)

        beat = (beat + 1) % 4
        time.sleep(0.3)

Running the example

I ran the jack_drums script, and of course not much happened.

First I needed a MIDI synthesizer. I installed fluidsynth, and ran it on the command line with no arguments. it registered with JACK, ready to do its thing.

Then I connected things together. I used qjackctl, opened the graph view, and connected the MIDI output of "pyeep drums" to the "FLUID Synth input port".

fluidsynth's output was already automatically connected to the audio card and I started hearing the drums playing! 🥁️🎉️

Monitoring a heart rate monitor

I bought myself a cheap wearable Bluetooth LE heart rate monitor in order to play with it, and this is a simple Python script to monitor it and plot data.

Bluetooth LE

I was surprised that these things seem decently interoperable.

You can use hcitool to scan for devices:

hcitool lescan

You can then use gatttool to connect to device and poke at them interactively from a command line.

Bluetooth LE from Python

There is a nice library called Bleak which is also packaged in Debian. It's modern Python with asyncio and works beautifully!

Heart rate monitors

Things I learnt:

How about a proper fitness tracker?

I found OpenTracks, also on F-Droid, which seems nice

Why script it from a desktop computer?

The question is: why not?

A fitness tracker on a phone is useful, but there are lots of silly things one can do from one's computer that one can't do from a phone. A heart rate monitor is, after all, one more input device, and there are never enough input devices!

There are so many extremely important use cases that seem entirely unexplored:

  • Log your heart rate with your git commits!
  • Add your heart rate as a header in your emails!
  • Correlate heart rate information with your work activity tracker to find out what tasks stress you the most!
  • Sync ping intervals with your own heartbeat, so you get faster replies when you're more anxious!
  • Configure workrave to block your keyboard if you get too excited, to improve the quality of your mailing list contributions!
  • You can monitor the monitor script of the heart rate monitor that monitors you! Forget buffalo, be your monitor monitor monitor monitor monitor monitor monitor monitor...

Released staticsite 2.x

In theory I wanted to announce the release of staticsite 2.0, but then I found bugs that prevented me from writing this post, so I'm also releasing 2.1 2.2 2.3 :grin:

staticsite is the static site generator that I ended up writing after giving other generators a try.

I did a big round of cleanup of the code, which among other things allowed me to implement incremental builds.

It turned out that staticsite is fast enough that incremental builds are not really needed, however, a bug in caching rendered markdown made me forget about that. Now I fixed that bug, too, and I can choose between running staticsite fast, and ridiculously fast.

My favourite bit of this work is the internal cleanup: I found a way to simplify the core design massively, and now the core and plugin system is simple enough that I can explain it, and I'll probably write a blog post or two about it in the next days.

On top of that, staticsite is basically clean with mypy running in strict mode! Getting there was a great ride which prompted a lot of thinking about designing code properly, as mypy is pretty good at flagging clumsy hacks.

If you want to give it a try, check out the small tutorial A new blog in under one minute.

Really lossy compression of JPEG

Suppose you have a tool that archives images, or scientific data, and it has a test suite. It would be good to collect sample files for the test suite, but they are often so big one can't really bloat the repository with them.

But does the test suite need everything that is in those files? Not necesarily. For example, if one's testing code that reads EXIF metadata, one doesn't care about what is in the image.

That technique works extemely well. I can take GRIB files that are several megabytes in size, zero out their data payload, and get nice 1Kb samples for the test suite.

I've started to collect and organise the little hacks I use for this into a tool I called mktestsample:

$ mktestsample -v samples1/*
2021-11-23 20:16:32 INFO common samples1/cosmo_2d+0.grib: size went from 335168b to 120b
2021-11-23 20:16:32 INFO common samples1/grib2_ifs.arkimet: size went from 4993448b to 39393b
2021-11-23 20:16:32 INFO common samples1/polenta.jpg: size went from 3191475b to 94517b
2021-11-23 20:16:32 INFO common samples1/test-ifs.grib: size went from 1986469b to 4860b

Those are massive savings, but I'm not satisfied about those almost 94Kb of JPEG:

$ ls -la samples1/polenta.jpg
-rw-r--r-- 1 enrico enrico 94517 Nov 23 20:16 samples1/polenta.jpg
$ gzip samples1/polenta.jpg
$ ls -la samples1/polenta.jpg.gz
-rw-r--r-- 1 enrico enrico 745 Nov 23 20:16 samples1/polenta.jpg.gz

I believe I did all I could: completely blank out image data, set quality to zero, maximize subsampling, and tweak quantization to throw everything away.

Still, the result is a 94Kb file that can be gzipped down to 745 bytes. Is there something I'm missing?

I suppose JPEG is better at storing an image than at storing the lack of an image. I cannot really complain :)

I can still commit compressed samples of large images to a git repository, taking very little data indeed. That's really nice!

Mock syscalls with C++

I wrote and maintain some C++ code to stream high quantities of data as fast as possible, and I try to use splice and sendfile when available.

The availability of those system calls varies at runtime according to a number of factors, and the code needs to be written to fall back to read/write loops depending on what the splice and sendfile syscalls say.

The tricky issue is unit testing: since the code path chosen depends on the kernel, the test suite will test one path or the other depending on the machine and filesystems where the tests are run.

It would be nice to be able to mock the syscalls, and replace them during tests, and it looks like I managed.

First I made catalogues of the mockable syscalls I want to be able to mock. One with function pointers, for performance, and one with std::function, for flexibility:

/**
 * Linux versions of syscalls to use for concrete implementations.
 */
struct ConcreteLinuxBackend
{
    static ssize_t (*read)(int fd, void *buf, size_t count);
    static ssize_t (*write)(int fd, const void *buf, size_t count);
    static ssize_t (*writev)(int fd, const struct iovec *iov, int iovcnt);
    static ssize_t (*sendfile)(int out_fd, int in_fd, off_t *offset, size_t count);
    static ssize_t (*splice)(int fd_in, loff_t *off_in, int fd_out,
                             loff_t *off_out, size_t len, unsigned int flags);
    static int (*poll)(struct pollfd *fds, nfds_t nfds, int timeout);
    static ssize_t (*pread)(int fd, void *buf, size_t count, off_t offset);
};

/**
 * Mockable versions of syscalls to use for testing concrete implementations.
 */
struct ConcreteTestingBackend
{
    static std::function<ssize_t(int fd, void *buf, size_t count)> read;
    static std::function<ssize_t(int fd, const void *buf, size_t count)> write;
    static std::function<ssize_t(int fd, const struct iovec *iov, int iovcnt)> writev;
    static std::function<ssize_t(int out_fd, int in_fd, off_t *offset, size_t count)> sendfile;
    static std::function<ssize_t(int fd_in, loff_t *off_in, int fd_out,
                                 loff_t *off_out, size_t len, unsigned int flags)> splice;
    static std::function<int(struct pollfd *fds, nfds_t nfds, int timeout)> poll;
    static std::function<ssize_t(int fd, void *buf, size_t count, off_t offset)> pread;

    static void reset();
};

Then I converted the code to templates, parameterized on the catalogue class.

Explicit template instantiation helps in making sure that one doesn't need to include template code in all sorts of places.

Finally, I can have a RAII class for mocking:

/**
 * RAII mocking of syscalls for concrete stream implementations
 */
struct MockConcreteSyscalls
{
    std::function<ssize_t(int fd, void *buf, size_t count)> orig_read;
    std::function<ssize_t(int fd, const void *buf, size_t count)> orig_write;
    std::function<ssize_t(int fd, const struct iovec *iov, int iovcnt)> orig_writev;
    std::function<ssize_t(int out_fd, int in_fd, off_t *offset, size_t count)> orig_sendfile;
    std::function<ssize_t(int fd_in, loff_t *off_in, int fd_out,
                                 loff_t *off_out, size_t len, unsigned int flags)> orig_splice;
    std::function<int(struct pollfd *fds, nfds_t nfds, int timeout)> orig_poll;
    std::function<ssize_t(int fd, void *buf, size_t count, off_t offset)> orig_pread;

    MockConcreteSyscalls();
    ~MockConcreteSyscalls();
};

MockConcreteSyscalls::MockConcreteSyscalls()
    : orig_read(ConcreteTestingBackend::read),
      orig_write(ConcreteTestingBackend::write),
      orig_writev(ConcreteTestingBackend::writev),
      orig_sendfile(ConcreteTestingBackend::sendfile),
      orig_splice(ConcreteTestingBackend::splice),
      orig_poll(ConcreteTestingBackend::poll),
      orig_pread(ConcreteTestingBackend::pread)
{
}

MockConcreteSyscalls::~MockConcreteSyscalls()
{
    ConcreteTestingBackend::read = orig_read;
    ConcreteTestingBackend::write = orig_write;
    ConcreteTestingBackend::writev = orig_writev;
    ConcreteTestingBackend::sendfile = orig_sendfile;
    ConcreteTestingBackend::splice = orig_splice;
    ConcreteTestingBackend::poll = orig_poll;
    ConcreteTestingBackend::pread = orig_pread;
}

And here's the specialization to pretend sendfile and splice aren't available:

/**
 * Mock sendfile and splice as if they weren't available on this system
 */
struct DisableSendfileSplice : public MockConcreteSyscalls
{
    DisableSendfileSplice();
};

DisableSendfileSplice::DisableSendfileSplice()
{
    ConcreteTestingBackend::sendfile = [](int out_fd, int in_fd, off_t *offset, size_t count) -> ssize_t {
        errno = EINVAL;
        return -1;
    };
    ConcreteTestingBackend::splice = [](int fd_in, loff_t *off_in, int fd_out,
                                        loff_t *off_out, size_t len, unsigned int flags) -> ssize_t {
        errno = EINVAL;
        return -1;
    };
}

It's now also possible to reproduce in the test suite all sorts of system-related issues we might observe in production over time.

Software development links

Next time we'll iterate on Himblick design and development, Raspberry Pi 4 can now run plain standard Debian, which should make a lot of things easier and cleaner when developing products based on it.

Somewhat related to nspawn-runner, random links somehow related to my feeling that nspawn comes from an ecosystem which gives me a bigger sense of focus on security and solidity than Docker:

I did a lot of work on A38, a Python library to deal with FatturaPA electronic invoicing, and it was a wonderful surprise to see a positive review spontaneously appear! ♥: Fattura elettronica, come visualizzarla con python | TuttoLogico

A beautiful, hands-on explanation of git internals, as a step by step guide to reimplementing your own git: Git Internals - Learn by Building Your Own Git

I recently tried meson and liked it a lot. I then gave unity builds a try, since it supports them out of the box, and found myself with doubts. I found I wasn't alone, and I liked The Evils of Unity Builds as a summary of the situation.

A point of view I liked on technological debt: Technical debt as a lack of understanding

Finally, a classic, and a masterful explanation for a question that keeps popping up: RegEx match open tags except XHTML self-contained tags

Boring restaurants

While traveling around Germany, one notices that most towns have a Greek or Italian restaurant, and they all kind of have the same names. How bad is that lack of fantasy?

Let's play with https://overpass-turbo.eu/. Select a bounding box and run this query:

node
  [cuisine=greek]
  ({{bbox}});
out;

Export the results as gpx and have some fun on the command line:

sed -nre 's/^name=([^<]+).*/\1/p' /tmp/greek.gpx \
   | sed -re 's/ *(Grill|Restaurant|Tavern[ae]) *//g' \
   | sort | uniq -c | sort -nr > /tmp/greek.txt

Likewise, with Italian restaurants, you can use cuisine=italian and something like:

sed -nre 's/^name=([^<]+).*/\1/p' /tmp/italian.gpx \
   | sed -re 's/ *(Restaurant|Ristorante|Pizzeria) *//g' \
   | sort | uniq -c | sort -nr > /tmp/italian.txt

Here are the top 20 that came out for Greek:

    162 Akropolis
     91 Delphi
     86 Poseidon
     78 Olympia
     78 Mykonos
     78 Athen
     76 Hellas
     74 El Greco
     71 Rhodos
     57 Dionysos
     53 Kreta
     50 Syrtaki
     49 Korfu
     43 Santorini
     43 Athos
     40 Mythos
     39 Zorbas
     35 Artemis
     33 Meteora
     29 Der Grieche

Here are the top 20 that came out for Italian, with a sadly ubiquitous franchise as an outlier:

     66 Vapiano
     64 Bella Italia
     59 L'Osteria
     54 Roma
     43 La Piazza
     38 La Dolce Vita
     38 Dolce Vita
     35 Italia
     32 Pinocchio
     31 Toscana
     30 Venezia
     28 Milano
     28 Mamma Mia
     27 Bella Napoli
     25 San Marco
     24 Portofino
     22 La Piazzetta
     22 La Gondola
     21 Da Vinci
     21 Da Pino

One can play a game while traveling: being the first to spot a Greek or Italian restaurant earns more points the more unusual its name is. But beware of being too quick! If you try to claim points for one of the restaurant with the top-5 most common names, you will actually will actually lose points!

Have fun playing with other combinations of areas and cuisine: the Overpass API is pretty cool!

Update:

Rather than running xml through sed, one can export geojson, then parse it with the excellent jq:

jq -r '.features[].properties.name' italian.json \
   | sed -re 's/ *(Restaurant|Ristorante|Pizzeria) *//g' \
   | sort | uniq -c | sort -nr > /tmp/italian.txt
© 2023 Enrico Zini. Generated with staticsite on 2024-12-07 02:00 CET.