Injecting Audio Sidetone using Pipewire or PulseAudio

My Razer headset has a built in microphone which started to fail recently: people started to note that they couldn't really hear me (a particular highlight being "We're having a hard time understanding you because your voice is so feeble").

Whilst it's possible to replace the headset's mic, they do get knocked a lot and I didn't really want to replace it with something that was just going to fail again, so I decided to get a boom mic instead.

It's not a particularly expensive one, but is a nice bit of kit. The only issue is that I'm never quite sure how I'm coming across on it. Am I too loud, am I too quiet? Am I disturbing the rest of the house?

So, I decided I wanted to enable Mic Monitoring (also known as sidetone or audible feedback): having the microphone's input play in my headphones, so that I can hear and moderate my own voice when I speak.

This post talks about enabling sidetone on a Linux box running Pipewire or PulseAudio (the commands are backwards compatible).


Identifying Device Names

Unless told otherwise, the command used to add the sidetone will use the default audio source and sink. That is not necessarily desirable: my laptop seems to list 11 possible input sources, with the default being the mic built into my webcam.

To have the module use a different source, we need to know it's name, so list the available sources with pactl:

pactl list sources

Somewhere in the list will be your device, mine looks like this

Source #10
    State: RUNNING
    Name: alsa_input.usb-3142_Fifine_Microphone-00.mono-fallback
    Description: Fifine Microphone Mono
    Driver: module-alsa-card.c
    Sample Specification: s16le 1ch 44100Hz
    Channel Map: mono
    Owner Module: 29
    Mute: no
    Volume: mono: 81912 / 125% / 5.81 dB
            balance 0.00
    Base Volume: 19944 /  30% / -31.00 dB
    Monitor of Sink: n/a
    Latency: 0 usec, configured 40000 usec
    Flags: HARDWARE HW_MUTE_CTRL HW_VOLUME_CTRL DECIBEL_VOLUME LATENCY 
    Properties:
        alsa.resolution_bits = "16"
        device.api = "alsa"
        device.class = "sound"
        alsa.class = "generic"
        alsa.subclass = "generic-mix"
        alsa.name = "USB Audio"
        alsa.id = "USB Audio"
        alsa.subdevice = "0"
        alsa.subdevice_name = "subdevice #0"
        alsa.device = "0"
        alsa.card = "2"
        alsa.card_name = "Fifine Microphone"
        alsa.long_card_name = "Fifine Microphone at usb-0000:00:14.0-7.3, full speed"
        alsa.driver_name = "snd_usb_audio"
        device.bus_path = "pci-0000:00:14.0-usb-0:7.3:1.0"
        sysfs.path = "/devices/pci0000:00/0000:00:14.0/usb3/3-7/3-7.3/3-7.3:1.0/sound/card2"
        udev.id = "usb-3142_Fifine_Microphone-00"
        device.bus = "usb"
        device.vendor.id = "3142"
        device.vendor.name = "3142"
        device.product.id = "5060"
        device.product.name = "Fifine Microphone"
        device.serial = "3142_Fifine_Microphone"
        device.form_factor = "microphone"
        device.string = "hw:2"
        device.buffering.buffer_size = "176400"
        device.buffering.fragment_size = "88200"
        device.access_mode = "mmap+timer"
        device.profile.name = "mono-fallback"
        device.profile.description = "Mono"
        device.description = "Fifine Microphone Mono"
        module-udev-detect.discovered = "1"
        device.icon_name = "audio-input-microphone-usb"
    Ports:
        analog-input-mic: Microphone (type: Mic, priority: 8700, availability unknown)
    Active Port: analog-input-mic
    Formats:
        pcm

In my case, the name is alsa_input.usb-3142_Fifine_Microphone-00.mono-fallback

Then, we list the output devices

pactl list sinks

Again, we want the name:

Sink #0
    State: IDLE
    Name: alsa_output.usb-C-Media_Electronics_Inc._Mpow-224_20200316-00.analog-stereo
    Description: Audio Adapter (Unitek Y-247A) Analogue Stereo
    Driver: module-alsa-card.c
    Sample Specification: s16le 2ch 44100Hz
    Channel Map: front-left,front-right
    Owner Module: 8
    Mute: no
    Volume: front-left: 53733 /  82% / -5.17 dB,   front-right: 53733 /  82% / -5.17 dB
            balance 0.00
    Base Volume: 65536 / 100% / 0.00 dB
    Monitor Source: alsa_output.usb-C-Media_Electronics_Inc._Mpow-224_20200316-00.analog-stereo.monitor
    Latency: 0 usec, configured 40000 usec
    Flags: HARDWARE HW_MUTE_CTRL HW_VOLUME_CTRL DECIBEL_VOLUME LATENCY 
    Properties:
        alsa.resolution_bits = "16"
        device.api = "alsa"
        device.class = "sound"
        alsa.class = "generic"
        alsa.subclass = "generic-mix"
        alsa.name = "USB Audio"
        alsa.id = "USB Audio"
        alsa.subdevice = "0"
        alsa.subdevice_name = "subdevice #0"
        alsa.device = "0"
        alsa.card = "0"
        alsa.card_name = "Mpow-224"
        alsa.long_card_name = "C-Media Electronics Inc. Mpow-224 at usb-0000:00:14.0-7.2, full speed"
        alsa.driver_name = "snd_usb_audio"
        device.bus_path = "pci-0000:00:14.0-usb-0:7.2:1.0"
        sysfs.path = "/devices/pci0000:00/0000:00:14.0/usb3/3-7/3-7.2/3-7.2:1.0/sound/card0"
        udev.id = "usb-C-Media_Electronics_Inc._Mpow-224_20200316-00"
        device.bus = "usb"
        device.vendor.id = "0d8c"
        device.vendor.name = "C-Media Electronics, Inc."
        device.product.id = "0014"
        device.product.name = "Audio Adapter (Unitek Y-247A)"
        device.serial = "C-Media_Electronics_Inc._Mpow-224_20200316"
        device.string = "front:0"
        device.buffering.buffer_size = "352800"
        device.buffering.fragment_size = "176400"
        device.access_mode = "mmap+timer"
        device.profile.name = "analog-stereo"
        device.profile.description = "Analogue Stereo"
        device.description = "Audio Adapter (Unitek Y-247A) Analogue Stereo"
        module-udev-detect.discovered = "1"
        device.icon_name = "audio-card-usb"
    Ports:
        analog-output: Analog Output (type: Analogue, priority: 9900, availability unknown)
    Active Port: analog-output
    Formats:
        pcm

Enabling Sidetone

The Pulse/Pipewire loopback module allows us to loop a source into an output, with the effect of creating sidetone.

Assuming you only have 1 input and output device (or, if you have more, that the ones you care about are the defaults), you can trigger sidetone by simply loading the module with (almost) default settings:

pactl load-module module-loopback latency_msec=1

If, however, you need to control which devices are used, you can provide those on the command line:

pactl load-module module-loopback \
source=alsa_input.usb-3142_Fifine_Microphone-00.mono-fallback \
sink=alsa_output.usb-C-Media_Electronics_Inc._Mpow-224_20200316-00.analog-stereo \
latency_msec=1

The important thing, in both cases, is the use of latency_msec=1: The default latency is 200ms which is.... unerring.

When you're ready to disable sidetone, you just unload the module

pactl unload-module module-loopback

Scripting

Wrapping the two commands in a script is pretty trivial:

#!/bin/bash
#
#

SOURCE=${SOURCE:-"alsa_input.usb-3142_Fifine_Microphone-00.mono-fallback"}
SINK=${SINK:-"alsa_output.usb-C-Media_Electronics_Inc._Mpow-224_20200316-00.analog-stereo"}

function stop(){
    pactl unload-module module-loopback
}

function start(){
    pactl load-module module-loopback \
    source="$SOURCE" \
    sink="$SINK" \
    latency_msec=1
}

if [ "$1" == "-k" ]
then
   stop
else
   start
fi

I saved this as ~/bin/sinky.sh, so, before a call, I could just run it and get my sidetone. If I need to stop the sidetone, I can run sinky.sh -k


Automating

It's not realistic, though, to think that I'll remember to run sinky.sh before each call, it really needs to be triggered automatically.

Some calling software will, undoubtedly, allow you to specify a script to trigger at meeting start, but Zoom is not one of those.

Luckily, though, there's a meeting-software agnostic approach to this: we can use pactl subscribe to subscribe to a feed of Pulse/Pipewire events and trigger the script when we see the Mic being bound to.

#!/bin/bash
#
#

SOURCE=${SOURCE:-"alsa_input.usb-3142_Fifine_Microphone-00.mono-fallback"}
SINK=${SINK:-"alsa_output.usb-C-Media_Electronics_Inc._Mpow-224_20200316-00.analog-stereo"}

function stop(){
        pactl unload-module module-loopback
}

function start(){
        pactl load-module module-loopback \
        source="$SOURCE" \
        sink="$SINK" \
        latency_msec=1
}

# Initialise
MIC_SOURCE=-1

# Subscribe and read
pactl subscribe | while read a event b type sourcenum
do
        # Is it it a new source-output coming online
        # and have we already triggered for a mic?
        if [ "$event" == "'new'" -a "$type" == 'source-output' -a "$MIC_SOURCE" == "-1" ]
        then
                start
                echo "Mic $sourcenum on"
                MIC_SOURCE=$sourcenum
        # Otherwise, is it our mic going offline
        elif [ "$event" == "'remove'" -a "$type" == 'source-output' -a "$MIC_SOURCE" == "$sourcenum" ]
        then
                stop
                echo "Mic $sourcenum off"
                MIC_SOURCE=-1
        fi
done

The script runs pactl subscribe which (outside of unexpected events) doesn't exit. It then reads the input and checks to see whether we've seen a source come online.

We use the variable MIC_SOURCE as a sort of mutex, ensuring that we only triggger for the first source-output bind that we see. The reason for this, is that we'll otherwise get an infinite loop:

10 Script sees Mic come online 
20 Script triggers loopback
30 This creates a new source-output that comes online
40 Script sees the new source-output come online
50 GOTO 20

If you're on a system with multiple Mics and you sometimes use a range of them, the script may not be complex enough (because it'll see Mic B come online and then enable sidetone from Mic A), but it should be sufficient for most other purposes.

The next thing to do is to have sinky.sh run at startup. For that, we're going to create a small unit file and then have systemd treat sinky as a service (remember to update the path used in ExecStart).

There's a downloadable copy here

[Unit]
Description=Autodetect Mic coming in use and enable sidetone

[Service]
ExecStart=/bin/bash /home/ben/bin/sinky.sh
Restart=always

[Install]
WantedBy=default.target

Save it as sinky.service and then move and enable it

mkdir -p ~/.config/systemd/user
mv sinky.service ~/.config/systemd/user
systemctl --user daemon-reload
systemctl --user start sinky
systemctl --user enable sinky

Systemd should now trigger the script when you log into your system.


One last enhancement - Locking

Although the script does what I want, it occurred to me that there may be times that I don't actually want sidetone (I might, perhaps, find it distracting because of environmental background noise, or find it's lagging too much in video heavy calls).

So, I wanted to add the ability to temporarily disable it.

Manually unloading the loopback module would stop the sidetone, but the script would likely reintroduce it later.

The solution is to give sinky.sh the ability to set a lock:

#!/bin/bash
#
# sinky: enable sidetone for a mic using Pulseaudio or Pipewire
# Author: Ben Tasker
# License: BSD 3 Clause (https://www.bentasker.co.uk/pages/licenses/bsd-3-clause.html)
#

SOURCE=${SOURCE:-"alsa_input.usb-3142_Fifine_Microphone-00.mono-fallback"}
SINK=${SINK:-"alsa_output.usb-C-Media_Electronics_Inc._Mpow-224_20200316-00.analog-stereo"}
LOCKDIR=${LOCKDIR:-"/var/lock/"}
LOCKFILE=${LOCKFILE:-"$LOCKDIR/sinky.$USER.lock"}

function stop(){
        pactl unload-module module-loopback
}

function start(){
        pactl load-module module-loopback \
        source="$SOURCE" \
        sink="$SINK" \
        latency_msec=1
}

# Have we been invoked manually?
case "$1" in
    "disable")
        stop
        touch "$LOCKFILE";;
    "lock")
        touch "$LOCKFILE";;
    "unlock")
        rm "$LOCKFILE" 2> /dev/null;;
    "enable"|"start")
        start;;
    "stop")
        stop;;
esac

# If a command was provided, exit rather than entering the loop
if [ ! "$1" == "" ]
then
    exit
fi

# Initialise
MIC_SOURCE=-1

# Subscribe and read
pactl subscribe | while read a event b type sourcenum
do
        # Is it it a new source-output coming online
        # and have we already triggered for a mic?
        if [ "$event" == "'new'" -a "$type" == 'source-output' -a "$MIC_SOURCE" == "-1" ]
        then
            if [ ! -f "$LOCKFILE" ]
            then
                start
                echo "Mic $sourcenum on"
                MIC_SOURCE=$sourcenum
            else
                echo "Skipping start for $sourcenum - lockfile exists"
            fi

        # Otherwise, is it our mic going offline
        elif [ "$event" == "'remove'" -a "$type" == 'source-output' -a "$MIC_SOURCE" == "$sourcenum" ]
        then
            if [ ! -f "$LOCKFILE" ]
            then        
                stop
                echo "Mic $sourcenum off"
                MIC_SOURCE=-1
            else
                echo "Skipping stop for $sourcenum - lockfile exists"
            fi
        fi
done

There's a copy of this script in my article-scripts repo.

I can now manually invoke sinky with one of the following arguments:

  • disable: Turn off any active sidetone and then set a lock
  • lock: Set a lock
  • unlock: Remove a lock
  • enable or start: Turn on sidetone
  • stop: Turn off any existing sidetone but don't set a lock

Zoom Specific Notes

Having used this with Zoom, there are a few things worth noting.

First, is that Zoom has an option to let it automatically control your mic's volume, you're going to want to disable this:

Screenshot of Zoom's audio settings showing the auto volume adjust being disabled

If Zoom is allowed to automatically adjust mic volume, the sidetone that you hear will often not be representative of the volume that other call participants are hearing - you may boom into your own ears and then be normal volume to everyone else, or you may be mousy quiet and feel the need to repeat yourself despite Zoom having boosted your voice.

You also, probably, do not want Suppress background noise to be on anything but the default (or perhaps low) for similar reasons.

Other than that, it all just seems to work.