I touched X11 for the first time in 30 years to run wrangler login. YouTube audio came with it.
What this post is about Here’s what I actually ended up doing: I wanted to run wrangler login on a GUI-less Ubuntu Server for a Cloudflare Workers project I already knew that API Token is the proper solution But I still wanted to see whether X11 forwarding could push the browser window to macOS It worked Then I got carried away, added PulseAudio forwarding, and ended up playing both YouTube video and audio on the macOS side Environment: macOS host VMware Fusion Ubuntu Server 24.04 guest I was working on a Cloudflare Workers project and needed to run: wrangler login That launches a browser for OAuth authentication. The problem was simple: my working environment was Ubuntu Server, not Ubuntu Desktop. No GUI, no local browser, no OAuth screen. At that point there were two obvious options: Method Description API Token The proper way. Generate a token in Cloudflare Dashboard and export it X11 forwarding Push the Ubuntu-side browser window to macOS I knew from the beginning that API Token was the sane answer. I used X11 anyway. Partly because I wanted to know whether it still worked. Partly because it had been about 30 years since I last touched X11, and apparently I make bad decisions in historically accurate ways. brew install --cask xquartz open -a XQuartz sudo apt install -y xauth x11-apps sudo vi /etc/ssh/sshd_config Uncomment or set: X11Forwarding yes X11DisplayOffset 10 Then restart SSH: sudo systemctl restart sshd X11DisplayOffset tells sshd which display number to start using for forwarded X11 sessions. The default is 10. In practice, $DISPLAY often ends up around localhost:10.0, unless that display number is already occupied. From macOS, connect from XQuartz's xterm first: ssh -Y user@ubuntu-vm Then on Ubuntu: echo $DISPLAY xclock & If a clock window appears on your Mac, X11 forwarding is working. In my environment, an already-open Terminal.app session did not reliably get a usable $DISPLAY, while XQuartz's xterm worked consistently. That does not mean Terminal.app is fundamentally incompatible. Possible causes include: XQuartz was installed but I had not fully logged out and back in yet shell startup files like ~/.bashrc or ~/.zshrc were overwriting $DISPLAY So if X11 forwarding behaves strangely, trying XQuartz's own xterm is a fast sanity check. My first attempt was the obvious one: firefox & That failed with: X11 connection rejected because of wrong authentication. On Ubuntu 22.04 and later, Firefox is usually delivered via Snap. Snap isolation and X11 forwarding do not get along particularly well in this setup. You can confirm it with: which firefox # /snap/bin/firefox So I switched to a non-Snap build. firefox-esr with APT pinning I used the mozillateam PPA and installed firefox-esr. Important detail: if you do this carelessly, later apt install / apt upgrade operations may drag you back toward Ubuntu's Firefox transition package and the Snap path. So I pinned it explicitly. sudo add-apt-repository -y ppa:mozillateam/ppa sudo tee /etc/apt/preferences.d/mozilla-firefox << 'EOF' Package: firefox* Pin: release o=LP-PPA-mozillateam Pin-Priority: 1001 Package: firefox* Pin: release o=Ubuntu Pin-Priority: -1 EOF sudo apt update sudo apt install -y firefox-esr A couple of notes: this mozillateam PPA is not the same thing as Mozilla's own packages.mozilla.org APT repository I used it here because it solved the practical problem at hand Pin-Priority: 1001 strongly prefers that source Pin-Priority: -1 blocks Ubuntu's matching package candidates Then: firefox-esr & And this time the browser window actually appeared on macOS. I also got these warnings: No matching fbConfigs or visuals found glx: failed to create drisw screen They looked dramatic, but in practice they just meant GPU acceleration was not happening. The browser still worked. wrangler login finally works Now the original goal: wrangler login Firefox ESR opened, Cloudflare's authentication page appeared, and login completed successfully. At that point the real task was done. Which should have been the end. Instead, my brain immediately asked a much worse question: If Firefox is already showing up on macOS, can I make YouTube audio come through too? This is how projects rot. At this point YouTube video rendered fine over X11. Audio did not. That is expected, because X11 was never designed to forward audio. window drawing commands keyboard input mouse input not forward: audio So if I wanted audio too, I needed a second path. The idea was: run a PulseAudio server on macOS expose it to Ubuntu through SSH reverse port forwarding tell Firefox on Ubuntu to use that forwarded PulseAudio endpoint Like this: brew install pulseaudio mkdir -p ~/.config/pulse Then run it in the foreground: pulseaudio --load="module-native-protocol-tcp auth-anonymous=1 auth-ip-acl=127.0.0.1" \ --exit-idle-time=-1 --daemonize=no You may see warnings like: W: [] caps.c: Normally all extra capabilities would be dropped now... W: [] socket-util.c: IP_TOS failed: Invalid argument In my case they did not break anything. auth-anonymous=1 is only acceptable here because this was a temporary setup behind SSH tunneling and limited to localhost. Exposing it more broadly would be reckless. Yes, this is sketchy. No, I would not do this on a real server. Also worth noting: PulseAudio is still usable here, but its own maintainers have said development has slowed down considerably, with bigger new work moving toward PipeWire / WirePlumber. For this kind of hack, it is still convenient. sudo apt install -y ubuntu-restricted-extras ffmpeg libavcodec-extra pulseaudio This post assumes Ubuntu Server 24.04. On Ubuntu Desktop 24.04, PipeWire / pipewire-pulse is the more standard baseline, so blindly adding PulseAudio can make the setup messier. From macOS: ssh -Y -R 24713:localhost:4713 user@ubuntu-vm Meaning: Option Purpose -Y Trusted X11 forwarding -R 24713:localhost:4713 Reverse forward Ubuntu port 24713 to macOS PulseAudio port 4713 The 24713 port number was not magical. I just took PulseAudio's default 4713 and added 20000 so it would be easy to recognize and unlikely to collide. -Y and not -X? Option Behavior -X Untrusted X11 forwarding. Safer, but some apps break -Y Trusted X11 forwarding. Less restricted, more dangerous This part matters: Only use -Y with machines you fully control. A trusted X11 client can do nasty things, including keystroke monitoring. On your own VM behind your own SSH connection, fine. On someone else's server, absolutely not. On Ubuntu: export PULSE_SERVER=tcp:localhost:24713 firefox-esr & At that point, both YouTube video and audio came through on the macOS side. Against all dignity, it worked. A fair question here is: X11 and PulseAudio are two different protocols on two different forwarding paths. Why didn't the audio drift badly? The answer is that Firefox handled synchronization, not X11 or PulseAudio. the media stream contains timestamps Firefox schedules video frames and audio chunks against those timestamps both streams are delayed by the tunnel, but roughly in similar ways so the playback remains acceptably synchronized So no, there is no elegant protocol-level sync mechanism here. This is application-layer behavior carrying the whole mess on its back. Item Status Notes X11 forwarding OK-ish Encrypted through SSH PulseAudio forwarding OK-ish Also inside SSH auth-anonymous=1 Risky unless contained Keep it local and temporary auth-ip-acl=127.0.0.1 Good idea Restricts access to localhost -Y Dangerous Use only with hosts you trust completely SSH auth Critical Use keys, not password login The weak point here is not X11 itself. It is you doing something sloppy with trust boundaries. As usual. The design is old, backwards-seeming, and slightly cursed. And yet: the display server lives on the machine with the screen the application runs somewhere else the UI still shows up where you need it That idea is ancient, but not dead. SSH handled encryption and tunneling X11 handled remote windows PulseAudio handled audio transport Firefox glued the user experience together No single part was especially elegant. Together, they were good enough to get something undeniably stupid working. Let me be very clear: If your real goal is just to use Cloudflare Workers cleanly on a headless machine, use API Token. That remains the proper answer. This post is about the side quest. I started out trying to solve a boring problem: run wrangler login on Ubuntu Server without a local GUI I ended up with something much dumber and much more entertaining: X11-forwarded Firefox on macOS successful Cloudflare OAuth login PulseAudio tunneled over SSH YouTube video and audio playing through the host machine Was it necessary? No. Was it the right solution? Also no. Did it work? Annoyingly, yes. Moral of the story: Sometimes the correct solution is boring. The fun solution is X11 + PulseAudio + questionable life choices.
