Trying xdotool and pyautogui

I tried xdotool and it’s pretty cool. I was automating setting up my Slack status. It finds the Slack window, focuses it, clicks here and there and — tah-dah! — my status is updated. It totally depends on things being in fixed places, but it works.

Then I tried pyautogui planning to port this script and see how it compares with xdotool, but I was surprised right from the start to discover pyautogui doesn’t have any feature to find or operate on windows. So I’ve decided to stick with xdotool and dive deeper. I might try pyautogui some other time.

One thing I liked about pyautogui is it’s «failsafe» — if you slam the mouse and move the cursor to any screen corner, pyautogui will stop running. This is a great feature. One automation of mine with xdotool didn’t quite work (the popup appeared too late) and suddenly xdotool was blindly doing stuff on the wrong places and I couldn’t stop it. No damage was done, but it was scary.

I totally needed something like pyautogui’s, but this required a higher-level language. I went with Ruby. My failsafe is the top or bottom edge of the screen. Before it executes any xdotool command it checks the mouse location and throws an exception. Works pretty well and I recurred to it when I made a mistake and the automation misbehaved.

Lesson learnt: automating the GUI is powerful, but also dangerous. Unlike a shell script, the GUI is an unpredictable environment and things can quickly derail. I like the way pyautogui’s documentation describes this:

Like the enchanted brooms from the Sorcerer’s Apprentice programmed to keep filling (and then overfilling) the bath with water, a bug in your program could make it go out of control. It’s hard to use the mouse to close a program if the mouse cursor is moving around on its own.

It’s a powerful, but double-edged sword. And lots of fun too! Automating the GUI is another dimension and a very useful skill to have.

GUI Automation on Linux

Curious to learn automating GUI stuff on Linux, that is, moving the mouse, clicking on buttons, pressing keys, etc.

I’m planning to learn two: xdotool and pyautogui. I like that you can give an image to pyautogui and it will find it on the screen and click on it. Pretty powerful stuff.

Cropping my webcam with ffmpeg

My laptop is on a riser, at a healthy distance and height from me. But when I do video calls, you see a lot of my surroundings (which aren’t great) and little of me. I sometimes tilt the screen to be vertically centered.

Since I’ve learnt ffmpeg is designed for real-time processing, I decided this would be a good problem to solve and learn with!

I created a loopback video4linux device, then run ffmpeg to feed my built-in laptop webcam’s stream into it, crop it, sharpen it and add a tiny bit of color saturation. It looks pretty good! Not Macbook Pro-webcam-good, but good!

Here’s the original webcam output and the final ffmpeg result, side by side:

Here’s my ffmpeg command, broken up for readability:

ffmpeg -f v4l2 \
  -framerate 30 \
  -video_size 640x480 \
  -input_format mjpeg \
  -i /dev/video1 \
  -vcodec rawvideo \
  -pix_fmt yuv420p \
  -filter:v "crop=494:370:103:111,scale=640:480,setsar=1,unsharp,eq=saturation=1.2" \
  -f v4l2 \
  /dev/video3

The magic happens on -filter:v. If you break it by commas you’ll see the cropping, scaling it back to 640×480, (un)sharpening and color saturation.

I never took the proper time to learn ffmpeg and, while the learning curve is a bit high, it makes perfect sense, it’s very well designed.

I won’t be posting on Saturday and Sundays for now. I wanted to post every day, but I really need to take a break. See you on Monday!

ffmpeg can do that?

I was reading Drew DeVault’s In praise of ffmpeg and read this part:

I was recently hanging out at my local hackerspace and wanted to play some PS2 games on my laptop. My laptop is not powerful enough to drive PCSX2, but my workstation on the other side of town certainly was. So I forwarded my game controller to my workstation via USB/IP and pulled up the ffmpeg manual to figure out how to live-stream the game to my laptop. ffmpeg can capture video from KMS buffers directly, use the GPU to efficiently downscale them, grab audio from pulse, encode them with settings tuned for low-latency, and mux it into a UDP socket.

And I was like… ffmpeg can do that? I didn’t know it was possible to do such a complex thing using just ffmpeg. Fair enough, there’s the USB-over-IP thing for the gamepad, but still.

I commented it to Oliver and he was explaining me some stuff I know very little of and should set aside time to learn. KMS, the role of the Compositor, Wayland (I use X11), etc.

Creating a timelapse video with ffmpeg

Having a collection of image files, you can build a timelapse video with ffmpeg like this:

ffmpeg -r 30 -pattern_type glob -i "*.png" -vcodec libx264 output.mp4

-r 30 is the number of images (frames) per second. For example, -r 1 will show every image for one second. -r 30 could be used for an animation with 30 frames per second.

And here’s Tumbler

I’ve written so much about Tumbler without a single picture. Let me rectify that. Here it is in tablet mode, while I was testing my rotation script. The photo doesn’t do the screen justice, it really looks a lot better.