Command Line Follies

Why use the command line?

One of the less fun parts of the creative process of bringing the Coiled Spring into the world is the fiddly bits of using Audacity to assemble the final product. No smear on Audacity, which is a great tool, it’s just that after doing the actual recording, wouldn’t it be nice to be able to just press a button and have a complete episode assembled and ready to upload?

Well, I’ve been fiddling around with some command-line FOSS tools, most of which are already installed as part of the Ubuntu Linux distribution, and I’ve been able to achieve what I wanted, at least partly. Here are the tools I ended up using:

  • espeak – for voice synthesis
  • SoX – for audio creation and manipulation
  • ImageMagick – for graphic creation and manipulation
  • eyeD3 – for MP3 file tagging and cover art embedding
  • bash – for scripting

I’ve added these tools to the list on the How It’s Made page, with descriptions and links.

How did I do it?

I’ll go through these tools one by one at a later date, but first let’s figure out exactly what it is I wanted to achieve. I started with a basic idea, which became more complex as I started to understand what was possible and what I could achieve.

Beginnings

The first idea was to take previously created bits and pieces, and assemble them into a single file. This is easy with SoX.

  1. Provide pre-created audio parts (old bits, or record new bits)
  2. Feed audio parts to SoX, which concatenates them into a WAV file.
  3. Convert WAV to MP3 file using SoX.

Scripting

Actually, this would be a lot easier if the steps could be put in a script which could be run again and again with different commands and settings. So I created a bash script. As this was almost my first attempt at bash scripting, I found ShellCheck, the online bash script checker and debugger, to be invaluable. I also used Geany as my script editor. It’s a simple IDE, but powerful enough for me.

Speech Synthesis

Some of the parts are created using a speech synthesizer, such as the introductory “episode number” bit (an idea I nicked from Escape Pod). So why not have that created in the script?

  1. Provide pre-created audio parts (old bits, or record new bits).
  2. Generate speech parts using espeak.
  3. Feed audio parts to SoX, which concatenates them into a WAV file.
  4. Convert WAV to MP3 file using SoX.

Artwork

I very much enjoy creating the artwork for the podcast, but it has to be said that sometimes I get carried away with distractions rather than creating the podcast itself. (Ironically, the exercise being described here is itself an attempt to automatic tedious parts to allow the actual podcasts to be created and released quicker, but it quickly became a time-sucking procrastination itself. Oh well).

Given that I like a particular theme for the artwork, and it uses the same text arrangement and filter styles, I realized I could use the powerful command-line graphics creation and manipulation tool ImageMagick to create the artwork.

  1. Provide pre-created audio parts (old bits, or record new bits).
  2. Generate speech parts using espeak.
  3. Generate artwork using ImageMagick.
  4. Feed audio parts to SoX, which concatenates them into a WAV file.
  5. Convert WAV to MP3 file using SoX.
  6. Tag MP3 with artwork manually.

Episode number and title

Hold on, the script could ask the user for the episode number and title, which can then be used by the various tools.

  1. Ask user for episode number and title.
  2. Provide pre-created audio parts (old bits, or record new bits).
  3. Generate speech parts using espeak and provided episode number and title.
  4. Generate artwork using using ImageMagick and provided episode number and title.
  5. Feed audio parts to SoX, which concatenates them into a WAV file.
  6. Convert WAV to MP3 file using SoX.
  7. Tag MP3 with artwork and other information manually.

Tagging

There must be a way to tag MP3s at the command line. There is, and it’s called eyeD3.

  1. Ask user for episode number and title.
  2. Provide pre-created audio parts (old bits, or record new bits).
  3. Generate speech parts using espeak and provided episode number and title.
  4. Generate artwork using using ImageMagick and provided episode number and title.
  5. Feed audio parts to SoX, which concatenates them into a WAV file.
  6. Convert WAV to MP3 file using SoX.
  7. Tag MP3 with artwork and other information (artist, album) using eyeD3.

More artwork

At first I was using an existing image of the Coiled Spring logo. This was fine, but I was enjoying the idea of creating everything from scratch as much as possible. ImageMagick has powerful drawing tools based on the SVG vector system, so why not draw the logo directly into the artwork? This took some work using a calculator and graph paper to get the vectors and coordinates correct. It was just like drawing UDGs for my Spectrum, as in days of yore.

I also decided to incorporate the episode number into the artwork more prominently, in the absence of any other ideas. It is surrounded by “rays”, the number of which is dictated by the episode number. The rays are then given my beloved newsprint half-toning, because it’s become a standard part of the palette.

  1. Ask user for episode number and title.
  2. Provide pre-created audio parts (old bits, or record new bits).
  3. Generate speech parts using espeak and provided episode number and title.
  4. Generate artwork text using using ImageMagick and provided episode number and title.
  5. Generate logo using ImageMagick.
  6. Generate rays using ImageMagick and episode number.
  7. Feed audio parts to SoX, which concatenates them into a WAV file.
  8. Convert WAV to MP3 file using SoX.
  9. Tag MP3 with artwork and other information (artist, album) using eyeD3.

Logotones

A long episode needs to be broken up somehow. In the past, I’ve used gimmicky little identifiers, in various styles, to provide structure. These took some time to create, so I decided to automatically generate a logotone, similar to the manually-created versions used in the first 14 episodes. Another way to simplify and streamline the production process, while maintaining the character.

To do this, I generated a radio-style “You’re listening…” using the speech synth, and backed it with some random atmospheric beeping.

  1. Ask user for episode number and title.
  2. Provide pre-created audio parts (old bits, or record new bits).
  3. Generate speech parts using espeak and provided episode number and title.
  4. Generate artwork text using using ImageMagick and provided episode number and title.
  5. Generate logo using ImageMagick.
  6. Generate rays using ImageMagick and episode number.
  7. Generate random beeping using SoX. Generate logotone speech using espeak. Mix together using SoX.
  8. Feed audio parts to SoX, which concatenates them into a WAV file.
  9. Convert WAV to MP3 file using SoX.
  10. Tag MP3 with artwork and other information (artist, album) using eyeD3.

Fading and mixing

In the manually created episodes, I had the theme music (by Kevin MacLeod) duck when the introduction started, then fade out completely over a few seconds. This was quite difficult to achieve with the command line, so in the end instead of a nice fading duck, I just abruptly drop the volume a second before the intro speech. It’s a compromise, but not a bad one. The SoX fade command fades the music completely out a few seconds after that.

Final result

Here’s a summary of the overall process, with extra detail.

  1. Set up
    1. Ask user for episode number and title.
    2. Provide pre-created audio parts (old bits, or record new bits).
    3. Set standard filenames to use
    4. Set variables to use.
  2. Create audio files
    1. Generate speech parts using espeak and provided episode number and title.
    2. Generate logotone.
      1. Generate random beeping using SoX.
      2. Generate logotone speech using espeak.
      3. Mix together using SoX.
    3. Generate intro with music
      1. Drop volume of theme music 20 seconds in (just after the drums start) using SoX.
      2. Add silence of about 21 seconds to start of intro speech SoX.
      3. Mix together using SoX.
  3. Create artwork
    1. Generate logo using ImageMagick.
    2. Generate background image using ImageMagick to create rays and a big episode number.
    3. Generate artwork text using using ImageMagick and provided episode number and title.
    4. Combine artwork and text elements using ImageMagick.
  4. Feed audio parts to SoX, which concatenates them into a WAV file, in the following order:
    1. episode number speech
    2. intro with music (“Hello, and welcome…”)
    3. logotone
    4. part 1
    5. logotone
    6. part 2
    7. logotone
    8. part 3
    9. logotone
    10. outro (“That’s all…”)
    11. music and credits
  5. Convert WAV to MP3 file using SoX.
  6. Tag MP3 with artwork and other information (artist, album) using eyeD3.
  7. Apply compressor to improve audio balance and quality.
  8. Delete temporary files.

Result

The resulting file has all the right parts, sounds good, the graphics do the job, and the whole thing only takes about 15 seconds to generate. I’m pleased with the result, and I look forward to changing the script to take some recorded podcast content and combine it with the automatically generated stuff. That will be the real test, and the whole point of this exercise.

Thoughts

One of the fun parts of the Coiled Spring was the creation of all the fun little bits and pieces, which would differ from episode to episode. Will they be sacrificed in this new drive for efficiency? No, they will remain, when I get the chance to make them, but I don’t want to have them get in the way of actually releasing material. I’d rather have a bunch of cookie-cutter episodes with similar structures than nothing at all. And as I become more proficient with the tools, maybe new features will be added to keep things interesting.

In the meantime, I will use this new system to bash out (geddit???!?!) some episodes while I have the urge.

What’s Next?

I’m sure the script as it stands has some inefficiencies in it. I have some items I would like to fix if I get the chance, or if anyone fancied lending me a hand.

  • Get rid of so many temporary files by using pipes and other things to feed results of one command to another command directly
  • Tweak the settings to make the voice a little better (the espeak voice, obviously, ahem)
  • Tweak the compression settings to improve the audio quality.
  • Tweak the sample rate to improve the audio quality.
  • Generate multiple logotones instead of just having one repeated. Perhaps say, “End of Part 1 of the Coiled Spring” etc. Use different random beeps in each one, or even a different sound effect.
  • Put the reused files in a different directory, and create the new files in their own directory. This is tricky with bash.
  • Get the script to open a browser, go to OpenStreetMap, find a relevant place (maybe ask for a location (or just a URL to be simple)), use a command-line screenshot tool to grab the map, use ImageMagick to manipulate the image (rotate, crop, filter), and then use that as a background. Maybe just provide a map image, and randomly crop it for each episode.