Thursday 1 June 2023

Amiga module player for DOS - the first draft!

After one week, I've finally pulled off what I previously thought impossible - an entire module player, running in real mode DOS! I'm super proud of this achievement, as it means I can use sample-based music in my projects, and it's as easy as composing in OpenMPT, saving it, and building the project again. What you hear is what you get (70% of the time). With this, I've added volume and speed scaling to Blastlib. One day I might write a series of posts, showing how I made this player!

Here's the source code, and here's the demo above, which plays one of many test modules that I made

Wednesday 24 May 2023

Fractional sample stepping

For the past few weeks, I've been working on an Amiga module player for DOS, using the Sound Blaster. There are numerous reasons I wanted to do this, one being that I wanted to use sample-based music in my projects, and the other being that I have a long background with music trackers (I've been using OpenMPT to produce music for nearly a decade!). I thought it would be a fun challenge, but I couldn't have done anything without the invaluable FMODDOC.ZIP, which tells you everything you need to know about the format.

I managed to get all the module info read in successfully, but then I hit the stumbling block of pitching samples around. It isn't something I was ever able to pull off successfully, but thanks to the documentation, I found a shockingly simple way of pulling it off! Seeing as it was so simple, I decided to make it part of Blastlib, and as a result, module playing will depend on this library. The actual player isn't written yet, I was too excited to share this first :P

First, we need to figure out the scale factor. This involves dividing the target sample rate by your overall mix rate. Let's say Blastlib's currently mixing sounds at 22050hz, and we want to play a sample at 13400hz (because it's a funny number). We could just divide it, but that'll only give us a whole result. So, we need to use 32-bit precision by using 2 16-bit variables (or if you're using long samples, 32-bit variables): one to hold the whole part, and the other to hold the fractional part. These can be calculated as follows:

  • Whole: Target Rate / Mix Rate
  • Fractional: ((Target Rate % Mix Rate) << 16) / Mix Rate
So our resulting formulae will be:
  • Whole: 13400 / 22050 = 0
  • Fractional: ((13400 % 22050) << 16) / 22050 = 9B92h (word), 9B937953h (dword)

For the fractional part, just use the remainder of the previous division. This is especially important if you're using assembly like me. So just use dx from the first division!

Here's an example in assembly language, interpolated from Blastlib: (assuming eax contains the target rate)

xor edx,edx
mov ebx,mix_rate
div ebx ; sample rate / mix rate
mov dword [sample_scale_whole],eax
mov eax,edx ; put remainder into eax
shl eax,16
mov ebx,mix_rate
div ebx ; ((sample rate % mix rate)<<16)/mix rate
mov dword [sample_scale_frac],eax

Now we have the scale factor, we need to bring in another variable, which is simply a counter. This serves no other purpose than to set the carry, but more on that shortly. We'll refer to this counter as the scale counter. In my mixing routine, I was simply stepping through the sample, byte by byte. If we're scaling a sample, we need to perform some extra steps.

First, add the fractional part to the scale counter. If the number overflows, the carry will be set. Then, we need to add the whole part to the sample position, with carry. What does that mean? Well, let's say the scale counter overflowed. The carry would be set, so we add an extra 1 alongside our whole value.

Here's the source code in assembly language (using 32-bit values):
mov eax,[sample_scale_frac]
add dword [sample_scale_count],eax ; add fractional value to counter
mov eax,[sample_position]
adc eax,[sample_scale_whole] ; add the whole part to the sample position + carry
mov dword [sample_position],eax

Pretty neat trick right? That's how you do it! I've made a varispeed demo using this method; the source is available here, and the executable is available here. It works with any .raw file because it uses streaming. Simply pass the filename as the argument when running the file!

Amendment

When it comes to writing a module player, notes are stored as period values. These values have to be converted to a frequency before you can scale the samples! Use the following formula:

Frequency = 7159091 / (Period * 2)

7159091 is the clock speed of an NTSC Amiga machine, rounded up (7.15mhz). This is important because on an Amiga, sample speed is linked directly to the processor's timing. The period value is how many cycles to wait before grabbing the next sample byte.

So, if we wanted to get the sample rate of a sample playing at period 428 (middle C), we'd do the following:

7159091 / (428 * 2) = 8363hz

Thursday 11 May 2023

The Chilli Analogy

  • Game Maker, Fusion, Stencyl - Store-bought chilli. Everything's mixed together already, beef ground, and spices made. It's quick and easy to prepare.
  • C/C++, PyGame, Java - The chilli has to be manually prepared. Although the beef is pre-ground and beans are tinned, the herbs are all separated, and everything has to be put together manually. The result can be more personalized to your tastes.
  • x86 Assembly - You're on a farm, and you're raising some calves. You've started to grow various types of herbs, as well as chilli peppers, which you'll grind and mix with the herbs to create chilli powder. The kidney beans have just been planted. You've made all the cooking utensils yourself by sourcing wood from trees, which you planted a few years ago. Everything has to be assembled by hand, and the result is something purely homemade.

Wednesday 10 May 2023

Super long streaming sounds, and system timer!

This is a follow-up to a post I made yesterday regarding Blastlib, my Sound Blaster library. I mentioned how 4-voice playback was tied to the vertical retrace, and how big a disadvantage it was. Today, I experimented with different methods, and tried using the system timer, which I briefly mentioned. Turns out it's an extremely good solution, and it works so much better than using the retrace period. Of course, it does have a catch, but it's a very small one. More on that later.

By default, the system timer runs at 18.2Hz, which at the time, didn't seem like enough to me. So, I ended up trying different frequencies, and after some experimenting, I discovered that the default speed actually gives the best results! I couldn't believe it; it was the first time I'd ever heard the mixing routine sound so good. Clicks and pops were a huge issue before, probably because the retrace and the buffer didn't match up perfectly. Using the timer is way more accurate, providing a much cleaner sound. It also means that if you're running a game and the main loop slows down, the timer keeps on streaming sounds without stuttering once. A huge improvement!

However, the timer does have a small catch, and it depends on the amount of code that's running at once. You'll have to make very fine adjustments to the buffer size if you want to get the cleanest sound, so more code will require the buffer to be minutely bigger, but only by a few samples!

I've also been making Blastlib way more dynamic. Before, mixing was fixed to 11025Hz, but I've discovered how to use %define in NASM, so I've made the sample rate changeable. Now, all you need to do is add a %define at the beginning of your code, and it'll handle all the buffer-related things for you. I also made an appropriate timer library, so you can use the libraries together to get the best possible sound. All of these things open up some huge possibilities previously unknown to me!

To finish off, here's a demo I made, streaming a 3-minute song at 22kHz with minimal stuttering, using the system timer. Check it out, it's pretty incredible:



Tuesday 9 May 2023

Real-time audio mixing using the Sound Blaster

A few weeks ago, I managed to program a real-time audio mixer that can play up to 4 voices at once through the Sound Blaster! This is also a rare occasion where a theory in my head actually works out. The steps to reach this goal are pretty hilarious, and may be useful to fellow masochists like me.

Immediately, we have a huge advantage in the form of the Sound Blaster's DMA. This allows for audio to be sent to the Sound Blaster and played independently, leaving the computer to do its own thing. The only way I could think to approach this, was to create a buffer of fixed size and tie the mixing routine to the vertical retrace. There's probably a better way, but I'm not known for that, and it works well enough!

So, the first thing to do was measure the samples between each retrace period. I did this by playing a click on every retrace, and recording the output from DOSBox-X to a wave file, which I could then analyze in Audacity. The period turned out to be 625 samples. Because it recorded at 44100Hz, and my target sample rate was 11025Hz, I divided the result by 4 to get the final buffer size of 156 samples (plus one to reduce clicking!).

The next step was to do the exciting stuff, actually mixing the sounds together, dumping the result into the buffer, and then sending it to the Sound Blaster. My plan was to have a bunch of states for each voice, which determined how it would be played. These included:

  • Is the sample playing?
  • Memory offset to the sample (where the sample pointer starts)
  • Current position in the sample (added to the memory offset, to get the current byte)
  • Length of the sample (so the sample can end)
  • Is the sample looping?
With these in mind, I got to programming! Using 4 voices has many advantages, mainly speed, as it's a small amount of voices, and only requires bit-shifting which is much quicker than division. The whole mixing routine is performed in a loop that loops through the entire buffer length. Inside that, is another loop, which goes through each of the 4 voices.

First, we clear the sample buffer, ready to change. Then we need to check if the current voice is even playing. If not, add a null byte (127) and check the next voice. Otherwise, get the sample byte, shift it to the right (by 2 in this case), and add it to the current buffer byte. Then we increase the sample position, and check if we've reached the end of the sample, using our length variable. If it's reached the end, check if it's looping. If it's not looping, stop the sample from playing, otherwise, reset the sample position back to 0. Once all voices have been processed, go to the next buffer byte (bufferbyte bufferbyte bufferbyte).

You're probably wondering why we need to do some of that stuff, so let me explain!

Firstly, summing audio is very easy, but it comes with a catch. To add 2 audio files together, you could just step through each byte and add them together, but it's not that simple. We're using 8-bit samples here, so a number can only range from 0 to 255. If you add 2 numbers together, the result could be over 255, and it would result in clipping. That's why we need to divide (or bit shift) by the amount of samples we're adding together. The volume will be lower, but all sounds will play at once. I used this principle to make a replayer for my music format (Unrefined Sequencing Format), which isn't real-time, but has way more features. But that's a topic for another day! As for 127, that's like the centre point when it comes to amplitude. Each byte is a different point in a waveform, with 0 being at the very top, and 255 at the bottom. Using 127 prevents any clicking when a sample stops!

Once all that is done, we write the whole thing to the Sound Blaster's DMA, and start playback. At this point, the computer can do whatever it wants, without interfering with sample playback, because the Sound Blaster is handling it all!

While this method does work well, it has a few caveats. The most prominent issue is the fact it's tied to the vertical retrace. This means that if your code slows down, the sound stutters, because the buffer size doesn't match anymore. Secondly, the retrace period isn't perfect, so you'll get some slight clicking. I think that's impossible to avoid when using this method. I could try tying it to the system timer instead, but that only runs at 18.2Hz, and results in severe time travel if you mess with it.

That's a lot of explaining, but hopefully that shows you the basics! I won't cover streaming here, because I already made a post about that. This method can also be applied to any kind of sound driver, since it's just adding bytes together!

Friday 5 May 2023

Streaming audio to the Sound Blaster

A few weeks ago, after many hours of patience, I managed to get the Sound Blaster blasting audio for the first time! I've since made some huge advancements, from getting a single sound to play, to mixing 4 sounds in real-time, while other code is running. It's exactly what I wished to do from the very beginning, and I finally managed to pull it off. However, because .com files have a limit of 64kb (this seems to be a running theme here), you can only fit so many sounds. What if I wanted long music loops to play in the background, or play a sound effect that won't fit?

The solution: streaming! The way I approached it might seem a bit strange, but it was the easiest to figure out. First, you need to open the file, and store the resulting handle in a variable. When you open this file, use the handle you saved, and read one byte at a time. This is important, because reading multiple at a time can actually be slower in the long run (like... significantly slower!). It's important to use 32-bit registers here, because that allows for files bigger than 65,536 bytes. In fact, your file can be up to 4,294,967,296 bytes large (4.2gb!) which is super excessive, but necessary to see the advantage of streaming from disk.

The result's pretty incredible, because you can have a super long music loop playing in the background, all while mixing 3 other sounds! For example, a .com file containing Blastlib on its own, streaming an extremely long file, is only 1.6kb in size. I strongly advise streaming one sound at a time though, because the disk access might go a bit mad. I've only tested it on a USB drive, but I plan to try it on my old laptop with a hard drive. I'll update this post when that happens!

View the source code for Blastlib here, and a jazzy example of streaming audio here! (make sure to download the appropriate file alongside it)

Monday 1 May 2023

High quality VGA graphics!

So, this post is a bit of a flex of how code can work first try, especially when you least expect it! The other day, I had the idea of adding a routine to my graphics library that draws graphics using its own set of colours, instead of the VGA's default palette. I already have a function that draws full-screen graphics, but it uses the default palette, so the colour range is hugely limited. Here's a test picture, converted using the default palette:


As you can see, it looks pretty awful. It's using such a reduced range of colours, because there's a lot of red shades in this picture, but it only has so many to choose from. As a result, it looks very blotchy. Dithering would take up too much space in this case, since I'm dealing with .com files, and they can't be above 64kb!

I spent time writing a simple conversion program in Python, based on the one I'd written already, but this time using a shoddy quantization routine that only uses 254 colours instead of 256, because integer division. The layout of the resulting file is ridiculously simple: the first 768 bytes consist of the palette entries, and after that, you have the picture, encoded in RLE. For a brief rundown, it works in pairs of bytes (or words), with the first byte being the colour to use, and the second byte deciding how many times to draw that colour. This saves a lot of space, especially if your image has lots of solid colours. Once I wrote that, I added a routine to my graphics library that gets the palette and draws the image.

Things got exciting, because I hadn't tested if my theory worked, so I assumed that my conversion program did the right thing and my drawing routine worked. Imagine my surprise when I assembled the program for the first time, and saw this:


Isn't it beautiful? Even though it only uses 254 colours, it still looks amazing, and the picture is only 46kb after conversion! I can't get over how good it looks, especially without dithering. This feature probably isn't so useful for pictures with solid colours, but for photos like the above, it works amazingly well. I can't wait to try it out with other pictures!

So there you have it, the miracles of code working the first time! The source code to the conversion program is available here, and the graphics library here.

Amiga module player for DOS - the first draft!

After one week, I've finally pulled off what I previously thought impossible - an entire module player, running in real mode DOS! I'...