Monday, March 7, 2011

Read in Wave Files

In this post I want to show, how to read in Wave files using the programming language C#. This post is not about playing these files etc., but about the exact analysis of the file format, so about how a Wave file is made up and how it can be read byte by byte (for playing, see for example here).

A Wave file is an audiofile, which saves sound frequencies.
It is made up of multiple chunks (blocks). With these a lot can be done, in this post though I will just describe the easiest format (but which is the default format though) of Wave files, consisting of 3 chunks. For the technical specification of the Wave format see this page. On this page (and of course, on many others too), the format is explained very neatly, I just want to sum up this information here and leave out a lot.
A little tip: Wave files with extact the structure described here, can very well be created with the free program Audacity (for example out of MP3 files).

The first 2 entries in a chunk are always the same: With 4 bytes each name and size of the chunk are coded.

The first chunk is named "RIFF". The following size indication (1) describes the size of the whole Wave file - 9, since name and size are not counted for that. The 3. position in the 1. block says "WAVE" (2).

Now comes the 2nd chunk, bearing the name "fmt " (the space is important). The size indication (3) has the value 16, it describes the size of this block. In the next 2 bytes the audio format (4) is saved: 1 describes default saving, other values describe a compression. The next 2 bytes code the number of audio channels (5). The next 4 bytes describe the sampling rate per second (6), saying how many values of the audio signal are saved per second. The next 4 bytes code the byterate (7), describing how many bytes per second need to be read for playing the audio signal. The next 2 bytes represent the number of bytes, which are used to describe a single sample value (taking all audio channels into account) (8). The last 2 bytes of this block code the number of bits (!), used for saving a single sample value of one channel (9).

Then follows the 3rd block, the actual data block.
As always, the first 4 bytes encode name ("data") and size (10) of the block. Then the actual data of the Wave file (11) follows, which are basically audio amplitudes.
The samples are saved in a row, the different audio channels directly follow each other. That means, in the "data" block, we first have the values of sample 1, here first comes audio channel 1, then 2 ... etc, then sample 2 with the same structure and so on. The right number of bytes per sample and channel describe, when interpreted as an integer, the amplitude at the current time.

With that my little description of the Wave format is finished, now comes the code of the C# program, which reads in Wave files which are structured like described above.
Core of the program is the class WaveFile, it provides a function to read in Wave files and saves as an instance the information of the file. The characteristic values mentioned above (number of channels etc.) are marked in the source code with the corresponding numbers.
The function to read the Wave file is LoadWave(), which expects the path of the file.
In this function then 3 times LoadChunk() is called, which reads a block. First the name of the block is analyzed and then determined, what to do.
I hope source code is clear.
As for all projects presented here it holds: The program is just a first hint towards further work.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            WaveFile WF1 = new WaveFile();
            WF1.LoadWave(@"C:\Users\User\Desktop\mix.wav");
        }
    }

    public class WaveFile 
    {
        int FileSize; // 1
        string Format; // 2
        int FmtChunkSize; // 3
        int AudioFormat; // 4
        int NumChannels; // 5
        int SampleRate; // 6
        int ByteRate; // 7
        int BlockAlign; // 8
        int BitsPerSample; // 9
        int DataSize; // 10

        int[][] Data; // 11

        public void LoadWave(string path)
        {
            System.IO.FileStream fs = System.IO.File.OpenRead(path); // open Wave file
            LoadChunk(fs); // read RIFF chunk
            LoadChunk(fs); // read fmt chunk
            LoadChunk(fs); // read data chunk
        }

        private void LoadChunk(System.IO.FileStream fs)
        {
            System.Text.ASCIIEncoding Encoder = new ASCIIEncoding();

            byte[] bChunkID = new byte[4];
            /* read the first 4 bytes, which should be the name */
            fs.Read(bChunkID, 0, 4);
            string sChunkID = Encoder.GetString(bChunkID); // decode the name

            byte[] ChunkSize = new byte[4];
            /* the next 4 bytes code the size */
            fs.Read(ChunkSize, 0, 4);

            if (sChunkID.Equals("RIFF"))
            {
                // what to do with the RIFF chunk:
                // save size in FileSize
                FileSize = System.BitConverter.ToInt32(ChunkSize, 0);
                // determine the format
                byte[] Format = new byte[4];
                fs.Read(Format, 0, 4);
                // should be "WAVE" as string
                this.Format = Encoder.GetString(Format);
            }

            if (sChunkID.Equals("fmt "))
            {
                // in the fmtChunk: Save size in FmtChunkSize
                FmtChunkSize = System.BitConverter.ToInt32(ChunkSize, 0);
                // readout all the other header information
                byte[] AudioFormat = new byte[2];
                fs.Read(AudioFormat, 0, 2);
                this.AudioFormat = System.BitConverter.ToInt16(AudioFormat, 0);
                byte[] NumChannels = new byte[2];
                fs.Read(NumChannels, 0, 2);
                this.NumChannels = System.BitConverter.ToInt16(NumChannels, 0);
                byte[] SampleRate = new byte[4];
                fs.Read(SampleRate, 0, 4);
                this.SampleRate = System.BitConverter.ToInt32(SampleRate, 0);
                byte[] ByteRate = new byte[4];
                fs.Read(ByteRate, 0, 4);
                this.ByteRate = System.BitConverter.ToInt32(ByteRate, 0);
                byte[] BlockAlign = new byte[2];
                fs.Read(BlockAlign, 0, 2);
                this.BlockAlign = System.BitConverter.ToInt16(BlockAlign, 0);
                byte[] BitsPerSample = new byte[2];
                fs.Read(BitsPerSample, 0, 2);
                this.BitsPerSample = System.BitConverter.ToInt16(BitsPerSample, 0);
            }

            if (sChunkID == "data")
            {
                // dataChunk: Save size in DataSize
                DataSize = System.BitConverter.ToInt32(ChunkSize, 0);

                // the first index of data specifies the audio channel, the 2. the sample
                Data = new int[this.NumChannels][];
                // temporary array for reading in bytes of one channel per sample
                byte[] temp = new byte[BlockAlign / NumChannels];
                // for every channel, initialize data array with the number of samples
                for (int i = 0; i < this.NumChannels; i++)
                {
                    Data[i] = new int[this.DataSize / (NumChannels * BitsPerSample / 8)];
                }

                // traverse all samples
                for (int i = 0; i &lt; Data[0].Length; i++)
                {
                    // iterate over all samples per channel
                    for (int j = 0; j < NumChannels; j++)
                    {
                        // read the correct number of bytes per sample and channel
                        if (fs.Read(temp, 0, BlockAlign / NumChannels) > 0)
                        {   // depending on how many bytes were used,
                            // interpret amplite as Int16 or Int32
                            if (BlockAlign / NumChannels == 2)
                                Data[j][i] = System.BitConverter.ToInt16(temp, 0);
                            else
                                Data[j][i] = System.BitConverter.ToInt32(temp, 0);
                        }
                        /* else
                         * other values than 2 or 4 are not treated here
                        */
                    }
                }
            }
        }
    }
}

3 comments:

  1. how can i do this in steps??? plzzzzz i want it

    ReplyDelete
  2. when you run this code whats output ,output how to show please explain this

    ReplyDelete