Friday, October 15, 2010

Compression with C#, Part 1 - Simple Strings

The .Net studio provides a standard class for compressing and decompression data in the gzip format, System.IO.Compression.GZipStream.
gzip is a good, free available format, which - for people interested in theory - uses the Deflate algorithm, similiar to the ZIP format.
The class GZipStream is a stream, as regular stream with it files can be written and read, just that they are immediately compressed or decompressed.
As a first example I want to show an easy method to (de-)compress strings in (or from) a file.
For all following code examples the commands

using System.IO.Compression;
using System.IO;

are needed.

First to compression: The class GZipStream expects a stream in the constructor, which will be used for writing the compressed data to a file, as well as the compression mode (hier CompressionMode.Compress).
As stream we here use a FileStream, which points to the target file. The class GZipStream now can read and write bytes, therefor the string, which is to be compressed, has to be converted to a byte array, what is done by the class ASCIIEncoding.
The resulting array is then written with Write(), the GZipStream compresses the bytes and writes them via the FileStream to the file.
The code looks as follows:

private void CompressString(string uncompressedString)
{
    GZipStream CompressStream = new GZipStream(new FileStream(Application.StartupPath + "\\CompressedString.gz", FileMode.Create), CompressionMode.Compress);
    ASCIIEncoding Encoder = new ASCIIEncoding();
    byte[] UncompressedStringInBytes = Encoder.GetBytes(uncompressedString);
    CompressStream.Write(UncompressedStringInBytes, 0, UncompressedStringInBytes.Length);
    CompressStream.Close();
}

The string passed over to CompressString() is written to the file "CompressedString.gz" in the application folder, .gz is the file extension for gzip files.
The file can decompressed with common compress applications (e.g. WinZip), when opening the decompressed file a text editor one finds the original string.

Now to the reversed case, the decompression of strings from the file:
The class GZipStream now expects again a stream in the constructor, which this time denotes the stream, out of which the compressed data should be read, as well as the compression mode (hier CompressionMode.Decompress).
Again, we pass a FileStream over as stream. The reading of the file is done by a loop, in which every time a byte array is read to a buffer via the function Read() of the GZipStream. The read bytes are then converted to a string via the class ASCIIEncoding, if in one iteration less bytes are read than the buffer can take, the file is at its end and the loop terminates.
The code:

        private string DecompressString(string compressedFile)
        {
            GZipStream DecompressStream = new GZipStream(new FileStream(Application.StartupPath + "\\CompressedString.gz", FileMode.Open), CompressionMode.Decompress);
            byte[] Buffer = new byte[4096];
            ASCIIEncoding Decoder = new ASCIIEncoding();
            int BytesReadCount = 0;
            string DecompressedString = "";

            while (true)
            {
                BytesReadCount = DecompressStream.Read(Buffer, 0, Buffer.Length);
                if (BytesReadCount != 0)
                {
                    DecompressedString += Decoder.GetString(Buffer, 0, BytesReadCount);
                }
                if (BytesReadCount < Buffer.Length)
                    break;
            }

            DecompressStream.Close();
            return (DecompressedString);
        }

The call of this 2 methods could for example look like this:

CompressString("Dies ist ein Test 123456.");
string DecompressedString = DecompressString(Application.StartupPath + "\\CompressedString.gz");

No comments:

Post a Comment