You are on page 1of 10

COMPRESSION AND DECOMPRESSION USING HUFFMAN CONVENTION SYNOPSIS

This project is entitled as COMPRESSION AND DECOMPRESSION USING

HUFFMAN CONVENTION developed in C ++ to, a encoding method. Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code (sometimes called "prefix-free codes") (that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common characters using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. A method was later found to do this in linear time if input probabilities (also known as weights) are sorted. For a set of symbols with a uniform probability distribution and a number of members which is a power of two, Huffman coding is equivalent to simple binary block encoding, e.g., ASCII coding. Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm. Although Huffman coding is optimal for a symbol-by-symbol coding (i.e. a stream of unrelated symbols) with a known input probability distribution, its optimality can sometimes accidentally be over-stated. For example, arithmetic coding and LZW coding often have better compression capability. Both these methods can combine an arbitrary number of symbols for more efficient coding, and generally adapt to the actual input statistics, the latter of which is useful when input probabilities are not precisely known or vary significantly within the stream. In general, improvements arise from input symbols being related (cat is more common than cta).

INTRODUCTION
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code (sometimes called "prefix-free codes") (that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common characters using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. A method was later found to do this in linear time if input probabilities (also known as weights) are sorted. For a set of symbols with a uniform probability distribution and a number of members which is a power of two, Huffman coding is equivalent to simple binary block encoding, e.g., ASCII coding. Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm. Although Huffman's original algorithm is optimal for a symbol-by-symbol coding (i.e. a stream of unrelated symbols) with a known input probability distribution, it is not optimal when the symbol-by-symbol restriction is dropped, or when the probability mass functions are unknown, not identically distributed, or not independent (e.g., "cat" is more common than "cta"). Other methods such as arithmetic coding and LZW coding often have better compression capability: both of these methods can combine an arbitrary number of symbols for more efficient coding, and generally adapt to the actual input statistics, the latter of which is useful when input probabilities are not precisely known or vary significantly within the stream. However, the limitations of Huffman coding should not be overstated; it can be used adaptively,

accommodating unknown, changing, or context-dependent probabilities. In the case of known independent and identically-distributed random variables, combining symbols together reduces inefficiency in a way that approaches optimality as the number of symbols combined increases.

MODULE DESCRIPTION The Huffman coding consists of the following module: 1. Input module 2. Compression module 3. Huffman tree generation 4. Decompression module 5. Output module Input module Input module is used to input the file that is to be compressed. The file name is given as input and the file is searched in the specified path. If the file is found the file is compressed else the file not found error is produced. Compression Module Compression module is used to compress the file. The large size file is compressed into a small size so that it occupies less amount of space in the disk also in the network transmission medium. The compression applied in such a way that there is no less in the information of the file. Huffman Tree generation While Huffman algorithm is applied for compression purpose the symbols are inputted. Along with the symbols inputted the frequency of those symbols are also inputted. Based on the symbol and the frequency of the symbol the Huffman tree is generated.

Decompression module In the decompression module the file compressed is taken as input and the file is decompressed in order to retrieve the original information of the file. Here also Huffman algorithm applied such that there is no loss of information during decompression. Output module In this module the output of the algorithm is displayed. This displays the contents of the decompressed file and also displays the Huffman tree in a graphical format such that the user better understands the working of the algorithm.

SYSTEM ANALYSIS

EXISTING SYSTEM Compared to other methods the Shannon-Fano coding is easy to implement. In practical operation Shannon-Fano coding is not of larger importance. This is especially caused by the lower code efficiency in comparison to Huffman coding. In Lempel-Ziv-77 to keep runtime and buffering capacity in an acceptable range, the addressing must be limited to a certain maximum. Contents exceeding this range will not be regarded for coding and will not be covered by the size of the addressing pointer.

DISADVANTAGES OF EXISTING SYSTEM Run Length encoding is characterized by the following properties:

Simple implementation of each RLE algorithm Compression efficiency restricted to a particular type of contents Mainly utilized for encoding of monochrome graphic data.

PROPOSED SYSTEM The algorithm as described by David Huffman assigns every symbol to a leaf node of a binary code tree. These nodes are weighted by the number of occurrences of the corresponding symbol called frequency or cost. The branches of the tree represent the binary values 0 and 1 according to the rules for common prefix-free code trees. The path from the root tree to the corresponding leaf node defines the particular code word. ADVANTAGES OF PROPOSED SYSTEM The tree structure results from combining the nodes step-by-step until all of them are embedded in a root tree. The algorithm always combines the two nodes providing the lowest frequency in a bottom up procedure. The new interior nodes get the sum of frequencies of both child nodes. Huffman codes are prefix-free binary code trees, therefore all substantial considerations apply accordingly. Codes generated by the Huffman algorithm achieve the ideal code length up to the bit boundary. The maximum deviation is less than 1 bit.

SYSTEM SPECIFICATIONS HARDWARE ENVIRONMENT Processor RAM Hard Disk CD ROM Monitor Keyboard Mouse Operating System : : : : : : : : INTEL 2GB 360 GB 52 X 17 (Samsung) 104 keys Three Button Windows 7

Software Environment Operating System : Windows XP Language Used : C ++ Language

INTRODUCTION TO OPERATING SYSTEM WINDOWS XP: Windows XP Professional gives all the benefits of Windows XP Home Edition, plus additional remote access, security, performance, manageability and multi-lingual features that make the operating system to suit for businesses of all sizes and user who demand the most out of their computing experience. FEATURES OF WINDOWS XP: New user interface makes it easy to find details according

to user needs. A reliable foundation can be count on keeps computer up and running when user needed it most. Network Setup Wizard easily connect & share the computers and devices.

Windows Messenger the ultimate communications & collaboration tool with instant messaging, voice and video conferencing, and application sharing. Windows Media Player for Windows XP single place for finding, playing, organizing, and storing digital media.

Help & Support Center easy to recover from problems and get help and support when needed. File and Folder Management Windows XP provides several new ways to arrange and identify files when viewing them in folders such as My Documents.

C LANGUAGE DESCRIPTION This project is done using the C Language on Windows Platform. C is a powerful tool, which is used to handle various applications in system side programming in an efficient manner. The C Programming Language was devised in the early 1970s as a system implementation language for the nascent Unix Operating System. Derived from the type less language BCPL, it evolved a type structure; created on a tiny machine as a tool to improve a meager programming environment, it has become one of the dominant languages of today. C Language contains various concepts such as pointers, interrupts, multi-tasking, linked list, assembler programming. It also has wide range of header files for supporting different types of functions. More over with C we can generate high-level and low-level programs. The main advantage in using C-language in our project is to carry out the bit-wise operations like AND, EXOR, etc.

ORGANIZATION PROFILE

RMS ELECTRICALS was started in the year 2002 at Periyanaickenpalayam, Coimbatore, especially for all type of Electronic Equipments. The company was started with the motto to deliver quality Electronic Equipments in time at competitive prices.

The company selling 100 percent Electronic Equipments like, sequence Timer, Industrial Automation, Embedded Application Developments. Today, RMS ELECTRICALS has 80 employees, is a leading electronic equipments industry in India.

They had six years of experience in the electronic equipments field. They currently have high sales volume than the previous years. Their major domestic customer is SSM works, Lakshmi engg works, SAI Enterprises, and various from all over India. The main reason behind the
success of RMS Electricals has been the highly trained and motivated work force, achieved by years of team building. These conscientious workers are key contributors to the consistent quality of the products produced.

You might also like