[Data Structure] File Compression Project - compression, Data Structure, file, project

Project name: File compression

Development environment: vs2010

Data structure used:

1, heap

2, huffmantree Huffman tree

3, Huffmancode Huffman coding

4, object-oriented C++ programming language

Thinking:

1. Use small heap to build Huffman tree

2. Use Huffman tree to generate Huffman code

3. Use Huffman code to compress files, Generate compressed files **.compress files, and **.config configuration files for easy decoding

4. Use the configuration file to get the number of occurrences of each character in the file

5. Use The configuration file uses a small heap to build the Huffman tree again

6. Use the Huffman tree established by the configuration file to decode and generate the decompressed file **.uncompress

Establish a large and small heap blog post connection Below

http://blog.csdn.net/shangguan_1234/article/details/52791719

The binary tree storage of the heap structure is
the largest heap: each parent node Are greater than the child nodes.
Minimal heap: each parent node is smaller than the child node.

Here is an example of a small pile.

Compare each child with the value of the parent node. If it is smaller than the parent node, the parent node will move down, and the smaller child will move up.

#pragma once #include  # include using namespace std;// small heap templatestruct Less{ bool operator() (const T& l, const T& r) {return l struct Greater{ bool operator() (const T& l, const T& r) {return l> r; }};template>class Heap{public: Heap() { } Heap(const T* a, size_t size) {for (size_t i = 0; i = 0; --i) {AdjustDown(i);}} void Push(const T& x) {_infosays.push_back(x); AdjustUp(_infosays.size()- 1);} void Pop() {assert(_infosays.size()> 0); swap(_infosays[0], _infosay s[_infosays.size()-1]); _infosays.pop_back(); AdjustDown(0);} const T& Top() {//assert(_infosays.size()> 0); if (!Empty()) {return _infosays[0];}} bool Empty() {return _infosays.empty();} int Size() {return _infosays.size();} void AdjustDown(int root) {size_t child = root * 2 + 1 ; Compare com; while (child <_infosays.size()) {if (child + 1<_infosays.size() && com(_infosays[child + 1], _infosays[child])) {++child;} if ( com(_infosays[child], _infosays[root])) {swap(_infosays[child], _infosays[root]); root = child; child = 2 * root + 1;} else {break;}}} void AdjustUp( int child) {int parent = (child-1) / 2; while (child> 0) {if (Compare()(_infosays[child], _infosays[parent])) {swap(_infosays[parent], _infosays[child ]); child = parent; parent = (child-1) / 2;} else {break;}}} void Print() {for (size_t i = 0; i <_infosays.size(); ++i) { cout << _infosays[i] << "";} c out << endl; }public: vector _infosays;};

Using heap to build Huffman tree

Huffman tree, also known as optimal binary tree, is weighted The binary tree with the shortest path length.
[Greedy Algorithm] means that when solving the problem, it always makes the best choice that seems to be the best at the moment. That is to say, what the greedy algorithm makes is not the overall optimal choice, but the local optimal solution in a sense. Greedy algorithms do not get the overall optimal solution to all problems.

Use greedy algorithm to build Huffman tree

#pragma once #include "Heap.h" #include using namespace std;templatestruct HuffmanTreeNode{ HuffmanTreeNode* _left; HuffmanTreeNode* _right; HuffmanTreeNode* _parent; T _weight; HuffmanTreeNode(const T& x) :_weight(x), _left(NULL), _right(NULL), _parent(NULL) {));templateclass HuffmanTree{ typedef HuffmanTreeNode  Node;public: HuffmanTree() :_root(NULL) {} ~HuffmanTree() {Destory(_root);} template  struct NodeCompare {bool operator()(Node *l, Node *r) { return l->_weight _weight;} }; void CreatTree(const T* a, size_t size, const T& invalid) {assert(a); Heap> minHeap; for (size_t i = 0; i  1) {Node* left = minHeap.Top(); minHeap.Pop( ); Node* right = minHeap.Top(); minHeap.Pop(); Node* parent = new Node(left->_weight + right->_weight); parent->_left = left; parent->_right = right; left->_parent = parent; right->_parent = parent; minHeap.Push(parent);} _root = minHeap.Top();} Node* GetRootNode() {return _root;} //void Destory(Node* root) //{ // if (root) // {// Destory(root->_left); // Destory(root->_right); // delete root; // root = NULL; //} //} void Destory(Node* root) {if (root==NULL) {return;} if(root->_left==NULL&&root->_right==NULL) {delete root; root=NULL;} else {Destory(root-> _left); Destory(root->_right);} }private: HuffmanTreeNode* _root;};

Using Huffman tree to generate Huffman code

Code Implementation

void _GenerateHuffmanCode(HuffmanTreeNode* root)//Create a Huffman code { if (root == NULL) {return;} _GenerateHuffmanCode(root->_left); _GenerateHuffmanCode(root->_right); if (root->_left == NULL && root->_right == NULL) {HuffmanTreeNode * cur = root; HuffmanTreeNode* parent = cur->_parent; string& code = _infos[cur->_weight._ch]._code; while (parent) {if (parent->_left == cur)//to Go left +0 {code += '0';} else if (parent->_right == cur)//Go right +1 {code += '1';} cur = parent; parent = cur->_parent ;} //The search code starts from the leaf node. reverse(code.begin(), code.end());}} //Recursively implement Huffman code void _GenerateHuffmanCode_R(HuffmanTreeNode* root,string code)//Create Huffman code {if(root= =NULL) return; _GenerateHuffmanCode_R(root->_left,code+'0'); _GenerateHuffmanCode_R(root->_right,code+'1'); if(root->_left==NULL&&root->_right==NULL) {_infos[ root->_weight._ch]._code=code;} }

Using Huffman encoding to compress the file to generate a compressed file **.compress file, and **.config configuration file for easy decoding

< /p>

bool Compress(const char* filename) {//1. Open File, count the number of occurrences of file characters Longtype Charcount = 0; assert(filename); FILE* fOut = fopen(filename, "rb"); Read the file in binary mode, where b is binary. "wb" means to write the file in binary mode assert(fOut); //The difference between opening in binary and text is: opening in text will convert 
 // to 
, which will not be like this in binary Conversion //char ch = fgetc(fOut); int ch = fgetc(fOut); while (ch != EOF) {_infos[(unsigned char)ch]._count++; ch = fgetc(fOut); Charcount++;} // 2. Generate the corresponding huffman code GenerateHuffmanCode(); //3. File compression string compressFile = filename; compressFile += ".compress"; FILE* fwCompress = fopen(compressFile.c_str(), "wb");// Binary write assert(fwCompress); fseek(fOut, 0, SEEK_SET); ch = fgetc(fOut); char inch = 0; int pos = 0; while (!feof(fOut)) {string& code = _infos[(unsigned char)ch]._code; for (size_t i = 0; i > 32, CountStr, 10); fputs(CountStr, fconfig); fputc('
', fconfig); _itoa(Charcount & 0xffffffff, CountStr, 10); //_itoa(Charcount & -1, CountStr, 10); fputs(CountStr, fconfig); fputc('
', fconfig); CharInfo invalid; for (int i = 0; i <256; i++) {if (_infos[i] != invalid) { /* fputc(_infos[i]._ch, fconfig); fputc(',', fconfig); fputc(_infos[i]._count + '0', fconfig); fputc('
', fconfig);* / infoStr=_infos[i]._ch; infoStr+=','; _itoa(_infos[i]._count, CountStr, 10); infoStr+=CountStr; infoStr+='
'; fputs(infoStr.c_str(),fconfig );}} fclose(fOut); fclose(fwCompress); fclose(fconfig); return true; }

Use the configuration file to get every character in the file Number of occurrences

string configfile = filename; configfile += ".config"; FILE* outConfig = fopen(configfile.c_str(), "rb"); assert(outConfig); char ch=0; /*char ch; */ Longtype Charcount = 0; string line = ReadLine(outConfig); Charcount = atoi(line.c_str()); Charcount <<= 32; line.clear(); line = ReadLine(outConfig); Charcount += atoi( line.c_str()); line.clear(); while (feof(outConfig)) //feof() encounters the end of the file, the function value is non-zero, otherwise it is 0. When the data is stored in binary form, the value of -1 may appear, //so the value of -1 (EOF) cannot be used as the eof() function to judge the end of the binary file at this time. {line = ReadLine(outConfig); if (!line.empty()) {ch = line[0]; _infos[(unsigned char)ch]._count += atoi(line.substr(2).c_str()) ; //_infos[(unsigned char)ch]._count += atoi(line.c_str()); line.clear();} else {line ='
';} }

< br>

Using the configuration file to build the Huffman tree again with a small heap

Using the Huffman tree established by the configuration file to decode and generate the decompressed file**.uncompress< /p>

HuffmanTree ht; CharInfo invalid(0); ht.CreatTree(_infos, 256, invalid);/ /Rebuild the tree HuffmanTreeNode* root = ht.GetRootNode(); string UnCompressFile = filename; UnCompressFile += ".uncompress"; FILE* fIn = fopen(UnCompressFile.c_str(), "wb"); string CompressFile = filename ; CompressFile += ".compress"; FILE* fOut = fopen(CompressFile.c_str(), "rb"); int pos = 8; HuffmanTreeNode* cur = root; ch=fgetc(fOut); while (( unsigned char)ch != EOF) //while(1) {--pos; if ((unsigned char)ch &(1 << pos)) { cur = cur->_right;} else {cur = cur->_left;} if (cur->_left == NULL && cur->_right == NULL) {fputc(cur->_weight._ch, fIn); cur = root; Charcount--;} if (pos == 0) {ch = fgetc(fOut); pos = 8;} if (Charcount==0) {break; }}

Uncompress

//Uncompress the file bool UnCompresss(const char* filename) {string configfile = filename; configfile += ".config"; FILE* outConfig = fopen(configfile.c_str(), "rb"); assert(outConfig); char ch=0; /*char ch;*/ Longtype Charcount = 0; string line = ReadLine(outConfig); Charcount = atoi(line.c_str()); Charcount <<= 32; line.clear(); line = ReadLine(outConfig); Charcount += atoi(line.c_str( )); line.clear(); while (feof(outConfig)) //feof() encounters the end of the file, the function value is non-zero, otherwise it is 0. When the data is stored in binary form, the value of -1 may appear, //so the value of -1 (EOF) cannot be used as the eof() function to judge the end of the binary file at this time. {line = ReadLine(outConfig); if (!line.empty()) {ch = line[0]; _infos[(unsigned char)ch]._count += atoi(line.substr(2).c_str()) ; //_infos[(unsigned char)ch]._count += atoi(line.c_str()); line.clear();} else {line ='
';}} HuffmanTree ht; CharInfo invalid (0); ht.CreatTree(_infos, 256, invalid);//Rebuild the tree HuffmanTreeNode* root = ht.GetRootNode(); string UnCompressFile = filename; UnCompressFile += ".uncompress"; FILE* fIn = fopen (UnCompressFile.c_str(), "wb"); string CompressFile = filename; CompressFile += ".compress"; FILE* fOut = fopen(CompressFile.c_str(), "rb"); int pos = 8; HuffmanTreeNode* cur = root; ch=fgetc(fOut); while ((unsigned char)ch != EOF) //while(1) {--pos; if ((unsigned char)ch &(1 << pos)) {cur = cur->_right;} else {cur = cur->_left;} if (cur->_left == NULL && cur->_right == NULL) {fputc(cur->_weight._ch, fIn); cur = root; Charcount--;} if (p os == 0) {ch = fgetc(fOut); pos = 8;} if (Charcount==0) {break;}} fclose(fIn); fclose(fOut); fclose(outConfig); return true; }< /pre> 
 
 
 
 文件压缩代码FileCompress.h
 
 #define _CRT_SECURE_NO_WARNINGS 1#pragma once#include"HuffmanTree.h" #include#include #include #include using namespace std;typedef long long Longtype;//In order to expand its scope, the range that the int type can handle can no longer be satisfied, so the Long Long type is defined to represent struct CharInfo{ unsigned char _ch;//Here must be unsigned, otherwise it will cause Truncated, so adjusted from -128~127 to 0~255. Longtype _count; string _code; CharInfo(int count = 0) :_ch(0), _count(count) ,_code("") {} CharInfo operator+(CharInfo& file )//Overload + {CharInfo tmp; tmp._count = _count + file._count; return tmp;} bool operator <(CharInfo& file) const//Overload< {return _count class FileCompress{public: FileCompress() {for (int i = 0; i <256; ++i)//initialization {_infos[i] ._ch = i;}} bool Compress(const char* filename) {//1. Open the file and count the number of occurrences of file characters Longtype Charcount = 0; assert(filename); FILE* fOut = fopen(filename, "rb" );//Before using "r", there was a problem //"rb" reads the file in binary mode, where b is binary. "wb" means to write the file in binary mode assert(fOut); //The difference between opening in binary and text is: opening in text will convert 
 // to 
, which will not be like this in binary Conversion //char ch = fgetc(fOut); int ch = fgetc(fOut); while (ch != EOF) {_infos[(unsigned char)ch]._count++; ch = fgetc(fOut); Charcount++;} // 2. Generate the corresponding huffman code GenerateHuffmanCode(); //3. File compression string compressFile = filename; compressFile += ".compress"; FILE* fwCompress = fopen(compressFile.c_str(), "wb");// Binary write assert(fwCompress); fseek(fOut, 0, SEEK_SET); ch = fgetc(fOut); char inch = 0; int pos = 0; while (!feof(fOut)) {string& code = _infos[(unsigned char)ch]._code; for (size_t i = 0; i > 32, CountStr, 10); fputs(CountStr, fconfig); fputc('
', fconfig); _itoa(Charcount & 0xffffffff, CountStr, 10); //_itoa(Charcount & -1, CountStr, 10); fputs(CountStr, fconfig); fputc('
', fconfig); CharInfo invalid; for (int i = 0; i <256; i++) {if (_infos[i] != invalid) { /* fputc(_infos[i]._ch, fconfig); fputc(',', fconfig); fputc(_infos[i]._count + '0', fconfig); fputc('
', fconfig);* / infoStr=_infos[i]._ch; infoStr+=','; _itoa(_infos[i]._count, CountStr, 10); infoStr+=CountStr; infoStr+='
'; fputs(infoStr.c_str(),fconfig );}} fclose(fOut); fclose(fwCompress); fclose(fconfig); return true;} //Uncompress the file bool UnCompresss(const char* filename) {string configfile = filename; configfile += ".config" ; FILE* outConfig = fopen(configfile.c_st r(), "rb"); assert(outConfig); char ch=0; /*char ch;*/ Longtype Charcount = 0; string line = ReadLine(outConfig); Charcount = atoi(line.c_str()); Charcount <<= 32; line.clear(); line = ReadLine(outConfig); Charcount += atoi(line.c_str()); line.clear(); while (feof(outConfig)) //feof() To the end of the file, the function value is non-zero, otherwise it is 0. When the data is stored in binary form, the value of -1 may appear, //so the value of -1 (EOF) cannot be used as the eof() function to judge the end of the binary file at this time. {line = ReadLine(outConfig); if (!line.empty()) {ch = line[0]; _infos[(unsigned char)ch]._count += atoi(line.substr(2).c_str()) ; //_infos[(unsigned char)ch]._count += atoi(line.c_str()); line.clear();} else {line ='
';}} HuffmanTree ht; CharInfo invalid (0); ht.CreatTree(_infos, 256, invalid);//Rebuild the tree HuffmanTreeNode* root = ht.GetRootNode(); string UnCompressFile = filename; UnCompressFile += ".uncompress"; FILE* fIn = fopen (UnCompressFile.c_str(), "wb"); string CompressFile = filename; CompressFile += ".compress"; FILE* fOut = fopen(CompressFile.c_str(), "rb"); int pos = 8; HuffmanTreeNode* cur = root; ch=fgetc(fOut); while ((unsigned char)ch != EOF) //while(1) {--pos; if ((unsigned char)ch &(1 << pos)) {cur = cur->_right;} else {cur = cur->_left;} if (cur->_left == NULL && cur->_right == NULL) {fputc(cur->_weight._ch, fIn); cur = root; Charcount--;} if (p os == 0) {ch = fgetc(fOut); pos = 8;} if (Charcount==0) {break;}} fclose(fIn); fclose(fOut); fclose(outConfig); return true; }protected : string ReadLine(FILE* fOut) {assert(fOut); char ch = fgetc(fOut); if (feof(fOut)) {return 0;} string line; while (ch !='
') {line + = ch; ch = fgetc(fOut); if (feof(fOut)) break;} return line;} void GenerateHuffmanCode() {HuffmanTree hft; CharInfo invalid; hft.CreatTree(_infos, 256, invalid); _GenerateHuffmanCode (hft.GetRootNode()); }protected: void _GenerateHuffmanCode(HuffmanTreeNode* root)//Create Huffman Code {if (root == NULL) {return;} _GenerateHuffmanCode(root->_left); _GenerateHuffmanCode( root->_right); if (root->_left == NULL && root->_right == NULL) {HuffmanTreeNode* cur = root; HuffmanTreeNode* parent = cur->_parent; string& code = _infos [cur->_weight._ch]._code; while (parent) {if (parent->_left == cur)//Go left +0 { code += '0';} else if (parent->_right == cur)//Go right +1 {code += '1';} cur = parent; parent = cur->_parent;} //Find The coding starts from the leaf node. reverse(code.begin(), code.end());}} //Recursively implement Huffman code void _GenerateHuffmanCode_R(HuffmanTreeNode* root,string code)//Create Huffman code {if(root= =NULL) return; _GenerateHuffmanCode_R(root->_left,code+'0'); _GenerateHuffmanCode_R(root->_right,code+'1'); if(root->_left==NULL&&root->_right==NULL) {_infos[ root->_weight._ch]._code=code;} }private: CharInfo _infos[256];};void TestFileCompress(){ cout<<"One compression"< fc; cout << " Compressing the Input.txt file...." << endl; cout << "Compression time: "; int begin1 = GetTickCount();//Record start time fc.Compress("Input.txt");// int end1 = GetTickCount();// Record end time cout << end1-begin1 << endl << endl;//Compression time cout << "The Input.txt file is being unzipped..." << endl;; cout < <"Decompression time: "; int begin2 = GetTickCount(); fc.UnCompresss("Input.txt"); int end2 = GetTickCount(); cout << end2-begin2 << endl << endl;//Decompression time FileCompress  fc1; cout << "Input.BIG file is compressing..." << endl; cout << "When compressing: "; int begin3 = GetTickCount(); fc1.Compress("Input.BIG") ;// int end3 = G etTickCount();// cout << end3-begin3 << endl << endl; cout << "Input.BIG file is being decompressed..." << endl; cout << "When decompressing: "; int begin4 = GetTickCount(); fc1.UnCompresss("Input.BIG"); int end4 = GetTickCount(); cout << (end4-begin4 )<< endl; FileCompress fc2; cout << "Kangxi dictionary.txt file compression In..." << endl; cout << "Compression time: "; int begin5 = GetTickCount();//Record start time fc2.Compress("Kangxi dictionary.txt");// int end5 = GetTickCount( );// Record end time cout << end5-begin5 << endl << endl;//Compression time cout << "Kangxi dictionary.txt file is being unzipped..." << endl;; cout << "Unzipped When used: "; int begin6 = GetTickCount(); fc2.UnCompresss("Kangxi dictionary.txt"); int end6 = GetTickCount(); cout << end6-begin6 << endl << endl; // Decompression time}void TestFileCompressAgain ()//Secondary compression{ cout<<"secondary compression"< fc; cout << "Input.txt.compress is compressing..." << endl; cout << "Compression time:"; int begin1 = GetTickCount();//Record start time fc.Compress("Input.txt.compress");// int end1 = GetTickCount();// Record end time cout << end1- begin1 << endl << endl;//Compression time cout << "The Input.txt.compress file is being decompressed..." << endl;; cout << "When decompressing: "; int begin2 = GetTickCount(); fc.UnCompresss("Input.txt.compress"); int end2 = GetTickCount(); cout << end2-begin2 << endl << endl;//When decompressing FileCompress fc1; cout << "Input.BIG.compress is compressing..." << endl; cout << "When compressing: "; int begin3 = GetTickCount();/ /Record start time fc1.Compress("Input.BIG.compress");// int end3 = GetTickCount();// Record end time cout << end3-begin3 << endl << endl;//Compression time cout < <"The Input.BIG.compress file is being decompressed..." << endl;; cout << "When decompressing: "; int begin4 = GetTickCount(); fc1.UnCompresss("Input.BIG.compress"); int end4 = GetTickCount(); cout << end4-begin4 << endl << endl;//When decompressing }void TestFileCompressThree(){ cout<<"Three compressions"< fc; cout << " Input.BIG.compress.compress file is compressing..." << endl; cout << "Compression time: "; int begin5 = GetTickCount();//Record start time fc.Compress("Input.BIG.compress .compress");// int end5 = GetTickCount();// Record end time cout << end5-begin5 << endl << endl;//Compression time cout << "Input.BIG.compress.compress file is being unzipped ...." << endl;; cout << "When decompressing: "; int begin6 = GetTickCount(); fc.UnCompresss("Input.BIG.compress.compress"); int end6 = GetTickCount(); cout << end6-begin6 << endl << endl ;//When decompressing}void TestFileCompressFour(){ cout<<"Four times compression"< fc; cout << "Input.BIG.compress.compress.compress file is being compressed...." << endl; cout << "Compression time: "; int begin5 = GetTickCount();//Record start time fc.Compress("Input.BIG.compress.compress.compress");// int end5 = GetTickCount() ;// Record end time cout << end5-begin5 << endl << endl;//Compression time cout << "Input.BIG.compress.compress.compress file is being decompressed..." << endl;; cout << "When decompressing: "; int begin6 = GetTickCount(); fc.UnCompresss("Input.BIG.compress.compress.compress"); int end6 = GetTickCount(); cout << end6-begin6 << endl << endl;//When decompressing}void TestFileCompressFive(){ cout<<"Five Compression"< fc; cout << "Input.BIG.compress.compress.compress.compress File is being compressed... .." << endl; cout << "Compression time: "; int begin5 = GetTickCount();//Record start time fc.Compress("Input.BIG.compress.compress.compress. compress");// int end5 = GetTickCount();// Record end time cout << end5-begin5 << endl << endl;//Compression time cout << "Input.BIG.compress.compress.compress.compress The file is being unzipped..." << endl;; cout << "When unzipping: "; int begin6 = GetTickCount(); fc.UnCompresss("Input.BIG.compress.compress.compress.compress"); int end6 = GetTickCount(); cout << end6-begin6 << endl << endl;//When decompressing}void TestFileCompressPhoto(){ cout<<"Picture Compression"< fc; cout << "166 Compressing .jpg file...." << endl; cout << "When compressing: "; int begin5 = GetTickCount();//Recording start time fc.Compress("166.jpg");// int end5 = GetTickCount();// Record end time cout << end5-begin5 << endl << endl;//Compression time cout << "The 166.jpg file is being unzipped..." << endl;; cout << "Decompression time:"; int begin6 = GetTickCount(); fc.UnCompresss("166.jpg"); int end6 = GetTickCount(); cout << end6-begin6 << endl << endl;//Decompression time }void TestFileCompressVadio(){ cout<<"Video Compression"< fc; cout << "Busan line_hd.mp4 file is being compressed..." << endl; cout << "Compression time: "; int begin5 = GetTickCount();//Record start time fc.Compr ess("Busan line_hd.mp4");// int end5 = GetTickCount();// Record end time cout << end5-begin5 << endl << endl;//Compression time cout << "Busan line_ The hd.mp4 file is being decompressed..." << endl;; cout << "When decompressing: "; int begin6 = GetTickCount(); fc.UnCompresss("Busan line_hd.mp4"); int end6 = GetTickCount (); cout << end6-begin6 << endl << endl;//decompression time}
main function

#include "FileCompress.h"using namespace std;int main(){ TestFileCompress(); TestFileCompressAgain(); TestFileCompressThree(); TestFileCompressFour(); TestFileCompressFive(); TestFileCompressPhoto(); TestFileCompressVadio(); system("pause"); return 0;}
Analysis of compression results
1. Compress common files multiple times
 Time analysis

 
 
Correctness analysis 
The source file and the decompressed result are exactly the same
 
Size analysis
The source file and the decompressed result are exactly the same
 < /p> 
 
Because the file is too small, the compression size has not changed in the next few times.
 
2. Decompress image files and video files
 p> 
Time analysis
The video file size is 238354ms.
 
Analysis of image correctness
Source file and unzipped The result is exactly the same 
 
Analysis size
The source file and the decompressed result are exactly the same
 

< p>

Leave a Comment Cancel reply