Skip to main content
No Access

The use of genetic programming for adaptive text compression

Published Online:pp 88-108

This paper exploits a modified genetic programming (GP) approach for solving the data compression problem. In fact, the typical GP algorithm in which a candidate solution is expressed as a tree rather than a bit string, fails to solve that problem since it does not guarantee a one to one correspondence between a particular symbol and the corresponding codeword during subtree exchange operations. The nature of the problem requires generating one, and only one, codeword for each symbol of the underlying text. In the proposed scheme, the authors introduced three new operators, namely, insertion, two-level mutation and modified crossover. Accordingly, a modified version of GP is presented and applied on different data texts to validate the proposed approach. The developed algorithm can provide optimum codes since its final solution reaches Huffman tree. Moreover, it makes use of GP not only to allow optimum compression ratio but also to provide adaptive compression implementation. The adaptation is achieved so that the selection of the codebook depends on the nature of the input text. The proposed compression scheme is written in C++ and is implemented on different text types under various operational conditions. Accordingly, the algorithm performance has been measured and evaluated.


Huffman code, genetic programming, adaptive text compression, data compression, lossless compression, alphabet, Arabic language