![]() |
|
|
|
#1
|
|||
|
|||
|
[Solution] How to check whether data is compressed or ciphered
Some time ago, I asked myself if I can distinguish compressed data from enciphered. This can be useful in data and file analysis and some other cases. Maybe you will find this information useful also.
First, how we can theoretically answer whether data is packed or crypted? Honestly packed data blocks will not use all possible code combinations (or this data will be not unpackable), but best compressors do not use less than 0.01% of all possible code combinations. So, only statistics will help us in analyzing big amounts of data. Example: compressed data entropy for 8-bit elements: 0.999833729 ciphered data entropy for 8-bit elements: 0.999999867 As you can see, difference is entropy is almost zero, and it cannot be right criteria to distinguish blocks of data. For some blocks of data compressed data entropy will be almost the same, as ciphered. The right algorithm is to calculate chi-squared criteria for block of data. Compare with the same blocks of data: compressed data chi-squared 0.001830034 ciphered data chi-squared 0.000001432 Yes, here we got a 1300 times differing values! But why? Because ciphered (with good cipher) data will contain all possible code combinations, and compressed will not. This algorithm reveals these unused codes and makes such a difference. Ok, how to calculate 8-bit entropy and chi-square? Imagine, elements array has count of all bytes in data block (for 8-bit entropy), i.e. 0-th element has number of 0x00 bytes in block, 1-st - 0x01 and etc. Here is pseudocode for calculating entropy value: Code:
long double GetEntropy(unsigned int bits)
{
unsigned int i;
long double result, temp;
result = 0.0;
for (i = 0; i < (1UL << bits); i++)
{
if (elements[i] == 0)
continue;
temp = (long double)elements[i] / quantity;
temp *= log(temp) / log(2);
result += temp;
}
return -result / (long double)bits;
}
Code:
long double GetChiSquared(int bits)
{
unsigned int i;
long double result, temp;
result = 0.0;
for (i = 0; i < (1UL << bits); i++)
{
temp = (long double)quantity / (long double)(1ul << bits);
result += ((long double)elements[i] - temp) * ((long double)elements[i] - temp) / temp;
}
return result / quantity;
}
Happy coding
|
|
#2
|
|||
|
|||
|
Here is another article about Entropy. This guy's approach is different
http://gynvael.coldwind.pl/?id=162 Perhaps you guys could collaborate to improve his tool to detect compressed and encrypted entropies |
![]() |
| Tags |
| chi-square, entropy |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Removing UPX protection? (compressed file) | Rhodium | General Discussion | 4 | 08-11-2003 19:50 |
| Help Me - CRC Check and FileSize Check | byvs | General Discussion | 11 | 07-31-2003 13:32 |