![]() |
|
|
|
#1
|
|||
|
|||
|
Tool to scan files for common byte sequences
I am looking for a tool that loads a set of files and will find common byte sequences between them. Does such a tool exist?
For example, if each file contains the sequence 0x01 0x02 0x03 0x04 0x05, then the tool will find this common string and print it. |
|
#2
|
|||
|
|||
|
Quote:
http://gnuwin32.sourceforge.net/packages/gsar.htm https://wingrep.codeplex.com/ https://www.fileseek.ca/ Or you can just use Notepad++ and use the "Find in Files" menu option. Last edited by Stingered; 01-22-2018 at 02:31. Reason: link change |
|
#3
|
|||
|
|||
|
This is for searching for a given string.
I paste a screenshot of the prototype here: https://i.imgur.com/8IxxjE6.png. It shows that the string 0x00 0x04 0x00 0xE8 0x02 0x00 is common to 8 files out of the sample set. And here it is, viewed in a hex editor: https://i.imgur.com/I06WEu7.png. |
|
#4
|
|||
|
|||
|
Quote:
Would something like this work? Code:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
#define BUF_SIZE 65536
int getnibble(char c)
{
c = toupper(c);
return (c > '9' ? c - 'A' + 10 : c - '0');
}
void main(int argc, char** argv)
{
if (argc != 3)
{
printf(
"Usage:\n"
"%s <filename> <hex>\n",
argv[0]
);
return;
}
char* filename = argv[1];
char* hexchars = argv[2];
int len = strlen(hexchars);
if (len % 2)
{
printf("Error: Odd number of hex chars\n");
return;
}
len /= 2; // len = number of bytes in pattern
// parse hexchars to real bytes
char* pattern = (char*)malloc(len);
char* p = pattern;
while (*hexchars)
{
int h = getnibble(*hexchars++);
int l = getnibble(*hexchars++);
if (h > 16 || l > 16)
{
printf("Error: invalid hex\n");
free(pattern);
return;
}
*p++ = (h << 4) + l;
}
// Open the file
FILE* f = fopen(filename, "rb");
if (f)
{
char* buf = (char*)malloc(BUF_SIZE);
// we want to read in less than the whole buffer each time to avoid
// missing the needle when it's halfway across a boundary
int readsize = BUF_SIZE - len;
int amtread;
int offset = 0;
char* p; // search result
int bytessearched; // how many bytes we've already searched in this block
// read in the first block in full
amtread = fread(buf, 1, BUF_SIZE, f);
while (amtread != 0)
{
// search for the start byte
bytessearched = 0;
while ((p = (char*)memchr(buf + bytessearched, *pattern, amtread - len - bytessearched)) != NULL)
{
if (memcmp(p, pattern, len) == 0)
{
printf("Found at %x\n", offset + p - buf);
}
bytessearched = p - buf + 1;
}
// copy the tail of the buffer over the head
memmove(buf, buf + BUF_SIZE - len, len);
// read in the next block
amtread = fread(buf + len, 1, BUF_SIZE - len, f);
offset += BUF_SIZE - len;
}
free(buf);
}
fclose(f);
free(pattern);
}
|
|
#5
|
|||
|
|||
|
Quote:
|
|
#6
|
|||
|
|||
|
Last ditch effort:
http://www.vxsearch.com/search_files_by_binary_patterns.html Windows app, 30-day trial download: http://www.vxsearch.com/downloads.html |
|
#7
|
|||
|
|||
|
Yes, i made a prototype. But it turns out such a tool is of no use to anyone, so i will not continue to develop it.
I think the idea is quite simple, but it seems not many people understood. Binary (or text) difference tools that compare a pair of files is not really the same at all. The prototype I created can compare any number of files and find a string that is present in all of them, up to some desired length. |
| The Following User Says Thank You to dila For This Useful Post: | ||
Stingered (01-22-2018) | ||
|
#8
|
|||
|
|||
|
The problem with this task - e.g. common substrings problem, is that is a high complexity so that it requires a lot of difficult heuristic tricks to get it below O(n^2) otherwise it is too slow or uses too much memory. I have not seen any tools to do this. It would work with pictures or videos or audio as well - to find matching image sections, video subclips, etc. But really, it would be quite useful. I am quite certain we are talking an NP-hard problem please see:
Quote:
And there are proofs I believe that shortest common substring is NP-hard. See for example Quote:
|
|
#9
|
|||
|
|||
|
Quote:
|
|
#10
|
|||
|
|||
|
Quote:
|
|
#11
|
|||
|
|||
|
Grep is just looking for regex's so its complexity is that of pattern matching of regex's. Now you are asking a very general and arbitrary common substring problem. They are not the same issue really at all.
This would be very useful, but it has a really problematic size vs speed tradeoff and would need some kind of limiting parameters like you are getting at. The NP-hard issue can be side stepped through heuristics and domain specific approach. Nonetheless, I doubt you will find such a tool for general cases. |
| The Following User Says Thank You to chants For This Useful Post: | ||
Stingered (01-23-2018) | ||
|
#12
|
|||
|
|||
|
Quote:
I believe I read that the metasploit framework included some heuristics for this kind of search, but I could find no specific tool. I too agree that a tool like this would be very useful. |
| The Following User Says Thank You to Stingered For This Useful Post: | ||
dila (01-23-2018) | ||
|
#13
|
|||
|
|||
|
Quote:
|
|
#14
|
|||
|
|||
|
Tool to scan files for common byte sequences
Similar problem solve the archivers.
That if try to take out the algorithm from some open-source archiver? |
| The Following User Says Thank You to dosprog For This Useful Post: | ||
dila (02-16-2018) | ||
|
#15
|
|||
|
|||
|
Yes, I figured the problem was similar to building a dictionary of common sequences, which you'd then substitute with shorter codes corresponding to the dictionary entries.
As we discussed, it doesn't sound like a perfect solution is possible, but some heuristics would work. You mentioned compression which would do exactly this kind of operation - pick your favourite algorithm. (I won't make the code available for my tool, since it was a rushed prototype and I don't think there any chance of anyone getting all the necessary libs to compile it.) |
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Is there any tool to replace the files packed in the NullSoft Install System package? | BlackWhite | General Discussion | 4 | 09-02-2018 00:27 |