![]() |
Tool to scan files for common byte sequences
I am looking for a tool that loads a set of files and will find common byte sequences between them. Does such a tool exist?
For example, if each file contains the sequence 0x01 0x02 0x03 0x04 0x05, then the tool will find this common string and print it. |
Quote:
http://gnuwin32.sourceforge.net/packages/gsar.htm https://wingrep.codeplex.com/ https://www.fileseek.ca/ Or you can just use Notepad++ and use the "Find in Files" menu option. |
This is for searching for a given string.
I paste a screenshot of the prototype here: https://i.imgur.com/8IxxjE6.png. It shows that the string 0x00 0x04 0x00 0xE8 0x02 0x00 is common to 8 files out of the sample set. And here it is, viewed in a hex editor: https://i.imgur.com/I06WEu7.png. |
Quote:
Would something like this work? Code:
#include <stdio.h> |
Quote:
|
Last ditch effort:
http://www.vxsearch.com/search_files_by_binary_patterns.html Windows app, 30-day trial download: http://www.vxsearch.com/downloads.html |
Yes, i made a prototype. But it turns out such a tool is of no use to anyone, so i will not continue to develop it.
I think the idea is quite simple, but it seems not many people understood. Binary (or text) difference tools that compare a pair of files is not really the same at all. The prototype I created can compare any number of files and find a string that is present in all of them, up to some desired length. |
The problem with this task - e.g. common substrings problem, is that is a high complexity so that it requires a lot of difficult heuristic tricks to get it below O(n^2) otherwise it is too slow or uses too much memory. I have not seen any tools to do this. It would work with pictures or videos or audio as well - to find matching image sections, video subclips, etc. But really, it would be quite useful. I am quite certain we are talking an NP-hard problem please see:
Quote:
And there are proofs I believe that shortest common substring is NP-hard. See for example Quote:
|
Quote:
|
Quote:
|
Grep is just looking for regex's so its complexity is that of pattern matching of regex's. Now you are asking a very general and arbitrary common substring problem. They are not the same issue really at all.
This would be very useful, but it has a really problematic size vs speed tradeoff and would need some kind of limiting parameters like you are getting at. The NP-hard issue can be side stepped through heuristics and domain specific approach. Nonetheless, I doubt you will find such a tool for general cases. |
Quote:
I believe I read that the metasploit framework included some heuristics for this kind of search, but I could find no specific tool. I too agree that a tool like this would be very useful. |
Quote:
|
Tool to scan files for common byte sequences
Similar problem solve the archivers.
That if try to take out the algorithm from some open-source archiver? |
Yes, I figured the problem was similar to building a dictionary of common sequences, which you'd then substitute with shorter codes corresponding to the dictionary entries.
As we discussed, it doesn't sound like a perfect solution is possible, but some heuristics would work. You mentioned compression which would do exactly this kind of operation - pick your favourite algorithm. (I won't make the code available for my tool, since it was a rushed prototype and I don't think there any chance of anyone getting all the necessary libs to compile it.) |
| All times are GMT +8. The time now is 17:10. |
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2026, vBulletin Solutions, Inc.
Always Your Best Friend: Aaron, JMI, ahmadmansoor, ZeNiX