![]() |
Tool to scan files for common byte sequences
I am looking for a tool that loads a set of files and will find common byte sequences between them. Does such a tool exist?
For example, if each file contains the sequence 0x01 0x02 0x03 0x04 0x05, then the tool will find this common string and print it. |
Quote:
http://gnuwin32.sourceforge.net/packages/gsar.htm https://wingrep.codeplex.com/ https://www.fileseek.ca/ Or you can just use Notepad++ and use the "Find in Files" menu option. |
This is for searching for a given string.
I paste a screenshot of the prototype here: https://i.imgur.com/8IxxjE6.png. It shows that the string 0x00 0x04 0x00 0xE8 0x02 0x00 is common to 8 files out of the sample set. And here it is, viewed in a hex editor: https://i.imgur.com/I06WEu7.png. |
Quote:
Would something like this work? Code:
#include <stdio.h> |
Quote:
|
Last ditch effort:
http://www.vxsearch.com/search_files_by_binary_patterns.html Windows app, 30-day trial download: http://www.vxsearch.com/downloads.html |
Yes, i made a prototype. But it turns out such a tool is of no use to anyone, so i will not continue to develop it.
I think the idea is quite simple, but it seems not many people understood. Binary (or text) difference tools that compare a pair of files is not really the same at all. The prototype I created can compare any number of files and find a string that is present in all of them, up to some desired length. |
The problem with this task - e.g. common substrings problem, is that is a high complexity so that it requires a lot of difficult heuristic tricks to get it below O(n^2) otherwise it is too slow or uses too much memory. I have not seen any tools to do this. It would work with pictures or videos or audio as well - to find matching image sections, video subclips, etc. But really, it would be quite useful. I am quite certain we are talking an NP-hard problem please see:
Quote:
And there are proofs I believe that shortest common substring is NP-hard. See for example Quote:
|
Quote:
|
Quote:
|
Grep is just looking for regex's so its complexity is that of pattern matching of regex's. Now you are asking a very general and arbitrary common substring problem. They are not the same issue really at all.
This would be very useful, but it has a really problematic size vs speed tradeoff and would need some kind of limiting parameters like you are getting at. The NP-hard issue can be side stepped through heuristics and domain specific approach. Nonetheless, I doubt you will find such a tool for general cases. |
Quote:
I believe I read that the metasploit framework included some heuristics for this kind of search, but I could find no specific tool. I too agree that a tool like this would be very useful. |
Quote:
|
Tool to scan files for common byte sequences
Similar problem solve the archivers.
That if try to take out the algorithm from some open-source archiver? |
Yes, I figured the problem was similar to building a dictionary of common sequences, which you'd then substitute with shorter codes corresponding to the dictionary entries.
As we discussed, it doesn't sound like a perfect solution is possible, but some heuristics would work. You mentioned compression which would do exactly this kind of operation - pick your favourite algorithm. (I won't make the code available for my tool, since it was a rushed prototype and I don't think there any chance of anyone getting all the necessary libs to compile it.) |
IMHO,
Such tool, if maked, It will not be very useful because of too many matches in files. Redundancy of results.. However, the very formulation of the problem is interesting. See old tool for similar purpose, but it searches only for 'string' duplicates inside one specially prepared [text or bin] file: -> Dup_Str <- - This tool does not prints line numbers of duplicated strings, - it prints only its duplicated strings. It maked Q&D quite a long time ago, but used sometimes without changes. Originally this tool was made to find duplicates of lexemes in the dictionary.. |
| All times are GMT +8. The time now is 16:43. |
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2026, vBulletin Solutions, Inc.
Always Your Best Friend: Aaron, JMI, ahmadmansoor, ZeNiX