![]() |
|
|
|
#1
|
|||
|
|||
|
Yes, i made a prototype. But it turns out such a tool is of no use to anyone, so i will not continue to develop it.
I think the idea is quite simple, but it seems not many people understood. Binary (or text) difference tools that compare a pair of files is not really the same at all. The prototype I created can compare any number of files and find a string that is present in all of them, up to some desired length. |
| The Following User Says Thank You to dila For This Useful Post: | ||
Stingered (01-22-2018) | ||
|
#2
|
|||
|
|||
|
The problem with this task - e.g. common substrings problem, is that is a high complexity so that it requires a lot of difficult heuristic tricks to get it below O(n^2) otherwise it is too slow or uses too much memory. I have not seen any tools to do this. It would work with pictures or videos or audio as well - to find matching image sections, video subclips, etc. But really, it would be quite useful. I am quite certain we are talking an NP-hard problem please see:
Quote:
And there are proofs I believe that shortest common substring is NP-hard. See for example Quote:
|
|
#3
|
|||
|
|||
|
Quote:
|
|
#4
|
|||
|
|||
|
Quote:
|
|
#5
|
|||
|
|||
|
Grep is just looking for regex's so its complexity is that of pattern matching of regex's. Now you are asking a very general and arbitrary common substring problem. They are not the same issue really at all.
This would be very useful, but it has a really problematic size vs speed tradeoff and would need some kind of limiting parameters like you are getting at. The NP-hard issue can be side stepped through heuristics and domain specific approach. Nonetheless, I doubt you will find such a tool for general cases. |
| The Following User Says Thank You to chants For This Useful Post: | ||
Stingered (01-23-2018) | ||
|
#6
|
|||
|
|||
|
Quote:
I believe I read that the metasploit framework included some heuristics for this kind of search, but I could find no specific tool. I too agree that a tool like this would be very useful. |
| The Following User Says Thank You to Stingered For This Useful Post: | ||
dila (01-23-2018) | ||
|
#7
|
|||
|
|||
|
Quote:
|
|
#8
|
|||
|
|||
|
Tool to scan files for common byte sequences
Similar problem solve the archivers.
That if try to take out the algorithm from some open-source archiver? |
| The Following User Says Thank You to dosprog For This Useful Post: | ||
dila (02-16-2018) | ||
|
#9
|
|||
|
|||
|
Yes, I figured the problem was similar to building a dictionary of common sequences, which you'd then substitute with shorter codes corresponding to the dictionary entries.
As we discussed, it doesn't sound like a perfect solution is possible, but some heuristics would work. You mentioned compression which would do exactly this kind of operation - pick your favourite algorithm. (I won't make the code available for my tool, since it was a rushed prototype and I don't think there any chance of anyone getting all the necessary libs to compile it.) |
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Is there any tool to replace the files packed in the NullSoft Install System package? | BlackWhite | General Discussion | 4 | 09-02-2018 00:27 |