View Single Post
  #8  
Old 10-08-2003, 08:50
Satyric0n
 
Posts: n/a
This is an algorithm that you will find in compression code (LZW especially), used for building the dictionary in an archive. The pattern finding routine for compression algos typically search for patterns in one file, but it should be easy to modify the code to search for distinct patterns across two files. Try looking at the source for ZLib, such code could be found there.

Writing an algorithm to do this is extremely simple, especially if you aren't concerned about speed. If it were acceptable for the code to take several minutes to execute, it would only take a very small amount of code to identify patterns in the way you describe. If you want the code to run quickly (under 20 seconds, lets say), this process doesn't really become any more complex, it usually just involves constraining the algorithm to only search for patters with lengths within a defined range (instead of patterns of any length). But, then you potentially miss patterns outside the length range you specify, which would generally be a bad thing. You see this in compression algos as a "dictionary size" limit. Really, the larger your allowed dictionary size, the better compression you can theoretically get, but the slower the compression. Same with your situation.

Either way, just think about the process for a few minutes, and you should be able to write the algo yourself without using someone elses. It is really a simple process.

Last edited by Satyric0n; 10-08-2003 at 09:06.
Reply With Quote