PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!
What is the easiest way to compare two very large text files (10+ MB) and show the lines that match? They could be dumped to the display or even better exported to a 3rd text file.
I've searched and searched and can't find a way to do this. I've found an easy way to remove the duplicate lines and create one unique file but not to show only the duplicate lines.
Any help would be appreciated and sorry if this is a newbie question I've searched and banged my head against the wall trying to find a way to do this and I know it has to be something very simple that I'm missing.
I thought about that but if the two files are fairly different you would have to compare row1 of file1 to every row of file2 and so on. Right?
Currently I combined both files and use this to remove the duplicate lines. It's very easy and fast but it doesn't tell me what the dupes are. I'd like to know that somehow.
This would be much easier if there was simply an opposite version of the array_unique() function. Say, array_duplicate().
Anyone have an idea? Let's scrap the two file format. Just one big text file, go through line by line, and export a list of lines that appear more than once within the same file. Any ideas how to do that?
domainguy wrote:This would be much easier if there was simply an opposite version of the array_unique() function. Say, array_duplicate().
You could always use array_unique to find the unique ones, then use array_diff() to compare the array of unique entries to the original array... that'd give you any that aren't unique eg the dupes.
onion2k wrote:You could always use array_unique to find the unique ones, then use array_diff() to compare the array of unique entries to the original array... that'd give you any that aren't unique eg the dupes.
Someone else mentioned doing it this way. Can you or someone else tell me how to incorporate this into my script? Sorry, I'm really new at this so I'm sure it's something easy but I'm lost.
There will be a "little" problem with this code - it needs at least 3 x file_size memory. As mentioned in the OP, it means 30MB+ memory. So, if it's used on a shared hosting, most probably, it will not work.
A memory friendly (while it's more I/O intensive) solution would be to use fgets().
There are 10 types of people in this world, those who understand binary and those who don't