Page 1 of 1

Using PHP to sort through a file and echo matches

Posted: Wed Jul 20, 2011 9:13 am
by GoQuickly
Hello everyone,

First time I have ever ran into a php issue while building a script for my business. :roll:

Here is what I need the script to do.

I will download and overwrite the file every day when Snapnames releases their list of expiring .com's

https://www.snapnames.com/file_dl.sn?file=snpdlist.zip

I then want the script to open said txt file on the server and reference a keyword.txt list.

PHP will then parse any line item and echo it containing a keyword found in my (keyword.txt) list.

How hard is this to achieve.

Re: Using PHP to sort through a file and echo matches

Posted: Wed Jul 20, 2011 12:21 pm
by McInfo
Download the file with file_get_contents() or cURL. Uncompress it with zip functions.

Are you trying to match whole domains, domains that begin with certain keywords, or domains that contain certain keywords anywhere? The first two would be relatively easy to implement with very little memory if your keywords are sorted. Depending on the size of your keyword list, the third may require a little more creativity since the domain list contains hundreds of thousands of domains.

Re: Using PHP to sort through a file and echo matches

Posted: Thu Jul 21, 2011 4:29 pm
by GoQuickly
What I was going to do to avoid the zip portion was just download the file daily and FTP the .txt that's included.

Ultimately if I had a list that was like this from the .zip

RentHouston.com
Rentersinsureance.com
parentingclassbook.com
Reno911.com
Garino200.com
Gareno200.com

and a keyword list of:
houston
reno

for it to process and only post:

RentHouston.com
Reno911.com
Gareno200.com

Re: Using PHP to sort through a file and echo matches

Posted: Thu Jul 21, 2011 6:34 pm
by McInfo
The text file is 12 times bigger than the Zip archive. Whatever time you spend writing code to uncompress the file on the server will surely be earned back over the lifetime of the script.

I'm still not sure how many keywords you want to match. If it's not too many, you can put them all in an array (hint: file() -- be sure to ignore newlines) to be compared to each line in the domain list file. But remember, in the worst-case scenario, each of the keywords will be compared to each of the domains. If there are 20 keywords and 350 thousand domains, that is seven million comparisons. The script could take a while.

You will need to create a file resource with fopen() for reading the domain list file. (Don't try to put all the lines in an array. That would hog too much memory.) Write a loop that reads a line from the file. Shorten the line so only the domain part remains (hint: strstr() -- every domain in the file is followed by a space character). Search the domain for each of the keywords (hint: strpos()). If you find one keyword in the domain, you don't need to search for the other keywords. Finally, close the file.