Is this the best way to handle CSV's and processing images?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
ChrisF79
Forum Commoner
Posts: 26
Joined: Tue Apr 01, 2008 8:26 pm

Is this the best way to handle CSV's and processing images?

Post by ChrisF79 »

Every night, I download 5 zipped CSV files using PHP and unzip them into a /processing directory. Every row of the CSV is one ACTIVE (that’s important) real estate listing. So, you list your house for sale today and tonight it shows up in the CSV. Problem is, if your house sells tomorrow, then the next day it simply isn’t in the file anymore. Somewhere, I have to do a check between the CSV and what’s in my database and mark some listings as inactive.

Also in that record is a field that is the URL for the first picture of your home and a field that says how many total pictures there are. The idea is that if the url is blah.com/pic.jpg, the next ends in _2.jpg, _3.jpg and so on. That makes it easy to download them.

Here’s the challenge. I need to download every picture and resize them, then reupload them to Amazon S3 and that could take 20 seconds per listing. There are roughly 24k listings today.

What I did is I made a `listings` table that will hold all of the images have been downloaded, resized and uploaded to S3. I made a `listings_queue` table that i’ll put the CSV’s into after downloading them so I can work from that.

Here’s my thought on how the process could work:

1. Download the zipped CSV's, unzip and upload the records to the ‘listings_queue’ table.
2. Run a query that marks any record in the `listings` table that isn’t in `listings_queue` as inactive (it must have sold).

A second process would then run:

1. Select the first record in `listings_queue`, download all of the images to a directory on the server.
2. Resize them as needed
3. Upload them to Amazon S3 for permanent storage.
4. Move the record from `listings_queue` to `listings` and update the `listings.active` field as 1 for active.

Can you foresee any problems with this approach or can you think of a better way of doing this?
Post Reply