Page 1 of 2
Background script (crontab) causing heavy load
Posted: Tue Jan 09, 2007 6:46 am
by Yaniv
Hi everybody,
I have a situation here. I have a script that runs as a crontab every half an hour that is meant to do some background website operations. The script runs fine and the crontab works as it should.
The problem is that the system admin reported to me that the script is taking quite a heavy load on the server (this is a shared host).
I tried "nice-ing" the script by lowering its priority but that doesn't seem to work. I have reports that the script takes about 60% of the server's processor which is really bad. These background operations must be made.
Is there another PHP way of making the script run slower? (other than making sleep commands manually). I do not mind the script running slower because these are background tasks.
Thanks a lot!
Posted: Tue Jan 09, 2007 8:25 am
by feyd
What sort of operations is it carrying out? Typically a 60% hit over a very short period of time isn't a problem, but if your script does this over say a second, it can cause issues with your hosting. What I'm getting at is that your script may benefit greatly from optimizations that not only help shorten the time requirement for it to run but also lighten the server load.
Have you considered dedicated hosting? I often view the need for crontab as a signal that one should probably move to a hosting solution where running heavy operations isn't an issue.
Posted: Tue Jan 09, 2007 8:35 am
by Yaniv
It's basically background statistics calculations that can't be made by a user triggering a page load because some calculations are user independent.
The problem is that no matter how efficient the script may get the more data there is on the database the longer it will take for the script to run. This is a behavior I am actually expecting. I am expecting the script to eventually get to a few minutes of operation. That is why this is an issue.
The script is not time critical that is why I would like to simply have the solution of slowing it down.
Dedicated hosting is not an option yet due to current low budget. In the future this switch will definitely be made.
Posted: Tue Jan 09, 2007 8:40 am
by feyd
You probably should seek dedicated hosting now, rather than later. Sleep() and usleep() are pretty much your only other solution until you can provide us with more information.
There may be alternate or more optimized solutions to these statistical operations. If you could tell us more about these statistics we may be able to offer help in fixing up the code.
Posted: Tue Jan 09, 2007 8:55 am
by Yaniv
It's kind of similar in concept to Amazon's "Other users who bought this book also bought these books" kind of script but not exactly the same. The script is also dependent on a certain profile each user has. So if the user updates the profile (the profile being something I can't say here, sorry) all objects the user is associated with must be updated as well. The profile is not just texts it's another table associated to other data.
So I can't simply update the statistics whenever a user does an operation on an object because if that user later decides to change his profile all associated objects statistics must be updated. That kind of operation will take too long for doing it on a page load.
I'm really sorry for not being able to say too much about the insides of the project because I am obligated for secrecy by contract. I will try to explain as much as I can.
Thanks
Posted: Tue Jan 09, 2007 9:18 am
by feyd
So I'm guessing it's the JOIN that's taking so long? Have you considered caching the information? For example, when an order is placed a record is created for each line item and the order id in this cache table. When you are pulling up an item you can join from this table to the order line item table to find other item id's using a distinct and order by rand() with a limit. You can also do the same with a self join on the cache table.
Posted: Tue Jan 09, 2007 9:53 am
by Yaniv
I think my previous explanation wasn't too great. Here's a better shot at it:
The update algorithm generally works like so:
Three main tables:
Objects
ObjectsUsers
Users
UsersChoices
Choices
Code: Select all
objects = (Retrieve list of Objects from DB)
foreach(objects)
{
users = (retrieve list of users who are linked to this object)
foreach(users)
{
profile = (retrieve list of user choices for this user)
(count each choice to a large list of choices statistics; each choice +1 if a user chose it)
}
}
(Send the choices statistics back to the database)
This is just one of the statistics the system must carry. But it's the main one.
Posted: Tue Jan 09, 2007 5:49 pm
by Yaniv
A buddy of mine gave me the idea of using sub-queries to get this done faster on the MySQL server than on PHP. I'm gonna try and figure out how to do this as SQL. My biggest concern is doing the profile counting but I assume it's possible.
Yaniv!
Posted: Tue Jan 09, 2007 9:50 pm
by ykarmi
Ma kore gever gever?
Me eigo ata?
Posted: Wed Jan 10, 2007 6:39 am
by Yaniv
I'm having some trouble working out the SQL for this.
The thing is I have to calculate the number of appearances of each choice for each object.
As in get each user's "UserChoices" and calculate that with "ObjectsUsers". ObjectsUsers also contains a numeric value that says how the user rated this object (+# or -#)
This is how the tables connect to each other.
Objects <-> ObjectsUsers <-> Users <-> UsersChoices <-> Choices
And I want to do all that in pure SQL if possible.
Can anyone help? Or at least turn me to some information about such complex queries? Thanks!
Posted: Wed Jan 10, 2007 8:17 am
by Grim...
I'm ready to help you out but I'm still not sure what you want.
How about mocking up some fake SQL tables for us to work with, but changing the text and any give-away field names.
Posted: Wed Jan 10, 2007 8:24 am
by Grim...
If I understand you correctly, you might need something like this:
Code: Select all
SELECT SUM(IF(choices.id!='',1,0)) AS choice_count FROM objects
LEFT JOIN objectsusers ON objectsusers.object_id = objects.id
LEFT JOIN users ON users.objectsuser_id = objectsusers.id
LEFT JOIN userschoices ON userschoices.user_id = users.id
LEFT JOIN choices ON choices.userschoice_id = choices.id
GROUP BY objects.id
Posted: Wed Jan 10, 2007 8:59 am
by Yaniv
This is generally how the tables are laid out:
Code: Select all
CREATE TABLE `object`
(
`id` INTEGER NOT NULL AUTO_INCREMENT COMMENT '',
PRIMARY KEY (`id`),
)Type=MyISAM;
CREATE TABLE `object_user`
(
`user_id` INTEGER NOT NULL COMMENT '',
`object_id` INTEGER NOT NULL COMMENT '',
`rating` INTEGER NOT NULL COMMENT 'Visit rating',
)Type=MyISAM;
CREATE TABLE `user`
(
`id` INTEGER NOT NULL AUTO_INCREMENT COMMENT '',
PRIMARY KEY (`id`),
)Type=MyISAM;
CREATE TABLE `user_choice`
(
`user_id` INTEGER NOT NULL COMMENT '',
`choice_id` INTEGER NOT NULL,
PRIMARY KEY (`user_id`,`choice_id`),
CONSTRAINT `user_choice_FK_1`
FOREIGN KEY (`user_id`)
REFERENCES `user` (`id`),
INDEX `user_choice_FI_2` (`choice_id`),
CONSTRAINT `user_choice_FK_2`
FOREIGN KEY (`choice_id`)
REFERENCES `choice` (`id`)
)Type=MyISAM;
CREATE TABLE `choice`
(
`id` INTEGER NOT NULL AUTO_INCREMENT COMMENT '',
PRIMARY KEY (`id`),
)Type=MyISAM;
Now I want to SUM the number of of choices each object has according to the `object_user` table, while considering the `rating` field in `object_user`.
Meaning that if I have this data:
Code: Select all
`object`
| id |
-----------
| 1 |
`object_user`
| user_id | object_id | rating |
--------------------------------
| 1 | 1 | 1 |
| 1 | 1 | 1 |
| 1 | 1 | -1 |
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 2 | 1 | 1 |
| 2 | 1 | 1 |
`user`
| id |
-----------
| 1 |
| 2 |
`user_choice`
| user_id | choice_id |
-----------------------
| 1 | 1 |
| 1 | 2 |
| 2 | 2 |
| 2 | 3 |
`choice`
| id |
-----------
| 1 |
| 2 |
| 3 |
The result for object 1 should be
Code: Select all
| choice_id | result |
------------------------
| 1 | 2 | (because only user 1 chose choice 1)
| 2 | 5 | (because both users chose choice 2)
| 3 | 3 | (because only user 2 chose choice 3)
I hope this helps.
(another thing)
It will be best to get the result for all tables together. Meaning something like this:
Code: Select all
|bubble_id | choice_id | result |
------------------------
| 1 | 1 | 2 |
| 1 | 2 | 5 |
| 1 | 3 | 3 |
| 2 | X | Y |
It will increase the code speed by alot. But I can't figure out how to do this.
Posted: Wed Jan 10, 2007 9:16 am
by Grim...
So it gets one 'point' for each user that chose it, plus whatever ratings it has?
Posted: Wed Jan 10, 2007 9:24 am
by programmingjeff
Have you considered using Stored Procedures? They are fairly new in MySQL...