Page 1 of 1

Selecting median information from a mysql db

Posted: Tue Mar 21, 2006 8:03 am
by oeb
Using PHP 4 and MySQL

Hello, I have a weird little issue and I was wondering what is the best way to go about this.

I am working with two tables in a database. One it a list of items (Few thousand rows) and the other a list of transactions (Of the various items). This is the logging system for an online games mall.
Each transaction represents a purchace of 1 or more items at price x (For example if you buy 32 of item x from me at $134 it will appear as one transaction (With a quantity of 32 and a price of 134) and if you purchase 1 of the item off me a single row is also generated. (The players can set whatever price they want for the items, so item x might sell for $100 from one player and $5000 from another.)

This table is quite large, recieving in excess of 50000 new transactions daily and presently contains about 7gb of data.

Now, what we are currently doing with this information is generating mean pricing data using the mysql sum() function. While this works fine, it does not give an acurate picture of the games economy, so what we really need is median data.

How I would normally go about this. (Presuming 1 item per row sold) (Psudo code)

Foreach item
select todays sales for the item from the db ordering by price
get number of results returned

if odd (results)
get middle price
else if even (results)
get mean price of middle 2
end if

end foreach

Simple enough, although it requires a few too many querys for my liking.

BUT since each row contains multiple purchase of the individual items what I think I would have to do is pull each item into a big array (Every element in the array representing a single item sold) sorting that array and splitting it up there to find the median entrys.

I can see this as being a problem as I cant imagine this would be very fast or very effeciant memory wise.

Can anyone else think of an alternate solution?

Please note, we have no control over how the information is stored, but we can install UDFs if need be.

Regards,
oeb

Posted: Tue Mar 21, 2006 10:24 am
by feyd
it would seem that using standard deviation would help you find the bulk of the price population.

STD()

Posted: Thu Mar 23, 2006 9:04 am
by oeb
I have been digging through the mysql manual and other google trying to find out how this works. What do I pass to the STD() function to gain the standard deviation? The whole thing seems very unclear.

Sorry for being a little thick, statistics was never my strong point.

Posted: Thu Mar 23, 2006 10:09 am
by feyd

Code: Select all

STD(`qty` * `price`), STD(`qty`), STD(`price`)
To get the range for the median, pull the AVG() (average) then add and subtract the deviation.

Code: Select all

AVG(`qty` * `price`) + STD(`qty` * `price`) as `Upper`, AVG(`qty` * `price`) - STD(`qty` * `price`) as `Lower`
for example.

Posted: Thu Mar 23, 2006 10:19 am
by oeb
Would extreme values through this off though?

Part of the problem with the mean based system we are using at the moment is that people purchase items priced at extreme values, purley to screw with the script (Or as an attempt to launder money in the ingame economy) whish skews the data quite a lot. For example if the mean price for an item is around 5000 and someone buys one for 5000000, the mean value is now almost worthless for someone attempting to study the economy.

Posted: Thu Mar 23, 2006 10:33 am
by feyd
Considering the way you're wanting to find the median, you'll need to find it in PHP as you've already postulated. (Or create a stored procedure/function)