Selecting median information from a mysql db
Posted: Tue Mar 21, 2006 8:03 am
Using PHP 4 and MySQL
Hello, I have a weird little issue and I was wondering what is the best way to go about this.
I am working with two tables in a database. One it a list of items (Few thousand rows) and the other a list of transactions (Of the various items). This is the logging system for an online games mall.
Each transaction represents a purchace of 1 or more items at price x (For example if you buy 32 of item x from me at $134 it will appear as one transaction (With a quantity of 32 and a price of 134) and if you purchase 1 of the item off me a single row is also generated. (The players can set whatever price they want for the items, so item x might sell for $100 from one player and $5000 from another.)
This table is quite large, recieving in excess of 50000 new transactions daily and presently contains about 7gb of data.
Now, what we are currently doing with this information is generating mean pricing data using the mysql sum() function. While this works fine, it does not give an acurate picture of the games economy, so what we really need is median data.
How I would normally go about this. (Presuming 1 item per row sold) (Psudo code)
Foreach item
select todays sales for the item from the db ordering by price
get number of results returned
if odd (results)
get middle price
else if even (results)
get mean price of middle 2
end if
end foreach
Simple enough, although it requires a few too many querys for my liking.
BUT since each row contains multiple purchase of the individual items what I think I would have to do is pull each item into a big array (Every element in the array representing a single item sold) sorting that array and splitting it up there to find the median entrys.
I can see this as being a problem as I cant imagine this would be very fast or very effeciant memory wise.
Can anyone else think of an alternate solution?
Please note, we have no control over how the information is stored, but we can install UDFs if need be.
Regards,
oeb
Hello, I have a weird little issue and I was wondering what is the best way to go about this.
I am working with two tables in a database. One it a list of items (Few thousand rows) and the other a list of transactions (Of the various items). This is the logging system for an online games mall.
Each transaction represents a purchace of 1 or more items at price x (For example if you buy 32 of item x from me at $134 it will appear as one transaction (With a quantity of 32 and a price of 134) and if you purchase 1 of the item off me a single row is also generated. (The players can set whatever price they want for the items, so item x might sell for $100 from one player and $5000 from another.)
This table is quite large, recieving in excess of 50000 new transactions daily and presently contains about 7gb of data.
Now, what we are currently doing with this information is generating mean pricing data using the mysql sum() function. While this works fine, it does not give an acurate picture of the games economy, so what we really need is median data.
How I would normally go about this. (Presuming 1 item per row sold) (Psudo code)
Foreach item
select todays sales for the item from the db ordering by price
get number of results returned
if odd (results)
get middle price
else if even (results)
get mean price of middle 2
end if
end foreach
Simple enough, although it requires a few too many querys for my liking.
BUT since each row contains multiple purchase of the individual items what I think I would have to do is pull each item into a big array (Every element in the array representing a single item sold) sorting that array and splitting it up there to find the median entrys.
I can see this as being a problem as I cant imagine this would be very fast or very effeciant memory wise.
Can anyone else think of an alternate solution?
Please note, we have no control over how the information is stored, but we can install UDFs if need be.
Regards,
oeb