Page 1 of 1

better database design

Posted: Sun Oct 14, 2007 4:41 am
by arjan.top
feyd | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]


I am writing page statistic script and I have samo questions:

Surrent mysql db design is:
visit_id (unique id for each visitor) | page_id | ip | browser | op_system | etc.. | pagevisit (1 on first page view) | bounce (1 if more than one page viewed) | timeonpage (time in seconds) | time (timestap of pageview)

Row is inserted for each pageview

Example:
1234 | page_1 | 123.123.123.123 | Firefox | etc... | 1 | 0 | 210 | 1187189415
1234 | page_1 | 123.123.123.123 | Firefox | etc... | 0 | 0 | 0 | 1187189430

Why I need inserts for each view?
Because i need url and title of pageview

Currently I create new tables based on timestap (so there are tables that are small enaugh):

timestap: 1187189415
so table is: stats_1187

From one table I get data with this query:

[syntax="sql"]SELECT 
	browser,
	SUM(pagevisit) AS visits,
	ROUND(COUNT(*) / SUM(pagevisit), 2) AS ppv,
	TRUNCATE((SUM(timeonpage) / SUM(pagevisit)), 0) AS time,
	ROUND(((SUM(pagevisit) - SUM(bounce)) / SUM(pagevisit)) * 100, 0) AS bounce
FROM estats_1180
GROUP BY browser
ORDER BY visits DESC
Of course this query can get very slow (14sec on athlon 64 3000 with 1 million entries)



My question is: Is there any better solution for this problem ? (and I am shure there is one )

The second one is how to get data from multiple tables? query for each one and then "merge" in php? (user can select any date range)

Yes there is a lot to read but I hope someone would know how to help.


feyd | Please use[/syntax]

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]

Posted: Sun Dec 02, 2007 10:46 am
by arjan.top
Now I tryed multiple tables

First table:

Code: Select all

CREATE TABLE `nstats_inno_1` (
  `id` int(11) NOT NULL auto_increment,
  `page_id` tinyint(3) NOT NULL,
  `ip` int(10) NOT NULL,
  `visit_number` tinyint(3) NOT NULL,
  `visit_unique` tinyint(1) NOT NULL,
  `visit_bounce` tinyint(1) NOT NULL,
  `timeonpage` smallint(6) NOT NULL,
  `timestap` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB
Second table:

Code: Select all

CREATE TABLE `nstats_inno_2` (
  `id` int(11) NOT NULL auto_increment,
  `browser_name` varchar(20) collate utf8_unicode_ci NOT NULL,
  `browser_version` varchar(20) collate utf8_unicode_ci NOT NULL,
  `os_name` varchar(20) collate utf8_unicode_ci NOT NULL,
  `os_version` varchar(20) collate utf8_unicode_ci NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `browser_name` (`browser_name`),
  KEY `browser_version` (`browser_version`)
) ENGINE=InnoDB
And the SQL to get data:

Code: Select all

SELECT SQL_NO_CACHE
	t2.browser_name,
	COUNT(t1.id) as visits,
	SUM(t1.visit_number) / COUNT(t1.id) as ppvm,
	SUM(timeonpage) / COUNT(t1.id) as timeonpage,
	SUM(t1.visit_bounce) / COUNT(t1.id) * 100 as bounce
FROM nstats_inno_1 as t1 LEFT JOIN nstats_inno_2 as t2 ON t1.id=t2.id
GROUP BY t2.browser_name
ORDER BY visits DESC
Now the query is faster (4s with 400.000 entries), but the problem is increased number of query-s to insert data into tables /these two tables are only the start, I will need more data in the db, so more tables to keep the tables small)

Still searching for the best solution :)

Posted: Sun Dec 02, 2007 11:09 am
by feyd
Normalized tables, by nature, require multiple insertions. However they help remove redundancies. The time required to do things is often a balance. The is a balance of time needed to create sufficient entries versus time needed to search for those entries. Since viewing often takes priority in our line of work, it's a necessary evil.

Posted: Tue Dec 04, 2007 6:39 pm
by dancaragea
I don't think that you solved anything by splitting the first table into 2 like you did.
If you wanted to normalize the db you should have created a separate table for browsers like:

Code: Select all

browser_id tinyint(2)
browser_name varchar(20)
and then the main table would hold the browser_id instead of browser_name for each row.
To select the browser_name then you would do something like

Code: Select all

select a.ip,b.browser_name from first_table a, browsers b where a.browser_id=b.browser_id
Do the same with the rest of repeating data (if any).
Also, the ip address should be stored as an int (not exactly clear from your example how you store it).
All this to minimize the memory eaten by the query because there's a temporary table involved due to that descending order and you want to have that table created in memory.

Then you could play with the indexes a little but post some results and your new tables and I'll see if I can help you more.

Posted: Wed Dec 05, 2007 4:09 am
by arjan.top
dancaragea wrote: I don't think that you solved anything by splitting the first table into 2 like you did.
I did something, query is faster hehe :)
dancaragea wrote: If you wanted to normalize the db you should have created a separate table for browsers like:

Code: Select all

browser_id tinyint(2)
browser_name varchar(20)
haven't thought of that, this would complicate efverything but speed is what I am looking for (don't we all :P )
dancaragea wrote: Also, the ip address should be stored as an int (not exactly clear from your example how you store it).
ip is stored as int as you can see (first table)
dancaragea wrote: Then you could play with the indexes a little but post some results and your new tables and I'll see if I can help you more.
will do :)

Posted: Sun Dec 23, 2007 11:18 am
by arjan.top
@dancaragea:

Tested yours suggestion but it is slower. I have primary key on id's and index on GROUP BY columns.
Will do some more testing when I have time ...

Posted: Tue Dec 25, 2007 6:50 am
by arjan.top
new tests, new results :)

Both with one million inserts:
old "design": 6-7 s
browsers in separate table: 4-5s

So I have a new db design to work on :)

Posted: Tue Dec 25, 2007 9:24 am
by arjan.top
me again :D

so i have this query:

Code: Select all

SELECT
	t3.browser_name,
	COUNT(t1.id) as visits,
	SUM(t1.visit_number) / COUNT(t1.id) as ppvm,
	SUM(timeonpage) / COUNT(t1.id) as timeonpage,
	SUM(t1.visit_bounce) / COUNT(t1.id) * 100 as bounce
FROM nstats_inno2_1 as t1 INNER JOIN (nstats_inno2_2 as t2, nstats_browser_name as t3)
ON (t1.id=t2.id AND t2.browser_id=t3.id)
GROUP BY t3.browser_name
ORDER BY visits DESC
As you can see I have groupby on t3 field browser_name ...
So the question is should i create index on browser_name in t3 (which is joined wit t2)?
in t3 there is just list of browser names and id's for join

EDIT: and should i make t3 myisam or innoDB?