I am keeping stats for Google Adwords and every click is related to an advertiser, campaign and adgroup. In the database, I already have all of these as tables including some details about them like specific landing URI's and such. Currently the clicks table looks like this:
id
source
created
keyword
advertiser
campaign
adgroup
I'm wondering if I should remove the advertiser, campaign and adgroup columns because they're redundant as I could use JOINs to get that data, although that might slow down the query(?). Would you keep the current structure or opt for removing the columns and why?
Duplicate data for added speed?
Moderator: General Moderators
Re: Duplicate data for added speed?
This is the classic "what is the entity?" question. Is the advertiser (campaign/adgroup) a property of the "click" or are they entities in their own right? Depending on the focus of your database, either could conceivably be true. If you will be recording other properties of any of those, they are entities and must be in their own tables.nutkenz wrote:I am keeping stats for Google Adwords and every click is related to an advertiser, campaign and adgroup. In the database, I already have all of these as tables including some details about them like specific landing URI's and such. Currently the clicks table looks like this:
id
source
created
keyword
advertiser
campaign
adgroup
I'm wondering if I should remove the advertiser, campaign and adgroup columns because they're redundant as I could use JOINs to get that data, although that might slow down the query(?). Would you keep the current structure or opt for removing the columns and why?
My bias is to always treat such things as entities (which they are in the real world) and place them in separate tables, regardless of whether I think I need to record their properties, because:
* Eliminates slightly misspelled entries, which will distort summaries;
* Reduces database size;
* Provides flexibility in case I later decide to record properties (advertiser contact data, e.g.).
Unless your server is flooded with thousands of simultaneous queries, you will notice no measurable performance penalty.
Re: Duplicate data for added speed?
They are entities and will remain entities. The thing is that they are very closely associated with the clicks, and I'd have to do a join virtually every time if I do not keep a column for them in the clicks table. I often search the clicks table for entries with a certain adgroup or campaign as well.califdon wrote:This is the classic "what is the entity?" question. Is the advertiser (campaign/adgroup) a property of the "click" or are they entities in their own right? Depending on the focus of your database, either could conceivably be true. If you will be recording other properties of any of those, they are entities and must be in their own tables.nutkenz wrote:I am keeping stats for Google Adwords and every click is related to an advertiser, campaign and adgroup. In the database, I already have all of these as tables including some details about them like specific landing URI's and such. Currently the clicks table looks like this:
id
source
created
keyword
advertiser
campaign
adgroup
I'm wondering if I should remove the advertiser, campaign and adgroup columns because they're redundant as I could use JOINs to get that data, although that might slow down the query(?). Would you keep the current structure or opt for removing the columns and why?
My bias is to always treat such things as entities (which they are in the real world) and place them in separate tables, regardless of whether I think I need to record their properties, because:
* Eliminates slightly misspelled entries, which will distort summaries;
* Reduces database size;
* Provides flexibility in case I later decide to record properties (advertiser contact data, e.g.).
Unless your server is flooded with thousands of simultaneous queries, you will notice no measurable performance penalty.
Re: Duplicate data for added speed?
There's your answer. Every entity must be in its own table, according to the inventor of relational databases, Dr. E. F. Codd. Sure, do a join every time. That's what relational database technology is all about. Unless your tables exceed hundreds of millions or rows, believe me, there will be no significant slowing of queries. The response time is mainly associated with how many rows are returned, regardless of whether they are joined or coming from a single table. In a MySQL user group meeting a week or so ago, one audience member asked the group what was the "largest" MySQL database they had ever worked with. The highest number mentioned was "about 2 billion rows in the largest table." I'd suggest that you don't worry about response time.nutkenz wrote:They are entities and will remain entities. The thing is that they are very closely associated with the clicks, and I'd have to do a join virtually every time if I do not keep a column for them in the clicks table. I often search the clicks table for entries with a certain adgroup or campaign as well.