Page 1 of 1

Tracking Database for Photographers [Design/Setup Questions]

Posted: Thu May 23, 2013 12:07 pm
by Skara
I have a lot of experience developing websites, but not much developing larger systems with many users. So I'd really like some advice in general database structure for a site consisting of many users and much more data (such as a forum). Currently I'm using MySQL via PHP's MySQLi extension.

:idea: The site I'm working on is a database for photographers/filmmakers to store/create information such as: projects, freelancers, equipment, invoices, expenses, etc. It's basically just a lot of tables with a lot of cross-referenced data. What makes it awesome is only that it very specifically relates to the needs of the user while remaining highly customizable.

I would say that my tables and queries are both pretty well structured, but I don't know much about what that means for a large-user system. I know forums like phpbb and such use their own database class, but I'm not familiar how this might (or might not) improve speed. I also know that forums tend to slow down after a few hundred thousand entries. Which might be an issue here.

To give a small example of the current setup, the system is more-or-less divided very simply as so:
companies: main account
users: each company will probably have ~1-10 users
expense-types: ~2-10 custom expense types per company with additional data such as a custom icon, description, etc
expenses: ~1000-10000 (?) expenses per company
project-types: ~2-20 custom project types
projects: ~10-1000 projects per year
project-expenses: ~10-500 expenses per project
project-[many other tables]: ~10-500 additional entries per project

The gist of which is that there will be a huge amount of data. Right now I have a handful of accounts with a fraction of the above data and everything seems to work fine. But I'm worried about extending this to a dozen accounts and, later, up to a couple hundred or more.

:?: So, what I'm asking is this: With this amount of data involved, what are the most important things I need to know? I would assume MySQL could handle it, structured properly, but am I wrong? What kind of basic optimization/structure do I need to watch out for past auto_increment IDs and cross reference tables?

Also, if anyone has any advice past database structure, that'd be helpful too. :)

I'd really appreciate some direction in this. Thanks!!

Re: Tracking Database for Photographers [Design/Setup Questi

Posted: Thu May 23, 2013 1:22 pm
by Eric!
I've found that mysql can often really slow a large site down (100k users and 15G db). It helps to learn the bottlenecks in your queries by SHOW PROFILE. Here's a starting point.

Re: Tracking Database for Photographers [Design/Setup Questi

Posted: Thu May 23, 2013 2:43 pm
by Christopher
Skara wrote:To give a small example of the current setup, the system is more-or-less divided very simply as so:
companies: main account
users: each company will probably have ~1-10 users
expense-types: ~2-10 custom expense types per company with additional data such as a custom icon, description, etc
expenses: ~1000-10000 (?) expenses per company
project-types: ~2-20 custom project types
projects: ~10-1000 projects per year
project-expenses: ~10-500 expenses per project
project-[many other tables]: ~10-500 additional entries per project
From a purely design perspective, all that matters is whether the relation is 1:1, 1:N or N:N. The actual numbers don't matter.
Skara wrote:The gist of which is that there will be a huge amount of data. Right now I have a handful of accounts with a fraction of the above data and everything seems to work fine. But I'm worried about extending this to a dozen accounts and, later, up to a couple hundred or more.

:?: So, what I'm asking is this: With this amount of data involved, what are the most important things I need to know? I would assume MySQL could handle it, structured properly, but am I wrong? What kind of basic optimization/structure do I need to watch out for past auto_increment IDs and cross reference tables?

Also, if anyone has any advice past database structure, that'd be helpful too. :)

I'd really appreciate some direction in this. Thanks!!
At the most basic, you want to keep row lengths as small as possible, index columns that you are selecting on so the database does not have to search the whole table, and fetch only the columns you need. Other than that, MySQL is excellent at tables with millions of records. MySQL also provides so tools to help optimize queries and log slow queries. After that you can scale vertically with database clusters or horizontally with replication. But remember that Premature Optimization can cause more problems that it helps. Keep the design solid and then deal with performance problems when you actually have them.

Re: Tracking Database for Photographers [Design/Setup Questi

Posted: Thu May 23, 2013 3:39 pm
by Skara
Awesome, great advice.

@Eric!
Didn't know about SHOW PROFILE. That's awesome.


@Christopher
I'm not quite sure what you mean by 1:1, 1:N, and N:N. I get the relationships, but how might this affect performance? Most of my data is 1:N with some 1:1, though I have a few cross reference tables so I can grab 1:N either way without dealing directly with N:N. ... If that makes sense.

Also, good advice with premature optimization, and I didn't even think about scaling.

Thanks again. If anyone else has advice, I'd love to hear it. :)

Re: Tracking Database for Photographers [Design/Setup Questi

Posted: Thu May 23, 2013 5:34 pm
by mecha_godzilla
Just to add to that good advice, watch out for what your joins are doing and be careful not to end up with too many indexes - as Christopher has already said, indexes are important for columns you're selecting and I think I'm right in saying that this also applies with joins as well, but the law of diminishing returns also means that too many indexes will cause slowdowns (at least, that's my experience) because of the need to cross-reference multiple indexes. You also have to watch out when you're using LIMIT and ORDER in your queries as well, although LIMIT is the main culprit. Keep your queries as simple as possible and try not to nest them if you can help it, because if you get this wrong the queries will take forever to process.

Also (just as a basic example) consider what performance gains you might get from the first query over the second if you just need to know the total of photographs stored in the database for a particular photographer:

Code: Select all

SELECT COUNT(photo_id) AS total FROM photos WHERE photographer = '1';

Code: Select all

SELECT photo_id FROM photos WHERE photographer = '1';
Finally, if you have a lot of lookup tables that are of a fixed size and unlikely to change very often, you could preload them into MEMORY tables - although MEMORY tables are not particularly suitable for most applications, this is one area where they can perform really well if structured correctly.

HTH,

M_G

Re: Tracking Database for Photographers [Design/Setup Questi

Posted: Thu May 23, 2013 6:55 pm
by Christopher
mecha_godzilla wrote:but the law of diminishing returns also means that too many indexes will cause slowdowns (at least, that's my experience) because of the need to cross-reference multiple indexes.
Agreed. I think indexes mainly cause slowdowns for INSERTs because the table needs to be re-indexed. More indexes means slower re-indexing.

Re: Tracking Database for Photographers [Design/Setup Questi

Posted: Fri May 24, 2013 5:03 pm
by pickle
Skara wrote:I know forums like phpbb and such use their own database class, but I'm not familiar how this might (or might not) improve speed. I also know that forums tend to slow down after a few hundred thousand entries. Which might be an issue here.
USE A DATABASE ABSTRACTION CLASS!!

There's a reason every application worth it's salt uses one, it makes your life/code much, much easier. Technically it will have an affect on speed, but that affect will be so low as to generally be negligible.

Re: Tracking Database for Photographers [Design/Setup Questi

Posted: Tue May 28, 2013 12:42 pm
by Skara
Awesome. This is all great clarification/advice.

I have a related question. In general, is it better to have fewer columns / shorter rows and SUM() (or whatever) rows from another table or the reverse? Err.. In other words... In a simple case, I have a list much like the following that I want to generate:
InvoiceID, Date, CompanyBilled, Total.
Would it be better to SUM() the total from an `invoice_rows` table on generating the list or to sum the data when saving the invoice to the `invoices` table as an additional column for easier (quicker?) access? Of course, in practice, many of my tables are much more complex, but I'd like to know the general rule to follow.

Thanks again!

Re: Tracking Database for Photographers [Design/Setup Questi

Posted: Tue May 28, 2013 12:50 pm
by pickle
I usually tie the most work to the action that happens the least - to reduce the clock cycles dedicated to the task.

In this case, I would sum the total and store it. The invoice will only be generated once, but the query could be run multiple times. So you really have a choice whether to do the summing work once on write, or once each read.