Tracking Database for Photographers [Design/Setup Questions]

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
User avatar
Skara
Forum Regular
Posts: 703
Joined: Sat Mar 12, 2005 7:13 pm
Location: US

Tracking Database for Photographers [Design/Setup Questions]

Post by Skara »

I have a lot of experience developing websites, but not much developing larger systems with many users. So I'd really like some advice in general database structure for a site consisting of many users and much more data (such as a forum). Currently I'm using MySQL via PHP's MySQLi extension.

:idea: The site I'm working on is a database for photographers/filmmakers to store/create information such as: projects, freelancers, equipment, invoices, expenses, etc. It's basically just a lot of tables with a lot of cross-referenced data. What makes it awesome is only that it very specifically relates to the needs of the user while remaining highly customizable.

I would say that my tables and queries are both pretty well structured, but I don't know much about what that means for a large-user system. I know forums like phpbb and such use their own database class, but I'm not familiar how this might (or might not) improve speed. I also know that forums tend to slow down after a few hundred thousand entries. Which might be an issue here.

To give a small example of the current setup, the system is more-or-less divided very simply as so:
companies: main account
users: each company will probably have ~1-10 users
expense-types: ~2-10 custom expense types per company with additional data such as a custom icon, description, etc
expenses: ~1000-10000 (?) expenses per company
project-types: ~2-20 custom project types
projects: ~10-1000 projects per year
project-expenses: ~10-500 expenses per project
project-[many other tables]: ~10-500 additional entries per project

The gist of which is that there will be a huge amount of data. Right now I have a handful of accounts with a fraction of the above data and everything seems to work fine. But I'm worried about extending this to a dozen accounts and, later, up to a couple hundred or more.

:?: So, what I'm asking is this: With this amount of data involved, what are the most important things I need to know? I would assume MySQL could handle it, structured properly, but am I wrong? What kind of basic optimization/structure do I need to watch out for past auto_increment IDs and cross reference tables?

Also, if anyone has any advice past database structure, that'd be helpful too. :)

I'd really appreciate some direction in this. Thanks!!
Eric!
DevNet Resident
Posts: 1146
Joined: Sun Jun 14, 2009 3:13 pm

Re: Tracking Database for Photographers [Design/Setup Questi

Post by Eric! »

I've found that mysql can often really slow a large site down (100k users and 15G db). It helps to learn the bottlenecks in your queries by SHOW PROFILE. Here's a starting point.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Tracking Database for Photographers [Design/Setup Questi

Post by Christopher »

Skara wrote:To give a small example of the current setup, the system is more-or-less divided very simply as so:
companies: main account
users: each company will probably have ~1-10 users
expense-types: ~2-10 custom expense types per company with additional data such as a custom icon, description, etc
expenses: ~1000-10000 (?) expenses per company
project-types: ~2-20 custom project types
projects: ~10-1000 projects per year
project-expenses: ~10-500 expenses per project
project-[many other tables]: ~10-500 additional entries per project
From a purely design perspective, all that matters is whether the relation is 1:1, 1:N or N:N. The actual numbers don't matter.
Skara wrote:The gist of which is that there will be a huge amount of data. Right now I have a handful of accounts with a fraction of the above data and everything seems to work fine. But I'm worried about extending this to a dozen accounts and, later, up to a couple hundred or more.

:?: So, what I'm asking is this: With this amount of data involved, what are the most important things I need to know? I would assume MySQL could handle it, structured properly, but am I wrong? What kind of basic optimization/structure do I need to watch out for past auto_increment IDs and cross reference tables?

Also, if anyone has any advice past database structure, that'd be helpful too. :)

I'd really appreciate some direction in this. Thanks!!
At the most basic, you want to keep row lengths as small as possible, index columns that you are selecting on so the database does not have to search the whole table, and fetch only the columns you need. Other than that, MySQL is excellent at tables with millions of records. MySQL also provides so tools to help optimize queries and log slow queries. After that you can scale vertically with database clusters or horizontally with replication. But remember that Premature Optimization can cause more problems that it helps. Keep the design solid and then deal with performance problems when you actually have them.
(#10850)
User avatar
Skara
Forum Regular
Posts: 703
Joined: Sat Mar 12, 2005 7:13 pm
Location: US

Re: Tracking Database for Photographers [Design/Setup Questi

Post by Skara »

Awesome, great advice.

@Eric!
Didn't know about SHOW PROFILE. That's awesome.


@Christopher
I'm not quite sure what you mean by 1:1, 1:N, and N:N. I get the relationships, but how might this affect performance? Most of my data is 1:N with some 1:1, though I have a few cross reference tables so I can grab 1:N either way without dealing directly with N:N. ... If that makes sense.

Also, good advice with premature optimization, and I didn't even think about scaling.

Thanks again. If anyone else has advice, I'd love to hear it. :)
User avatar
mecha_godzilla
Forum Contributor
Posts: 375
Joined: Wed Apr 14, 2010 4:45 pm
Location: UK

Re: Tracking Database for Photographers [Design/Setup Questi

Post by mecha_godzilla »

Just to add to that good advice, watch out for what your joins are doing and be careful not to end up with too many indexes - as Christopher has already said, indexes are important for columns you're selecting and I think I'm right in saying that this also applies with joins as well, but the law of diminishing returns also means that too many indexes will cause slowdowns (at least, that's my experience) because of the need to cross-reference multiple indexes. You also have to watch out when you're using LIMIT and ORDER in your queries as well, although LIMIT is the main culprit. Keep your queries as simple as possible and try not to nest them if you can help it, because if you get this wrong the queries will take forever to process.

Also (just as a basic example) consider what performance gains you might get from the first query over the second if you just need to know the total of photographs stored in the database for a particular photographer:

Code: Select all

SELECT COUNT(photo_id) AS total FROM photos WHERE photographer = '1';

Code: Select all

SELECT photo_id FROM photos WHERE photographer = '1';
Finally, if you have a lot of lookup tables that are of a fixed size and unlikely to change very often, you could preload them into MEMORY tables - although MEMORY tables are not particularly suitable for most applications, this is one area where they can perform really well if structured correctly.

HTH,

M_G
Last edited by mecha_godzilla on Thu May 23, 2013 7:14 pm, edited 1 time in total.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Tracking Database for Photographers [Design/Setup Questi

Post by Christopher »

mecha_godzilla wrote:but the law of diminishing returns also means that too many indexes will cause slowdowns (at least, that's my experience) because of the need to cross-reference multiple indexes.
Agreed. I think indexes mainly cause slowdowns for INSERTs because the table needs to be re-indexed. More indexes means slower re-indexing.
(#10850)
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Re: Tracking Database for Photographers [Design/Setup Questi

Post by pickle »

Skara wrote:I know forums like phpbb and such use their own database class, but I'm not familiar how this might (or might not) improve speed. I also know that forums tend to slow down after a few hundred thousand entries. Which might be an issue here.
USE A DATABASE ABSTRACTION CLASS!!

There's a reason every application worth it's salt uses one, it makes your life/code much, much easier. Technically it will have an affect on speed, but that affect will be so low as to generally be negligible.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
User avatar
Skara
Forum Regular
Posts: 703
Joined: Sat Mar 12, 2005 7:13 pm
Location: US

Re: Tracking Database for Photographers [Design/Setup Questi

Post by Skara »

Awesome. This is all great clarification/advice.

I have a related question. In general, is it better to have fewer columns / shorter rows and SUM() (or whatever) rows from another table or the reverse? Err.. In other words... In a simple case, I have a list much like the following that I want to generate:
InvoiceID, Date, CompanyBilled, Total.
Would it be better to SUM() the total from an `invoice_rows` table on generating the list or to sum the data when saving the invoice to the `invoices` table as an additional column for easier (quicker?) access? Of course, in practice, many of my tables are much more complex, but I'd like to know the general rule to follow.

Thanks again!
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Re: Tracking Database for Photographers [Design/Setup Questi

Post by pickle »

I usually tie the most work to the action that happens the least - to reduce the clock cycles dedicated to the task.

In this case, I would sum the total and store it. The invoice will only be generated once, but the query could be run multiple times. So you really have a choice whether to do the summing work once on write, or once each read.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
Post Reply