Settings

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Typically my way of handling application wide configuration goes along the lines of

Code: Select all

class applicationSettings
{
   function applicationSettings()  {
      if (!$this->isSetSettings()) {
         $this->loadSettings();
      }
   }

   function loadSettings() {
      //load setting via file? db?
      //load into session configuration
   }

   function isSettingsSet() { 
      return (isset($_SESSION['config']) ? true : false);    
   }

   function getSetting($setting) {
      return $_SESSION['config'][$setting];
   }
}
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

I don't understand: if you load it into SESSION, people without sessions (cookie disabled) won't have the caching enabled, and you'll have multiple copies of configuration floating around...
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Ambush Commander wrote:I don't understand: if you load it into SESSION, people without sessions (cookie disabled) won't have the caching enabled
Point taken. Well on the applications I write which are generally required to have cookies enabled for logins and such.
What would you recommend for caching? OR are you saying load it fresh from the database each call?
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Files. Or better yet, dynamically generated PHP files.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

This is where polymophysm comes into play, why not do some prelimary checks on the cookie status to determine if caching is a viable option?
Caching is far better than re-opening files on a per page basis, or even if the case may using a database.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

You do realize the PHP's sessions use the filesystem...
By default, all data related to a particular session will be stored in a file in the directory specified by the session.save_path INI option. A file for each session (regardless of if any data is associated with that session) will be created. This is due to the fact that a session is opened (a file is created) but no data is even written to that file. Note that this behavior is a side-effect of the limitations of working with the file system and it is possible that a custom session handler (such as one which uses a database) does not keep track of sessions which store no data.
Unless you're using database driven sessions, but filesystem is faster than database.
Roja
Tutorials Group
Posts: 2692
Joined: Sun Jan 04, 2004 10:30 pm

Post by Roja »

Ambush Commander wrote:Unless you're using database driven sessions, but filesystem is faster than database.
This is a common thought ("FS > DB"), and I'm very glad to say its just as often not true on hosting machines. Several factors come into play.

First and foremost is the fact that the vast majority of sites are running on shared hosts. Which in turn are running on an IDE drive. Which in turn are limited to a single request at a time. As a result, on a busy shared host, databases will often be faster, because they have a larger in-memory cache (which avoids file hits).

Second is the fact that databases are ideal for random access, while files are generally poor for it. Because you have to seek to the location of the data in the file, you waste time. Databases are designed for it. Since config values (done individually) are rather random-seek in nature, the advantage is usually on databases.

Third, and perhaps the most fun, is the fact that databases are generally optimized for website use, while harddrive rarely are - if for no other reason than size and scope. A phpbb install is ~3-5 mb. A hosting machine OS can be multiple GIGS. Thats a huge amount of seek time.

As a general statement, on your home PC, yes, a filesystem will often beat the DB. However, on a shared host, setup properly, the db will be extremely competitive, and win in many situations.

I ran a hosting company.. we spent quite a bit of time making sure that was the case, which meant happy customers.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Ambush Commander wrote:You do realize the PHP's sessions use the filesystem...
By default, all data related to a particular session will be stored in a file in the directory specified by the session.save_path INI option. A file for each session (regardless of if any data is associated with that session) will be created. This is due to the fact that a session is opened (a file is created) but no data is even written to that file. Note that this behavior is a side-effect of the limitations of working with the file system and it is possible that a custom session handler (such as one which uses a database) does not keep track of sessions which store no data.
Unless you're using database driven sessions, but filesystem is faster than database.
Roja wrote: Second is the fact that databases are ideal for random access, while files are generally poor for it. Because you have to seek to the location of the data in the file, you waste time. Databases are designed for it. Since config values (done individually) are rather random-seek in nature, the advantage is usually on databases.
Thats sort of what I was getting at, since using the internal file structure of php to handle the files rather than doing it yourself. I'm not quite sure if php session file handling is quicker or not, but I tend to investigate in the near futur. Unless anyone knows?
First and foremost is the fact that the vast majority of sites are running on shared hosts. Which in turn are running on an IDE drive. Which in turn are limited to a single request at a time. As a result, on a busy shared host, databases will often be faster, because they have a larger in-memory cache (which avoids file hits).
Good to know.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

This is a common thought ("FS > DB"), and I'm very glad to say its just as often not true on hosting machines. Several factors come into play.
I've actually never heard this before. Very interesting.
First and foremost is the fact that the vast majority of sites are running on shared hosts. Which in turn are running on an IDE drive. Which in turn are limited to a single request at a time. As a result, on a busy shared host, databases will often be faster, because they have a larger in-memory cache (which avoids file hits).
That's a funny thing, because if file hits were so hard to read, then why not load everything straight from the database? I find it hard to believe that this factor would be so great (versus a call to an external resource).
Second is the fact that databases are ideal for random access, while files are generally poor for it. Because you have to seek to the location of the data in the file, you waste time. Databases are designed for it. Since config values (done individually) are rather random-seek in nature, the advantage is usually on databases.
A PHP file that contains all configuration probably is faster. PHP handles the parsing and sometimes even the opcode caching, and it goes straight to an in-memory representation. Granted, loading only the core configuration, and then loading other blocks of config as you need it probably should use a database, since you've segmented it, but IMHO it is a lot easier to load all configurations in one swoop.
Third, and perhaps the most fun, is the fact that databases are generally optimized for website use, while harddrive rarely are - if for no other reason than size and scope. A phpbb install is ~3-5 mb. A hosting machine OS can be multiple GIGS. Thats a huge amount of seek time.
I won't believe it til I see it. ;) It probably comes down to your hosting environment: there is no catch all solution, it depends on your hardware and software. :)
Thats sort of what I was getting at, since using the internal file structure of php to handle the files rather than doing it yourself. I'm not quite sure if php session file handling is quicker or not, but I tend to investigate in the near futur. Unless anyone knows?
It's all dependent on your machine. Test it yourself.

Although... you haven't answered my other question, and that's if each session gets its own copy of the configuration, and you have, say 1000 users, that's 1000 copies of the configuration? (unless you're talking user-specific configuration).
Roja
Tutorials Group
Posts: 2692
Joined: Sun Jan 04, 2004 10:30 pm

Post by Roja »

Ambush Commander wrote:That's a funny thing, because if file hits were so hard to read, then why not load everything straight from the database? I find it hard to believe that this factor would be so great (versus a call to an external resource).
Because file seeks are better at large quantities of sequential data. Thats what they do well. Databases on the other hand are ideal for small amounts of mostly random (yet related) data. So, if you have a large amount of data ("Loading everything"), then you'd favor a fileseek. If you had a few mostly random items (say, a config file with ~ 300 variables), then you probably would do better on a db.
Ambush Commander wrote:A PHP file that contains all configuration probably is faster. PHP handles the parsing and sometimes even the opcode caching, and it goes straight to an in-memory representation. Granted, loading only the core configuration, and then loading other blocks of config as you need it probably should use a database, since you've segmented it, but IMHO it is a lot easier to load all configurations in one swoop.
It depends on the size, and the load of the server, but yes, you have the right idea. Smaller amounts of data = db win, larger amounts = file win.
Ambush Commander wrote:I won't believe it til I see it. ;) It probably comes down to your hosting environment: there is no catch all solution, it depends on your hardware and software. :)
Oh I agree totally on that. Many hosts are very poorly configured, have horrible db settings, have no filesystem optimization done, and so forth. But in my experience, if done well, a db will often give better results than a file pull for a pretty reasonable spectrum of data sizes.
Ambush Commander wrote:Although... you haven't answered my other question, and that's if each session gets its own copy of the configuration, and you have, say 1000 users, that's 1000 copies of the configuration? (unless you're talking user-specific configuration).
Memory-wise, it can be yes.

Basically, php loads the variables into memory for each script. If you have 300 variables that need to be accessed, thats 300 variables loaded on each sccript * the number of users.

Thankfully, the actual memory used by loading even 1,000 variables into memory in php is rather minor. I've loaded a 100,000 element array into memory and stayed under the 8mb limit. As a result, 300 variables is practically a non-issue (performance wise) - even at 1,000 users.
User avatar
BDKR
DevNet Resident
Posts: 1207
Joined: Sat Jun 08, 2002 1:24 pm
Location: Florida
Contact:

Post by BDKR »

All of the above said, what are some of the deciding factors for choosing to load all config vars into memory up front verse a lazy loading approach? It would seem that there are cases for both. I'm just curious to hear how some of you are thinking on this topic.

Cheers
Roja
Tutorials Group
Posts: 2692
Joined: Sun Jan 04, 2004 10:30 pm

Post by Roja »

BDKR wrote:All of the above said, what are some of the deciding factors for choosing to load all config vars into memory up front verse a lazy loading approach? It would seem that there are cases for both. I'm just curious to hear how some of you are thinking on this topic.
I can offer an answer about how I am doing it, and what my reasons are.

In the upcoming version of BNT, we wanted language editing to be trivially easy. In a previous version, we had a php file, with embedded html (!), embedded variables, and placeholder text. Then we moved to 100+ files, seperated for each page in the game, to reduce memory impact and load time.

What we've ended up with is a mix between the two. We have a single ini file with no embedded variables, html, or other non-language items. The ini file has categories, which correlate to the correct file in the game. Thus, in "main.php", we load all items from the category [main]. Its fairly straightforward.

At the 'creation' phase of the game, when the admin sets everything up, we load that ini file, and store the language variables in the database. Some files (categories) are larger than others, and some variables are used in multiple files.

During the play phase of the game, each page presenting text to a user does a database call, for the specific language categories needed. To date, the majority of the files use on average 3-4 categories, and approximately 50 lines of text.

We measured performance against a number of alternatives, and found this was the best solution. The alternatives included:

- One php language file, loaded on every page: Tremendously wasteful in processing time and in memory use. 2,000 variables defined for a page that only uses 20-50 doesn't make any sense.

- One ini language file, loaded into memory: A slight advantage over the php file, but again, wasteful for pages that don't need it.

- Multiple php language files, loaded on specific pages: While the processing time isn't terrible, when you approach 100 simultaneous users on an IDE-driven server, you see load increase rapidly due to the uncached, queueing disk seeks. Plus, the impact on translators was significant.

When we benchmarked the final choice (db-driven, ini-fed), I was a little shocked to see that it was somewhat better at the low-user count level, but fantastically better at the hundred-user count level. We're talking better than a 50% decrease in load!

It is definitely unique to an online game. Most websites (portals, news, blogs) are going to have roughly the same number of language strings on a given page. For our specific game, definitely not. We have one page with over 200 lines of text, and one page with only four. Its just a completely different problem set.

So thats why we went with ini-file-fed, db-driven languages. As a side note, we gained even *more* significant gains by using adodb's caching features. Because the language files literally change only when an admin makes the change, we can cache them until that change occurs. That leads nicely into the discussion of why we are working to add a built-in translator tool (which would notify adodb of the change), but thats another topic. :)
User avatar
AKA Panama Jack
Forum Regular
Posts: 878
Joined: Mon Nov 14, 2005 4:21 pm

Re: Settings

Post by AKA Panama Jack »

Ree wrote:Where do you store settings of your applications - file or database? Of course, I want to allow users to edit those via web interface.
Well, storing configuration files in a database may sound like a good thing but it has some serious problems when it comes to speed and host limits (IE: queries per hour and byts transfered between the database server and client).

The game I am working on (link in my signature) used to use a database for storing and retrieving the configuration information but we have changed this in the current version we are alpha testing. The game has over 300 different configuration items. I performed a number of CPU load and performance tests and found that pulling all of the data from the database was incredably slow compared to loading a configuration file as an include. It was between 10-15 times SLOWER pulling the data from the database than loading it as an include file. Plus loading from an include file caused far less CPU load than pulling the data from the database table.

If you have only a handful of configuration items then storing them in a database table will not have as noticable difference in speed or load but it will always be slower even if you are pulling one element from the table because of the overhead in connecting to the database and retrieving the single element, even with caching.
Last edited by AKA Panama Jack on Mon Nov 28, 2005 2:13 pm, edited 1 time in total.
User avatar
AKA Panama Jack
Forum Regular
Posts: 878
Joined: Mon Nov 14, 2005 4:21 pm

Post by AKA Panama Jack »

Roja wrote:First and foremost is the fact that the vast majority of sites are running on shared hosts. Which in turn are running on an IDE drive. Which in turn are limited to a single request at a time. As a result, on a busy shared host, databases will often be faster, because they have a larger in-memory cache (which avoids file hits).
This is not quite true if the database server is handling a number of databases. You will find that a 128 meg read cache on a database server that is hosting multiple databases is constantly flushing data from the cache. You will find that an active database server will be flushing data from the cache on a constant bases and that loading a file will be faster because of the less overhead involved.
Roja wrote:Second is the fact that databases are ideal for random access, while files are generally poor for it. Because you have to seek to the location of the data in the file, you waste time. Databases are designed for it. Since config values (done individually) are rather random-seek in nature, the advantage is usually on databases.
Actually that is a misnomer as well. All of the databases are basically flat file systems with indexing. You can even duplicate this type of database using pure PHP code. All databases have to seek to a location in the database file in a similar manner. You can even perform random seeking inside a stored text file using PHP. You can create a fully indexed database system using pure PHP code if you want. The speed advantage when seeking for a database like mysql is because it can communicate directly with the filesystem while doing something similar in PHP you have to go through the interpreter layer of PHP to get to the filesystem. But even at that loading a file that contains configuration data will ALWAYS be faster than loading configuration data from a database, even if it is only one element.
Roja wrote:As a general statement, on your home PC, yes, a filesystem will often beat the DB. However, on a shared host, setup properly, the db will be extremely competitive, and win in many situations.
I will guarantee you that isn't true for what you are talking about and we host alot of websites as well as one of the largest trucking networks in the world. When it comes to loading configuration data and similar items using a database is one of the worst things you can use for a highly active site, shared or dedicated.
User avatar
AKA Panama Jack
Forum Regular
Posts: 878
Joined: Mon Nov 14, 2005 4:21 pm

Post by AKA Panama Jack »

Roja wrote:- Multiple php language files, loaded on specific pages: While the processing time isn't terrible, when you approach 100 simultaneous users on an IDE-driven server, you see load increase rapidly due to the uncached, queueing disk seeks. Plus, the impact on translators was significant.

When we benchmarked the final choice (db-driven, ini-fed), I was a little shocked to see that it was somewhat better at the low-user count level, but fantastically better at the hundred-user count level. We're talking better than a 50% decrease in load!
Our results were completely opposite. We used Microsofts Web Stress Test package to simulate 100 users accessing the same web page. We ran a test with one page pulling 300 data elements from a database and populating variables from those elements. The second test was loading an include file of 300 variables (the same variables and data that would be created from the database table). The test.php program is exactly the same for both tests except one loads the 300 variable config from an include file and the other loads the 300 variable config from a database table. These tests were performed on my local network so there was absolutely no latency involved.

Here are our results.

300 Variable Config from include file (adodb lite)

Code: Select all

Report name:                  11/28/2005 3:11:43 PM
Run on:                       11/28/2005 3:11:43 PM
Run length:                   00:00:55

Web Application Stress Tool Version:1.1.293.1

Number of test clients:       1

Number of hits:               1203
Requests per Second:          40.10

Socket Statistics
--------------------------------------------------------------------------------
Socket Connects:              1206
Total Bytes Sent (in KB):     300.32
Bytes Sent Rate (in KB/s):    10.01
Total Bytes Recv (in KB):     1213.03
Bytes Recv Rate (in KB/s):    40.43

Number of threads:            100
Number of users:              200
Hit Count:                    1203

300 Variable Config from Database Table (adodb lite)

Code: Select all

Report name:                  11/28/2005 3:12:23 PM
Run on:                       11/28/2005 3:12:23 PM
Run length:                   00:00:55

Web Application Stress Tool Version:1.1.293.1

Number of test clients:       1

Number of hits:               654
Requests per Second:          21.80

Socket Statistics
--------------------------------------------------------------------------------
Socket Connects:              653
Total Bytes Sent (in KB):     162.61
Bytes Sent Rate (in KB/s):    5.42
Total Bytes Recv (in KB):     654.75
Bytes Recv Rate (in KB/s):    21.82

Number of threads:            100
Number of users:              200
Hit Count:                    654
As you can see the server was able to process 1.8 times the number of pages using an include file instead of the database for loading configuration files.

Both of the above tests load a database abstraction layer (adodb_lite) and connect to the database. This is to prove that it is not the abstraction layer or connecting to the database causing the slowdown. The test was also performed using ADOdb and it was even slower on both tests.

The following shows the same tests running eAccelerator...

300 Variable Config from include file (adodb lite)

Code: Select all

Report name:                  11/28/2005 2:37:43 PM
Run on:                       11/28/2005 2:37:43 PM
Run length:                   00:00:55

Web Application Stress Tool Version:1.1.293.1

Number of test clients:       1

Number of hits:               3869
Requests per Second:          128.96

Socket Statistics
--------------------------------------------------------------------------------
Socket Connects:              3852
Total Bytes Sent (in KB):     959.24
Bytes Sent Rate (in KB/s):    31.97
Total Bytes Recv (in KB):     3859.05
Bytes Recv Rate (in KB/s):    128.62

Number of threads:            100
Number of users:              200
Hit Count:                    3869

300 Variable Config from Database Table (adodb lite)

Code: Select all

Report name:                  11/28/2005 3:03:33 PM
Run on:                       11/28/2005 3:03:33 PM
Run length:                   00:00:55

Web Application Stress Tool Version:1.1.293.1

Number of test clients:       1

Number of hits:               1543
Requests per Second:          51.43

Socket Statistics
--------------------------------------------------------------------------------
Socket Connects:              1543
Total Bytes Sent (in KB):     384.24
Bytes Sent Rate (in KB/s):    12.81
Total Bytes Recv (in KB):     1520.92
Bytes Recv Rate (in KB/s):    50.69

Number of threads:            100
Number of users:              200
Hit Count:                    1543
As you can see the server was able to process 2.5 times the number of pages using an include file instead of the database for loading configuration files when an accellerator is used. The reason for the increased ratio of pages processed is because the accellerator is caching the files in memory.

The mysql database server is using 8 and 16 meg buffers and caches and the only thing the database server was processing during these tests were the tests themselves.

As you can see using a database table to store and retrieve configuration files is very slow in comparison to using an include file. The above tests were also made retrieving 50 records and the SAME difference in speed was observed.

Also, the server load was about 45%-50% HIGHER when using the database to retrieve the configuration in these tests.
Post Reply