Large scale website - which concept/design to use?

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

y_itay
Forum Newbie
Posts: 2
Joined: Thu Apr 09, 2009 2:20 pm

Large scale website - which concept/design to use?

Post by y_itay »

I'm writting PHP for almost 3 years and now I decided I want to create some new site which will suppose to handle a lot of traffic.
The problem I'm not sure if to use some framework like Yii Framework which has a really good preformance or to use smarty + native php with the concept of MVC.
What do you think guys?

Another question is about sessions, since I never had a lot of traffic I never knew how to manage session for large scale websites.
How exactly should I plan the session managing? to save in the database? what's the right concept for doing that?

My last question is about cache, how the cache managing is done for large websites like facebook, myspace? do they use the database cache?

Thanks! 8)
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Re: Large scale website - which concept/design to use?

Post by pickle »

I've never used a framework, but I've stayed away from them because I don't think they're as efficient (as in the number of clock cycles to complete a particular task) as native code.

I wouldn't use Smarty as there are other, more efficient libraries out there. TemplateLite is a fork of Smarty which is much quicker. Lately I've gotten into Savant - which doesn't use a secondary template language, just native PHP code.

It depends on how long you want to keep session data. If it's just as long as the user is currently on the site, then $_SESSION should work fine. Otherwise, cookies and a database is pretty much necessary.

Facebook has published information on particular aspects of their site - how they store images, etc. They use a modified version of memcache, so pages don't need to be interpreted for each request. Beyond that I'm not sure.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
y_itay
Forum Newbie
Posts: 2
Joined: Thu Apr 09, 2009 2:20 pm

Re: Large scale website - which concept/design to use?

Post by y_itay »

Thanks for your quick response.

I can't really use $_SESSION, because this site is going to be distributed on over more then one server and if I use dns routing I can't manage the session on the server itself, or there is a better way to create clusters for ths site?

memcache sounds like a good solution, is it cache everything on the RAM? or it uses disk space also?
User avatar
kaisellgren
DevNet Resident
Posts: 1675
Joined: Sat Jan 07, 2006 5:52 am
Location: Lahti, Finland.

Re: Large scale website - which concept/design to use?

Post by kaisellgren »

As what comes to templating languages, one could compile templates into native code in order to avoid any overhead.

Database caching is not worth it unless you have lots of memory reserved for your DBMS and query caching enabled. Usually file based caching works more efficient. I also recommend having a look at in-memory caches. APC, for instance.

Database connections are pricy, make sure you open as less connections as possible (or even don't open any if possible). Make sure you are using Views and profile your queries as well as your whole application.

If you are using the built-in session system, it becomes very messy to make it to work with multiple servers. Memcached is great in that aspect. It allows you to spread data over multiple servers. If you are using just one server, you could store session data even in the RAM directly. On my developer PC it is around 1100 times faster to fetch session data from the RAM than from a database. Drawbacks to that scenario are pretty obvious: reboots erase session data and too much session data may overload your allocated RAM.

I am not exactly sure about specific sites like MySpace or Facebook, but in very big websites the load can easily be balanced. For instance, when you try to load site.com/showimage.php?id=234234, instead of letting the "main" server to upload the image data to the client, you output HTTP redirect headers to another server of yours that outputs the image data. You could even improvise and keep log of current CPU usage in all servers and then based on those details decide which server will handle the load.

Just my two cents.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Large scale website - which concept/design to use?

Post by Christopher »

y_itay wrote:I can't really use $_SESSION, because this site is going to be distributed on over more then one server and if I use dns routing I can't manage the session on the server itself, or there is a better way to create clusters for ths site?
You can easily use a central database or memcache server to hold all session data. There are articles about doing this all over the internet and examples in the manual. That is the standard way to distribute an application across multiple servers. You application should not have to care about where the session data resides if you configure things properly.
(#10850)
wei
Forum Contributor
Posts: 140
Joined: Wed Jul 12, 2006 12:18 am

Re: Large scale website - which concept/design to use?

Post by wei »

as already noted above memcache is not a persistent storage, it is indeed a cache. Session data can be simply stored in a distributed key-value store (aka, distributed hash table), such as http://project-voldemort.com/ that can handle about 10-20k requests per second per server, main bottle neck is probably disk speeds. If you need this scale of operations you will face other problems before hand.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Large scale website - which concept/design to use?

Post by Christopher »

wei wrote:as already noted above memcache is not a persistent storage, it is indeed a cache.
How is a cache not persistent storage -- especially in the case of session data?
wei wrote:Session data can be simply stored in a distributed key-value store (aka, distributed hash table), such as http://project-voldemort.com/ that can handle about 10-20k requests per second per server, main bottle neck is probably disk speeds. If you need this scale of operations you will face other problems before hand.
memcache is memory based so disk speed is not a bottle neck -- that is the point of it.
(#10850)
User avatar
kaisellgren
DevNet Resident
Posts: 1675
Joined: Sat Jan 07, 2006 5:52 am
Location: Lahti, Finland.

Re: Large scale website - which concept/design to use?

Post by kaisellgren »

Yup. The worst thing of Memcached's performance is the need for TCP/IP. Forget disc speeds, they are irrelevant...
wei
Forum Contributor
Posts: 140
Joined: Wed Jul 12, 2006 12:18 am

Re: Large scale website - which concept/design to use?

Post by wei »

memcache data is stored in memory, it uses least recently used (LRU) scheme to remove blocks when its storage is full, thus like most cache implementations it is designed as a volatile storage system. Depending on the application, session data is probably something that you do not want to disappear for some duration, i.e. you may want the session to persist for a long period. Also consider the case when a machine fails. Component failure rate is high when there are lots of servers. Replication strategies can be used to afford fault tolerance, something that memcache does not provide itself.

my comment regarding disk speed was related to project voldermort's that provides persistence via disk storage.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Large scale website - which concept/design to use?

Post by Christopher »

wei wrote:memcache data is stored in memory, it uses least recently used (LRU) scheme to remove blocks when its storage is full, thus like most cache implementations it is designed as a volatile storage system. Depending on the application, session data is probably something that you do not want to disappear for some duration, i.e. you may want the session to persist for a long period.
It really depends on how much memory the memcache server has as to how long it can go before being full. And since the removal of old session data can be controlled with PHP, you can tune how data is freed. I am assuming that for large scale systems like this the price if memory is a small consideration.
wei wrote:Also consider the case when a machine posting.php?mode=quote&f=19&p=534440#fails. Component failure rate is high when there are lots of servers. Replication strategies can be used to afford fault tolerance, something that memcache does not provide itself.
If you want looking for fault tolerance in sessions then storing them in a replicated database cluster may be an option. But there is also the possibility that the session data could be lost if the data base slave fails given the short lived nature of many sessions and the latency of replication. And you would pay a performance price for that fault tolerance. You could also have redundancy with memcache servers so the system would continue to work even with a failure.
(#10850)
gregor171
Forum Newbie
Posts: 22
Joined: Thu Apr 16, 2009 5:09 pm
Location: Ljubljana, Slovenia

Re: Large scale website - which concept/design to use?

Post by gregor171 »

Ok I'm a Newbie but I worked in heavy load environment.
First u have to determine what is a heavy traffic for you and what are your resources (how many servers).
1. Using database is nice and simple, but can be costly. as said connections take time even if your query's are cached
2. We used file cache it's a simple solution but hard to control
3. We useed memcache for some parts of the system. It's a nice tool and can be put on to remote server. this is cool

About performance issues in general there are many blogs and forums. I found a few cool ones:
Zend/PHP Conference 2007 had something about high performance application http://devzone.zend.com/podcasts/zendconsessions
http://www.bostonphp.org had a session (ok found it)Podcast - Designing, deploying and operating high-traffic PHP
http://www.bostonphp.org/content/view/114/9/

I tried Drupal the other day for my personal page, since the tests show that it's fast, but there are some issues that I don't like. But it gave me fast results. Like setting a new site in a day. Using a reliable CMS is a good way to a Rapid development, but it has it's drawbacks.

What I'm thinking of (correct me if I'm wrong). But this would be useful if we wouldn't have memcache or enough RAM to work with.
if you use a serialize method and cache content as class that extends class cacheable and this one holds private value $last_datetime, we would have a small overhead every time, the cache is obsolete and there fore reload new content.

This cacheable objects would be stored either on filesystem or database (perhaps a dedicated one). But as you can see from those Podcasts, there is many ways to achieve high performance web site.
gregor171
Forum Newbie
Posts: 22
Joined: Thu Apr 16, 2009 5:09 pm
Location: Ljubljana, Slovenia

Re: Large scale website - which concept/design to use?

Post by gregor171 »

ps: for heavy traffic site have in mind that mySql official supports covers up to 5.000 q/sec. but I've seen that it can handle 50k q/s.
Heavy traffic is what we all want ;-)
wei
Forum Contributor
Posts: 140
Joined: Wed Jul 12, 2006 12:18 am

Re: Large scale website - which concept/design to use?

Post by wei »

arborint wrote:It really depends on how much memory the memcache server has as to how long it can go before being full. And since the removal of old session data can be controlled with PHP, you can tune how data is freed. I am assuming that for large scale systems like this the price if memory is a small consideration.
That may become very messy considering multiple servers and configurations, probably not something to consider doing in php.
arborint wrote:You could also have redundancy with memcache servers so the system would continue to work even with a failure.
memcache is very basic, no replication (required for redundancy) by default.

Memcache is good for caching data, not for storage. There is also a memcache version that uses UDP instead of TCP that reduces some over heads.

at 50k queries / sec, how many qps per node?

Also http://www.slideshare.net/mattetti/couc ... -pr0n-star
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Large scale website - which concept/design to use?

Post by Christopher »

wei wrote:That may become very messy considering multiple servers and configurations, probably not something to consider doing in php.
Or it may be a clean solution -- either is a conjecture. The reality is the we know the sites like Facebook use memcache to scale.
wei wrote:Memcache is good for caching data, not for storage. There is also a memcache version that uses UDP instead of TCP that reduces some over heads.

at 50k queries / sec, how many qps per node?

Also http://www.slideshare.net/mattetti/couc ... -pr0n-star
I honestly don't think that any disk/DB based solution will perform like memcache.
(#10850)
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: Large scale website - which concept/design to use?

Post by Benjamin »

In my experience memcached works flawlessly, even for high traffic sites under a very heavy load.
Post Reply