unsure of a good title for this

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

unsure of a good title for this

Post by s.dot »

I've been debating this one for quite a while.

Storing things in the database already 'configured' or letting your PHP script do the configuring? For instance, say you had an emoticon system. When processing form input, all instances of ; ) would be turned into <img src="emoticons/wink.gif" alt="wink">.

You could str_replace() this when processing it and store it in the database as an IMG tag. However there are some cons to this. 1) It takes up more database space. 2) Say you wanted to change the alt="wink" to alt="big wink"... that would require you to edit the database.

However, if you just store it as ; ) in the database and use your PHP script to str_replace() (or whatever other method) this will take up more CPU usage.. but would allow for greater (and easier) manipulation of the input and less database space.

The emoticon was a simple example, but when doing BB tags or other things, it can get quite complex.

Discuss?
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Post by shiznatix »

wow thats a good question, never thought of that.

What I have always done is just str_replace() it and put that into the database instead of doing the str_replace() every time its called...BUT! The other way you mentioned would probably be better. Unless you are dealing with 5000 hits a hour and a extreamly large amount of strings to be str_replaced every page call then I could not imagine that it would do that much, str_replace is not very resource hungry at all.

Just my thoughts.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

CPU time is more precious than storage space in my opinion .. I parse the data as it goes into the database. What's more, I store the original as it was entered into the form in one column, and the parsed "display" version in a second column. That way if the user wants to edit it later I don't have to 'unparse' it to put it into the form.
User avatar
Buddha443556
Forum Regular
Posts: 873
Joined: Fri Mar 19, 2004 1:51 pm

Post by Buddha443556 »

I parse it when it comes out. However, my bbcode/emoticons process is simply a bunch of str_replace with no regard for HTML standards and therefore very fast. There one more reason that's not been mentioned: My layout and bbcode/emoticon are related to the theme/layout so they can change with the layout. I'm not sure how you would handle a layout change if you pre-process (besides the CSS I mean)?
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

As already mentionned, (as almost always) you end up with a dilemma between the following:

- use cpu and calculate stuff again and again
- use datastore (memory, file, ...) and do it only once

Anyway, i would suggest the following: Create an extra content table that contains the "manipulated" text. This way, for simple views you can fetch this. For manipulations you use the original (without phpbb replacements etc...)
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

For your example I'd probably just parse it as it comes out of the DB.... that's not too intensive an example.

However, things like BBCode that use tokenizing and recursion... I'd definitely do it as it goes into the DB to save my poor CPU/Memory. Take my JavaScript beautifier for example.... that takes a little amount of time to process as well as making the server work.... so I convert once and collect the markup.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

seeing as my server is starting to generate a lot of requests and my server load is getting hammered, im starting to switch it all to storing it in the database already formatted =) i was just curious what other people did
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
GRemm
Forum Newbie
Posts: 17
Joined: Fri Jul 08, 2005 3:37 pm
Location: California
Contact:

Post by GRemm »

The alternative isn't really that bad as far as cpu is concerned.

Use php's output buffering to grab the output of the whole page and pass the buffered page through some sort of intercepting filter (both the pattern and the general idea).

A more important concern is what happens when someone actually wants to use a character string that gets translated into a smiley. We have all seen forums where some poor kids code ends up filled with sad faces or stupid img bbcode tags. Yuck.

Keep your tags more complicated and exclusive than the :) and ;=] sort of things. This is why bbcode seems to work so well. They have an specific tag structure designed not to interfere with the basics of filling in a form field.
User avatar
dbevfat
Forum Contributor
Posts: 126
Joined: Tue Jun 28, 2005 2:47 pm
Location: Ljubljana, Slovenia

Post by dbevfat »

IMO converting only before storing to DB has some drawbacks. Basically, because the data is not in it's "raw" format anymore, you can't have:

1. "don't show emoticons" option (I use it everywhere),
2. changes of parser engine (old entries will be in old format),
3. theme switching (different paths for emoticon images).

I believe these drawbacks alone (there must be others) are too big to even consider this option. A solution would be (as suggested) to hold both the original and the parsed text, but this only really addresses point 2.

Regards
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

dbevfat wrote:IMO converting only before storing to DB has some drawbacks. Basically, because the data is not in it's "raw" format anymore, you can't have:

1. "don't show emoticons" option (I use it everywhere),
2. changes of parser engine (old entries will be in old format),
3. theme switching (different paths for emoticon images).

I believe these drawbacks alone (there must be others) are too big to even consider this option. A solution would be (as suggested) to hold both the original and the parsed text, but this only really addresses point 2.

Regards
It does cause side-issues yes. You just need to decide if those outweigh the bonusses. If you have the space you could actually store both versions in case you ever need to change the data -- or even just store the `diff' from the two. Some of the phpBB mods have caused us issues in the past after making updates you get strange things like this all over:

Code: Select all

class foo
{
    function foo()
    {
        $this->isBroken();
    }
}
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Post by Maugrim_The_Reaper »

Each strategy is going to have costs associated, so you either choose between flexibility and speed, or compromise on both to limit the cost. Since scrotaye's issue is primarily server processing load - store it after processing. If you really need flexibility (hard to discount with some applications) store the original raw form also. I thinks that's the most adaptable method - assuming you don't also have an issue with database size! :)
GRemm
Forum Newbie
Posts: 17
Joined: Fri Jul 08, 2005 3:37 pm
Location: California
Contact:

Post by GRemm »

What is going to take longer in the end?

Process form input -> store raw input -> parse out bbcode -> store changed input -> redirect to confirmation / whatever -> query db and display output.

Or..

Process form input -> store raw input -> redirect to confirmation / whatever -> query db -> parse bbcode and display output.


From my perspective the parsing at display time option actually has one less storage query and one fewer steps to render the output.

A good question to pose to the experts here.. what takes more cpu load / resources / time
regexp / str_replacing buffered output or regex / str_replacing and storing an entry twice at edit time?

From my limited tests I see the output buffering taking more memory and only a small spike in cpu on one process vs the medium amount of memory and multiple threads spiking when the db gets hit twice.
The output buffering method is faster with my tests as well, but by just a small amount.
Post Reply