Page 1 of 1

benefits of using encapsulation characters?

Posted: Thu Nov 17, 2005 2:09 pm
by Swede78
Someone tried to persuade me into using encapsulation characters in addition to delimiting each variable in an string that is going to be parsed. I've never used encapsulation characters before, and would like to know why this would be benefitial. As far as I can tell, it would just cause more work. The idea I suppose, is to make it less likely that you would separate the variables incorrectly. To me, adding a single character encapsulation to a single character delimited string of variables, is the same as have a three character delimiter.

|variable1|,|variable2|,|variable3| // delimiter is the "comma" and the encaps char is "pipe"

what's the difference in next example?

variable1|,|variable2|,|variable3 / delimiter is "pipe comma pipe"

In both instances, you have to separate them using the same three characters. And there's the same possibilty that that combination of characters resides in any of the variables. So, using encapsulation characters doesn't seem to be any safer. The only difference is that now you have to remove the encapsulation characters on the ends.

Just wondering if someone knows why using encapsulation characters would be benefitial. I can't get a solid explanation from this someone or anywhere on the net.


Thanks in advance for any information about this topic.

Posted: Thu Nov 17, 2005 6:19 pm
by RobertPaul
The only usefulness I see in that is if there's a possibility that the string data itself will have a comma in it... a problem solved by using said delimiters.

Posted: Thu Nov 17, 2005 9:23 pm
by josh
Maybe so you can explode on the delim instead of using regex? Ask the person who suggested it to you because there is multiple advantages to each method, use whichever works for you in your situation

Posted: Thu Nov 17, 2005 9:29 pm
by Ambush Commander
serialize()

Personally, I think the way it's implemented is genius. Instead of muddling through escaping mumbo-jumbo, you just store the length of the following string, and that's it. A third solution. And a really fast one too.

Posted: Fri Nov 18, 2005 9:07 am
by josh
That is true he could store his variables in array and then serialize it, but if he wants the string to be human editable (by humans other than the ones who know the inner workings of the serialize function), he will need a more "common seperated value" approach

Posted: Fri Nov 18, 2005 11:43 am
by Swede78
I have tried to ask this person... I don't think they really understand it themselves. They're not really sure - I think they just heard it from someone else. While searching for this earlier, I found that it was recommended by a credit card processor. It had to do with sending data back and fourth, and they recommend using encapsulation characters for that data. But, they don't describe the reason why. That's why I came here. Obviously, there is an important reason that they would recommend using it. I like to be on the up-and-up on these things. If I can make my code more effecient, safe, stable... then I want to know how... but, I also want to know why. This question is all for my curiousity's sake. I don't NEED to know this, just trying to learn about something new.

The only advantage I can see is that if you don't have control over what characters are allowed in the variables, than using the encapsulation chars would help lower the chance that you'd separate the variables incorrectly. Which I can see happening, if you're sharing data with another party. You don't necessarily have control over what type of data/chars they send. So, that makes sense.

But, as someone stated... I'll use what works for me. And, I don't think this would be useful unless I'm getting data that I don't have control over. If I am sure that these variables don't have the pipe character, than, I can use the pipe char as my delimiter without worrying about it.

jshpro2, you mention multiple advantages. Are there any other advantages for using the encapsulation characters?

Posted: Fri Nov 18, 2005 2:28 pm
by Ambush Commander
That is true he could store his variables in array and then serialize it, but if he wants the string to be human editable (by humans other than the ones who know the inner workings of the serialize function), he will need a more "common seperated value" approach
Serialize is quite human editable actually. Try serializing something and then echoing it.

OK, the main thing is ambiguity. Let's take this example for instance:

Code: Select all

$array = array('array','to','be','transferred');
$string = implode(',',$array);
echo $string; //array,to,be,tranferred
Now, this brings up obvious problems when a string that needs to be transferred requires a comma in it.

A quick fix would be to use a set of characters that nobody would ever think of using:

Code: Select all

$array = array('array','to','be','transferred');
$string = implode('<>',$array);
echo $string; //array<>to<>be<>tranferred
But there is always ambiguity.

Then, you consider escaping characters. This requires a bit more code logic:

Code: Select all

$array = array('ar"ray','to','be','transferred');
$string = '';
foreach($array as $key => $value) {
  if (!$key) $string .= ',';
  $string .= str_replace('\\','\\\\'$value);
  $string .= str_replace(',','\\,'$value);
}
Then, when we parse the value, we have to slowly inch forward the array, whenever we look for the closing comma, we have to make sure that there is an even number of parantheses surrounding it (for a related solution, see: viewtopic.php?t=36790 for various implementations)

These are all fine and dandy for serializing the value, but a bit harder to parse.

The finally step is to give the lengths of the strings, and remove the dependency on escaping characters. This makes the parser very fast, and only sacrifices readability slightly.
The only advantage I can see is that if you don't have control over what characters are allowed in the variables, than using the encapsulation chars would help lower the chance that you'd separate the variables incorrectly. Which I can see happening, if you're sharing data with another party. You don't necessarily have control over what type of data/chars they send. So, that makes sense.

But, as someone stated... I'll use what works for me. And, I don't think this would be useful unless I'm getting data that I don't have control over. If I am sure that these variables don't have the pipe character, than, I can use the pipe char as my delimiter without worrying about it.
When you put data into a string, you've create a document format for it. This format can be as simple or as complex as you want: it's all about what you need.

Posted: Tue Nov 22, 2005 9:37 am
by Swede78
Thanks for the explanation! Enjoyed the link to you and Feyd's little coding competition. Very interesting.