Can I have some parsing help?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
eysikal
Forum Newbie
Posts: 13
Joined: Thu May 11, 2006 2:13 pm

Can I have some parsing help?

Post by eysikal »

Weirdan | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]


Hi all.  I coming from a Java background and I'm having a little trouble with parsing. 

I have an old data file that contains non-ASCII characters.  I need to parse these out and replace them with entities.

I also need to keep track of whether I am within a backslash or not.  I figure I can do this with a state variable that is either on or off.

My main problem is figuring out how to go through the file character by character.  I think I need to do it this way so that I can keep track of the backslashes.  

Here is what I have so far:

Code: Select all

<?php

$fp = fopen( "test.txt", "r+" );
if(!$fp)
{
    echo "Couldn't open the data file.";
    exit;
}
$contents = fread($fp, filesize("test.txt")); 
echo $contents;
fclose($fp);

?>
So I have the entire file in one string, now how can I go through it char by char?

Thanks


Weirdan | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Post by pickle »

You could use substr(), but I'd probably just treat the string as an array:

Code: Select all

<?PHP
$string = "this is the contents of the file blah blah blah...";
for($i = 0;$i < strlen($string);++$i)
{
  $curr_char = $string{$i};
  //do what you need to do here
}
?>
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
eysikal
Forum Newbie
Posts: 13
Joined: Thu May 11, 2006 2:13 pm

Post by eysikal »

Yeah, I was just using substr(), and it didn't seem to be the best way to go about it. I didn't realize I could treat it as an array like that. Thanks for the help.
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

pickle wrote:You could use substr(), but I'd probably just treat the string as an array:

Code: Select all

<?PHP
$string = "this is the contents of the file blah blah blah...";
for($i = 0;$i < strlen($string);++$i)
{
  $curr_char = $string{$i};
  //do what you need to do here
}
?>
That is one of the coolest things I've seen. I need to try that one of these days.
eysikal
Forum Newbie
Posts: 13
Joined: Thu May 11, 2006 2:13 pm

Post by eysikal »

Ok, so now I'm trying to parse out the stuff that is non-ASCII. How can I do this?

I was thinking I could use ord() and that it would just throw an exception of return a -1 or something when it reads a non-ASCII character.

Any ideas?
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Post by pickle »

I'm not too knowledgeable about non-ASCII stuff - you mean umlauts & the sort?
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
eysikal
Forum Newbie
Posts: 13
Joined: Thu May 11, 2006 2:13 pm

Post by eysikal »

Yes, that's part of it. We're trying to convert some old plain text data over and it has some non-ASCII characters that need to be replaced.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

eysikal wrote:Yes, that's part of it. We're trying to convert some old plain text data over and it has some non-ASCII characters that need to be replaced.
I'd go with your idea of using ord() byte-for-byte and remove anything that doesn't work ;)
Everah wrote:That is one of the coolest things I've seen. I need to try that one of these days.
Don't forget that PHP is written in C. Most of the thing you do just translate back to C stuff. In C, strings are just arrays of characters " char foo[50];" so this feels fairly natural anyway ;)
eysikal
Forum Newbie
Posts: 13
Joined: Thu May 11, 2006 2:13 pm

Post by eysikal »

d11wtq wrote:I'd go with your idea of using ord() byte-for-byte and remove anything that doesn't work ;)

How will I know if it doesn't work? How will ord() react when it hits a non-ASCII character?

Sorry, I'm not the most experienced of programmers.
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

d11wtq wrote:Don't forget that PHP is written in C. Most of the thing you do just translate back to C stuff. In C, strings are just arrays of characters " char foo[50];" so this feels fairly natural anyway ;)
Last time I did anything in C was in 1992 computer science in college. I don't remember a thing from that time at all. No, not even one little bit. I do recall, however, wishing that everything was as easy to pick up on as PHP. Maybe if Rasmus' little project was widely in use across universities I may have been better at it.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

eysikal wrote: How will I know if it doesn't work? How will ord() react when it hits a non-ASCII character?
ord would return ordinal number of given character. This will allow you to check if the character within allowed range(s). Here's a simple example:

Code: Select all

function isUppercaseASCII($string) {
   $length = strlen($string);
   for($i = 0; $i < $length; $i++) {
       $currChar = ord($string{$i});
       // ord('A') = 65
       // ord('Z') = 90
       if(($currChar < 65) || ($currChar > 90))
            return false;
   }
   return true;
}

var_dump(isUppercaseASCII('JHAGSJHGJH')); // true
var_dump(isUppercaseASCII('JHAG SJHGJH')); // false, there's a space
var_dump(isUppercaseASCII('JHaGSJHGJH')); // false, there's a lowercase letter
Post Reply