Page 1 of 2
Get text from a web site
Posted: Sun Aug 07, 2005 4:09 am
by spaceman33
Hi there.
I am trying to extract text from a web site, but it's not working out too well

.
I have tried Google, and all the responses seem to be from SQL database, and searcng through here I found Explode, which doesn't do what I need. I've also tried substr, but that just returns what I'm looking for.
In Excel Visual Basic, I would do something like this:
Code: Select all
Position = InStr(4600, a, "You are on rank ", vbTextCompare)
position2 = InStr(4600, a, "of ", vbTextCompare)
position3 = InStr(4600, a, " (Page", vbTextCompare)
rank = Mid$(a, Position + 15, position2 - (Position + 15))
howmany = Mid$(a, (position2 + 3), position3 - (position2 + 3))
So far I have tried combinations of:
Code: Select all
<body>
<?php
$page=file_get_contents('http://aaotracker.4players.de/usertracker.php?userid=61239');
echo $page;
$pos = strrpos($page, "Username:");
echo $pos;
$needle='Username:';
echo substr('Username:',6,5);
$pos=strlen($page) - (strpos(strrev("Username"), strrev($page)) + strlen($page));
echo $pos;
echo "HH";
echo count($page);
echo "JJ";
$pieces = explode("[hborea]", $page);
echo $pieces[1];
echo $pieces[2];
echo $pieces[3];
echo $pieces[4];
?>
</body>
But with limited success so far.
Basically I want to find some text on the web page, and then read the next few characters until a return carriage, or space is found.
If anybody could give me some pointers it would be much appreciated.
Cheers.
Posted: Sun Aug 07, 2005 6:09 am
by pilau
I'm not such a pro but try a While loop.
Posted: Sun Aug 07, 2005 7:33 am
by feyd
regex works great for this..
join(file()) is odd... why not use file_get_contents() ?
Posted: Sun Aug 07, 2005 9:00 am
by spaceman33
feyd wrote:regex works great for this..
join(file()) is odd... why not use file_get_contents() ?
Before posting here I tried Google etc first, and this is one thing I found that
seemed to do the job, so I've been using it trying to get things to work around that command.
It loaded the web site into a variable which is what I thought I needed to do, so it looked good.
Posted: Sun Aug 07, 2005 9:08 am
by feyd
file_get_contents() will do the same thing as join(file()) however, leaving out some side-steps..
Posted: Sun Aug 07, 2005 10:29 am
by spaceman33
feyd wrote:file_get_contents() will do the same thing as join(file()) however, leaving out some side-steps..
Code amended as per your suggestion.
Thanks.
So I suppose what would get me going is a PHP version of MID$.
If I knew that command I could then look for a string, and then get the next few characters, as required.
Posted: Mon Sep 12, 2005 12:45 am
by spaceman33
Not bumping this post, because there's no reason to, but I have one question.
Is there an alternative to file_get_contents?
It would be better for me if I could grab just what is seen on screen, rather than all the html as well (which file_get_contents does).
I keep getting colour codes (I think), and I guess the html code could change which would make my calculations for grabbing text a bit off.
Cheers.
Posted: Mon Sep 12, 2005 12:48 am
by feyd
look into
strip_tags(), however be aware it's not a very smart function. It may not make a good "plain text" version of the page either. A smarter strip_tags is linked to from the Useful Posts thread (link in my signature)
Posted: Mon Sep 12, 2005 3:07 am
by phpdevuk
you could try looking at the snoopy class on sourceforge, that has a few methods for extracting different information from a webpage.
Posted: Mon Sep 12, 2005 8:09 am
by m3mn0n
Just what's being seen by the browser is impossible because the browser is compiling the source and displaying it client-side, and PHP only works server-side so all you can do is grab the source and compile it to suit your needs.
Posted: Mon Sep 12, 2005 1:24 pm
by spaceman33
strip_tags seems to work fine for what I want it for, thanks.
I'm using it in its native form and it does the job great, it seems.
Cheers.
Posted: Fri Sep 16, 2005 2:56 pm
by spaceman33
Quick question for ya
I am getting quite far with this little project of mine, but for some reason a little something isn't working. I am trying to extract some data, but I am finding that if there is more than one occurrence of a string, the last instance is being returned.
No good, I need the first instance to be returned to do my calculations.
My code is:
Code: Select all
$page=file_get_contents('http://aaotracker.4players.de/clanprofile.php?clanid=7523');
$page=strip_tags($page);
//echo $page;
for ($i = 0; $i < strlen($page); $i++) {
if (substr($page, $i, == 'Spaceman') {
$first = $i;
}
}
next;
//////////////
$enemykills = substr($page, $first + 16, 5);
This should return 44427, but returns , 06: or something.
Any further assistance appreciated

Posted: Sat Sep 17, 2005 7:19 am
by raghavan20
but I am finding that if there is more than one occurrence of a string, the last instance is being returned.
No good, I need the first instance to be returned to do my calculations.
Can you be a little clear?
It always better if you give an example of wot you want to happen.
say: you file contents:
hi iam spaceman; spaceman is working with a php script; spaceman needs help
now wot do you want to do with this file.
Posted: Sat Sep 17, 2005 1:02 pm
by spaceman33
I tried to be clear, but failed!
On a web page I load it into a variable, and run that strip command to remove all HTML tags. This works perfectly as it leaves only the data as seen on the web page - perfect.
I want to search for the word Spaceman (and then get some data x characters along, 5 characters along from that occurrence), again I can do that no problem.
Thing is, with the code I am using it seems to use the last occurrence of Spaceman that is found, rather than the first one on the web page, which I need.
eg..
Data blah Spaceman blah blah
data
data
blah
Spaceman
blah
Spaceman
I want to return where the occurrence of the red Spaceman is in the variable $Page (as in the example), but the example returns the location of the blue Spaceman, the last one.
Posted: Sat Sep 17, 2005 1:35 pm
by feyd