Strip Tags from a Website
Posted: Tue Jan 03, 2012 10:31 pm
I'm currently creating a crawler that needs to process the sentences of many website it visits. For me to do this I need to first of all remove the tags of a website which is the bit I'm stuck on perfecting.
This is an example website I have to deal with:
http://pastie.org/3122620
I have used strip_tags() but that sometimes doesn't get rid of JavaScript and other things.
If you can remove all HTML, CSS and JavaScript from that webpage and show me how to do I would be very greatfull, thanks!
This is an example website I have to deal with:
http://pastie.org/3122620
I have used strip_tags() but that sometimes doesn't get rid of JavaScript and other things.
If you can remove all HTML, CSS and JavaScript from that webpage and show me how to do I would be very greatfull, thanks!