Get file encoding

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Get file encoding

Post by Ollie Saunders »

Is it possible using bash or PHP to get the details of the character encoding of a bunch of files?
What resources are there for getting encoding information out of a file in either bash or PHP?
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

mb_detect_encoding() will work on the data inside the file.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

It's a tricky problem, since there's (somewhat of) a chicken egg problem: if the encoding is specified inside the file, how do you figure out what encoding to read it out by?

If you're dealing with XML or HTML, try to detect the <?xml or <meta tags for the encoding.

If you're dealing with plaintext, you outta luck. You can try using mb_detect_encoding() but 1) it isn't always right and 2) it's not always available.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

d11wtq wrote:mb_detect_encoding() will work on the data inside the file.
Thanks d11.
If you're dealing with plaintext, you outta luck.
Yeah I'm realising that now, not that it was particularly surprising.
You can try using mb_detect_encoding() but 1) it isn't always right and 2) it's not always available.
It seems to say ASCII for everything. But when you put a special char in the file it reports correctly; better than nothing.

Oh does anyone have a recursive version of glob by any chance?
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Well, you *should* be converting them to UTF-8, so if it reports ASCII, you can treat it like UTF-8 and be done with it.
Post Reply