I am in the process of cleaning up the names of all of my mp3 files.
The format I desire is artist»title //that's an ascii 187 as a delimiter
when I read the filename in windows it looks like Andrew Sisters»Alexander's Ragtime Band.mp3 //exactly what I want
Using Ubuntu GUI it is the same /mnt/mirror2/music/mp3/Artists/AAAAA/Andrew Sisters/Andrew Sisters»Alexander's Ragtime Band.mp3
Using terminal I see Andrew Sisters┬╗Alexander's Ragtime Band.mp3
And scandir also reports Andrew Sisters»Alexander's Ragtime Band.mp3
obviously, Ubuntu is storing the filename as a multibyte character string and I know very little about them.
I've tried a couple of php routines to convert the strings but have been unsuccessful. Any suggestions would be appreciated.
Mac
scandir inserting character in filename??
Moderator: General Moderators
Re: scandir inserting character in filename??
Please post your code
Re: scandir inserting character in filename??
Just a simple scandir wrapped in a function
Code: Select all
function showfiles ($dir){
$file = scandir ($dir);
echo "<table id='files'>";
foreach ($file as $value){
if (substr ($value,0,1) !="."){
echo "<tr>";
echo "<td ><a href = 'text.php?N=$value'>$value</a></td>";
echo "</tr>";
}
}
echo "</table>";
}Re: scandir inserting character in filename??
The problem you experience has to do with the "terminal" not supporting the character encoding used in the file names.
In the character encoding ISO-8859-1 a single byte with decimal value 187 is the "»" character.
In the character encoding UTF-8 two bytes with decimal value 194 and 187 is the "»" character.
In the character encoding probably supported by your terminal, Microsoft Code page 437 (http://en.wikipedia.org/wiki/Code_page_437), the bytes with value 194 and 187 are the "┬" and "╗" characters.
What all of this boils down to is that you need to make sure that the "text" you view is encoded in a character set supported by the device (terminal, web browser, file explorer, etc) you use to view the text. For web pages you can declare the character set using the Content-Type HTTP header and the Content-Type meta-element. If you don't declare anything the web browser will assume ISO-8859-1.
I hope this at least help you understand what's going on.
In the character encoding ISO-8859-1 a single byte with decimal value 187 is the "»" character.
In the character encoding UTF-8 two bytes with decimal value 194 and 187 is the "»" character.
In the character encoding probably supported by your terminal, Microsoft Code page 437 (http://en.wikipedia.org/wiki/Code_page_437), the bytes with value 194 and 187 are the "┬" and "╗" characters.
What all of this boils down to is that you need to make sure that the "text" you view is encoded in a character set supported by the device (terminal, web browser, file explorer, etc) you use to view the text. For web pages you can declare the character set using the Content-Type HTTP header and the Content-Type meta-element. If you don't declare anything the web browser will assume ISO-8859-1.
I hope this at least help you understand what's going on.