Page 1 of 1

scandir inserting character in filename??

Posted: Thu Sep 08, 2011 6:13 pm
by delpi767
I am in the process of cleaning up the names of all of my mp3 files.

The format I desire is artist»title //that's an ascii 187 as a delimiter
when I read the filename in windows it looks like Andrew Sisters»Alexander's Ragtime Band.mp3 //exactly what I want

Using Ubuntu GUI it is the same /mnt/mirror2/music/mp3/Artists/AAAAA/Andrew Sisters/Andrew Sisters»Alexander's Ragtime Band.mp3

Using terminal I see Andrew Sisters┬╗Alexander's Ragtime Band.mp3
And scandir also reports Andrew Sisters»Alexander's Ragtime Band.mp3

obviously, Ubuntu is storing the filename as a multibyte character string and I know very little about them.

I've tried a couple of php routines to convert the strings but have been unsuccessful. Any suggestions would be appreciated.

Mac

Re: scandir inserting character in filename??

Posted: Fri Sep 09, 2011 12:14 am
by ok
Please post your code

Re: scandir inserting character in filename??

Posted: Fri Sep 09, 2011 10:09 pm
by delpi767
Just a simple scandir wrapped in a function

Code: Select all

function showfiles ($dir){
    $file = scandir ($dir);
    echo "<table id='files'>";
      
      foreach ($file as $value){
         if (substr ($value,0,1) !="."){
            echo "<tr>";
            echo "<td ><a href = 'text.php?N=$value'>$value</a></td>";
            echo "</tr>";
         }
      }
     echo "</table>"; 
}

Re: scandir inserting character in filename??

Posted: Sat Sep 10, 2011 4:34 am
by greip
The problem you experience has to do with the "terminal" not supporting the character encoding used in the file names.

In the character encoding ISO-8859-1 a single byte with decimal value 187 is the "»" character.

In the character encoding UTF-8 two bytes with decimal value 194 and 187 is the "»" character.

In the character encoding probably supported by your terminal, Microsoft Code page 437 (http://en.wikipedia.org/wiki/Code_page_437), the bytes with value 194 and 187 are the "┬" and "╗" characters.

What all of this boils down to is that you need to make sure that the "text" you view is encoded in a character set supported by the device (terminal, web browser, file explorer, etc) you use to view the text. For web pages you can declare the character set using the Content-Type HTTP header and the Content-Type meta-element. If you don't declare anything the web browser will assume ISO-8859-1.

I hope this at least help you understand what's going on.