Charset for Rich Text Format?

JavaScript and client side scripting.

Moderator: General Moderators

Post Reply
User avatar
Chalks
Forum Contributor
Posts: 447
Joined: Thu Jul 12, 2007 7:55 am
Location: Indiana

Charset for Rich Text Format?

Post by Chalks »

A client of mine, for some godforsaken reason, writes articles in rich text format. She wants to copy this text directly onto her webpage. This is all well and good, and I've got a system in place for her to put it into a database. Once it's in the database, the "display.php?article=name_here" file displays the text. Unfortunately, various characters are not being interpreted correctly. I've tried UTF-8 and ISO -8859-1 and neither work fully. What charset should I be using?


Also, related to this, I have a way for her to edit text that she has already loaded into the database. My edit.php page uses xmlhttprequest to pull the data out of the database, and uses "getElementByID('edit').innerHTML = data;" to insert the data into a <textarea>. This works just fine for about half of the articles in the database. The other half, only the first ~3000 characters are loaded, and the rest gets truncated. I can not for the life of me figure out why, because I've checked the database, and the database DOES have all the data in it. Help?!

Thanks.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: Charset for Rich Text Format?

Post by Benjamin »

User avatar
Chalks
Forum Contributor
Posts: 447
Joined: Thu Jul 12, 2007 7:55 am
Location: Indiana

Re: Charset for Rich Text Format?

Post by Chalks »

astions wrote:eeek! Have fun man: http://www.biblioscape.com/rtf15_spec.htm
D:

I read that document yesterday and was not happy about it. Surely _someone_ has had to deal with this before? I'd rather not tell my client that they need to change an ingrained habit (saving in rtf). I've been googling all over the place and can't find anything very helpful.
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Charset for Rich Text Format?

Post by Eran »

You might want to try an RTE such as TinyMCE or FCKeditor, those can handle text pasted from word and rtf documents.
The other half, only the first ~3000 characters are loaded, and the rest gets truncated. I can not for the life of me figure out why, because I've checked the database, and the database DOES have all the data in it. Help?!
My guess is that there is an html entity in the data that breaks up the markup. You should escape those using htmlentities or htmlspecialchars in your php script that fetches the data.
User avatar
Chalks
Forum Contributor
Posts: 447
Joined: Thu Jul 12, 2007 7:55 am
Location: Indiana

Re: Charset for Rich Text Format?

Post by Chalks »

pytrin wrote:You might want to try an RTE such as TinyMCE or FCKeditor, those can handle text pasted from word and rtf documents.
Thanks, I'll look into that.
pytrin wrote:My guess is that there is an html entity in the data that breaks up the markup. You should escape those using htmlentities or htmlspecialchars in your php script that fetches the data.
I'm testing with a big long article, no html entities or anything like that, just plain text. I _think_ the issue has something to do with xml childnodes being broken up into 4k chunks: http://www.webmasterworld.com/javascript/3388031.htm
I see pretty much that same behavior when I test in FF3, however when I test in IE, the larger blocks of text won't even be input into the textarea (despite being loaded into a javascript variable). Here's the relevant bit of code... maybe I'm doing something wrong?

Code: Select all

   reqsend.onreadystatechange = function()
    {
      if (reqsend.readyState == 4 && reqsend.status == 200)
      {
        title = reqsend.responseXML.getElementsByTagName("title");
        content = reqsend.responseXML.getElementsByTagName("content");
        day = reqsend.responseXML.getElementsByTagName("day");
        month = reqsend.responseXML.getElementsByTagName("month");
        year = reqsend.responseXML.getElementsByTagName("year");
alert("first alert");
        i = month[0].childNodes[0].nodeValue - 1;
alert("second alert");
        j = day[0].childNodes[0].nodeValue - 1;
 
        document.getElementById("editContent").innerHTML = content[0].childNodes[0].nodeValue;
 
        document.forms['edit'].title.value = title[0].childNodes[0].nodeValue;
        document.forms['edit'].oldtitle.value = title[0].childNodes[0].nodeValue;
        document.forms['edit'].month.options[i].selected = "selected";
        document.forms['edit'].day.options[j].selected = "selected";
        document.forms['edit'].year.value = year[0].childNodes[0].nodeValue;
      }
      else
        document.forms['edit'].title.value = "Loading...";
    };
and the xml that is requested by reqsend:

Code: Select all

 header("Content-Type: text/xml");
 
  echo "<article>";
  echo "<title>$data[0]</title>";
  echo "<content>$data[1]</content>";
  echo "<day>$day</day>";
  echo "<month>$month</month>";
  echo "<year>$year</year>";
  echo "</article>";
When I run this code in firefox, 4k of the article is added to the textarea (the rest is truncated), and both alerts are displayed. When I run it in IE, only the first alert is displayed, then it gives the error "object required". I'm still brand new to xmlhttprequest, and I really have no idea how to fix this beyond pinpointing where the error occurs.
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Charset for Rich Text Format?

Post by Eran »

How did you write your AJAX implementation? I highly recommend using a framework such as jQuery to handle those interactions in a cross-browser compatible way.
User avatar
Chalks
Forum Contributor
Posts: 447
Joined: Thu Jul 12, 2007 7:55 am
Location: Indiana

Re: Charset for Rich Text Format?

Post by Chalks »

I will use jquery soon, but I need to finish this project tonight, and I'm not going to go back and change everything to jquery right now. I probably will in the future though. Anyways, I've figured out exactly what's causing the problems... these three characters:





Man I hate those things. I can get them to display properly when I'm just outputting to a page, but when I try to use ajax to pull them from a database, and place it into a textarea, it chokes.

None of these solutions have worked on the client side:
stringVariable.replace(/“/, """);
stringVariable.replace(/\“/, """);

and none of these have worked on the server side:
str_replace(/“/, """, $stringVariable);
str_replace(/\“/, """, $stringVariable);


Neither PHP or Javascript recognizes that character, and so it doesn't get replaced in either location. I keep getting "�" in the textarea which prevents the ajax from working in IE, and looks bad in FF.
Post Reply