Ticket #127 (closed defect: fixed)

Opened 8 years ago

Last modified 6 years ago

convert special chars to HTML entities

Reported by: anonymous Owned by: gogo
Priority: high Milestone:
Component: Xinha Core Version: trunk
Severity: normal Keywords: special chars entities euro corrupt file
Cc: pixelsoul7@…

Description

THe spcial char button inserts not the code but the actual sign... which is not good code € instead of € etc..

Attachments

special_chars.txt (14.4 kB) - added by mharrisonline 8 years ago.
fix for ticket 127, special characters

Change History

Changed 8 years ago by niko

actually i think the browser converts it, go into html-code-view, enter € switch to wysiwyg and back again -> € is replaced by €.

..but why is this a problem?

Changed 8 years ago by gogo

  • status changed from new to closed
  • resolution set to wontfix

I agree, I don't think it's a problem. The only entities that are strictly required are & < > and " everything else can be just plain old characters. If anybody really wants to they can do post-processing to turn the "special" characters into numbered entities or such.

Closing as wont fix.

Changed 8 years ago by niko

see #130

Changed 8 years ago by anonymous

It does not happen only with € also with thing like ï so it's annoying :)

Changed 8 years ago by anonymous

  • status changed from closed to reopened
  • resolution deleted

Also the button for special chars gives the correct format for special chars.. only it inserts the wrong one....

It would be allot more work to catch everything in php and replace the characters then inserting the correct format at once... now it's doing the same thing twice but one of them does not work.

Changed 8 years ago by anonymous

Forgot to add that you can't just use the characters sine it will give a square as character...

Changed 8 years ago by gogo

  • status changed from reopened to closed
  • resolution set to wontfix

Ok, here's the deal. Looks like Gecko at least converts the entity into the appropriate representation of the character in the character set of the document being edited (typically utf-8). At least that's what my cursory look shows, it may be that when we get the HTML out we are inadvertadly converting it but I don't think that's the case.

Forgot to add that you can't just use the characters since it will give a square as character...

Only if you have an incorrect character set defined for the html or are using a font that doesn't have that character, that is not our concern, there seems to be a some confusion about character sets, so here is a wiki page about them.

So, to cut this short, closing as WONTFIX, however we probably should remove the entity display from the CharacterMap plugin.

If somebody wants to patch htmlarea.js to make it keep the entities then by all means (but it should be configurable, some prefer the raw characters) reopen.

Changed 8 years ago by mharrisonline

  • status changed from closed to reopened
  • resolution deleted

It is so easy to fix this once and for all if you do this. In the implementation of HTMLArea3 and then Xinha used in Jones Standard we saved the file htmlarea.js as UTF-8 and changed HTMLArea.htmlEncode to be the example below. Now, if it's a character shown below, even if you paste the actual character instead of inserting the entity, Xinha will convert it to the entity. Nothing else (hex codes, trying to catch in PHP, etc.) that we tried worked for more than the most common symbols. One of the biggest problems found in examples in the old htmlarea forum was people using case insensitive regular expressions.

This converts a lot more than what the insert character plugin for Xinha offers, and it works 100% of the time for the symbols below. To use this you must save htmlarea.js as UTF-8. To save filesize you could delete any symbols you don't expect that you will encounter, or use the compressed version at the bottom.

HTMLArea.htmlEncode = function(str) { 
// we don't need regexp for that, but.. so be it for now. 
str = str.replace(/&/g, "&"); 
str = str.replace(/</g, "<"); 
str = str.replace(/>/g, ">"); 
str = str.replace(/¡/g, "¡");
str = str.replace(/¢/g, "¢");
str = str.replace(/£/g, "£");
str = str.replace(/¤/g, "¤");
str = str.replace(/¥/g, "¥");
str = str.replace(/¦/g, "¦");
str = str.replace(/§/g, "§");
str = str.replace(/¨/g, "uml;");
str = str.replace(/©/g, "©");
str = str.replace(/ª/g, "ª");
str = str.replace(/«/g, "«");
str = str.replace(/¬/g, "¬");
str = str.replace(/®/g, "®");
str = str.replace(/¯/g, "¯");
str = str.replace(/°/g, "°");
str = str.replace(/±/g, "±");
str = str.replace(/²/g, "²");
str = str.replace(/³/g, "³");
str = str.replace(/´/g, "´");
str = str.replace(/µ/g, "µ");
str = str.replace(/¶/g, "¶");
str = str.replace(/·/g, "·");
str = str.replace(/¸/g, "¸");
str = str.replace(/¹/g, "¹");
str = str.replace(/º/g, "º");
str = str.replace(/»/g, "»");
str = str.replace(/¼/g, "¼");
str = str.replace(/½/g, "½");
str = str.replace(/¾/g, "¾");
str = str.replace(/¿/g, "¿");
str = str.replace(/À/g, "À");
str = str.replace(/Á/g, "Á");
str = str.replace(/Â/g, "Â");
str = str.replace(/Ã/g, "Ã");
str = str.replace(/Ä/g, "Ä");
str = str.replace(/Å/g, "Å");
str = str.replace(/Æ/g, "Æ");
str = str.replace(/Ç/g, "Ç");
str = str.replace(/È/g, "È");
str = str.replace(/É/g, "É");
str = str.replace(/Ê/g, "Ê");
str = str.replace(/Ë/g, "Ë");
str = str.replace(/Ì/g, "Ì");
str = str.replace(/Í/g, "Í");
str = str.replace(/Î/g, "Î");
str = str.replace(/Ï/g, "Ï");
str = str.replace(/Ð/g, "Ð");
str = str.replace(/Ñ/g, "Ñ");
str = str.replace(/Ò/g, "Ò");
str = str.replace(/Ó/g, "Ó");
str = str.replace(/Ô/g, "Ô");
str = str.replace(/Õ/g, "Õ");
str = str.replace(/Ö/g, "Ö");
str = str.replace(/×/g, "×");
str = str.replace(/Ø/g, "Ø");
str = str.replace(/Ù/g, "Ù");
str = str.replace(/Ú/g, "Ú");
str = str.replace(/Û/g, "Û");
str = str.replace(/Ü/g, "Ü");
str = str.replace(/Ý/g, "Ý");
str = str.replace(/Þ/g, "Þ");
str = str.replace(/ß/g, "ß");
str = str.replace(/à/g, "à");
str = str.replace(/á/g, "á");
str = str.replace(/â/g, "â");
str = str.replace(/ã/g, "ã");
str = str.replace(/ä/g, "ä");
str = str.replace(/å/g, "å");
str = str.replace(/æ/g, "æ");
str = str.replace(/ç/g, "ç");
str = str.replace(/è/g, "è");
str = str.replace(/é/g, "é");
str = str.replace(/ê/g, "ê");
str = str.replace(/ë/g, "ë");
str = str.replace(/ì/g, "ì");
str = str.replace(/í/g, "í");
str = str.replace(/î/g, "î");
str = str.replace(/ï/g, "ï");
str = str.replace(/ð/g, "ð");
str = str.replace(/ñ/g, "ñ");
str = str.replace(/ò/g, "ò");
str = str.replace(/ó/g, "ó");
str = str.replace(/ó/g, "ó");
str = str.replace(/ô/g, "ô");
str = str.replace(/õ/g, "õ");
str = str.replace(/ö/g, "ö");
str = str.replace(/÷/g, "÷");
str = str.replace(/ø/g, "ø");
str = str.replace(/ù/g, "ù");
str = str.replace(/ú/g, "ú");
str = str.replace(/û/g, "û");
str = str.replace(/ü/g, "ü");
str = str.replace(/ý/g, "ý");
str = str.replace(/þ/g, "þ");
str = str.replace(/ÿ/g, "ÿ");
str = str.replace(/ƒ/g, "ƒ");
str = str.replace(/Α/g, "Α");
str = str.replace(/Β/g, "Β");
str = str.replace(/Γ/g, "Γ");
str = str.replace(/Δ/g, "Δ");
str = str.replace(/Ε/g, "Ε");
str = str.replace(/Ζ/g, "Ζ");
str = str.replace(/Η/g, "Η");
str = str.replace(/Θ/g, "Θ");
str = str.replace(/Ι/g, "Ι");
str = str.replace(/Κ/g, "Κ");
str = str.replace(/Λ/g, "Λ");
str = str.replace(/Μ/g, "Μ");
str = str.replace(/Ν/g, "Ν");
str = str.replace(/Ξ/g, "Ξ");
str = str.replace(/Ο /g, "Ο");
str = str.replace(/Π/g, "Π");
str = str.replace(/Ρ/g, "Ρ");
str = str.replace(/Σ/g, "Σ");
str = str.replace(/Τ/g, "Τ");
str = str.replace(/Υ/g, "Υ");
str = str.replace(/Φ/g, "Φ");
str = str.replace(/Χ/g, "Χ");
str = str.replace(/Ψ/g, "Ψ");
str = str.replace(/Ω/g, "Ω");
str = str.replace(/α/g, "α");
str = str.replace(/β/g, "β");
str = str.replace(/γ/g, "γ");
str = str.replace(/δ/g, "δ");
str = str.replace(/ε/g, "ε");
str = str.replace(/ζ/g, "ζ");
str = str.replace(/η/g, "η");
str = str.replace(/θ/g, "θ");
str = str.replace(/ι/g, "ι");
str = str.replace(/κ/g, "κ");
str = str.replace(/λ/g, "λ");
str = str.replace(/μ/g, "μ");
str = str.replace(/ν/g, "ν");
str = str.replace(/ξ/g, "ξ");
str = str.replace(/ο/g, "ο");
str = str.replace(/π/g, "π");
str = str.replace(/ρ/g, "ρ");
str = str.replace(/ς/g, "ς");
str = str.replace(/σ/g, "σ");
str = str.replace(/τ/g, "τ");
str = str.replace(/υ/g, "υ");
str = str.replace(/φ/g, "φ");
str = str.replace(/ω/g, "ω");
str = str.replace(/•/g, "•");
str = str.replace(/…/g, "…");
str = str.replace(/′/g, "′");
str = str.replace(/″/g, "″");
str = str.replace(/‾/g, "‾");
str = str.replace(/⁄/g, "⁄");
str = str.replace(/™/g, "™");
str = str.replace(/←/g, "←");
str = str.replace(/↑/g, "↑");
str = str.replace(/→/g, "→");
str = str.replace(/↓/g, "↓");
str = str.replace(/↔/g, "↔");
str = str.replace(/⇒/g, "⇒");
str = str.replace(/∂/g, "∂");
str = str.replace(/∏/g, "∏");
str = str.replace(/∑/g, "∑");
str = str.replace(/−/g, "−");
str = str.replace(/√/g, "√");
str = str.replace(/∞/g, "∞");
str = str.replace(/∩/g, "∩");
str = str.replace(/∫/g, "∫");
str = str.replace(/≈/g, "≈");
str = str.replace(/≠/g, "≠");
str = str.replace(/≡/g, "≡");
str = str.replace(/≤/g, "≤");
str = str.replace(/≥/g, "≥");
str = str.replace(/◊/g, "◊");
str = str.replace(/♠/g, "♠");
str = str.replace(/♣/g, "♣");
str = str.replace(/♥/g, "♥");
str = str.replace(/♦/g, "♦");
str = str.replace(/Œ/g, "Œ");
str = str.replace(/œ/g, "œ");
str = str.replace(/Š/g, "Š");
str = str.replace(/š/g, "š");
str = str.replace(/Ÿ/g, "Ÿ");
str = str.replace(/ˆ/g, "ˆ");
str = str.replace(/˜/g, "˜");
str = str.replace(/–/g, "–");
str = str.replace(/—/g, "—");
str = str.replace(/‘/g, "‘");
str = str.replace(/’/g, "’");
str = str.replace(/‚/g, "‚");
str = str.replace(/“/g, "“");
str = str.replace(/”/g, "”");
str = str.replace(/„/g, "„");
str = str.replace(/†/g, "†");
str = str.replace(/‡/g, "‡");
str = str.replace(/‰/g, "‰");
str = str.replace(/‹/g, "‹");
str = str.replace(/›/g, "›");
str = str.replace(/€/g, "€");
	
	
	// \x22 means '"' -- we use hex reprezentation so that we don't disturb
	// JS compressors (well, at least mine fails.. ;)
	
	str = str.replace(/\x22/ig, """);
	str = str.replace(/\xA0/gi," ");
	str = str.replace(String.fromCharCode(0x2264), "≤"); 
	str = str.replace(String.fromCharCode(0x2265), "≥");

return str;
};

Compressed version:

HTMLArea.htmlEncode=function(str){str=str.replace(/&/g,"&");str=str.replace(/</g,"<");str=str.replace(/>/g,">");str=str.replace(/¡/g,"¡");str=str.replace(/¢/g,"¢");str=str.replace(/£/g,"£");str=str.replace(/¤/g,"¤");str=str.replace(/¥/g,"¥");str=str.replace(/¦/g,"¦");str=str.replace(/§/g,"§");str=str.replace(/¨/g,"uml;");str=str.replace(/©/g,"©");str=str.replace(/ª/g,"ª");str=str.replace(/«/g,"«");str=str.replace(/¬/g,"¬");str=str.replace(/®/g,"®");str=str.replace(/¯/g,"¯");str=str.replace(/°/g,"°");str=str.replace(/±/g,"±");str=str.replace(/²/g,"²");str=str.replace(/³/g,"³");str=str.replace(/´/g,"´");str=str.replace(/µ/g,"µ");str=str.replace(/¶/g,"¶");str=str.replace(/·/g,"·");str=str.replace(/¸/g,"¸");str=str.replace(/¹/g,"¹");str=str.replace(/º/g,"º");str=str.replace(/»/g,"»");str=str.replace(/¼/g,"¼");str=str.replace(/½/g,"½");str=str.replace(/¾/g,"¾");str=str.replace(/¿/g,"¿");str=str.replace(/À/g,"À");str=str.replace(/Á/g,"Á");str=str.replace(/Â/g,"Â");str=str.replace(/Ã/g,"Ã");str=str.replace(/Ä/g,"Ä");str=str.replace(/Å/g,"Å");str=str.replace(/Æ/g,"Æ");str=str.replace(/Ç/g,"Ç");str=str.replace(/È/g,"È");str=str.replace(/É/g,"É");str=str.replace(/Ê/g,"Ê");str=str.replace(/Ë/g,"Ë");str=str.replace(/Ì/g,"Ì");str=str.replace(/Í/g,"Í");str=str.replace(/Î/g,"Î");str=str.replace(/Ï/g,"Ï");str=str.replace(/Ð/g,"Ð");str=str.replace(/Ñ/g,"Ñ");str=str.replace(/Ò/g,"Ò");str=str.replace(/Ó/g,"Ó");str=str.replace(/Ô/g,"Ô");str=str.replace(/Õ/g,"Õ");str=str.replace(/Ö/g,"Ö");str=str.replace(/×/g,"×");str=str.replace(/Ø/g,"Ø");str=str.replace(/Ù/g,"Ù");str=str.replace(/Ú/g,"Ú");str=str.replace(/Û/g,"Û");str=str.replace(/Ü/g,"Ü");str=str.replace(/Ý/g,"Ý");str=str.replace(/Þ/g,"Þ");str=str.replace(/ß/g,"ß");str=str.replace(/à/g,"à");str=str.replace(/á/g,"á");str=str.replace(/â/g,"â");str=str.replace(/ã/g,"ã");str=str.replace(/ä/g,"ä");str=str.replace(/å/g,"å");str=str.replace(/æ/g,"æ");str=str.replace(/ç/g,"ç");str=str.replace(/è/g,"è");str=str.replace(/é/g,"é");str=str.replace(/ê/g,"ê");str=str.replace(/ë/g,"ë");str=str.replace(/ì/g,"ì");str=str.replace(/í/g,"í");str=str.replace(/î/g,"î");str=str.replace(/ï/g,"ï");str=str.replace(/ð/g,"ð");str=str.replace(/ñ/g,"ñ");str=str.replace(/ò/g,"ò");str=str.replace(/ó/g,"ó");str=str.replace(/ó/g,"ó");str=str.replace(/ô/g,"ô");str=str.replace(/õ/g,"õ");str=str.replace(/ö/g,"ö");str=str.replace(/÷/g,"÷");str=str.replace(/ø/g,"ø");str=str.replace(/ù/g,"ù");str=str.replace(/ú/g,"ú");str=str.replace(/û/g,"û");str=str.replace(/ü/g,"ü");str=str.replace(/ý/g,"ý");str=str.replace(/þ/g,"þ");str=str.replace(/ÿ/g,"ÿ");str=str.replace(/ƒ/g,"ƒ");str=str.replace(/Α/g,"Α");str=str.replace(/Β/g,"Β");str=str.replace(/Γ/g,"Γ");str=str.replace(/Δ/g,"Δ");str=str.replace(/Ε/g,"Ε");str=str.replace(/Ζ/g,"Ζ");str=str.replace(/Η/g,"Η");str=str.replace(/Θ/g,"Θ");str=str.replace(/Ι/g,"Ι");str=str.replace(/Κ/g,"Κ");str=str.replace(/Λ/g,"Λ");str=str.replace(/Μ/g,"Μ");str=str.replace(/Ν/g,"Ν");str=str.replace(/Ξ/g,"Ξ");str=str.replace(/Ο /g,"Ο");str=str.replace(/Π/g,"Π");str=str.replace(/Ρ/g,"Ρ");str=str.replace(/Σ/g,"Σ");str=str.replace(/Τ/g,"Τ");str=str.replace(/Υ/g,"Υ");str=str.replace(/Φ/g,"Φ");str=str.replace(/Χ/g,"Χ");str=str.replace(/Ψ/g,"Ψ");str=str.replace(/Ω/g,"Ω");str=str.replace(/α/g,"α");str=str.replace(/β/g,"β");str=str.replace(/γ/g,"γ");str=str.replace(/δ/g,"δ");str=str.replace(/ε/g,"ε");str=str.replace(/ζ/g,"ζ");str=str.replace(/η/g,"η");str=str.replace(/θ/g,"θ");str=str.replace(/ι/g,"ι");str=str.replace(/κ/g,"κ");str=str.replace(/λ/g,"λ");str=str.replace(/μ/g,"μ");str=str.replace(/ν/g,"ν");str=str.replace(/ξ/g,"ξ");str=str.replace(/ο/g,"ο");str=str.replace(/π/g,"π");str=str.replace(/ρ/g,"ρ");str=str.replace(/ς/g,"ς");str=str.replace(/σ/g,"σ");str=str.replace(/τ/g,"τ");str=str.replace(/υ/g,"υ");str=str.replace(/φ/g,"φ");str=str.replace(/ω/g,"ω");str=str.replace(/•/g,"•");str=str.replace(/…/g,"…");str=str.replace(/′/g,"′");str=str.replace(/″/g,"″");str=str.replace(/‾/g,"‾");str=str.replace(/⁄/g,"⁄");str=str.replace(/™/g,"™");str=str.replace(/←/g,"←");str=str.replace(/↑/g,"↑");str=str.replace(/→/g,"→");str=str.replace(/↓/g,"↓");str=str.replace(/↔/g,"↔");str=str.replace(/⇒/g,"⇒");str=str.replace(/∂/g,"∂");str=str.replace(/∏/g,"∏");str=str.replace(/∑/g,"∑");str=str.replace(/−/g,"−");str=str.replace(/√/g,"√");str=str.replace(/∞/g,"∞");str=str.replace(/∩/g,"∩");str=str.replace(/∫/g,"∫");str=str.replace(/≈/g,"≈");str=str.replace(/≠/g,"≠");str=str.replace(/≡/g,"≡");str=str.replace(/≤/g,"≤");str=str.replace(/≥/g,"≥");str=str.replace(/◊/g,"◊");str=str.replace(/♠/g,"♠");str=str.replace(/♣/g,"♣");str=str.replace(/♥/g,"♥");str=str.replace(/♦/g,"♦");str=str.replace(/Œ/g,"Œ");str=str.replace(/œ/g,"œ");str=str.replace(/Š/g,"Š");str=str.replace(/š/g,"š");str=str.replace(/Ÿ/g,"Ÿ");str=str.replace(/ˆ/g,"ˆ");str=str.replace(/˜/g,"˜");str=str.replace(/–/g,"–");str=str.replace(/—/g,"—");str=str.replace(/‘/g,"‘");str=str.replace(/’/g,"’");str=str.replace(/‚/g,"‚");str=str.replace(/“/g,"“");str=str.replace(/”/g,"”");str=str.replace(/„/g,"„");str=str.replace(/†/g,"†");str=str.replace(/‡/g,"‡");str=str.replace(/‰/g,"‰");str=str.replace(/‹/g,"‹");str=str.replace(/›/g,"›");str=str.replace(/€/g,"€");str=str.replace(/\x22/ig,""");str=str.replace(/\xA0/gi," ");str=str.replace(String.fromCharCode(0x2264),"≤");str=str.replace(String.fromCharCode(0x2265),"≥");return str;};

Changed 8 years ago by mharrisonline

fix for ticket 127, special characters

Changed 8 years ago by mharrisonline

I had to attach this in a text file, the code above is all wrong. After I submitted, the HTML entity in each line turned into the actual character. Looks like Xinha isn't the only thing with that problem. The second incidence of the character was originally the html entity.

Changed 8 years ago by mharrisonline

It seems that for at least a month the HTMLArea.htmlEncode function no longer works in Xinha, rendering this fix useless.

Changed 8 years ago by mharrisonline

Oops! Nevermind, it must have been my PC, I can't get it to not work now. It still works fine, sorry.

Changed 8 years ago by gogo

  • status changed from reopened to closed
  • resolution set to wontfix

I don't want to introduce the code supplied above (htmlEncode) for two reasons...

1. It's a lot of code, for an unnecessary purpose. There should be no reason to encode characters to html entities except for < > " and & 2. I don't want to require that javascript files in Xinha are UTF-8 (which would be necessary to include htmlEncode) because of multiple developer concerns. Although this can be worked around by not including the UTF-8 characters but using javascript unicode escapes instead. but again, lots of unnecessary code IMHO.

This is more suitable for a plugin to implement, it doesn't need to be in the core.

Changed 6 years ago by ray

  • keywords special chars entities euro added
  • status changed from closed to reopened
  • version changed from 2.0 to trunk
  • resolution deleted
  • summary changed from Special chars to convert special chars to HTML entities

Changed 6 years ago by ray

  • status changed from reopened to closed
  • resolution set to fixed

If you need the entities (e.g. to use the € in ISO-8859-1, a common case in europe), use the HtmlEntities? plugin (committed in Changeset [615])

Changed 6 years ago by mharrisonline

Ray, the new plugin is great! That's one less thing I have to customize everytime I upgrade to the latest Xinha version. Thanks!

Changed 6 years ago by znoob2@…

  • keywords corrupt file added
  • status changed from closed to reopened
  • resolution deleted

In Xinha version 0.92beta the Entities.js file is completely faulty. Something went wrong, definitely.

And furthermore, when the html contains a span that has at least a classname that contains AM (Equation plugin generates <span class="AM">, but class="GAME" would give the same result), this entire plugin (HtmlEntities) seems to be disabled. None of my characters in the html are converted to entities anymore. Maybe this is due to the mangled Entities.js file? Or is this purposely generated behaviour?

(Couldn't find Plugin_HtmlEntities as a component so I post thus under Xinha Core...)

Changed 6 years ago by ray

  • status changed from reopened to closed
  • resolution set to fixed

The curruption is caused by the js comressor. rev [823]: changed the compression script to take care of such cases.

And furthermore, when the html contains a span that has at least a classname that contains AM (Equation plugin generates <span class="AM">, but class="GAME" would give the same result), this entire plugin (HtmlEntities) seems to be disabled. None of my characters in the html are converted to entities anymore. Maybe this is due to the mangled Entities.js file? Or is this purposely generated behaviour?

could not reproduce this

Note: See TracTickets for help on using tickets.