Opened 14 years ago

Closed 14 years ago

#536 closed defect (fixed)

wordclean: remove lang-attribute and class="c1"

Reported by: niko Owned by: gogo
Priority: normal Milestone: Version 1.0
Component: Xinha Core Version: trunk
Severity: normal Keywords: wordclean lang class="c1"
Cc:

Description

When i paste from word i get this html-code:

  <p class="MsoNormal"><span lang="DE-AT" class=
  "c1"><o:p><br /></o:p></span></p>
  <p class="MsoNormal"><span lang="DE-AT" class="c1">Lorem Ipsum<o:p></o:p></span></p>
  <p class="MsoNormal"><span lang="DE-AT" class="c1">Lorem Ipsum<o:p></o:p></span></p>

and the wordclean-functions of xinha won't remove the lang-attribute or the class="c1".

So, should the wordclean-functions remove this stuff? Imho it is not necessary for usual sites - although it is valid html and may be used by somebody.

should i commit this patch? or would it be better to write a specific plugin that does this job?

here a little patch that would do it:

--- htmlarea.js (Revision 360)
+++ htmlarea.js (Arbeitskopie)
@@ -2243,6 +2243,7 @@
       mso_class  : 0,
       mso_style  : 0,
       mso_xmlel  : 0,
+      mso_lang   : 0,
       orig_len   : this._doc.body.innerHTML.length,
       T          : (new Date()).getTime()
     },
@@ -2250,7 +2251,8 @@
       empty_tags : "Empty tags removed: ",
       mso_class  : "MSO class names removed: ",
       mso_style  : "MSO inline style removed: ",
-      mso_xmlel  : "MSO XML elements stripped: "
+      mso_xmlel  : "MSO XML elements stripped: ",
+      mso_lang   : "lang attributes removed: "
     };
   function showStats() {
     var txt = "HTMLArea word cleaner stats: \n\n";
@@ -2264,6 +2266,7 @@
   };
   function clearClass(node) {
     var newc = node.className.replace(/(^|\s)mso.*?(\s|$)/ig, ' ');
+    newc = newc.replace(/(^|\s)c[0-9]+(\s|$)/ig, ' ');
     if (newc != node.className) {
       node.className = newc;
       if (!/\S/.test(node.className)) {
@@ -2272,6 +2275,12 @@
       }
     }
   };
+  function clearLang(node) {
+    if (node.lang) {
+      node.removeAttribute("lang");
+      ++stats.mso_lang;
+    }
+  };
   function clearStyle(node) {
     var declarations = node.style.cssText.split(/\s*;\s*/);
     for (var i = declarations.length; --i >= 0;)
@@ -2306,6 +2315,7 @@
       return false;
     } else {
       clearClass(root);
+      clearLang(root);
       clearStyle(root);
       for (i = root.firstChild; i; i = next) {
         next = i.nextSibling;

Change History (2)

comment:1 Changed 14 years ago by gocher

I think if only the pasted text would be cleaned it's ok, but if the lang-attribute get removed from the whole text then it's not ok, because all set lang-attributes (plugin LangMarks?) get lost!

comment:2 Changed 14 years ago by niko

  • Resolution set to fixed
  • Status changed from new to closed

...wordclean does always clean the whole html...
so i think its not a good idea...

in changeset:364 i added a functionality to SuperClean? to remove those lang-attributes...

Note: See TracTickets for help on using tickets.