Opened 10 years ago

Closed 9 years ago

#961 closed defect (fixed)

baseHref not correctly treated when stripping

Reported by: gogo Owned by: gogo
Priority: normal Milestone: 0.96
Component: Xinha Core Version: trunk
Severity: major Keywords:
Cc:

Description

The <base> tag in html, and thus the baseHref config option are intended to take a full URL to the file. That is, something like http://www.example.com/example.html

However this is not correctly treated when stripping because the basename (filename) is not removed from the baseHref before the replacements are done, thus when using a full URL as a correct base href the stripping does not function.

I propose the following fix to fixRelativeLinks (XinhaCore?.js)

    if ( typeof this.config.baseHref != 'undefined' && this.config.baseHref !== null )
    {
      baseRe = new RegExp( "((href|src|background)=\")(" + this.config.baseHref.replace(/\/[^\/]*$/, '/').replace( Xinha.RE_Specials, '\\$1' ) + ")", 'g' );
    }

That is, we remove any basename (filename) portion of the baseHref, leaving the baseHref with a trailing slash.

If the baseHref already has a trailing slash, it wouldn't be touched, so one could set the baseHref to a directory in this manner (I think that some people do this, incorrectly).

Comments on this anybody?

Change History (11)

comment:1 Changed 10 years ago by ray

Doing things correctly is always best!

One problem I see though is that until rev [685] stripping the the server part from URLs only worked with config.baseHref set to the server name without the trailing slash. I suppose there are still a lot of installations that have it this way, which will break, wouldn't they? Of course this is only a matter of communication...

comment:2 Changed 10 years ago by htanaka

I agree according to a standard specification. It is good to avoid the confusion of the user in the future.

How about the following methods for a conventional user who is using the server name without trailing slash?

"http://www.will-be-removed.com".replace(/\/[^\/]*$/, '/');
// --> "http://"
"http://www.keep-alive.com".replace(/([^\/])\/[^\/]*$/, '$1/');
// --> "http://www.keep-alive.com"

comment:3 Changed 10 years ago by znoob2@…

I have a question about exactly that function.

The fourth line states: "var b=document.location.href;"

Why don't you first try this.config.baseHref? If present, it would always be more accurate, right? I changed my code to:

var b=this.config.baseHref||document.location.href;

but I would like feedback because I am unsure about global implications. I mean, it works way better in *my* situation, but maybe it is worse in others?

comment:4 Changed 10 years ago by guest

Could I urge people to be careful with the baseHref functionality? In real-world xhtml editing situations, this variable cannot easily be set to a full (file-based) URL, since the editor is likely to be used to edit a number of different documents. This is why most people, I suspect, would set it to a hostname.
Also, there is a subtle but in my case very important difference between having a trailing slash and not having one. If baseHref has a trailing slash, then links are stripped down to fully relative, so, assuming baseHref is set to the hostname http://www.mydomain.org/, then http://www.mydomain.org/mydir/myfile.html becomes mydir/myfile.html. These kinds of reference break most of the linking functions of Xinha if the URL for editing the above file is something like http://www.mydomain.org/cgi-bin/cm/mywiki.cgi?file=myfile&edit=true
However, if I omit the trailing slash in the above baseHref, then I get semi-absolute links relative to the root directory, so http://www.mydomain.org/mydir/myfile.html becomes /mydir/myfile.html. Hey presto, Xinha now produces links which will work no-matter where the edited file ends up but which would also be portable to another domain, if I were to change mydomain.org to mydomain.com (for example).
Thanks!
Geoffrey K

comment:5 Changed 9 years ago by gogo

changeset:928 committed the change I proposed.
Leaving open for the suggestion by znoob2 above.

comment:6 Changed 9 years ago by guest

This "fix" has broken my setup (as I warned above), and I suspect will break others' setups when it makes it into a release. I'm not sure why the change was made at all -- was anyone asking for or needing it? Who, in real-world content management, is going to set this variable differently for EACH page being edited in a content-management system? It makes much more sense to set it to a domain or a directory, and in any case, the function simply strips out anything that is not the domain or domain plus folder ending in /, so why bother at all? Why put information in only to throw it away immediately?

The new Regex is:

replace(/\/[^\/]*$/, '/')

Let's be clear about the inconsistent effects of this:

Case 1: baseHref = "http://www.mydomain.org"
Result: "http://" --> this is catastrophic!

Case 2: baseHref = "http://www.mydomain.org/mydir"
Result: "http://www.mydomain.org/" --> this *incorrectly* strips the directory information

Case 3: baseHref = "http://www.mydomain.org/myfile.html"
Result: "http://www.mydomain.org/" --> this works fine

Case 4: baseHref = "http://www.mydomain.org/mydir/"
Result: "http://www.mydomain.org/mydir/" --> this works fine

So, the outcome is that adding a final / to a domain or directory becomes obligatory or else really major errors and unexpected behaviour are introduced because information is incorrectly thrown away. Additionally, it no longer becomes possible to strip hyperlinks down to semi-absolute, i.e. <a href="/mydir/myfile.html">. Since the final / is now mandatory, we will always get <a href="mydir/myfile.html"> (when we don't get nonsense such as <a href="www.mydomain.org/mydir/">), which means that links are completely broken if our edit address is not the same as the file location (i.e. the way most content management systems work).

Can this "fix" be reverted, please? Geoffrey K

comment:7 Changed 9 years ago by guest

  • Severity changed from normal to major

comment:8 Changed 9 years ago by guest

Please see ticket:1130 for a proposed solution. Geoffrey K

comment:9 Changed 9 years ago by gogo

Your patch has been applied I see Geoff, I have not looked too hard at it but it seems... reasonable, however just to go back to your second case study above:

First: http://www.w3.org/TR/html401/struct/links.html#h-12.4

http://www.mydomain.org/mydir is very definately the *file* mydir as far as a browser is concerned, how would they know any different - http://www.mydomain.org/mydir/ is very definately the *dir* mydir, you have explicitly informed the browser of the fact.

Therefore http://www.mydomain.org/ is equivalent to http://www.mydomain.org/mydir (there is of course an implied auto-file (index.html etc) after the slash) and I fail to see why it would make any difference as far as the browser is concerned.

I'm not sure why the change was made at all -- was anyone asking for or needing it?

It had caused problems for me, I was both asking for and needing it, and as I (like to think I) still have some sway here, I ultimately committed it.

Who, in real-world content management, is going to set this variable differently for EACH page being edited in a content-management system?

Anybody who is doing the job properly. Strangely, that includes me.

comment:10 Changed 9 years ago by ray

  • Milestone changed from Version 1.0 to 0.96

comment:11 Changed 9 years ago by ray

  • Resolution set to fixed
  • Status changed from new to closed

With the patches committed, the topic seems settled (?)

Note: See TracTickets for help on using tickets.