Link checker in Hippo 7?

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Link checker in Hippo 7?

Gerrit Berkouwer
Hi, did anyone ever make a link checker from within Hippo 7 CMS?

What we are thinking of:

- External link checker: every external link in our implementation is a Hippo document, which contains a URL and some metadata like an optional title-attribute. Which we re-use all over the website.
- So we would want something that checks all these external URLs, see if they respond back with a 200 ok response on internet
- And generate a list (for the editors in their dashboard) of URLs that are not working anymore (eg those that give back a 404- or a 410 or some other response)

Anyone ever see something like that or has a meaning about this idea?
--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

minos
Hi Gerrit,

I'm happy to reply to you that we just recently added this functionality and we 'll be releasing it at some point of time (please don't ask me when exactly, it's still work in progress!).
The functionality is that there is a daemon-like process that scans your documents and extracts every link it can find. Then it checks the links one by one and creates a report for all broken links found. It provides a convenient way to access the document (in order to fix it) and also many other useful features, like reporting on the date since the link is broken, an explanatory message on why is the link considered broken (status code, response from server etc) and an excerpt of the text in which the link was found. We will probably be adding more functionality in the near future.

Thanks


On Mon, May 16, 2011 at 4:49 PM, Gerrit Berkouwer <[hidden email]> wrote:
Hi, did anyone ever make a link checker from within Hippo 7 CMS?

What we are thinking of:

- External link checker: every external link in our implementation is a
Hippo document, which contains a URL and some metadata like an optional
title-attribute. Which we re-use all over the website.
- So we would want something that checks all these external URLs, see if
they respond back with a 200 ok response on internet
- And generate a list (for the editors in their dashboard) of URLs that are
not working anymore (eg those that give back a 404- or a 410 or some other
response)

Anyone ever see something like that or has a meaning about this idea?

-----
--
Greetz, Gerrit
--
View this message in context: http://hippo.2275632.n2.nabble.com/Link-checker-in-Hippo-7-tp6368953p6368953.html
Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html



--
With kind regards/Met vriendelijke groet,
Minos Chatzidakis

Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100 •  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal QC H2T 1S5  •  +1 (514) 316 8966
________________________________________________________________
This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately.



_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Ard
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

Ard
Hello Minos and Gerrit et al,

Indeed we are working on a broken links report. At the same time, I
think, we should include the more specific usage Gerrit is poinint
out: It is not the most common way editors work with external links,
but, it is however the most elegant way to do it. Let me explain:

Gerrit's project apparently does not have xinha fields containing
links to external URLs. Instead, most likely, they always *link* to
some other Document which contains the actual external link.

This scenario of course is covered by the work currently done:
Checking all documents for external links, and then check the external
links.

However, if we could enhance the daemon process a tiny bit, to make it
configurable *which* documents it should actually check, the daemon
job checking all the documents would become much more efficient:
Obviously, for large repositories, it is expensive to check all
documents. This is not needed if you 'know' all your external links
reside in a document of type X.

@Minos: Could this be added to the configuration for the daemon thread?

Regards Ard


On Mon, May 16, 2011 at 5:40 PM, Minos Chatzidakis
<[hidden email]> wrote:

> Hi Gerrit,
> I'm happy to reply to you that we just recently added this functionality and
> we 'll be releasing it at some point of time (please don't ask me when
> exactly, it's still work in progress!).
> The functionality is that there is a daemon-like process that scans your
> documents and extracts every link it can find. Then it checks the links one
> by one and creates a report for all broken links found. It provides a
> convenient way to access the document (in order to fix it) and also many
> other useful features, like reporting on the date since the link is broken,
> an explanatory message on why is the link considered broken (status code,
> response from server etc) and an excerpt of the text in which the link was
> found. We will probably be adding more functionality in the near future.
> Thanks
>
> On Mon, May 16, 2011 at 4:49 PM, Gerrit Berkouwer
> <[hidden email]> wrote:
>>
>> Hi, did anyone ever make a link checker from within Hippo 7 CMS?
>>
>> What we are thinking of:
>>
>> - External link checker: every external link in our implementation is a
>> Hippo document, which contains a URL and some metadata like an optional
>> title-attribute. Which we re-use all over the website.
>> - So we would want something that checks all these external URLs, see if
>> they respond back with a 200 ok response on internet
>> - And generate a list (for the editors in their dashboard) of URLs that
>> are
>> not working anymore (eg those that give back a 404- or a 410 or some other
>> response)
>>
>> Anyone ever see something like that or has a meaning about this idea?
>>
>> -----
>> --
>> Greetz, Gerrit
>> --
>> View this message in context:
>> http://hippo.2275632.n2.nabble.com/Link-checker-in-Hippo-7-tp6368953p6368953.html
>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>> _______________________________________________
>> Hippo-cms7-user mailing list and forums
>> http://www.onehippo.org/cms7/support/forums.html
>
>
>
> --
> With kind regards/Met vriendelijke groet,
> Minos Chatzidakis
> Hippo
> Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522
> 4466
> USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100 •  +1
> (707) 773 4646
> Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal QC H2T
> 1S5  •  +1 (514) 316 8966
> www.onehippo.com  •  www.onehippo.org  •  [hidden email]
> ________________________________________________________________
> This e-mail may be privileged and/or confidential, and the sender does not
> waive any related rights and obligations. Any distribution, use or copying
> of this e-mail or the information it contains by other than an intended
> recipient is unauthorized. If you received this e-mail in error, please
> advise me (by return e-mail or otherwise) immediately.
>
>
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>



--
Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco 755 Baywood Drive, Second Floor •  Petaluma, CA.
94954 •  +1 877 414 4776 (toll free)
Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal QC
H2T 1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  [hidden email]
________________________________________________________________
This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

minos
Well for the time being you can specify the content path under which the checker scans for documents. That way if you have all your 'link' documents (in the case of Gerrit) inside a folder, you can restrict the scan to only check the documents in this particular folder. 
Of course I agree with you Ard in that we should make it much more flexible, eg by providing a query to reduce the number of scanned documents or target the checker to particular document types/locations. The query can be made configurable of course via the cms console. 
Another important issue is to extend the checker to take into consideration other types of fields apart from hippo:html (Xinha). This can also be configurable and ideally we should be able to specify which fields to check depending on the document type of the document being scanned.

Regards,
Minos



On Tue, May 17, 2011 at 9:38 AM, Ard Schrijvers <[hidden email]> wrote:
Hello Minos and Gerrit et al,

Indeed we are working on a broken links report. At the same time, I
think, we should include the more specific usage Gerrit is poinint
out: It is not the most common way editors work with external links,
but, it is however the most elegant way to do it. Let me explain:

Gerrit's project apparently does not have xinha fields containing
links to external URLs. Instead, most likely, they always *link* to
some other Document which contains the actual external link.

This scenario of course is covered by the work currently done:
Checking all documents for external links, and then check the external
links.

However, if we could enhance the daemon process a tiny bit, to make it
configurable *which* documents it should actually check, the daemon
job checking all the documents would become much more efficient:
Obviously, for large repositories, it is expensive to check all
documents. This is not needed if you 'know' all your external links
reside in a document of type X.

@Minos: Could this be added to the configuration for the daemon thread?

Regards Ard


On Mon, May 16, 2011 at 5:40 PM, Minos Chatzidakis
<[hidden email]> wrote:
> Hi Gerrit,
> I'm happy to reply to you that we just recently added this functionality and
> we 'll be releasing it at some point of time (please don't ask me when
> exactly, it's still work in progress!).
> The functionality is that there is a daemon-like process that scans your
> documents and extracts every link it can find. Then it checks the links one
> by one and creates a report for all broken links found. It provides a
> convenient way to access the document (in order to fix it) and also many
> other useful features, like reporting on the date since the link is broken,
> an explanatory message on why is the link considered broken (status code,
> response from server etc) and an excerpt of the text in which the link was
> found. We will probably be adding more functionality in the near future.
> Thanks
>
> On Mon, May 16, 2011 at 4:49 PM, Gerrit Berkouwer
> <[hidden email]> wrote:
>>
>> Hi, did anyone ever make a link checker from within Hippo 7 CMS?
>>
>> What we are thinking of:
>>
>> - External link checker: every external link in our implementation is a
>> Hippo document, which contains a URL and some metadata like an optional
>> title-attribute. Which we re-use all over the website.
>> - So we would want something that checks all these external URLs, see if
>> they respond back with a 200 ok response on internet
>> - And generate a list (for the editors in their dashboard) of URLs that
>> are
>> not working anymore (eg those that give back a 404- or a 410 or some other
>> response)
>>
>> Anyone ever see something like that or has a meaning about this idea?
>>
>> -----
>> --
>> Greetz, Gerrit
>> --
>> View this message in context:
>> http://hippo.2275632.n2.nabble.com/Link-checker-in-Hippo-7-tp6368953p6368953.html
>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>> _______________________________________________
>> Hippo-cms7-user mailing list and forums
>> http://www.onehippo.org/cms7/support/forums.html
>




_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Ard
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

Ard
On Tue, May 17, 2011 at 9:58 AM, Minos Chatzidakis
<[hidden email]> wrote:
> Well for the time being you can specify the content path under which the
> checker scans for documents. That way if you have all your 'link' documents
> (in the case of Gerrit) inside a folder, you can restrict the scan to only
> check the documents in this particular folder.
> Of course I agree with you Ard in that we should make it much more flexible,
> eg by providing a query to reduce the number of scanned documents or target
> the checker to particular document types/locations. The query can be made
> configurable of course via the cms console.

Ok. As long as you realize that querying is very very delicate when
you have these possible long running jobs: Be aware! Therefore, I'd
rather opt for 'filtering' after just travering nodes. The problem
with querying, and then processing the query result (and check for
every node whether it contains external links, and then check these
links) will keep the QueryResult way too long in memory: Keeping a
QueryResult open, prevents backing lucene indexes to be released. So,
to much detail perhaps, but rather not query

Regards Ard

> Another important issue is to extend the checker to take into consideration
> other types of fields apart from hippo:html (Xinha). This can also be
> configurable and ideally we should be able to specify which fields to check
> depending on the document type of the document being scanned.
> Regards,
> Minos
>
>
> On Tue, May 17, 2011 at 9:38 AM, Ard Schrijvers <[hidden email]>
> wrote:
>>
>> Hello Minos and Gerrit et al,
>>
>> Indeed we are working on a broken links report. At the same time, I
>> think, we should include the more specific usage Gerrit is poinint
>> out: It is not the most common way editors work with external links,
>> but, it is however the most elegant way to do it. Let me explain:
>>
>> Gerrit's project apparently does not have xinha fields containing
>> links to external URLs. Instead, most likely, they always *link* to
>> some other Document which contains the actual external link.
>>
>> This scenario of course is covered by the work currently done:
>> Checking all documents for external links, and then check the external
>> links.
>>
>> However, if we could enhance the daemon process a tiny bit, to make it
>> configurable *which* documents it should actually check, the daemon
>> job checking all the documents would become much more efficient:
>> Obviously, for large repositories, it is expensive to check all
>> documents. This is not needed if you 'know' all your external links
>> reside in a document of type X.
>>
>> @Minos: Could this be added to the configuration for the daemon thread?
>>
>> Regards Ard
>>
>>
>> On Mon, May 16, 2011 at 5:40 PM, Minos Chatzidakis
>> <[hidden email]> wrote:
>> > Hi Gerrit,
>> > I'm happy to reply to you that we just recently added this functionality
>> > and
>> > we 'll be releasing it at some point of time (please don't ask me when
>> > exactly, it's still work in progress!).
>> > The functionality is that there is a daemon-like process that scans your
>> > documents and extracts every link it can find. Then it checks the links
>> > one
>> > by one and creates a report for all broken links found. It provides a
>> > convenient way to access the document (in order to fix it) and also many
>> > other useful features, like reporting on the date since the link is
>> > broken,
>> > an explanatory message on why is the link considered broken (status
>> > code,
>> > response from server etc) and an excerpt of the text in which the link
>> > was
>> > found. We will probably be adding more functionality in the near future.
>> > Thanks
>> >
>> > On Mon, May 16, 2011 at 4:49 PM, Gerrit Berkouwer
>> > <[hidden email]> wrote:
>> >>
>> >> Hi, did anyone ever make a link checker from within Hippo 7 CMS?
>> >>
>> >> What we are thinking of:
>> >>
>> >> - External link checker: every external link in our implementation is a
>> >> Hippo document, which contains a URL and some metadata like an optional
>> >> title-attribute. Which we re-use all over the website.
>> >> - So we would want something that checks all these external URLs, see
>> >> if
>> >> they respond back with a 200 ok response on internet
>> >> - And generate a list (for the editors in their dashboard) of URLs that
>> >> are
>> >> not working anymore (eg those that give back a 404- or a 410 or some
>> >> other
>> >> response)
>> >>
>> >> Anyone ever see something like that or has a meaning about this idea?
>> >>
>> >> -----
>> >> --
>> >> Greetz, Gerrit
>> >> --
>> >> View this message in context:
>> >>
>> >> http://hippo.2275632.n2.nabble.com/Link-checker-in-Hippo-7-tp6368953p6368953.html
>> >> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>> >> _______________________________________________
>> >> Hippo-cms7-user mailing list and forums
>> >> http://www.onehippo.org/cms7/support/forums.html
>> >
>
>
>
>
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>



--
Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco 755 Baywood Drive, Second Floor •  Petaluma, CA.
94954 •  +1 877 414 4776 (toll free)
Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal QC
H2T 1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  [hidden email]
________________________________________________________________
This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

Gerrit Berkouwer
@ard and @minos,

good stuff. Offcourse configuration should be possible. Some questions/thoughts:

- how fast will this work? In our project we have >100.000 documents in the repository. How quick would the daemon come back with a good list? For big repositories it is absolutely necessary to be able to choose which folders to check...
- it should be possible to schedule this link-checker, e.g 'run every night at 02.00 am'
- does the link-checker 'behave' when checking links? :-) You want to prevent that your spider degrades the performance of your website while running...
- what happens if the daemon tries to check the links on internet, but its connection to internet is not there? Will this result in a link-report saying that ALL links are not working? There should be something in place like if the daemon finds a certain number of broken links it should stop searching... or better a configurable number of broken links....5, 10, 100, 200...
--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

minos
Hi Gerrit,

- how fast will this work? In our project we have >100.000 documents in the
repository. How quick would the daemon come back with a good list? For big
repositories it is absolutely necessary to be able to choose which folders
to check...
 
Yes, we already provide batching based on the content path of the documents. We are investigating more ways content can be split and scanned in batches.

 
- it should be possible to schedule this link-checker, e.g 'run every night
at 02.00 am'
 
Of course, it already is possible
 

- does the link-checker 'behave' when checking links? :-) You want to
prevent that your spider degrades the performance of your website while
running...

It does not have any noticeable impact on cms performance. Apart from that, it is multithreaded and can be configured to be 'light'
 

- what happens if the daemon tries to check the links on internet, but its
connection to internet is not there? Will this result in a link-report
saying that ALL links are not working? There should be something in place
like if the daemon finds a certain number of broken links it should stop
searching... or better a configurable number of broken links....5, 10, 100,
200...


And even more solutions (and issues) exist. For instance, the link checker could consider a link broken only after it has checked it twice (or even 3 times). Cause apart from your connection, the target site may also temporarily be down. These are all things we are currently investigating.


Thanks a lot for your feedback/proposals and bringing up these issues.

--
With kind regards/Met vriendelijke groet,
Minos Chatzidakis

Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100 •  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal QC H2T 1S5  •  +1 (514) 316 8966
________________________________________________________________
This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately.




_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

Gerrit Berkouwer
Minos,

I would like to add:

- next to the resulting list of broken links the tool should also generate a list of 'identical' links.

Let me explain. These links are more or less identical:

www.domain.com
domain.com
http://www.domain.com
http://www.domain.com/

Is the use-case where a system has all external links as separate documents in a folder (like in our project) this is not wise. Editors should be able to use this 'identical links' list to clean up their links...

What do you think?

Any insights on the progress of this? Any launch ideas already? :-)



--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

Jasha Joachimsthal
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

minos
Hi Gerrit,

I completely agree with Jasha, we would have to be very careful with what we consider identical.
http://www.domain.com can point to a different location than http://domain.com, one can be broken the other alive, just like the trailing slash example Jasha gave.

So, even though your proposal sounds like a very nice idea, the cases in which it can be applied will probably be reduced to very few after we decide what 'identical' means. It certainly needs more investigation.

Thanks



On Thu, Jun 16, 2011 at 1:03 PM, Jasha Joachimsthal <[hidden email]> wrote:
Hi Gerrit,

On 16 June 2011 11:49, Gerrit Berkouwer <[hidden email]> wrote:
Minos,

I would like to add:

- next to the resulting list of broken links the tool should also generate a
list of 'identical' links.

Let me explain. These links are more or less identical:

www.domain.com
domain.com

These may look identical but they won't necessarily point to the same location.
For the site root these may be identical. But http://www.example.com/myapp can return a 404 while http://www.example.com/myapp/ can return a valid page.
 
So it's a bit tricky to treat links identical.

Jasha

_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html



--
With kind regards/Met vriendelijke groet,
Minos Chatzidakis

Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

______________________________
__________________________________
This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.


_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

Gerrit Berkouwer
Totally agree. So lets decide what identical is ;-).
It gets simpler: identical means: exactly identical URLs, e.g.:

http://www.domain.com/
http://www.domain.com/

https://www.domain.com/
https://www.domain.com/

http://www.domain.com/myapp
http://www.domain.com/myapp

www.domain.com
www.domain.com

domain.com
domain.com

BUT ALSO:

http://www.domain.com
www.domain.com


These 2 are identical, right?

Thats about it I guess? No problem if this list turns out to be small, although in a big system you might be surprised what hundreds of editors come up with...;-)

--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

b.vanderschans@onehippo.com
Hi Gerrit,

On Thu, Jun 16, 2011 at 3:00 PM, Gerrit Berkouwer
<[hidden email]> wrote:

> Totally agree. So lets decide what identical is ;-).
> It gets simpler: identical means: exactly identical URLs, e.g.:
>
> http://www.domain.com/
> http://www.domain.com/
>
> https://www.domain.com/
> https://www.domain.com/
>
> http://www.domain.com/myapp
> http://www.domain.com/myapp
>
> www.domain.com
> www.domain.com
>
> domain.com
> domain.com
>
> BUT ALSO:
>
> http://www.domain.com
> www.domain.com

I don't see how this last one is a "link". It's just a domain name.
You need to specify a protocol like http://, ftp://, https://, etc.

Regards,
Bart
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

Jasha Joachimsthal
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

minos

If the front end developer doesn't do magic for a link that starts with www., it will be rendered as a link relative to the visited page. That one will probably return a 404 by the HST while http://www.domain.com will lead to the external site. 


Well, it's the xinha editor that does a little bit of magic and will not allow the editor to omit the protocol. That is, if the user writes www.domain.com, xinha will store http://www.domain.com
Apart from this case, all the links Gerrit provided are indeed identical.

So I'm wondering, apart from cases where the links are lexicographically equal, is there any other example of links that look different but point to the same location?



--
With kind regards/Met vriendelijke groet,
Minos Chatzidakis

Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

______________________________
__________________________________
This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.


_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

Jasha Joachimsthal
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

minos
Yes, I assume that the links come from xinha cause that is the only field the link checker checks!
This may change in the future of course but that's the case for now.

Thnx



On Thu, Jun 16, 2011 at 4:41 PM, Jasha Joachimsthal <[hidden email]> wrote:


On 16 June 2011 16:00, Minos Chatzidakis <[hidden email]> wrote:

If the front end developer doesn't do magic for a link that starts with www., it will be rendered as a link relative to the visited page. That one will probably return a 404 by the HST while http://www.domain.com will lead to the external site. 


Well, it's the xinha editor that does a little bit of magic and will not allow the editor to omit the protocol. That is, if the user writes www.domain.com, xinha will store http://www.domain.com
Apart from this case, all the links Gerrit provided are indeed identical.

So I'm wondering, apart from cases where the links are lexicographically equal, is there any other example of links that look different but point to the same location?


 
Now you assume all links come form xinha. It's also possible to create a text field for links (and add some magic in the JSP to render <a href="${document.link}"> 

Jasha


_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html



--
With kind regards/Met vriendelijke groet,
Minos Chatzidakis

Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

______________________________
__________________________________
This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.


_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

Gerrit Berkouwer
minos wrote
Yes, I assume that the links come from xinha cause that is the only field
the link checker checks!
This may change in the future of course but that's the case for now.
Thnx
Minos, really? That is a big disappointment...

Are we the only Hippo7 project that has links to external websites as separate documents? That has a lot of advantages if you want to do structured content management...

What is the main reason to check links coming from Xinha fields? Would it also be possible to check hippo-documents in a folder? The code could simply look for URLs in the document-fields...and return the list of broken URLs.

How far is you 'future' where 'this may change'? :-) Looks like two roads will be followed for broken-links-checking: yours and ours. I would prefer to line them up and bring them together!

--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

minos
Hi Gerrit,

Well, as I've said before the decision for what the broken links checker supports is not mine to take. This project is still in a pilot-like phase, now getting properly integrated with the general reporting functionality added from 7.6 and onwards.

Checking the xinha fields was our first priority since this is the field most users use to add content and links. Extending it to support more fields is not such a difficult task, but honestly I cannot say when will it be planned and done. There are many improvements on the way, like support for clustered environments, better scheduling mechanism, tighter link checking and many more. Supporting more fields naturally comes second to those issues.

By the way blc already supports specifying which folder to monitor (content path), this as an answer to "Would it also be possible to check hippo-documents in a folder?". So you specify a folder and blc will scan every document (but will only look for xinha fields) and do the link checking. I hope I got your question right.

So, I would really like to make promises and speak about planning, but I'm afraid I can't. 

Best regards,
Minos



On Thu, Jun 16, 2011 at 7:54 PM, Gerrit Berkouwer <[hidden email]> wrote:

minos wrote:
>
> Yes, I assume that the links come from xinha cause that is the only field
> the link checker checks!
> This may change in the future of course but that's the case for now.
> Thnx
>

Minos, really? That is a big disappointment...

Are we the only Hippo7 project that has links to external websites as
separate documents? That has a lot of advantages if you want to do
structured content management...

What is the main reason to check links coming from Xinha fields? Would it
also be possible to check hippo-documents in a folder? The code could simply
look for URLs in the document-fields...and return the list of broken URLs.

How far is you 'future' where 'this may change'? :-) Looks like two roads
will be followed for broken-links-checking: yours and ours. I would prefer
to line them up and bring them together!



-----
--
Greetz, Gerrit
--
View this message in context: http://hippo.2275632.n2.nabble.com/Link-checker-in-Hippo-7-tp6368953p6484118.html
Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html



--
With kind regards/Met vriendelijke groet,
Minos Chatzidakis

Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

______________________________
__________________________________
This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.


_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Link checker in Hippo 7?

Gerrit Berkouwer
Minos, no problem, thanks for all the info!
--
Greetz, Gerrit