Plugin for Sitemaps.org sitemap protocol

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Plugin for Sitemaps.org sitemap protocol

Gerrit Berkouwer
Hi all,

we want to use the Sitemap.org protocol to let search systems know which URLs we have for them.

What we want in general:
- generate a x-number of sitemap.xml files every night from the repository (50.000 documents per sitemap is the maximum)
- generate a sitemap indexfile to connect the x-number of sitemap.xml's
- the sitemap.xml should contain ALL URLs of our website
- if a document has multiple URL's, only the canonical URL should be used in the sitemap.xml
- the xml has these attributes:
<urlset>
<url>
<loc>
<lastmod>
<changefreq>
<priority>

On the Hippo Forge I see the Sitemap Protocol Component, https://forge.onehippo.org/gf/project/sitemap/. I do not see documentation about what this plugin exactly does.

Does it have all requirements I mentioned above?

--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Plugin for Sitemaps.org sitemap protocol

Adolfo Benedetti
2011/7/21 Gerrit Berkouwer <[hidden email]>:
> https://forge.onehippo.org/gf/project/sitemap/. I do not see documentation
>
http://sitemap.forge.onehippo.org/index.html
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Plugin for Sitemaps.org sitemap protocol

Gerrit Berkouwer
This post was updated on .
yeah, I saw, but there seems to be no functional documentation, e.g. what does the plugin do exactly. Does it take all document URLs? How does it handle canonical URLs? how does it solve the >50.000 documents rule?
--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Plugin for Sitemaps.org sitemap protocol

Jeroen Reijn
Administrator
In reply to this post by Gerrit Berkouwer
I took a quick look at the source code since the plugins is quite old.

For all the requirements below I'll state if this plugin can handle this.

On Thu, Jul 21, 2011 at 1:45 PM, Gerrit Berkouwer
<[hidden email]> wrote:
> Hi all,
>
> we want to use the Sitemap.org protocol to let search systems know which
> URLs we have for them.
>
> What we want in general:
> - generate a x-number of sitemap.xml files every night from the repository
> (50.000 documents per sitemap is the maximum)

This is not handled by the plugin right now.

> - generate a sitemap indexfile to connect the x-number of sitemap.xml's

No.

> - the sitemap.xml should contain ALL URLs of our website

No it does not do that right now. It only adds items from all menu
items configured in the hst sitemenu structure.

> - if a document has multiple URL's, only the canonical URL should be used in
> the sitemap.xml
> - the xml has these attributes:
> <urlset>

Yes

> <url>

Yes
> <loc>
Yes
> <lastmod>
Yes, but always set to today
> <changefreq>
Yes hardcoded to daily
> <priority>

Yes alwas set to 1.0

>
> On the Hippo Forge I see the Sitemap Protocol Component,
> https://forge.onehippo.org/gf/project/sitemap/. I do not see documentation
> about what this plugin exactly does.
>
> Does it have all requirements I mentioned above?

So I don't think it meets al your requirements.

>
>
>
> -----
> --
> Greetz, Gerrit
> --
> View this message in context: http://hippo.2275632.n2.nabble.com/Plugin-for-Sitemaps-org-sitemap-protocol-tp6606371p6606371.html
> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>



--
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Plugin for Sitemaps.org sitemap protocol

Gerrit Berkouwer
Thanks for the input Jeroen. Would it be wise to extend/modify the existing plugin or would it be wise to start with a new plugin?
--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Plugin for Sitemaps.org sitemap protocol

Jeroen Reijn
Administrator
I think a part of the component can be reused. I have some ideas about
this. Maybe I can write some technical approach document that some
people with HST knowledge can review. I think the hard part is to work
with the canonicals. All the rest should be quite doable. Next to that
what is your idea about setting specific rules for the next two
elements how do you expect to set these?

<changefreq>
<priority>

Gr,

Jeroen

On Thu, Jul 21, 2011 at 2:36 PM, Gerrit Berkouwer
<[hidden email]> wrote:

> Thanks for the input Jeroen. Would it be wise to extend/modify the existing
> plugin or would it be wise to start with a new plugin?
>
> -----
> --
> Greetz, Gerrit
> --
> View this message in context: http://hippo.2275632.n2.nabble.com/Plugin-for-Sitemaps-org-sitemap-protocol-tp6606371p6606585.html
> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>



--
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Plugin for Sitemaps.org sitemap protocol

Gerrit Berkouwer
This post was updated on .
<changefreq> should have the options as described in the protocol:

always | hourly | daily | weekly | monthly | yearly | never

same for <priority>: 0.0 - 1.0 (0.5 being default)

Settings these values will be different per informationtype I guess. The changfreq of a subject-page is different than the changefreq of a curriculum vitae page of a person. Same for priority.

So it could be like this:
- the plugin has a list of all information types (dynamically pulled in from the repo?)
- an editor can decide which changefreq each type has
- the editor can decide what priority each information type has
- the generated xml uses these values per set of informationtype-pages

Maybe a more granular approach is needed, for specific pages maybe. A manual list of URLs with specific changefreq and priority would be helpful for this?

--
Greetz, Gerrit
Ard
Reply | Threaded
Open this post in threaded view
|

Re: Plugin for Sitemaps.org sitemap protocol

Ard
In reply to this post by Jeroen Reijn
On Thu, Jul 21, 2011 at 3:12 PM, Jeroen Reijn <[hidden email]> wrote:
> I think a part of the component can be reused. I have some ideas about
> this. Maybe I can write some technical approach document that some
> people with HST knowledge can review. I think the hard part is to work
> with the canonicals.

The canonicals won't be hard I think. I assume you do not take the
website as a starting point. If you would, than canonicals would be
hard.

As Gerrit points out, he does not want the canonicals in the sitemap.
So all you would need is something like:

1) Crawl all documents in the repository
2) Ask the HstLinkCreator for a URL (with canonical=true flag)

This would give you a sitemap. Creating links is pretty instant. So,
crawl (I do not advice searching: use crawl) the documents in repo,
create sitemap and serve the result. Of course you need some caching.
You can do this through caching the result. Of course, you want to run
in a cluster. You can store the result as binary in the repository.

Any way, many not too hard options.

I would not take the current forge as starting point: it is geared
towards small instant sitemap's, like only the sitemenu

let me know if you'd like to discuss more about it

Regards ard


> All the rest should be quite doable. Next to that
> what is your idea about setting specific rules for the next two
> elements how do you expect to set these?
>
> <changefreq>
> <priority>
>
> Gr,
>
> Jeroen
>
> On Thu, Jul 21, 2011 at 2:36 PM, Gerrit Berkouwer
> <[hidden email]> wrote:
>> Thanks for the input Jeroen. Would it be wise to extend/modify the existing
>> plugin or would it be wise to start with a new plugin?
>>
>> -----
>> --
>> Greetz, Gerrit
>> --
>> View this message in context: http://hippo.2275632.n2.nabble.com/Plugin-for-Sitemaps-org-sitemap-protocol-tp6606371p6606585.html
>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>> _______________________________________________
>> Hippo-cms7-user mailing list and forums
>> http://www.onehippo.org/cms7/support/forums.html
>>
>
>
>
> --
> Amsterdam - Oosteinde 11, 1017 WT Amsterdam
> Boston - 1 Broadway, Cambridge, MA 02142
>
> US +1 877 414 4776 (toll free)
> Europe +31(0)20 522 4466
> www.onehippo.com
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>



--
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html