Scoping the Search in backend of the CMS

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Scoping the Search in backend of the CMS

Gerrit Berkouwer
We have a repository of 12GB. Our editors use the Search function of the CMS to find their documents. We use a lot of linking from within documents to other documents.

The 'problem': finding the exact document with Search is difficult. Because of the amount of documents, including the >8000 PDF-files who are also indexed by Lucene. The Search-function simply gives back too many options to choose from...

Use-case: in our environment an editor almost always uses the exact titel of a document he knows should be 'somewhere' in the CMS. He 'just' wants to link to this document, from within the 'linkpicker'.

Our idea: we want an option within the Search-function to ONLY search in the name-field and the titel-field of all documents. I think this would be a great standard feature in the UI of standard-Hippo7-CMS, because i think our use-case is typical for a CMS user.

Ideally it would be great to be able to choose/configure which fields the Search-function should use, maybe other implementations and other use-cases need other choices.

Any thoughts/ideas about this? Good idea? Bad idea? Suitable to be standard functionality within the system, or not? And why? :-)
--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Arje Cahn
Administrator
> Our idea: we want an option within the Search-function to ONLY search in the

> name-field and the titel-field of all documents. I think this would be a
> great standard feature in the UI of standard-Hippo7-CMS, because i think our
> use-case is typical for a CMS user.
>
> Ideally it would be great to be able to choose/configure which fields the
> Search-function should use, maybe other implementations and other use-cases
> need other choices.
>
> Any thoughts/ideas about this? Good idea? Bad idea? Suitable to be standard
> functionality within the system, or not? And why? :-)
Good idea!
(you knew I was going to say that! :) )

I think you'd need some more room in the search bar in the foldertree,
so you can have extra elements in there. See my attached sketch.
Such a form would then have
to be configurable (or extendible) so you can add for example an
option box for Title and Name. If made generic, so you can configure
which properties to search in, I'd be happy to pull it into the core
CMS!

Arje

_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html

search-in.png (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

marnixkok
In reply to this post by Gerrit Berkouwer
Hi,

We have created an issue related to the search function as well. For
some use-cases the current search functionality has short comings.

I think an additional "Advanced search" view with more detailed search
results and search options would be a good next step in providing usable
and somewhat complete search functionality. I'd much prefer that over
cramming additional options into the search area.

Cheers,

Marnix Kok

On 07/29/2010 02:02 PM, Gerrit Berkouwer wrote:

> We have a repository of 12GB. Our editors use the Search function of the CMS
> to find their documents. We use a lot of linking from within documents to
> other documents.
>
> The 'problem': finding the exact document with Search is difficult. Because
> of the amount of documents, including the>8000 PDF-files who are also
> indexed by Lucene. The Search-function simply gives back too many options to
> choose from...
>
> Use-case: in our environment an editor almost always uses the exact titel of
> a document he knows should be 'somewhere' in the CMS. He 'just' wants to
> link to this document, from within the 'linkpicker'.
>
> Our idea: we want an option within the Search-function to ONLY search in the
> name-field and the titel-field of all documents. I think this would be a
> great standard feature in the UI of standard-Hippo7-CMS, because i think our
> use-case is typical for a CMS user.
>
> Ideally it would be great to be able to choose/configure which fields the
> Search-function should use, maybe other implementations and other use-cases
> need other choices.
>
> Any thoughts/ideas about this? Good idea? Bad idea? Suitable to be standard
> functionality within the system, or not? And why? :-)
>
> -----
>    
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Arje Cahn
Administrator
> I think an additional "Advanced search" view with more detailed search
> results and search options would be a good next step in providing usable and
> somewhat complete search functionality. I'd much prefer that over cramming
> additional options into the search area.

Totally agreed - and probably only slightly more work.

Arje
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Gerrit Berkouwer
In reply to this post by marnixkok
If I combine the 2 ideas I would want to be able to have an Advanced Search option, where I also can configure how Standard Search works. Because Advanced Search is only interesting for the 'happy few'. The average user just wants to use the title of a document to search for or 1 or 2 words to search for. He puts these words in the search-box and hits enters and expects magic stuff to happen :-). From within the linkpicker!

So:

Standard Search:
- Standard Search uses AND in between words: here AND a AND few AND words
- Standard Search handles "here a few words" just like Google does

Advanced Search:
- here I can configure which fields Standards Search uses to search in
- I can extend the scope of the searchresults by removing the Standard Search scope switches
- I can choose to use the content of binary files or not to give back searchresults.
> Search also indexes the content of binary files I believe? E.g. the text within a PDF. It would be nice to be able to configure this, so I should be able to either search with or without the content within PDF's. The reason is that this functionality gives me so much hits, especially if I have a lot of PDF's in a system. This is not always nice, it can get in the way.
--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

b.vanderschans@onehippo.com
Personally I really like the gmail way of searching. If you want to,
you can do everything from the (normal) search box. And when you use
the advanced
search it will show you what you could have typed yourself directly in
the search box. So the next time when you want to do a similar search,
you don't have to do the extra click to the advanced search page.

It would be cool if you could do a search with something like "on+sale
title:car type:news year:2010" ;-)

Regards,
Bart

On Fri, Jul 30, 2010 at 8:11 AM, Gerrit Berkouwer
<[hidden email]> wrote:

>
> If I combine the 2 ideas I would want to be able to have an Advanced Search
> option, where I also can configure how Standard Search works. Because
> Advanced Search is only interesting for the 'happy few'. The average user
> just wants to use the title of a document to search for or 1 or 2 words to
> search for. He puts these words in the search-box and hits enters and
> expects magic stuff to happen :-). From within the linkpicker!
>
> So:
>
> Standard Search:
> - Standard Search uses AND in between words: here AND a AND few AND words
> - Standard Search handles "here a few words" just like Google does
>
> Advanced Search:
> - here I can configure which fields Standards Search uses to search in
> - I can extend the scope of the searchresults by removing the Standard
> Search scope switches
> - I can choose to use the content of binary files or not to give back
> searchresults.
>> Search also indexes the content of binary files I believe? E.g. the text
>> within a PDF. It would be nice to be able to configure this, so I should
>> be able to either search with or without the content within PDF's. The
>> reason is that this functionality gives me so much hits, especially if I
>> have a lot of PDF's in a system. This is not always nice, it can get in
>> the way.
>
> -----
> --
> Greetz, Gerrit
> --
> View this message in context: http://hippo.2275632.n2.nabble.com/Scoping-the-Search-in-backend-of-the-CMS-tp5347234p5353865.html
> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>



--
Hippo B.V.  -  Amsterdam
Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466

Hippo USA Inc.  -  San Francisco
101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646
-----------------------------------------------------------------
http://www.onehippo.com   -  [hidden email]
-----------------------------------------------------------------
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Gerrit Berkouwer
Agreed. Two hesitations:

- the average user will never use or understand your query ;-)
- the use case of the editor searching for documents in the CMS is different from the user searching through his e-mals in Gmail
--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Gerrit Berkouwer
In reply to this post by Arje Cahn
Arje,

I think 1 option with a checkbox is sufficient

- 1 checkbox: "Only search in name and title'
- this is default
- If I want to seach in everything I uncheck this box

I think that is clear enough in the interface. Your sketch has too many options, it is confusing to have the option to either search in Title OR Name. Both is enough I think.

--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Jeroen Reijn
Administrator
I personally think that title might not be the best to search for.
Title is most of the time a project specific property and not all
items have a title, but do have a name. I think

- name
- all information

would be better.

Jeroen

On Mon, Aug 2, 2010 at 12:55 PM, Gerrit Berkouwer
<[hidden email]> wrote:

>
> Arje,
>
> I think 1 option with a checkbox is sufficient
>
> - 1 checkbox: "Only search in name and title'
> - this is default
> - If I want to seach in everything I uncheck this box
>
> I think that is clear enough in the interface. Your sketch has too many
> options, it is confusing to have the option to either search in Title OR
> Name. Both is enough I think.
>
>
>
> -----
> --
> Greetz, Gerrit
> --
> View this message in context: http://hippo.2275632.n2.nabble.com/Scoping-the-Search-in-backend-of-the-CMS-tp5347234p5363639.html
> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Gerrit Berkouwer
@jeroen: correct.

Ideally you can add project specific fields in the list, all of this should be configurable. But for phase 1 I would be very happy with your suggestion.

How difficult is this technically? It does not sound difficult, these are 2 views on the same index, are they not?
--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

marijan milicevic
In reply to this post by Jeroen Reijn
  That one (no title) is easy to solve by doing incremental (dunno right
name) search,
e.g.  if you don't find anything, you do an auto search on name.

What I would like to see is something like a search scope header,
something like we have now for the folders scope:


search: searchTerm
---------------------------------------
(3) name | (28) title | (127) all
---------------------------------------
- searchTerm result 1
- searchTerm result 2
- searchTerm result 3




Now, if there are no results within  "name" search,  title results would
be auto selected, etc.


cheers,
marijan








On 08/02/2010 01:09 PM, Jeroen Reijn wrote:

> I personally think that title might not be the best to search for.
> Title is most of the time a project specific property and not all
> items have a title, but do have a name. I think
>
> - name
> - all information
>
> would be better.
>
> Jeroen
>
> On Mon, Aug 2, 2010 at 12:55 PM, Gerrit Berkouwer
> <[hidden email]>  wrote:
>> Arje,
>>
>> I think 1 option with a checkbox is sufficient
>>
>> - 1 checkbox: "Only search in name and title'
>> - this is default
>> - If I want to seach in everything I uncheck this box
>>
>> I think that is clear enough in the interface. Your sketch has too many
>> options, it is confusing to have the option to either search in Title OR
>> Name. Both is enough I think.
>>
>>
>>
>> -----
>> --
>> Greetz, Gerrit
>> --
>> View this message in context: http://hippo.2275632.n2.nabble.com/Scoping-the-Search-in-backend-of-the-CMS-tp5347234p5363639.html
>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>> _______________________________________________
>> Hippo-cms7-user mailing list and forums
>> http://www.onehippo.org/cms7/support/forums.html
>>
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html


_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Rolf van der Steen
In reply to this post by Gerrit Berkouwer
I agree with most of what is said here…

Yes, it would be nice to have text search within specific fields (user selected).
Yes, I think a separate view would be better then cramming it all in the top left area of the tree view.
Yes, it would be nice to enter a query right in the 'simple' search field (and Yes, 99 of  100 users will never do this).

@ Gerrit (original post) Users know the exact title from a 12GB repository??? Respect! I think there will be (a lot of) use cases where users do NOT know exact data about the document they are searching. Their search will be fuzzy.

Just an example: user is writing a doc on fuel consumption of cars in general. A college has written a doc on some new car about half a year ago. User remembers this new car was extremely fuel efficient and wants to link that doc to the one he is writing now. But… the college is not in the office and user doesn't know the (exact) title of the document. In fact he doesn't even know the brand or type of the car.

User's search would be something like:
- terms "fuel" and "efficient"
- category cars
- publication date roughly half a year ago

Anyway, just my way of saying it is not easy to design a good generic advanced search ;-)

I would like to spark up this discussion some more by sharing some of my designs on advanced search (advanced_search_designs.zip).

Note: these designs are all draft; I've created these a while ago, but we haven't gotten round to implementing them (yet). Nevertheless feel free to use them for inspiration.

Short explanation on the designs:

search2_06.png & search2_09.png
Mac users will recognize the UI, as it is inspired on the 'search rules' from Mac's Finder.
Search is positioned in the (wide) doc list in this concept. It allowed users to create (multiple) 'rules'. Thus narrowing the result set. A set of rules can be saved and reused.
- search2_06.png: shows creation of rules
- search2_09.png: shows saved search

search3_facets1.png & search3_facets2.png
The tree view contains a section called "facets". This section contains a 'simple' search box which can be combined with facets.
- search3_facets1.png: shows search without facets
- search3_facets2.png: shows search with some facets active

Love to hear your opinion and be involved in this discussion.

Keep the posts coming.

advanced_search_designs.zip
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Gerrit Berkouwer
@rolf you are right, a good generic advanced search is not easy!

So I am a big fan of 1-step-at-a-time :-). And also I am no big fan of advanced search... I think search should be simple, and we should help users on result-pages and not with advanced search forms... Your sketches with the business rules will probably scare away the users I know :-)...
The facets are an example of helping users on the result-page. I would love to see how that works out with 10000 documents or more.

Anyway, we have implemented the 'search-on-specific-fields'! I hope to see it in action next week.

To return to use-cases. Your use case describes editors writing, linking and searching 'from scratch'. Looks like a sane use-case.

My use-case is for users that build pages around 1 subject, re-using content already in their system:
- the editor has a 'generic' page on his website, already published
- he needs to link to that page
- he wants to find the shortest route to that page via the linkpicker
- so he wants to copy the whole title of that page
- paste this into the search field of the CMS
- hit enter and
- find the page to link to it, preferably it is the first hit in the searchresult-list.

One problem we face here is how to handle this copied title: should search handle these words with AND-operators in between or should search handle this sentence as an exact sentence by adding 'quotation marks' around it?
It's difficult, because this is not good behaviour if the user wants to 'try' finding stuff by just trying 2 or 3 words... (your use-case I guess...).

--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Frank van Lankvelt
Hi Gerrit,

[ snip ]

> To return to use-cases. Your use case describes editors writing, linking and
> searching 'from scratch'. Looks like a sane use-case.
>
> My use-case is for users that build pages around 1 subject, re-using content
> already in their system:
> - the editor has a 'generic' page on his website, already published
> - he needs to link to that page
> - he wants to find the shortest route to that page via the linkpicker
> - so he wants to copy the whole title of that page
> - paste this into the search field of the CMS
> - hit enter and
> - find the page to link to it, preferably it is the first hit in the
> searchresult-list.
>
> One problem we face here is how to handle this copied title: should search
> handle these words with AND-operators in between or should search handle
> this sentence as an exact sentence by adding 'quotation marks' around it?
> It's difficult, because this is not good behaviour if the user wants to
> 'try' finding stuff by just trying 2 or 3 words... (your use-case I
> guess...).
>
are you sure then that free-text search is the answer?  If your users
are already working within the context of a subject, then perhaps the
navigation should be subject-oriented.

The "subjects" that correspond to the "generic pages" seem to
correspond to the tags in a folksonomy.  So perhaps one can think of
alternatives to a "scoped search" that would typically be used in
those cases.  E.g. what comes to mind is a tag-cloud where the size of
the tag is related to the number of documents that have it.
Translated to your use case, i.e. replacing tagging by linking, the
size of a document in the tag cloud would be related to the number of
documents that link to it.  Perhaps a search where the results are
ordered by "size" would already be sufficient?

Unfortunately, this type of ordering is technically not possible, if I
understand the search index correctly.  This is in contrast with the
pure tagging case, where the tags are indexed with the document.
There, a faceted search (or faceted navigation) could be used to do
the search efficiently.

If the mapping to a folksonomy is not appropriate and the subjects are
just regular documents, then it is harder for the system to suggest a
good link target, given a number of keywords.  This does not seem to
be your use-case, but it might be worth considering what we can do
when there is a large number of documents and a large number of links
between them.  Since this is a directed graph of vertices (documents)
and edges (links), an "importance" score can be associated with each
document using the PageRank algorithm.  This will be harder to
implement than the incoming links count, but it should give a good,
generic, search result ordering.

Returning to your use-case; I don't know to what extent other metadata
exists on the documents that you would like to find in this manner.
If for instance these "generic pages" have specific document types,
then we should also consider enhancing the search in the picker such
that results are restricted to these types.  This might be a generic
alternative that is useful to others too.

cheers, Frank
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Gerrit Berkouwer
Frank van Lankvelt wrote
are you sure then that free-text search is the answer?  If your users
are already working within the context of a subject, then perhaps the
navigation should be subject-oriented.

[ snip ]

Returning to your use-case; I don't know to what extent other metadata
exists on the documents that you would like to find in this manner.
If for instance these "generic pages" have specific document types,
then we should also consider enhancing the search in the picker such
that results are restricted to these types.  This might be a generic
alternative that is useful to others too.
Frank, we already scope to certain folders from within the linkpicker for the linking in this specific use case. These folder then can contain >2500 documents (and growing) which are all of the same document-type... So we cannot do much with e.g. facets there, the user has to find 1 exact document.
--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Rolf van der Steen
In reply to this post by Gerrit Berkouwer
@Gerrit

First let me clear up a definition-thing. By 'advanced search' I mean a search functionality which allows more than just typing some words and hitting a search button. Whether that is an advanced search form upfront or guiding users with facets after an initial search.

And yes - search should always be simple… to use. Even the "advanced search" ;-)

I agree with your comment on the sketches. Actually the 'facets'-design evolved from the 'rules' design. So we're on the same line here.

I'd like to try if the current 'folder scope' and the 'facets advanced search' can be combined within the same section (ie Document, Images or Assets).

About the use case (thanks for explaining by the way, gives me a better understanding of the situation): can't it be just as simple as ranking an exact match of title (or incrementally of name) 'extremely' high? In other words give 'title' and 'name' a big part in the weight factor for the ranking?

Don't know if this is technically feasible.
Ard
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Ard
On Wed, Aug 11, 2010 at 12:59 PM, Rolf van der Steen
<[hidden email]> wrote:
>
> About the use case (thanks for explaining by the way, gives me a better
> understanding of the situation): can't it be just as simple as ranking an
> exact match of title (or incrementally of name) 'extremely' high? In other
> words give 'title' and 'name' a big part in the weight factor for the
> ranking?

This is easy, also see 'Relevance scoring your search results with the
HST Query' at [1]

Regards Ard

[1] http://blogs.onehippo.org/ard/

>
> Don't know if this is technically feasible.
> --
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Ard
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Ard
In reply to this post by Gerrit Berkouwer
On Sun, Aug 8, 2010 at 10:59 PM, Gerrit Berkouwer
<[hidden email]> wrote:
>
>
> Frank van Lankvelt wrote:

>
> Frank, we already scope to certain folders from within the linkpicker for
> the linking in this specific use case. These folder then can contain >2500
> documents (and growing) which are all of the same document-type... So we
> cannot do much with e.g. facets there, the user has to find 1 exact
> document.

But I assume there is more meta-data on documents, right? The editor
might know quite sure that it was for example an article created in
dec 2008. So, date facets could help. There are keywords perhaps on
documents. This can be used as facets as well. The creator of the
document is a facet... anyway, lot's of facets. And, if it still isn't
sufficient, then I think there is a nice challenge for me to build
something smart.

Regards Ard

>
> Greetz, Gerrit
>
>
>
> -----
> --
> Greetz, Gerrit
> --
> View this message in context: http://hippo.2275632.n2.nabble.com/Scoping-the-Search-in-backend-of-the-CMS-tp5347234p5387024.html
> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Gerrit Berkouwer
In reply to this post by Ard
Ard, good to hear about ranking possibilities! I assume this also goes for front-end search? We need that, was not sure about it.
--
Greetz, Gerrit
Reply | Threaded
Open this post in threaded view
|

Re: Scoping the Search in backend of the CMS

Gerrit Berkouwer
In reply to this post by Ard
@ard, no, the documents have no specific date, they are 'questions & answer' combinations. Also they have no keywords. The author names have no meaning in relation to the documents, the documents are imported into Hippo from another system and updated every night.

So something more smart is maybe needed! But first we will deploy the 'scoped search' to our production environment and see if that improves the search :-).
--
Greetz, Gerrit
12