synonyms in hippo-lucene

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

synonyms in hippo-lucene

Adolfo Benedetti
Hi everyone,

In order to make use of the synonyms functionality present in
lucene[1][2] for a non-english language, can somebody point me to the
documentation hippo7.5-lucene internals[3],

Thank you in advance,
Cheers

Adolfo

///
 String[] words = new String[] { "hard", "woods", "forest", "wolfish", "xxxx"};
 SynonymMap map = new SynonymMap(new
FileInputStream("samples/fulltext/wn_s.pl"));
 for (int i = 0; i < words.length; i++) {
    String[] synonyms = map.getSynonyms(words[i]);
    System.out.println(words[i] + ":" +
java.util.Arrays.asList(synonyms).toString());
 }

 ////
 Example output:
 hard:[arduous, backbreaking, difficult, fermented, firmly, grueling,
gruelling, heavily, heavy, intemperately, knockout, laborious,
punishing, severe, severely, strong, toilsome, tough]
 woods:[forest, wood]
 forest:[afforest, timber, timberland, wood, woodland, woods]
 wolfish:[edacious, esurient, rapacious, ravening, ravenous,
voracious, wolflike]
 xxxx:[]

[1]http://lucene.apache.org/java/2_9_2/api/contrib-memory/org/apache/lucene/index/memory/SynonymTokenFilter.html
[2]http://lucene.apache.org/java/2_9_2/api/contrib-memory/org/apache/lucene/index/memory/SynonymMap.html
[3]http://www.onehippo.org/search?q=lucene

--
Adolfo Benedetti
Mobile: +31 6 46436090
Sourcesense - making sense of Open Source: http://www.sourcesense.nl
Herengracht 124 - 128
1015 BT Amsterdam
T +31 20 5708949
F +31 20 5708989
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: synonyms in hippo-lucene

Frank van Lankvelt
you should be able to provide synonyms by implementing the JackRabbit
SynonymProvider class, or by using the existing
PropertiesSynonymProvider.  Adding the "synonymProviderClass"
parameter to the SearchIndex configuration should work.
AFAIK, the underlying jackrabbit repository configuration [1] should
take care of this.

cheers, Frank


[1] http://wiki.apache.org/jackrabbit/Search

On Thu, Jun 9, 2011 at 10:39 AM, Adolfo Benedetti
<[hidden email]> wrote:

> Hi everyone,
>
> In order to make use of the synonyms functionality present in
> lucene[1][2] for a non-english language, can somebody point me to the
> documentation hippo7.5-lucene internals[3],
>
> Thank you in advance,
> Cheers
>
> Adolfo
>
> ///
>  String[] words = new String[] { "hard", "woods", "forest", "wolfish", "xxxx"};
>  SynonymMap map = new SynonymMap(new
> FileInputStream("samples/fulltext/wn_s.pl"));
>  for (int i = 0; i < words.length; i++) {
>    String[] synonyms = map.getSynonyms(words[i]);
>    System.out.println(words[i] + ":" +
> java.util.Arrays.asList(synonyms).toString());
>  }
>
>  ////
>  Example output:
>  hard:[arduous, backbreaking, difficult, fermented, firmly, grueling,
> gruelling, heavily, heavy, intemperately, knockout, laborious,
> punishing, severe, severely, strong, toilsome, tough]
>  woods:[forest, wood]
>  forest:[afforest, timber, timberland, wood, woodland, woods]
>  wolfish:[edacious, esurient, rapacious, ravening, ravenous,
> voracious, wolflike]
>  xxxx:[]
>
> [1]http://lucene.apache.org/java/2_9_2/api/contrib-memory/org/apache/lucene/index/memory/SynonymTokenFilter.html
> [2]http://lucene.apache.org/java/2_9_2/api/contrib-memory/org/apache/lucene/index/memory/SynonymMap.html
> [3]http://www.onehippo.org/search?q=lucene
>
> --
> Adolfo Benedetti
> Mobile: +31 6 46436090
> Sourcesense - making sense of Open Source: http://www.sourcesense.nl
> Herengracht 124 - 128
> 1015 BT Amsterdam
> T +31 20 5708949
> F +31 20 5708989
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>



--
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: synonyms in hippo-lucene

Adolfo Benedetti
Frank,

It works!, As you said before just adding the synonymProviderClass and
the path to the thesaurus to the SearchIndex and using the prefix (~)
during the search.

<param name="synonymProviderClass"
value="org.apache.jackrabbit.core.query.lucene.PropertiesSynonymProvider"/>
<param name="synonymProviderConfigPath" value="synonyms.properties"/>

More info[1] if someone is interested in use the WordNet Synonym Provider.

Thank you,

Adolfo

[1]http://wiki.apache.org/jackrabbit/SynonymSearch


--
Adolfo Benedetti
Mobile: +31 6 46436090
Sourcesense - making sense of Open Source: http://www.sourcesense.nl
Herengracht 124 - 128
1015 BT Amsterdam
T +31 20 5708949
F +31 20 5708989

2011/6/9 Frank van Lankvelt <[hidden email]>:

> you should be able to provide synonyms by implementing the JackRabbit
> SynonymProvider class, or by using the existing
> PropertiesSynonymProvider.  Adding the "synonymProviderClass"
> parameter to the SearchIndex configuration should work.
> AFAIK, the underlying jackrabbit repository configuration [1] should
> take care of this.
>
> cheers, Frank
>
>
> [1] http://wiki.apache.org/jackrabbit/Search
>
> On Thu, Jun 9, 2011 at 10:39 AM, Adolfo Benedetti
> <[hidden email]> wrote:
>> Hi everyone,
>>
>> In order to make use of the synonyms functionality present in
>> lucene[1][2] for a non-english language, can somebody point me to the
>> documentation hippo7.5-lucene internals[3],
>>
>> Thank you in advance,
>> Cheers
>>
>> Adolfo
>>
>> ///
>>  String[] words = new String[] { "hard", "woods", "forest", "wolfish", "xxxx"};
>>  SynonymMap map = new SynonymMap(new
>> FileInputStream("samples/fulltext/wn_s.pl"));
>>  for (int i = 0; i < words.length; i++) {
>>    String[] synonyms = map.getSynonyms(words[i]);
>>    System.out.println(words[i] + ":" +
>> java.util.Arrays.asList(synonyms).toString());
>>  }
>>
>>  ////
>>  Example output:
>>  hard:[arduous, backbreaking, difficult, fermented, firmly, grueling,
>> gruelling, heavily, heavy, intemperately, knockout, laborious,
>> punishing, severe, severely, strong, toilsome, tough]
>>  woods:[forest, wood]
>>  forest:[afforest, timber, timberland, wood, woodland, woods]
>>  wolfish:[edacious, esurient, rapacious, ravening, ravenous,
>> voracious, wolflike]
>>  xxxx:[]
>>
>> [1]http://lucene.apache.org/java/2_9_2/api/contrib-memory/org/apache/lucene/index/memory/SynonymTokenFilter.html
>> [2]http://lucene.apache.org/java/2_9_2/api/contrib-memory/org/apache/lucene/index/memory/SynonymMap.html
>> [3]http://www.onehippo.org/search?q=lucene
>>
>> --
>> Adolfo Benedetti
>> Mobile: +31 6 46436090
>> Sourcesense - making sense of Open Source: http://www.sourcesense.nl
>> Herengracht 124 - 128
>> 1015 BT Amsterdam
>> T +31 20 5708949
>> F +31 20 5708989
>> _______________________________________________
>> Hippo-cms7-user mailing list and forums
>> http://www.onehippo.org/cms7/support/forums.html
>>
>
>
>
> --
> Amsterdam - Oosteinde 11, 1017 WT Amsterdam
> Boston - 1 Broadway, Cambridge, MA 02142
>
> US +1 877 414 4776 (toll free)
> Europe +31(0)20 522 4466
> www.onehippo.com
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: synonyms in hippo-lucene

Gerrit Berkouwer
Looking good! :-)

Is there a maximum of synonyms before the performance gets sluggish? What happens if I have e.g. 20 synonyms per word and my user uses 4 words in his search-string ("word1 word2 word3 word4")? Could that be a problem?
--
Greetz, Gerrit
Ard
Reply | Threaded
Open this post in threaded view
|

Re: synonyms in hippo-lucene

Ard
On Thu, Jun 9, 2011 at 7:03 PM, Gerrit Berkouwer
<[hidden email]> wrote:
> Looking good! :-)
>
> Is there a maximum of synonyms before the performance gets sluggish? What
> happens if I have e.g. 20 synonyms per word and my user uses 4 words in his
> search-string ("word1 word2 word3 word4")? Could that be a problem?

No, don't worry about that! It is just Lucene query expansion that
takes care of synonyms. Compare it to good old Lucene range queries:
Although not blistering fast and kind of outdated in some cases (like
numeric ranges and such) it works as follows: A range query between,
say, 'a' and 'c' takes all possible terms that start with 'a' until
one that starts with 'c', and do an OR query with all these terms.
Basically, the same expansion mechanism as for synonyms. When having
like 100.000 synonyms for a single word, then you should get worried
:-)

Hope this make your worries disappear

Regards Ard



>
> -----
> --
> Greetz, Gerrit
> --
> View this message in context: http://hippo.2275632.n2.nabble.com/synonyms-in-hippo-lucene-tp6457037p6458782.html
> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>



--
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com
_______________________________________________
Hippo-cms7-user mailing list and forums
http://www.onehippo.org/cms7/support/forums.html
Reply | Threaded
Open this post in threaded view
|

Re: synonyms in hippo-lucene

Gerrit Berkouwer
@ard good to hear! :-) #webperformanceiskey

--
Greetz, Gerrit