Forums (Obsolete)

Certified Developer 8

majoor-evident - 7/22/2013 5:33:19 AM

Adding synonyms to Smart Search subset analyzer

Hello,

I'm trying to implement a synonym analyzer, similar to what's described on http://www.codeproject.com/Articles/32201/Lucene-Net-Custom-Synonym-Analyzer.

I've created a custom analyzer for my smart search index and got it to work for normal search results, but I'd also like to be able to search for subsets.

This is where I'm stuck. My custom analyzer class inherits the CMS.SiteProvider.SubSetAnalyzer class and I've set the constructor params isSearch and startsWith to false. Now it seems some additional filtering is required in the TokenStream method. What am I missing?

Hope you can point me in the right direction.

Regards,
Martijn

Kentico Support

kentico_zdenekc - 7/23/2013 3:50:37 AM

RE:Adding synonyms to Smart Search subset analyzer

Hello,

When you set the isSearch to false and fieldName equals to SearchHelper.CONTENT_FIELD, the code of the TokenStream first gets the original text using reader.ReadToEnd() method, then it adds subsets to it and creates new TextReader (StringReader) instance.
The TokenStream type result is a new WhitespaceTokenizer instance from the reader, filtered using LowerCaseFilter() method.

Does this help to identify your missing part?
Do you have any further details about what you currently get?
Thank you in advance for information.

Regards,
Zdenek

Certified Developer 8

majoor-evident - 7/31/2013 2:46:27 AM

RE:Adding synonyms to Smart Search subset analyzer

Hello Zdenek,

Thank you for your reply. My custom analyzer currently looks like this:

public class SearchSynonymAnalyzer : SubSetAnalyzer
    {
        /// <summary>
        /// Token stream.
        /// </summary>
        /// <param name="fieldName">Field name</param>
        /// <param name="reader">Text reader</param>
        public override TokenStream TokenStream(string fieldName, TextReader reader)
        {
            TokenStream result = new WhitespaceTokenizer(reader);
            
            result = new StandardFilter(result);
            result = new LowerCaseFilter(result);
            result = new StopFilter(result, StopAnalyzer.ENGLISH_STOP_WORDS);
            
            return new SynonymTokenizer(result);
        }


        /// <summary>
        /// Construct a new SearchSynonymAnalyzer.
        /// </summary>
        public SearchSynonymAnalyzer(Boolean isSearch, Boolean startsWith, int minimalLength)
            : base(isSearch, startsWith, minimalLength)
        {
        }
    }

The SynonymTokenizer class is where I insert the synonyms.

For my document type, the search fields are defined as follows:
Title field: DocumentName
Content field: DocumentContent
Image field: (none)
Date field: DocumentCreatedWhen

Inserting synonyms works, but unfortunately at the cost of not being able to search for subsets anymore. Should I apply some additional filtering in the TokenStream method?

Regards,
Martijn

Kentico Support

kentico_zdenekc - 8/8/2013 9:23:12 AM

RE:Adding synonyms to Smart Search subset analyzer

Hello Martijn,

Some filtering in the TokenStream method might be the way, however, this is more complex question and the Lucene documentation or dev. community may provide some hints in this matter.
Have you already tried any additional actions/filtering in the TokenStream method?

Thank you for your cooperation and patience.

Regards,
Zdenek