Smart Search, Custom Highlighted Results

Francis Carroll asked on October 22, 2021 14:20

Hi,

I need help in returning custom results to the user when they search a term using Kentico Smart Search.

For Example the user searches: Home

I would then like for the result to look like this:

Home

This is the home page…

This functionality would need to:

  1. Crawl through the page
  2. Parse the content
  3. Display the results, with the searched phrase highlighted

I have created the custom index from this tutorial : https://docs.xperience.io/configuring-xperience/setting-up-search-on-your-website/using-locally-stored-search-indexes/creating-local-search-indexes/creating-custom-smart-search-indexes

This has the desired effect but it is checking text files and not pages from the content tree. How would I go about implementing this? There are little to no resources online about this and I don’t have access to the base Pages Index to attempt to replicate it and extend with this custom functionality.

Smart Search Controller

This is the controller used for executing the search

public class SmartSearchController : Controller
{
    public static readonly string[] searchIndexes = new string[] { "Ownership.Index" };

    private const int PAGE_SIZE = 10;

    private readonly TypedSearchItemViewModelFactory searchItemViewModelFactory;

    public SmartSearchController(IPageUrlRetriever iPageRetriever)
    {
        this.searchItemViewModelFactory = new TypedSearchItemViewModelFactory(iPageRetriever);
    }

    //[ValidateInput(false)]
    public ActionResult Index(string searchText)
    {
        // Displays the search page without any search results if the query is empty
        if (String.IsNullOrWhiteSpace(searchText))
        {
            // Creates a model representing empty search results
            SmartSearchResultModel emptyModel = new SmartSearchResultModel
            {
                Items = new List<SmartSearchResultItemModel>()
            };

            return View(emptyModel);
        }

        // Searches the specified index and gets the matching results
        SearchParameters searchParameters = SearchParameters.PrepareForPages(searchText, searchIndexes, 1, PAGE_SIZE, MembershipContext.AuthenticatedUser, "en-us", true);
        SearchResult searchResult = SearchHelper.Search(searchParameters);

        var searchResultItemModels = searchResult.Items.Select(searchItemViewModelFactory.GetTypedSearchResultItemModel);

        // Creates a model with the search result items
        var model = new SmartSearchResultModel
        {
            Items = searchResultItemModels.ToList(),
            Query = searchText
        };

        return View(model);
    }
}

Custom Content Processor

This is my custom crawler that excludes the header, footer and mobile content

  [assembly: RegisterImplementation(typeof(ISearchCrawlerContentProcessor), typeof(CustomContentProcessor))]

public class CustomContentProcessor : ISearchCrawlerContentProcessor
{
    public string Process(string htmlContent)
    {
        var parser = new HtmlParser();
        IHtmlDocument doc = parser.ParseDocument(htmlContent);
        IHtmlElement body = doc.Body;

        // Removes script tags
        foreach (var element in body.QuerySelectorAll("script"))
        {
            element.Remove();
        }

        //excludes the header, which includes the navigation bar
        foreach (var element in body.QuerySelectorAll("header"))
        {
            element.Remove();
        }

        //removes all elements with the footer tag
        foreach (var element in body.QuerySelectorAll("footer"))
        {
            element.Remove();
        }

        //removes all elements with the mobile tag
        foreach (var element in body.QuerySelectorAll($" *[{ "search-exclude-mobile"}]"))
        {
            element.Remove();
        }

        // Removes elements marked with the default Xperience exclusion attribute
        foreach (var element in body.QuerySelectorAll($"*[{"data-ktc-search-exclude"}]"))
        {
            element.Remove();
        }

        // Gets the text content of the body element
        string textContent = body.TextContent;

        // Normalizes and trims whitespace characters
        textContent = HTMLHelper.RegexHtmlToTextWhiteSpace.Replace(textContent, " ");
        textContent = textContent.Trim();

        return textContent;
    }
}

Custom Index

This is my custom index that has the tutorial index implemented

  public class TextFileIndex : ICustomSearchIndex
    {
        public void Rebuild(SearchIndexInfo srchInfo)
        {
            // Checks whether the index info object is defined
            if (srchInfo != null)
            {
                // Gets an index writer object for the current index
                IIndexWriter iw = srchInfo.Provider.GetWriter(true);

                // Checks whether the writer is defined
                if (iw != null)
                {
                    try
                    {
                        // Gets an info object of the index settings
                        SearchIndexSettingsInfo sisi = srchInfo.IndexSettings.Items[SearchHelper.CUSTOM_INDEX_DATA];

                        // Gets the search path from the Index data field
                        string path = Convert.ToString(sisi.GetValue("CustomData"));

                            // Checks whether the path is defined
                        if (!String.IsNullOrEmpty(path))
                        {
                            // Gets all text files from the specified directory
                            string[] files = Directory.GetFiles(path, "*.txt");

                            // Loops through all files
                            foreach (string file in files)
                            {
                                // Gets the current file info
                                FileInfo fi = FileInfo.New(file);

                                // Gets the text content of the current file
                                string text = fi.OpenText().ReadToEnd();

                                // Checks that the file is not empty
                                if (!String.IsNullOrEmpty(text))
                                {
                                    // Converts the text to lower case
                                    text = text.ToLowerCSafe();

                                    // Removes diacritics
                                    text = TextHelper.RemoveDiacritics(text);

                                    // Creates a new Lucene.Net search document for the current text file
                                    SearchDocumentParameters documentParameters = new SearchDocumentParameters()
                                    {
                                        Index = srchInfo,
                                        Type = SearchHelper.CUSTOM_SEARCH_INDEX,
                                        Id = Guid.NewGuid().ToString(),
                                        Created = fi.CreationTime
                                    };
                                    ILuceneSearchDocument doc = LuceneSearchDocumentHelper.ToLuceneSearchDocument(SearchHelper.CreateDocument(documentParameters));

                                    // Adds a content field. This field is processed when the search looks for matching results.
                                    doc.AddGeneralField(SearchFieldsConstants.CONTENT, text, SearchHelper.StoreContentField, true);

                                    // Adds a title field. The value of this field is used for the search result title.
                                    doc.AddGeneralField(SearchFieldsConstants.CUSTOM_TITLE, fi.Name, true, false);

                                    // Adds a content field. The value of this field is used for the search result excerpt.
                                    doc.AddGeneralField(SearchFieldsConstants.CUSTOM_CONTENT, TextHelper.LimitLength(text, 200), true, false);

                                    // Adds a date field. The value of this field is used for the date in the search results.
                                    doc.AddGeneralField(SearchFieldsConstants.CUSTOM_DATE, fi.CreationTime, true, false);

                                    // Adds a url field. The value of this field is used for link urls in the search results.
                                    doc.AddGeneralField(SearchFieldsConstants.CUSTOM_URL, file, true, false);

                                    // Adds an image field. The value of this field is used for the images in the search results.
                                    // Commented out, since the image file does not exist by default
                                    // doc.AddGeneralField(SearchFieldsConstants.CUSTOM_IMAGEURL, "textfile.jpg", true, false);

                                    // Adds the document to the index
                                    iw.AddDocument(doc);
                                }
                            }

                            // Flushes the index buffer
                            iw.Flush();

                            // Optimizes the index
                            iw.Optimize();
                        }
                    }

                    // Logs any potential exceptions
                    catch (Exception ex)
                    {
                        EventLogProvider.LogException("CustomTextFileIndex", "Rebuild", ex);
                    }

                    // Always close the index writer
                    finally
                    {
                        iw.Close();
                    }
                }
            }
        }
    }

Recent Answers


David te Kloese answered on November 3, 2021 11:08

Well you obviously have to update the part that loops through txt files.

    // Gets all text files from the specified directory
    string[] files = Directory.GetFiles(path, "*.txt");

    // Loops through all files
    foreach (string file in files)
    {

And now it depends on how you setup your content, and what you would like to index.

You could use the API way and just use a API call per page type that would get all nodes. This is the most structured way. It allows you to have full control over what items are indexed and what are not.

However if you have pages that live on your site that are not linked 1-1 to a tree node item. It might miss content or urls.

Or you could build a manual crawler based e.g. your sitemap. That does http calls and parsed the result. The downside of that is that you might miss pages and it also contains duplicate sections like your header and footer.

You might end up with a mixture of the two above, but it does depend on your personal situation and functional wishes.

0 votesVote for this answer Mark as a Correct answer

   Please, sign in to be able to submit a new answer.