How to display content from index instead of content field

CSS Team asked on March 19, 2021 13:27

During our migration project from K11 to K13 Core MVC we are facing some issues concerning smart search. We already managed to extract all the relevant data from indexed content (very similar as described in this article) using a custom module and the DocumentEvents.GetContent.Execute handler. Without all the redundant content, such as the main menu, the search index is finally producing the expected results. Besides the title and a link we now also want to include some search result content in the search result presented to the user.

Is there a way to use content from the index or do we really have to use a page type field here - either edited by an author or automatically filled within our custom search module?

If that should be the case, this would be just another example why our customers constantly keep asking themselves why they should pay for a feature they already payed for in previous versions and selling this product has become a real challenge, especially to existing kentico customers.

Thanks and best regards.

Recent Answers


Dmitry Bastron answered on March 22, 2021 18:35

Hi,

There are two things probably to consider here. In Kentico 13 Refresh 1 the widget's content search implementation doesn't require the hack you mentioned anymore. Please refer to this search documentation as well as Refresh 1 release notes.

For the second point, including some of the content from search index please consider using Azure Search instead. It already has highlight functionality, refer to the example in this article (search for "highlight" on the page).

0 votesVote for this answer Mark as a Correct answer

CSS Team answered on March 25, 2021 16:31

As azure search was not an option, this is what we came up with. First we implemented an ISearchCrawlerContentProcessor to get rid of any unwanted content.

public class CustomContentProcessor : ISearchCrawlerContentProcessor
{
    private readonly IEventLogService eventLogService;

    public CustomContentProcessor(IEventLogService eventLogService)
    {
        this.eventLogService = eventLogService;
    }

    public string Process(string htmlContent)
    {
        try
        {
            //// Gets the body element from the HTML content, using the API of the AngleSharp library
            var parser = new HtmlParser();
            var doc = parser.ParseDocument(htmlContent);

            // Removes elements marked with the default Xperience exclusion attribute
            foreach (var element in doc.QuerySelectorAll($"*[{"data-ktc-search-exclude"}]"))
            {
                element.Remove();
            }

            htmlContent = doc.Body.InnerHtml;

            // Removes new line entities
            htmlContent = HTMLHelper.RegexHtmlToTextWhiteSpace.Replace(htmlContent, " ");

            // Removes JavaScript
            htmlContent = HTMLHelper.RegexHtmlToTextScript.Replace(htmlContent, " ");

            // Removes Styles
            htmlContent = HTMLHelper.RegexHtmlToTextStyle.Replace(htmlContent, " ");

            // Removes tags
            htmlContent = HTMLHelper.RegexHtmlToTextTags.Replace(htmlContent, " ");

            // Decodes HTML entities
            htmlContent = HTMLHelper.HTMLDecode(htmlContent);

            return htmlContent;
        }
        catch (Exception ex)
        {
            eventLogService.LogException("CustomContentProcessor", "PROCESS", ex);
            return string.Empty;
        }
    }
}

Next we created a document search event handler module to store this content into a custom page field.

public class CustomSmartSearchModule : Module
{
    private const string PREVIEW_TEXT_COLUMN_NAME = "SearchResultPreviewText";
    private const int PREVIEW_TEXT_MAX_LENGTH = 280;

    public CustomSmartSearchModule() : base("CustomSmartSearch")
    {
    }

    protected override void OnInit()
    {
        base.OnInit();
        DocumentEvents.GetContent.Execute += GetContentOnExecute;
    }

    private void GetContentOnExecute(object sender, DocumentSearchEventArgs e)
    {
        var currentNode = e.Node;
        if (!currentNode.ContainsColumn(PREVIEW_TEXT_COLUMN_NAME) || string.IsNullOrWhiteSpace(e.Content))
        {
            return;
        }

        var previewContent = e.Content.Trim().Substring(0, PREVIEW_TEXT_MAX_LENGTH);
        currentNode.SetValue(PREVIEW_TEXT_COLUMN_NAME, previewContent);
        currentNode.Update();
    }
}

Finally we used this custom field as content source within the search config of every indexed page type. Content Field Setting

0 votesVote for this answer Mark as a Correct answer

   Please, sign in to be able to submit a new answer.