Azure Search page crawler get crawled page content

Elmar Höfinghoff asked on February 16, 2021 10:29

I set up an Azure Search Index using a pages crawler.

When I retrieve the search data, I recognized that only the fields are included, that are set as retrieveable, searchable, etc. in the search fields setting of the page type.

When I query the index, I recognized that all content words in the HTML of the page are found. So they must be somewhere in that index.

My question: How can I access the crawled content within the Azure Search index?

I would like to use this to show a preview of content in my search results. Much of the page's content is not in the same page node. It is stored in child nodes and repeated by list widgets. So I can't just use some page fields for all of the HTML content that is crawled by the page crawler.

Recent Answers


Dmitry Bastron answered on February 16, 2021 11:19

Hi Elmar,

The reason you are not getting the value back is likely that the field sys_content (basically, "Content") is not marked as "retrievable". You can double-check this in Azure search portal.

If I'm not mistaken, this should be customizable. Check out this article - you'd need to hook into CreatingField.Before event, and if the field name is "sys_content" - change it's settings for retrievable to true. After this, you'd need to rebuild this index.

0 votesVote for this answer Mark as a Correct answer

Elmar Höfinghoff answered on February 17, 2021 17:14

Thank you, Dmitry.

Here's what I did and what worked for me:

    public class CustomAzureSearchModule : Module
{
    // Module class constructor, the system registers the module under the name "CustomAzureSearch"
    public CustomAzureSearchModule() : base("CustomAzureSearch") { }

    // Contains initialization code that is executed when the application starts
    protected override void OnInit()
    {
        base.OnInit();

        // add custom event handler
        DocumentFieldCreator.Instance.CreatingField.Before += AddSysContentRetrievable;
    }

    // make the pages crawler indexed content retrievable
    private void AddSysContentRetrievable(object sender, CreateFieldEventArgs args)
    {

        if (!(args?.SearchField?.FieldName??"").Equals("_content"))
        {
            return;
        }

        args.SearchField.SetFlag("AzureRetrievable", true);
    }
0 votesVote for this answer Mark as a Correct answer

   Please, sign in to be able to submit a new answer.