How to exclude UNPUBLISHED documents from Azure Search Index

Ali Azhar asked on October 26, 2019 00:27

We have realized that regardless of whether pages are published or not published, Azure Search will include them within its index. Any way to exclude them, we dont see any option within Kentico's Smart Search module or the Page/Page-type modules. We dont want to have to unpublish a document and also have to remember to check the "exclude from search checkbox". We expect it to automatically exclude it since it is not published.

Thanks!

Recent Answers


Roman Hutnyk answered on October 28, 2019 13:18

What means unpublished? If you have published page and you edit it and save, only the latest version is not published, but previous one remains published and this is what end user sees as well as what is being indexed.

I believe you need to archive a page, so it won't get longer indexed.

0 votesVote for this answer Mark as a Correct answer

Kashif Shah answered on October 29, 2019 17:50

You can also explicitly set each page to be excluded from search: http://devnet.kentico.com/docs/12_0/api/html/P_CMS_DocumentEngine_TreeNode_DocumentSearchExcluded.htm

0 votesVote for this answer Mark as a Correct answer

Ali Azhar answered on October 29, 2019 18:01

@Roman you can create page so that it can be published in the future. So you can set the "PublishFrom" to a future date, that way the page is not published until that date is reached. We dont want these pages to show up within the index's search results until it is published.

@Kashif we thought of that as an alternative, but we dont want to impose this requirement on our clients that they have to do this too. We are surprised Kentico doesn't filter out unpublished pages when pushing them to an Azure Search index.

0 votesVote for this answer Mark as a Correct answer

Arjan van Hugten answered on February 17, 2020 22:26 (last edited on February 17, 2020 22:26)

I also had this problem. I fixed it by filtering on the 'documentpublishto' field. You need to make this field filterable first.

[assembly: RegisterModule(typeof(CustomAzureSearchModule))]

namespace Core.Modules
{
    public class CustomAzureSearchModule : Module
    {
        public CustomAzureSearchModule()
            : base(nameof(CustomAzureSearchModule))
        {
        }

        protected override void OnInit()
        {
            base.OnInit();

            DocumentFieldCreator.Instance.CreatingField.After += CreatingField_After;
        }

        private void CreatingField_After(object sender, CreateFieldEventArgs e)
        {
            var field = e.Field;

            if (field.Name.Equals(nameof(TreeNode.DocumentPublishTo), StringComparison.OrdinalIgnoreCase))
            {
                field.IsFilterable = true;
            }
        }
    }
}

Then you can use the following query to check if the page is published.

$"documentpublishto eq null or documentpublishto gt {DateTimeOffset.UtcNow.ToString("O", CultureInfo.InvariantCulture)}"

1 votesVote for this answer Mark as a Correct answer

   Please, sign in to be able to submit a new answer.