Smart search on files in Content tree

Rakesh Pinnamaneni asked on February 21, 2020 17:48

I have a folder with some pdf files in content tree and I am using smart search indexes to search through the folder for my users, however I am not able to read content to display as search highlight in transformation, it only searches through the name of the file and highlights the words in title, I would like to display the teasing content of the file. Thank you Rakesh

Correct Answer

Juraj Ondrus answered on February 25, 2020 08:42

I see what you mean now - you also submitted a support ticket where it got clarified. I will post my colleague's (Eric Dugre) answer here as well for the future reference in case others will have the same need, there are two options:

  • Using the field _content: This field of the index stores the searchable content of the page (as opposed to the "Content" field, which you see in the results). This will contain the page content and attachment content, but by default you cannot access it. To access this field, you need to add the CMSSearchStoreContentField key to your web.config: Smart search settings (please read the key description as it has some performance impacts). Once that key is added, you need to rebuild the index.

Afterward, you can access the field like any other field, and highlight text in the attachment:

  {%SearchHighlight(GetSearchValue("_content"), "<span style='background-color: #FEFF8F'>", "</span>")%}  
  • Manually get attachment content: The attachment content is stored in the CMS_Attachment table, AttachmentSearchContent column. You can use the attachment's GUID from the FileAttachment field of your page, then load the attachment data. First, you need to go to Page types > CMS.File > Search fields tab and check "Searchable" for the FileAttachment field. After you save that, you need to rebuild the index.

Now, in your transformation you can get the attachment:

  {%nodealiaspath = GetSearchValue("nodealiaspath");
  attachment = GetSearchValue("fileattachment");
  acontent = Documents[nodealiaspath].AllAttachments[attachment].AttachmentSearchContent;

  Content: {%SearchHighlight(acontent, "<span style='background-color: #FEFF8F'>", "</span>")%}

Note that you also need to clear the attachment search cache as mentioned here: Enabling indexing for page attachments.

0 votesVote for this answer Unmark Correct answer

Recent Answers

Brenden Kehren answered on February 21, 2020 18:09

In the search index, if you want to be able to search the content of the files you need to check the box in the indexed content to "Include attachment content". This will allow the index to search that content assuming the content isn't an image.

Now displaying the atachment's content as a teaser, I'm unsure of how to do so but you could download the Luke tool and see where that content is in the index.

0 votesVote for this answer Mark as a Correct answer

Rakesh Pinnamaneni answered on February 21, 2020 18:22

Hi Brenden thank you for the response and Include Attachment Contetn is checked. My filesa are placed in the content tree by themselves in a folder and not as attatchments, when I am trying to use serach hihlight on eval(content)in transformation it is empty, I theink my index is only reading the title of my files and not content of the files, can you advice me on how to show the pdf content in search results like page content, is it even possible for files? Thanks again!

0 votesVote for this answer Mark as a Correct answer

Brenden Kehren answered on February 21, 2020 18:26

If you're uploading them as file page types, then they are considered attachments. As I said previously, I'm unsure how to get that attachment content out. You may have to download the Luke tool to inspect the actual index to find out where it is.

1 votesVote for this answer Mark as a Correct answer

Rakesh Pinnamaneni answered on February 21, 2020 19:01

Thank you where can I find the actual index, I exported the index that I created but its all just the settings and machine names, nothing in there.

0 votesVote for this answer Mark as a Correct answer

Juraj Ondrus answered on February 23, 2020 14:09

The search index files are placed in /App_Data/CMSModules/SmartSearch folder. You can download the LUKE tool from here for example. Anyway, I would test the search setup by creating a testing CMS.File page with simple text file attached to it and search for the content of the text file to see the search indexing is working fine. If this will work, then make sure your PDFs are "searchable". As described in the documentation on searching attachments, The search does NOT work for:

-Legacy MS Office formats: doc, xls, ppt
-Certain types of PDF files, including:
---Encrypted files
---Files using PDF version 1.5 or older

0 votesVote for this answer Mark as a Correct answer

Rakesh Pinnamaneni answered on February 24, 2020 18:17

Thank you Juraj I was able to find the files and inspect them using luke tool and content is being indexed/stored. Also I did upload a sample text file and searched on it, still Eval("Content") returns empty and when uploaded a sample page of type menu item, the content in it's web part shows up to highlight. Attached the content tree and Lucene index inspect. Thank you Rakesh

0 votesVote for this answer Mark as a Correct answer

   Please, sign in to be able to submit a new answer.