Kentico CMS 6.0 Developer's Guide

Creating an index

Creating an index

Previous topic Next topic Mail us feedback on this topic!  

Creating an index

Previous topic Next topic JavaScript is required for the print function Mail us feedback on this topic!  

Information about content is stored in indexes, which must be defined before any searches using the Smart search module can be performed.

 

There are seven types of indexes, click the link to learn about how indexes of the given type can be configured:

 

Documents index - stores information about the content of documents in the content tree.

Documents crawler index - stores information about the content of documents similar to a Documents index. However, the Documents crawler directly indexes the HTML output of documents.

Forums index - stores information about the content of discussion forums in the system.

Custom table index - stores information about data stored in custom tables.

User index - stores information about site users.

General index - stores information about system objects of a specified type.

Custom index - allows you to use your own custom‑written search index, so it stores any kind of information depending on its implementation.

 

The following example describes the general procedure of search index creation and the options that are available. The procedure is applicable for all types of indexes and differences between them are noted in the text.

 

1. Go to Site Manager -> Administration -> Smart search and click the NewIndex New index link. The New search index dialog will be displayed. You will be asked to enter the following details:

 

Display name - name of the index displayed in the administration interface.

Code name - name of the index used as a unique identifier, typically in web part properties or in website code (the fully qualified file name must be less than 260 characters long, the directory name must be less than 241 characters long).

Index type - sets the type of content to be indexed:

Custom index - indexes any kind of data depending on its implementation.

Custom tables - indexes records in custom tables.

Documents - indexes content of documents in the content tree.

Documents crawler - indexes the content of the HTML output generated by documents in the content tree.

Forums - indexes content of discussion forums.

General - indexes objects of a specified type. Any objects within the CMS can be searched this way.

Users - indexes details about system users (fields of the CMS_User system table).

Analyzer type - type of analyzer that will be used when indexing content, the following types are available:

Custom - allows a custom‑written analyzer to be specified. This gives you the option of performing tokenization according to your particular requirements. If selected, the names of the assembly and class that implement the custom analyzer must be entered into the Assembly name and Class name fields. An example can be found in the Using a custom analyzer topic.

Keyword - tokenizes the entire stream as a single token. This is useful for data like zip codes, ids, and some product names.

Simple - divides text at non-letter characters.

Standard - grammar-based analyzer (stop-words, shortcuts, ...). This option is very efficient for English, but may not produce satisfactory results with other languages.

Starts with - tokenizes all prefixes contained in words, which allows searching for words that start with the entered string. Text is divided at whitespace characters. For example, searching for test returns words such as test, tests, tester, etc.

Stop - contains a collection of stop-words at which text is divided.

Subset - tokenizes all substrings in words, which allows searching for words that contain the entered string. Text is divided at whitespace characters. For example, searching for net returns words such as net, Internet, network, etc.

White space - divides text at whitespace characters.

Stop words - dictionary containing words which will be omitted from indexing (e.g. 'and', 'or', ...) when a Stop or Standard analyzer is used. The dictionaries are stored in ~\App_Data\CMSModules\SmartSearch\_StopWords

Assign index to site <sitename> - if checked, the index will be assigned to the site whose name is displayed.

 

Click OK.

 

devguide_clip1083

 

2. You will be redirected to the index's editing interface. The General tab, which will be displayed by default, allows the editing of the same properties entered when creating the index.

 

Additionally, the Batch size property can be set for the index, which limits the amount of records retrieved by a single query when rebuilding (or creating) the index. The purpose of this property is to help optimize indexing performance. Increasing the value of this property reduces the amount of required queries, which may improve performance, but doing this also increases memory consumption. The ideal value depends on the type of the indexed objects and on available resources. It is recommended to set a reasonable value when indexing large objects (e.g. documents).

 

Notice the Index info box on the right, which displays current information about the status and properties of the index.

 

The following two actions can also be performed on this tab, but it would make no sense to perform them at this stage of creating a new index:

 

RebuildIndex Rebuild - deletes the original index and the specified content gets indexed again. Clicking this action button does not always guarantee that the index will be rebuilt immediately, the process may be delayed if e.g. another index is already being rebuilt or if the rebuilding tasks are configured to be handled by the scheduler. The index is automatically optimized after a successful rebuild.

OptimizeIndex Optimize - de-fragments the index, which results in better search performance, particularly in the case of large indexes.

 

devguide_clip1084

 

3. Switch to the Index tab. This is where you define which documents, forums, custom tables, users or other objects will be indexed. The content of this tab depends on the type of index that you are creating. Detailed information about defining index content is given in Defining document index content, Defining forums index content, Defining custom tables index content, Defining user index content, Defining general index content and Defining custom index content.

 
devguide_clip1085

 

4. Now switch to the Sites tab. Make sure that the index is assigned to the appropriate website. You may also want to optionally assign the index to some other sites in order to have multi-site search results.

 

If the index is defined for global objects that are not site-specific, the selection made here will not affect the index's content. However, the index will still only be available for use (through smart search web parts) on the sites chosen on this tab.

 
devguide_clip1086

 

5. Switch to the Cultures tab (only available for Documents and Documents crawler type indexes). This is where you can choose which cultural versions of  documents will be indexed. At least one culture must be selected in order for the index to be functional. If you have a multi-site index, you can select the cultures separately for each site chosen by the Select site drop-down list.

 
devguide_clip0001

 

6. Finally, go back to the General tab and choose to RebuildIndex Rebuild the index. This needs to be done only for the first time. Any further changes made to the specified content will be indexed automatically. If you wish, you can quickly test the index by switching to the Search preview tab.