Deep dive: Cache dependencies
Cache dependencies are the single most often topic we always discuss with our larger clients and since the behavior may not be that obvious at the beginning, I decided to explain them in more details to everyone ...
Hi there,
Before you even start reading this post, you should have some knowledge about the Kentico CMS caching and maybe also some API and what is it's current state in Kentico CMS 5.5, you should also be aware about the performance differences you can gain with caching. So the prerequisities are following:
I will somewhat cover all caching options and cache dependencies since they are closely related and the behavior you see may be totally relevant to both settings, and also summarize the most important aspects, too.
Version
First of all, you definitely need to use the hotfixed version to have accurate behavior for the full page cache as described later. So if you want to apply any of this, I recommend to use the latest hotfix (or at least 5.5.7 or newer). With older versions, the same is true except for the default cache dependencies of the full page cache.
Types of caching
There are several types of cache that may be on the path to the result that the site visitor gets, I list them in the order they are evaluated which means (in case the particular cache is enabled) that the request attempts to get the data from the cache. If the data is available in the cache, it is used in the higher level. If the data is not available, it is either taken from DB or lower level (depending on the situation and control), and the result is cached for later use by other requests. The page levels are following:
-
Full page cache (1)
-
Partial cache (2)
-
Content cache (3)
-
Page info cache (4)
The image cache is standing beside this (isn't related to the output HTML) and I will cover it in some other post.
Cache dependencies
Cache dependency is a single control point that can be used to flush multiple cache items somehow relevant to it. Technically, it is a
dummy cache item on which the other (real cache items) are depending. If we look at a sample cache space, we can see the cache items with the data (blue) and the dummy keys (white). You may notice the arrows in there which are the actual cache dependencies in the meaning of "dummy item can influence data item" or "data item is depending on the dummy item".
You may notice that the arrows are always from dummy item to data item, and never the oposite way or between the items of the same type. That serves a purpose that you will learn right now.
If there is some
change done to the data, we need to flush everything that possibly contains such data. What happens is we
"touch" the dummy cache item. Touching in terms of cache means throwing the dummy cache item away and recreating it. The effect of this is that all the depending cache items (in the direction of arrows are also thrown away which is what we call
"flush the cache".
If you are not familiar with how cache and the dependencies in .NET work, I may have a better explanation for you. Imagine that "touching" the dummy item is the same as infecting it with a disease, and that the disease can always spread only in the direction of arrows. So in the end, everything connected to the dummy cache item by a forward arrow gets infected. Unlike the dummy item, that can survive the infection and cure itself, everything else dies (disappears from the cache) while the dummy item stays there, uninfected.
So in the end, only the data items are erased and the new items upon next request connect to the same dummy keys the same way.
This was a simple example, in reality, the cache namespace has many items and different CMS objects provide different dummy keys, but the concept is the same. If you are thinking about cache dependency, you always need to only think this simple:
Single dummy key and it's leafs as the data items. When you touch the key, the leafs get deleted. So if you want to make something dependent on something, you just need to know which dummy keys to use, which leads us back to the dummy keys that are touched upon object change from one of my previous post. The table is reviewed for 5.5:
Object |
Touched keys |
Examples |
Document (TreeNode) |
node|<sitename>|<aliaspath>|<culture>
node|<sitename>|<aliaspath>
nodeid|<nodeid>
nodeid|<linkednodeid>
documentid|<documentid>
documentid|<documentid>|attachments
nodeguid|<nodeguid>
nodes|<sitename>|<classname>|all
+ for every parent node:
node|<sitename>|<aliaspath>|childnodes |
node|corporatesite|/home|en-us
node|corporatesite|/home
nodeid|12
nodeid|34
documentid|56
documentid|56|attachments
nodeguid|1ced44f3-f2fc- ...
nodes|corporatesite|cms.menuitem|all
node|sitename|/|childnodes |
Any object
(except document) |
<classname>|all
<classname>|byid|<id>
<classname>|byname|<codename>
<classname>|byguid|<guid> |
cms.user|all
cms.user|byid|53
cms.user|byname|administrator
cms.user|byguid|1ced44f3-f2fc- ... |
Metafile |
metafile|<guid> |
metafile|1ced44f3-f2fc- ... |
Document attachment |
cms.attachment|all
attachment|<guid>
documentid|<documentid>|attachments |
cms.attachment|all
attachment|1ced44f3-f2fc- ...
documentid|56|attachments |
Forum attachment |
forumattachment|<guid> |
forumattachment|1ced44f3-f2fc- ... |
Avatar |
avatarfile|<guid> |
avatarfile|1ced44f3-f2fc- ... |
Media file |
mediafile|<guid>
mediafile|preview|<guid> |
mediafile|1ced44f3-f2fc- ...
mediafile|preview|1ced44f3-f2fc- ... |
Page template |
template|<id> |
template|12 |
CacheHelper
.ClearFullPageCache |
fullpage |
fullpage |
CacheHelper
.ClearPartialCache |
partial |
partial |
Note: The dependencies are always LOWERCASE.
Also note, that for documents under workflow, the keys are touched upon any change to the document since the document consist of both versioned and not-versioned data.
Debug the cache
There are two ways how to debug the cache, one of them is observing the cache items in the cache on the Cache items tab under Debug, this is more static debugging (just look at the cache space without the arrows):
And the other is to enable dynamic debugging of the cache actions using the web.config keys:
<add key="CMSDebugCache" value="true"/>
<add key="CMSDebugCacheLive" value="true"/>
<add key="CMSDebugAllCaches" value="true"/>
<add key="CMSDebugCacheLogLength" value="10"/>
<add key="CMSLogCache" value="true" />
And see what happened with the cache on the Cache access tab in the same location or on the live site depending on the settings, more information here:
http://devnet.kentico.com/docs/devguide/index.html?cache_debugging.htm
Great thing about Kentico CMS 6.0 which is now under development is that is also provides a list of used dependencies upon ADD action to the cache so it will be even easier to debug.
Very important thing for you is that most of the cache dependencies are covered by default so most of the data is flushed correctly. As you will see in the following text, there are some exceptions that may require your manual work in setting some additional properties if you want this behavior.
Full page cache and it's dependencies (1)
As you already know, the Full page cache (or Output cache) caches the
HTML output of the whole page which effectively means that with next request, the user just gets the whole output HTML with almost any processing on the server side, that is why you can see the performance spike in the performance report, there is almost no overhead with this. In fact, it is very similar to just serve static HTML file. You can configure the full page caching in the document properties (you can set it up differently on each section):
You cannot control the cache item name which is by default in format:
-
pageinfo|<sitename>|<aliaspath>|<culture>|<urlpath>|<combinewithdefaultculture> e.g. pageinfo|corporatesite|/home|en-us||true
Full page cache by default is depending ONLY on that particular document. So if you change the document Home, it flushes only the cache for the URL Home.aspx or its aliases. The caching is always choice of balance between flushing everything upon change (up-to-date view, but poor performance) and flushing nothing upon change (best performance but stale data). That is why it has to be like that by default, the only possible other default option would be to flush everything since in the end, the output cache is just a string, with no context to any code that provided the parts of that string. By default, the performance has always higher priority for us (and it should be the same for you unless you can justify the other option).
This means that if you expect other pages to flush upon change to some specific page (e.g. if you want your menu on all pages to refresh upon change to any main level document name), full page caching may not be the best option for you. You have following options:
-
Do not use full page caching at all (not recommended)
-
Provide a custom cache dependencies specific to your setup (with possibility of worse performance upon change)
-
Live with the fact that some pages will not refresh right away and try to avoid such changes on the live web site (recommended), you may help yourselves by either clearing the cache from Site manager in case of some important change or shorten the cache interval to get the changes sooner.
You just need to call additional code to make the page output be dependent on other things that just that particular page change, you can put it to the page layout for instance or call it anywhere in your code:
<script runat="server">
protected override void OnPreRender(EventArgs e)
{
base.OnPreRender(e);
CMSContext.AddDefaultOutputCacheDependencies();
CacheHelper.AddOutputCacheDependencies(new string[] { "cms.user|all" } );
}
</script>
This will refresh the page upon any change to any user. In 6.0, there will be a web part to provide this functionality.
The default cache dependencies for full page cache are following keys:
-
fullpage
-
fullpage|<siteid>|<aliaspath> e.g. fullpage|1|/home
-
fullpage|<siteid>|<aliaspath>|<culture> e.g. fullpage|1|/home|en-us
-
nodeid|<nodeid> e.g. nodeid|12
-
documentid|<documentid> e.g. documentid|56
You can get these by calling method
PageInfo.GetResponseCacheDependencies().
You can use them to manually flush some output cache for individual pages.
Partial caching (2)
Partial caching (or also Output caching for controls) is similar to Full page caching, but applies only to fragment of the page (specific control). You can set it up in the web part properties, at the very end of the form:
You cannot control the cache item name since this is purely build on the control Output cache provided by .NET framework.
There is no default dependency on any content (there is no context, value is just string without additional context), just on the web part settings (you change the web part properties, the cache flushes), the system dependency is:
-
webpartinstance|<instanceguid>, where instance GUID is unique identifier of the web part on template, you can see it only in the URL of the web part properties dialog or get it through the web part control property. You may need to touch it manually only in case you modify the structure of the page template programatically (I will cover that on my blog sometimes later) and show the changes right away.
You can specify additional dependencies in the web part properties under „Partical caching“ category.
One important thing to note here is that this partial cache relates only to web parts in web part zones in portal engine mode. The partial cache properties are not supported when the web part is used as a standard user control in your own code. If you want to use the partial caching in ASPX mode, you need to use the standard Output cache for controls:
http://msdn.microsoft.com/en-us/library/h30h475z.aspx
Content caching (3)
Content caching is caching of the data used by the controls and web parts to populate their content. Typically, it is a DataSet or some object based on the nature of the particular control. The default value can be set in the web site settings:
And it can be overriden in the web part or control properties (if you leave it blank, it will use the web site settings):
The cache item name can be controlled via the property of the web part, this is the only cache type where it actually may make sense to change it, see below.
The default dependencies (only if checked in the web part properties under „System settings“) are typically related to the source of the data (plus there is a dependency on web part settings just as with partial caching). It is always the smallest subset of objects that could possibly be returned as a result, some examples:
Query repeater – No default dependencies, since query can potentially cover anything, there is no relevant context.
Document repeater (CMSRepeater) or other document lists (data sources), including menus
Typically dependency on the section covered by path and document types covered by document type. The dependencies of such document viewer (data source) can be of two types.
If there are specific classes (document types) configured, for each document type:
-
nodes|<sitename>|<classname>|all e.g. nodes|corporatesite|cms.menuitem|all
If there are no specific document types defined, the dependencies are based purely on path, in case of list (path ending with /%) it is:
-
node|<sitename>|<path>|childnodes, e.g. node|corporatesite|/news|childnodes
In case of single document (detail view):
-
node|<sitename>|<path>, e.g. node|corporatesite|/News/News-1
You can get these by calling method
TreeProvider.GetDocumentCacheDependencies(...).
As you can see, they are pretty general to cover all possible modifications, so they may ocassionally cause "unwanted" flushes of not related data, if you want to replace them by something more accurate, you need to use custom ones.
Users viewer or other object lists (data sources)
Dependency on change of any object of the given type since any object can be potentially displayed in it, it typically is:
-
<classname>|all e.g. cms.user|all
You can specify additional dependencies in the web part properties under „System settings“. If you leave the checkbox "Use default cache dependencies" checked, both custom and default ones will be used.
Page info caching
Page info is used by URL rewriter and to pull some basic information about document, including editable content and the template structure caching. Just as content caching, you can set it up in the settings of the site. It is shared for all pages of the site and cannot be overriden.
You cannot control the cache item name which is by default in format:
-
pageinfo|<sitename>|<aliaspath>|<culture>|<urlpath>|<combinewithdefaultculture> e.g. pageinfo|corporatesite|/home|en-us||true
By default only dependencies on the specific document and its template (it carries only the particular document data and the used template):
-
template|<templateid> e.g. template|12
-
nodeid|<nodeid> e.g. nodeid|12
-
nodeid|<linkednodeid> e.g. nodeid|34 - only for links
-
pageinfo - general key for all page infos
You can get these by calling method PageInfoProvider.GetDependencyCacheKeys(...).
There is currently no way how to specify additional dependencies since it typically doesn’t make sense (it doesn’t carry any other data)
Summary on cache types
So any time you display the page, the information is pulled from the highest cache where the item is available. If it reaches Content or Page info levels (these two aren’t anyhow connected or ordered, they are independent), if the item is not there, it goes to the database. Some examples:
-
CMSRepeater – Uses caches (1), (2) and (3) in that order (3 because the data is provided by general API in various ways)
-
Editable region – Uses caches (1), (2) and (4) in that order (4 because editable content is cached in PageInfo)
-
Static text web part – Uses caches (1), (2) and (4) in that order (4 because the web part settings are cached in PageInfo as part of the template structure)
As you can see, anything in portal engine uses (1) and (2), followed by (3) or (4).
And anything in ASPX based page used (1) followed by (3) or (4), because partial caching is not supported there.
So any time you see the stale data while you expect them to change, it may be sourced in the full page cache.
Basically, if you use more than one cache type on the path, if something changes, all caches on the path need to be flushed. (3) and (4) is done automatically (except for query repeater which has no exact data context), while:
-
In partial caching (2), everything must be done manually if you want to flush the content somehow
-
In full page caching (1), it must be done manually if you want to flush other pages than current one
One important note: If you allow the system to cache item WITHOUT the default dependencies, the item stays there until it expires, the cache is cleared in the Site manager or the app is restarted. (the link to it to flush upon any change was never created)
If you struggle with the cache debug to disable all caches first, enable only the one you want to set up (fix), debug and configure it, and then reenable the caches again.
Cache dependencies of the files (images)
Caching of the files works on the same principle, except that the file cache is configured in special setting:
The dependencies is based on the type of file:
-
Document attachment (including CMS.File) uses typically dependency on that particular attachment item and also uses Content cache (3) for the related information from the document (the only file cache working with two levels) and it's dependencies.
-
Media library file is depending of that single media file and also on the Media library itself
-
CSS stylesheet which is also considered a file depends on the stylesheet itself
-
Object attachment (meta file) is usually depending on that particular meta file
-
Other files are also typically depending on that particular single files
It would be whole new post to cover caching of the files in detail, so I will leave it for some of the next times. The important thing here is that when you do some change, the files get flushed from the cache.
Cache item names
Default cache keys (cache item names) typically consist of name of the data source (several web parts may use the same data source and share one cache if they are configured the same) and all the properties that have effect on the resulting data as well as the context of the site and user (for the matter of security if applied), so menu data source (any menu) has key like:
cmsmenudatasource|[<username>]|<preferredculture>|<sitename>|<aliaspath>|<culture>
|<combinewithdefaultculture>|<classnames>|<where>|<orderby>|<maxrelativelevel>|<onlypublished>|<columns>|<sitemap>
Unless you specify different. Since it is still a source of documents, it uses the default document dependencies as described above.
There is one important thing you need to know. The only need of using the custom cache key is in following situations:
-
Ifyou want to merge the same data items used by different controls, and the cache item names aren't the same by default - This may be typical to save some redundant, but is mostly covered from version 5.0:
-
If you want to split the data items to the independent ones, but the default cache keys are the same - This may be needed in case you further manipulate the data somehow in one of the locations and do not want the same change in others.
Cache dependencies and API
You should always use the CacheHelper method to work with the cache, only that way, you can get the full support of Kentico CMS cache features including the proper behavior of dummy keys and flushing of the cache or propagating the changes to the cache across web farms. Except for the . The most important parts of it you should know are:
-
CachedSection - New caching API in 5.5, totally contention-proof http://devnet.kentico.com/Blogs/Martin-Hejtmanek/June-2010/New-in-5-5--Caching-API.aspx
-
ClearCache(null) - To clear all the cache items
-
CacheHelper.TouchKey("somekey") - This is how you can manually touch a dummy key and flush depending data
-
CacheHelper.GetCacheDependency("somekey") - This is how you can get the cache dependency for your cache item. It basically ensures the dummy keys with the given names and returns the dependency object that you pass to the cache.
So your code might look like:
DataSet ds = null;
using (CachedSection<DataSet> cs = new CachedSection<DataSet>(ref ds, 1, true, null, "mykey", someValue))
{
if (cs.LoadData)
{
// Get from database
//ds = ...
cs.CacheDependency = CacheHelper.GetCacheDependency("somekey");
cs.Data = ds;
}
}
Of course, if you want to make it dependent on some default generic key, just use that key as a parameter. While later in some other code of yours you would call following line to flush your cache item:
CacheHelper.TouchKey("somekey");
What you shouldn't do (unless you are absolutely sure you will never use web farm) is clearing the data cache items directly. This operation is not supported across web farm and also not very effective. You should us dependencies and flush the cache through them by touching the dummy keys.
Web farms and touch key
Before you do any explicit touch key (CacheHelper.TouchKey), note that TouchKey is allowed to be called only on dummy keys. If you call TouchKey on a data item key, it will replace it with some dummy value, making the data inconsistent for the control.
If you need to flush the cache across the web farm, you need to use the touch key on some dummy key, that is the only cache event that is propagated through web farm. So you either use some existing generic dummy key and call that one, or add additional cache dependency like „mykey“ to everything that you want to be dependent and call CacheHelper.TouchKey(„mykey“) to flush it across web farm.
To flush all menus on a current instance, you could call CacheHelper.ClearCache(„cmsmenudatasource|“), but that event is not propagated through the web farm and would flush all sites. Also, it must lock the Cache object and iterate through all the keys which makes the operation very expensive.
If you call CacheHelper.TouchKey(„nodes|<sitename>|cms.menuitem|all“), it will also effectively flush other data depending on the same key, and that event is propagated even through the web farm.
So that is the right thing to do in your code.
Summary
Cache dependencies and propagating that across web farm is very complex feature and once you try it and fully understand it, you will also understand why it cannot be simply covered automatically in every setup. I must say there are tries to somehow make it better and propagate the dependencies to higher levels, but I must say it is a very long time run based on the feedback from you. If you want anyhow to assist us on that, we will be more than happy.
But note that to make that happen, there must be some specific hard rule to say when that should apply, and things like „If the control is document repeater, add this dependencies, if not, add those.“ won’t work, because the control is really a web part wrapping around other controls and code that cannot be simply covered by some interface.
I hope I made the cache dependencies and cache clearer to everyone. If not, let me know what else you need to clarify so I can cover that, too.
See you next time!