Deep Dive - Kentico CMS Caching
As promised, this is my first "Deep-dive into Kentico CMS" post. This time, it is all about caching. Why is caching so important and why you should master it? Get ready and see for yourselves ...
Before I start, I must say this article requires you to have at least basic knowledge of the principles of caching in .NET and principles of caching in general. So if you don't have such knowledge, at least Google some caching tutorial or go through some book chapter about caching before you read further. The same applies to the Kentico CMS basics.
So now you know why we use caching, that it is in many ways the root of the application performance. Let's see how the caching is handled in Kentico CMS.
NOTE: Please note that all this applies to version 4.0, some (most) of the information is valid for previous versions, but it may vary.
Levels of caching
Kentico CMS has several levels of caching. This is very important because sometimes you need to cache some things and not cache other things so obviously you need to be able to select which parts of the solution are static and can be cached and which are too dynamic so the cache cannot be used.
The levels of caching we use are similar to the ASP.NET web site cache levels, because they are based on them. They are:
- PageInfo cache - PageInfo is the smallest necessary information about the page which is needed to handle the structure of the pages in URLRewriting and page processing engine (no matter if ASPX or Portal). This is why we cache this separately. In 99.9% of cases, you should cache that information so I don't recommend you to turn it off. It is stored in the standard .NET cache using several types of keys from URL, NodeAliasPath, Culture, etc. with NotRemovable priority (see how important is to keep this in cache?). The configuration is in our standard Site manager -> Settings section, under the Web site category (Cache page info).
- Content cache - Content means all the data which must be processed by the content page itself. It can be document DataSets processed by repeaters, or any other DataSets or objects which are retrieved from the database, such as users, groups, etc. Basically everything that is on the page and not covered by the PageInfo, is located in the content cache. The important thing here is that only it caches only data, not the output HTML code, not the controls collection. So even if you cache content, the controls still have to be loaded, data bind and perform all operations that they would perform normally. You just save the time to query the database. It is also stored in the standard .NET cache, the keys consist of various parts depending on the content, usually the URL of the page and the control ID or settings and it is stored with High priority (the performance of the page depends on it). General content caching can be configured in the same location as the PageInfo caching, in Site manager -> Settings section, under the Web site category (Cache content).
- Content cache for web parts and controls - Content cache can be configured separately for each web part. If you look at the web part properties, there are the cache settings and one of them is the Cache minutes property. If you set it to specific value, the web part uses that settings. If not, the web part uses the web site settings. Same applies to the CacheMinutes property of the controls if you use the ASPX page templates.
- Output caching - Also known as "Full page caching" is very useful if you know that the whole page is pretty static. Using the full page caching produces much better performance than any other type of caching, simply because the request is stored in the cache in the form of output code which was sent to the client, so when the page is loaded the next time, it doesn't have to be processed at all and the same output as a string is sent directly to the client. This really saves a lot of CPU time. Actually, something is done in the background. It is not that smooth because something must know which page is processed. That something is the URL rewriting engine which processes the request and says what exactly the client wanted, I will tell you something about this later. Output cache uses the standard .NET output cache principles and the key is the standard key + some context specific information provided by VaryByCustom handler. The output caching can be configured separately for each section of the web site, and is configured on the Properties tab of the document.
Is that all? Not even close ...
If you ask me if that is all, the answer is "It depends ...". Yes, it is all for the content page processing of version 4.0. But web site doesn't only process pages (meaning the HTML code of the page which is the output for the client request). It also processes the files, right? So here is another one:
- Image (file) cache - It is obvious that this cache matters for the files (mostly images) and serves to store the reusable content in the GetFile scripts of our solution. We have spent a lot of time thinking about how the file caching should be implemented. We obviously do not want to cache large files, but we want to cache the images and smaller files. It basically caches the information about the file, and if the file is small enough, it's content. That's why the file cache comes with two settings. First is the number of minutes which has the same meaning like other types of cache, it is located in the Site manager -> Settings section, under the Web site category (Cache images). The second one is the maximum size of the file data that can be cached and is located int the Site manager -> Settings section, under the Files section (Maximum file size to cache).
- Client cache - Our GetFile scripts allow the client caching in the browser and they support the revalidation requests. So when someone looks at the file, he only needs to revalidate it when he displays it again. This is automatic.
That is pretty much all about the caching levels you may set-up as the end user in Kentico CMS 4.0. Do you wonder why I am saying 4.0? Because you obviously get more with the upcoming version 4.1 ...
What will be new in 4.1?
If you look at the levels of cache above, you must be thinking that something is missing. You are right. It is the Partial caching. It was not available before (unless you wrote it directly into your controls code), but it will be available in the next version. You can expect to be able to set the Partial caching as a property of the web parts or place the controls to the caching containers in ASPX templates. This will bring you another level of experience where you will be able to fully cache only portions of the page such as headers, menus, footers etc. while the rest of the page will be processed normally. We expect nothing less than bringing the performance of the pages which cannot use the full page cache for some reasons to the numbers between content caching and full page caching. Promising, isn't it?
And we are also optimizing the URL rewriting in 4.1 so the Full page caching can give even better performance.
What type of cache should you use?
I always say, cache everything you can. So you should decide how static your content is and cache everything that is static enough to be cached, or can refresh automatically (see below). Here is a simple procedure how to correctly set-up caching:
See my other post to check out how the caching affects the web site performance.
- Always cache PageInfo, turning off page info cache is good only when you suspect some error may be caused by it to validate that suspicion.
- Cache images (stored in our content repository) if you use them on master page of if you use them heavily.
- For every section of the web site where the pages do not change or don't change frequently, use full page caching.
- Since version 4.1 use Partial caching on all components which do not change or don't change frequently.
- For other sections, use content cache everywhere where the site visitor cannot influence the data displayed on the live site. If he can, setup the dependencies or check if the dependency can refresh itself when the visitor changes it.
- Let only the pages or components which change frequently or the cache item cannot be removed automatically without caching.
- Cache your own code results, especially in transformations displayed in the list.
- Actively debug queries to see what are the data which are not cached, see SQL queries debugging in our Developer's guide for details
NOTE1: The cache is always used only on live site, your code should do the same.
NOTE2: Some components of the solution, such as some module web parts or controls do not use cache. This is usually by design. Such components do not have the cache minutes property. If you still think that some of these components should be cached, please let us know at firstname.lastname@example.org
Let's dive deep
I promised you a deep dive. So far, I was talking just about the configuration and the levels of caching. Now is the time to look into the caching to see how it works.
I told you that the caching is based on the standard .NET caching, so our CacheHelper is just a simple class on the top of the standard Cache.
You store the content in the cache by calling the method CacheHelper.Add just like with standard cache
You get the content from the cache by callind the method CacheHelper.GetItem similar to standard cache
What makes the difference and needs more explanation are the CacheDependencies and dropping the cache when something changes. Over the versions our customers were coming with problems that when they enable caching, they do not get the changes immediately. For this reason we built a pretty robust solution how to make the cached content dependent on other content. That is fully available from version 4.0.
NOTE: At this point, if you don't know how the cache dependency in the .NET cache works, learn it if you want to continue. Otherwise you may not know what I am talking about.
We have had the cache dependency model before (from version 3.1), but it was not mature enough, there were just basics, so what I am about to describe is for version 4.0.
For the depending cache items, we use the "dummy" keys which are used to make the cache dependencies. So the cache item with data never depends on another cache item with data, it always depends on the dummy cache item. Why is that? Simply because you never know if the other key is there or not and still want to cache the item. The dummy key is in the cache only in case some other cache item depends on it because the cache dependency model requires the other key to be present if you want to keep the item in the cache. This is ensured automatically by our CacheHelper.GetCacheDependency method which ensures that all the dependency keys (dummy keys) are in the cache and returns the CacheDependency for them.
When some object changes, it refreshes all its dummy keys present in the cache (the keys on which something depends), causing the dependent items to be removed from the cache. This is done by the CacheHelper.TouchKey method.
When you need to make something dependent, call the CacheHelper.CacheDependency with the dummy key(s) name(s). You can see the example in our CMSPages/GetFile.aspx.cs in the Load method.
When you need to remove from the cache everything that is dependent on the dummy key, call the CacheHelper.TouchKey with the key name.
What is "touched" by the system
Now you know how it works internally and how to use it using the API, let's look at how you can use it as the end user and which system keys are touch by default.
Every web part (control) which has the cache minutes settings, has the cache dependencies settings. By default they are set to the update of any object that could have some influence on the displayed data. For instance, the cached data from UsersDataSource is removed from the cache if any user changes. Of course these dependencies can be customized to limit them only to some items that actually are part of the datasource.
Here is the list of the dummy keys that are touched by the system when something changes (for version 4.0):
+ for every parent node:
Even deeper look? No problem ... get ready for the ride!
NOTE: I you have no idea what is generic programming or class (methods) templates, learn it or hope you will understand the concept.
You know about the levels of caching, you know about the cache dependencies, let's look how you can simply make cached variant of your method.
Simple, traditional, and a lot of additional code way:
DataSet result = null;
string useCacheItemName = "myCacheKey";
// Get the data
result = TreeHelper.SelectNodes("/%", false, "CMS.News");
// Save the result to the cache
CacheHelper.Add(useCacheItemName, result, null, DateTime.Now.AddMinutes(10), System.Web.Caching.Cache.NoSlidingExpiration);
// Get the data from the cache
result = (DataSet)CacheHelper.GetItem(useCacheItemName);
Minimum code pretty much automatic version:
DataSet resultComplex = DataCacheHelper.GetCached3<DataSet, string, bool, string>("/%", false, "CMS.News", TreeHelper.SelectNodes, "myKeyComplex", 10, false, null, null);
Do you see what just happened? You don't have to write the code using CacheHelper and check if it is correct or not, you can write it with one line of code, you can even write this into your transformation without any additional methods definition. Great, isn't it? Let's see how it works ...
DataCacheHelper is a special templated class which offers the caching envelope methods for the methods with 0-10 parameters (you always use the number matching the number of parameters of the original method, in this case TreeHelper.SelectNodes which selects all the News documents on the site which has 3 parameters). Then you specify in the method template what is the output type and what are the parameter types of your method and then you just type in the parameters as you would type in your original method ("/%", false, "CMS.News"). Then, there is the name of the method which should be called (TreeHelper.SelectNodes) and then there are some caching parameters like cache key, cache minutes, if the data should be stored also in the request cache (HttpContext.Current.Items) and also some context to get the dependencies for the object which is optional.
Just note: I really like the concept of templated classes, because it is very powerful. I once wrote an image analysis and processing application in C++ as my Master Thesis which had about 80% of all code in the templated classes and which was fully configurable for the pixel type and operations with the image. So you may expect such solutions also for other things in our solution ;-)
It is now up to you to decide if you want to use the standard coding or the complex coding on one line, you have the choice.
An exercise for you which can help you decide:
Write into your transformation the code which will display the e-mail of the document author in the list of blog posts and uses cache so each of the users on the list is retrieved from the database only once.
My code is:
<%# CMS.DataEngine.DataCacheHelper.GetCached1<CMS.SiteProvider.UserInfo, int>((int)Eval("NodeOwner"), CMS.SiteProvider.UserInfoProvider.GetUserInfo, "user" + Eval("NodeOwner"), 10, false, null, null).Email %>
Try your own code using the standard way.
Stop it! Already too much to memorize for one session ...
I think that is enough for now, you have now the overview about the Kentico CMS caching, how it works and what you may use. If you have any questions or just want to express yourselves, just add some comments below. See you ...