Improvements under the hood – Document and ObjectQuery enumeration without DataSets
In this article, I will talk about how we enhanced page and object querying in Kentico Xperience 13 and the benefits for custom code.
The article is part of our short series presenting new technical features.
Database reader based querying
While the original querying relied heavily on the DataSet class and SqlDataAdapter filling its tables, the new implementation is instead based on SqlDataReader
directly.
Those who are closely familiar with DataSets know the class can be a little too heavy on memory consumption. Furthermore, the need to fill the whole set with data before the querying completes requires a lot of processing time on the server. As a result, the time to execute the whole query is longer and the memory footprint more significant.
Having said that, DataSets are easy to process as one basically works with a materialized collection supporting random access to its elements (rows).
Given the time and memory aspect associated with the original implementation, we decided to adopt a more performant way of querying – the DB data reader.
Going into details
Database reader represents a forward only stream of results from the DB server. The result processing in the application can start as soon as the first row is returned from the DB server. No more waiting for all the data to be transmitted from the DB.
Moreover, since the underlying API does not materialize the result, the overall memory footprint of the result processing is lower. Which implicitly adds to web server performance and throughput.
In the ObjectQueryBase
class, which forms the common API for both ObjectQuery and DocumentQuery, the original approach surfaced via the TypedResult property. The property returns a DataSet based structure holding the strongly typed info objects (or pages).
var q = DocumentHelper.GetDocuments().TopN(ResultCount);
foreach (var treeNode in q.TypedResult)
{
// ...
}
The above could be simplified. Since GetEnumerator was backed by TypedResult, it was equivalent to
foreach (var treeNode in q)
{
// ...
}
The new approach offers the GetEnumerableTypedResult
method. Why a method? It simply gives a hint to developers that there is some overhead associated with the execution, and the result is not cached for the purpose of repeated calls.
foreach (var treeNode in q.GetEnumerableTypedResult())
{
// ...
}
That contrasts with the DataSet (and TypedResult) which is a materialized result allowing for repeated access.
Performance gains
I have mentioned two aspects of the DB reader – smaller memory requirements and less waiting as the result set does not need to be complete before rows are available to the application.
To verify that we actually contributed to better performance, we used BenchmarkDotNet to compare the original (TypedResult property) and the new (GetEnumerableTypedResult method) approach.
The following are results from a simple benchmark retrieving 1, 10 and 100 pages respectively using MultiDocumentQuery
What we consider a typical scenario is the retrieval of 10 items to populate a listing of articles, for example on a home page, or retrieval of a single article to render the content on an article detail page.
As can be seen, the first scenario has a reduced memory requirement of just 62% (158.84 KB) of the original. And takes roughly 83% of the original time.
The latter scenario has a reduced memory requirement of only 37.5% (59.35 KB) of the original. And takes about 68% of the original time.
We also benchmarked more complex scenarios – namely retrieval of the 2nd page of results in a paged query, retrieval of the latest version for versioned pages (top N), and retrieval with permission checks included (top N). Feel free to evaluate the results yourself
General availability of the database reader approach
The article’s heading suggests the improvement affects enumeration of the query results. This is the case, as the internal implementation of the GetEnumerator method was changed to be based on GetEnuemrableTypedResult instead of TypedResult.
foreach (var treeNode in q) // Now based on GetEnumerableTypedResult()
{
// ...
}
Inherently, all custom code enumerating query results either directly (e.g. using the foreach above) or indirectly (e.g. using LINQ extension methods) now benefits from the improved performance.
Even more details
Should you be interested in performing your own benchmarks, the following code represents the stub of most of our measurements
[Benchmark]
public void TypedResult()
{
var q = DocumentHelper.GetDocuments().TopN(ResultCount);
foreach (var treeNode in q.TypedResult)
{
GC.KeepAlive(treeNode);
}
}
[Benchmark]
public void EnumerableTypedResult()
{
var q = DocumentHelper.GetDocuments().TopN(ResultCount);
foreach (var treeNode in q.GetEnumerableTypedResult())
{
GC.KeepAlive(treeNode);
}
}
Summary
The CMS.DocumentEngine.(Multi)DocumentQuery and CMS.DataEngine.ObjectQuery are now based on DB data reader directly, instead of relying on DataSet or SqlDataAdapter. This results in a reduced memory footprint by up to 60% in single document queries and significantly lower processing time.
All custom code that processes query results as an enumeration (e.g. foreach, LINQ extension methods) of pages or objects benefits from the increased performance
var q = DocumentHelper.GetDocuments().TopN(ResultCount);
foreach (var treeNode in q) // GetEnumerator() now uses the new implementation
{
// ...
}