The
purpose of this document is to provide details on importance of aggregation,
how effectively we can design aggregations, performance monitoring and usage
based aggregation.
The
aggregations are the high level summaries of data. It is analogous to
calculated value by GROUP BY statements of T-SQL with some attributes. SQL
Server pre-aggregates the measures with respect to dimension attributes at some
extent instead of executing the query runtime. Pre-aggregation improve the
query performance.
Following are the reasons
to build the aggregation:
Performance
is the top priority of all the OLAP system and having aggregated data will
helpful to easily retrieve the data. E.g. if we aggregate the yearly data based
on days in a year. Cube has to aggregate a large no. of columns. In place of
days wise, if we have aggregated data up to months or quarter, we will get
yearly aggregated data more quickly.
Aggregations
improve query performance by limiting the amount of partition data that needs
to be scanned to answer a query. Instead of querying the fact level data of the
partition, SSAS will query the aggregation and because the amount of data in
the aggregation is smaller, the query can be answered more crisply
How to do logging and
Performance monitoring
To
enable the logging, go to analysis service instance where your cube exists,
Right click on the server èProperties
By
default, Analysis Services does not log queries into the query log table. To
log queries, you need to set Analysis Services properties. Change the below
properties to enable logging:
ü QueryLogConnectionstring
will contain the ConnectionString e.g. Server name & DB name to create log
table
Table
is created.
I
have browsed the cube for End Customer revenue fact and logging information
with fact name & query execution duration is properly logged in the
OlapQueryLog table.
By
using OLAPQueryLog table, we can get information about facts on which users are
mostly interested and frequently called facts
Check the profiler for the
logged information:
How to design aggregation
Based
on the above statistical data, if we feel that query is taking time and
performance of the query can be improved. In this case, we can go for the
designing aggregation for Fact EndCustomerRevenue.
For designing the aggregation, open the cube
solution and go to Aggregation tab è End Customer Revenue fact.
Here it is showing ‘unassigned aggregation design’, it means there is
explicitly aggregation designed for this fact.
Select
‘Design Aggregations’ to design aggregation. It will start the aggregation
design wizard
Click
next. It will open Review Aggregation Usage window. Here if we want to exclude
any dimension attributes from aggregation, we can remove it to avoid
unnecessary aggregation
Options are Default, Full, None, and Unrestricted. Full means that the
attribute must be included in every aggregation, while None means that the
attribute will not be in any aggregation. Default means that the SSAS engine
will examine the attribute and associated relationships and try to determine
whether or not to include that attribute in aggregations using a set of
internal rules. These rules are very restrictive and a lot of attributes get
excluded from consideration. Unrestricted means that the attribute will be
considered without regard to the internal set of rules. For the first creation
of aggregations, I just leave all attributes set to Default. We will keep analyze our cube and accordingly
will change the usage configuration of the aggregation. Click next
It will open the object count window, either we
can enter the estimated count value or wizard will automatically count the row.
Entering count values manually will be good in case of bulk amount of data
because wizard may take lot of time to count. While specifying the value
manually, make sure value should be nearer to exact count because the
Aggregation Design Wizard uses the object counts to estimate storage
requirements.
Click
on count to get the estimated count
Next
is Aggregation option. It will provide an option to design aggregation with
respect to storage space or performance point of view. In designing our cube,
you will have to balance the storage needs of the aggregation tables against
the speed and performance of the queries. The three approaches to achieving
this balance are as follows:
ü Set the storage size and let Microsoft® SQL Server
Analysis Services determine which aggregations to store. This approach works
well when you have limited storage space.
ü Set the percentage of performance gain and let the
necessary aggregation tables take as much storage space as they need.
ü Click Stop: Manually determine the best balance by
watching the progress of the Performance vs. Size graph.
I
am selecting Performance gain because for many applications, performance is
more important as compared to storage space and Cube processing time. Click
start to design aggregation
It
creates 274 aggregations with occupying 446.5MB disk space. Save the
aggregation name. We can either deploy the changes now or we can save only.
Click finish.
In
the Cube aggregation tab, we can see the designed aggregation:
Monitor the performance of
the cube after aggregation design
Deploy
the designed aggregation and check the query performance by using profiler:
Fact End Customer Revenue.
Fig
showing query execution duration before designing aggregation- 7952 Sec
Fig
showing query execution duration after designing aggregation- 4000 Sec
Net New Quota
Fig
showing the query execution duration before designing aggregation- 1484 sec
Fig
showing the query execution duration after designing aggregation- 1484 sec
Fig
showing the query execution duration before designing aggregation- 1344
Fig
showing the query execution duration after designing aggregation- 953
Usage based optimization
The
problem with the Aggregation Design Wizard is that it assumes all queries are equally
likely. Therefore, some of the aggregations it designs are likely to be useful,
but some will not be. More importantly, there will be aggregations you need
that are not created by the Aggregation design wizard. To create those
aggregations that are required but aggregation wizard does not created, we can
use usage based optimization.
Click on the advance view
under aggregation view:
It
will display the aggregation designed with respect to dimension attributes:
Based
on the logging information, if we found that we need some more aggregation then
we can check the box against the associated dimension attributes. If we feel some of the aggregations are not
required and wizard is creating unnecessary, in this case, we can uncheck it.
Removing unnecessary aggregations will helpful to manage disk space for the
partitions as well as it reduce the cube process time.
Disadvantages of more no. of
aggregations
Storage
space required for the aggregated table is more.
Processing
time for the cube is more.
Conclusion
By
using the above steps, we can understand how to design aggregation to improve
cube query performance and based on user’s action, we can do optimize the
aggregation to provide best performance and adequate disk storage to the cube.
No comments:
Post a Comment