How to optimize the Data Warehouse - 5 Point Plan
Overview
SCOM simultaneously writes performance data, events and monitor state changes to the Operations Manager database (Ops DB) and the Data Warehouse (DW). SquaredUp DS accesses data from both the Ops DB and the Data Warehouse:
- SquaredUp DS authenticates through the SCOM SDK using role-based access control (RBAC) to access core runtime data such as users/roles, entities, classes, monitors, alerts and health status.
- SquaredUp DS accesses the Data Warehouse directly using the published schema, to show performance data, state history (SLA tile) and event data (SQL tile).
So performance tuning the Data Warehouse is key to fast data retrieval in SquaredUp DS. How do you get the most out of your Data Warehouse performance? There isn't one configuration setting, but this article covers a five point plan for reviewing the performance of your Data Warehouse.
You may also like to watch the SquaredUp DS webinar 'Tuning the SCOM Data Warehouse':
1. Sizing
What sort of resources does my SQL server running the Data Warehouse server require? As much as you can get!
Before SquaredUp DS the Data Warehouse was being written to, but was probably largely unused.
- No fixed sizing rules, but the Data Warehouse is now a core component and will significantly influence user experience.
2. Configuration
The two resources you must follow:
- SQL 2012 and System Center 2012 R2 Database Tweaks
- System Center Operations Manager Field Experience e-book
Then monitor using Tao Yang’s Community resources:
3. Data In
SCOM is constantly writing data to the Data Warehouse. Check whether you do need all that data.
- Tune collection intervals - Do you need high frequency granular performance metrics? If not, perhaps tune down the collection intervals.
- Disable unnecessary rules - Disable any rules that are not useful to your organization. Share details with your organization about what SCOM is collecting, to see if they want that data at that frequency, or if anything is missing.
Here are several common sources of high load:
Flip-flopping monitors
Check for noisy monitors 'flip-flopping'. It's not just performance data, every health state change is written to the Ops DB and the Data Warehouse, so if there are monitors that are changing from green to red frequently, find those monitors and tune the thresholds, or disable the monitors, to save all those write actions.
Make use of Kevin Holman’s 'noisiest monitors' query: "Tuning tip: Do you have monitors constantly 'flip-flopping'?"
Rules and Events data
These queries can help you identify rules and events which might need tweaking. Run these queries on the Data Warehouse database in SQL Server Management Studio or in SquaredUp DS:
High frequency performance collection rules
SELECT TOP 50 r.RuleSystemName, Count(*) AS Count FROM perf.vPerfRaw AS p JOIN vPerformanceRuleInstance AS pri ON p.PerformanceRuleInstanceRowId = pri.PerformanceRuleInstanceRowId JOIN vRule AS r ON pri.RuleRowId = r.RuleRowId GROUP BY r.RuleSystemName ORDER BY Count DESC
Event data
SELECT TOP 50 r.RuleSystemName, Count(*) AS Count FROM Event.vEvent AS e JOIN Event.vEventRule AS er ON e.EventOriginId = er.EventOriginId JOIN vRule AS r ON er.RuleRowId = r.RuleRowId GROUP BY r.RuleSystemName ORDER BY Count DESC
4. Data Retention
The Data Warehouse stores data in three forms: Raw, Hourly and Daily.
SquaredUp DS, when using the auto resolution setting, uses the highest resolution data available (raw, hourly or daily), based on what data is available in the Data Warehouse for the whole graph timeframe. So SquaredUp DS will use raw data as long as it is available for the whole reporting timeframe, then it will use hourly, and when that is not available for the whole timeframe it will use daily data.
For Performance tiles the resolution is set in metric > resolution
Whether raw or hourly data is available for the whole timeframe will depend on your Data Warehouse data retention settings. Using the default SCOM data retention settings means that SquaredUp DS timeframes of 1 week or less will probably be using raw data, for timeframes between 30 days and 12 months SquaredUp DS will be probably be using hourly data. The SCOM data retention settings may well have been changed from the default.
Kevin Holman's blog shows you how to run a Microsoft tool called DWDATARP.EXE which shows the percentage of the Data Warehouse taken up by different types of data:
Understanding and modifying Data Warehouse retention and grooming
This shows that hourly performance data can become huge – typically 33% of the Data Warehouse size.
- Do you need all that hourly data? It's not just about the capacity of the Data Warehouse for storing the data, but more the performance, in terms of the SQL server's memory. If hourly data is available for a year, then users may well be querying it and pulling all that data back. Consider reducing the amount of hourly data retained.
- How long do you keep the data for? Consider reducing hourly retention to three to six months.
Take a look at the percentage for event data, as this is often little used, but can take up considerable space.
5. Data Out
When using the Performance tile (see How to use the Performance tile), carefully choose the resolution in the Metric panel.
Resolution refers to the raw, hourly or daily data stored in the Data Warehouse.
Using the autoresolution setting on a Performance tile automatically changes to a more suitable resolution when the page timeframe is changed by a user, in order to optimize the time to return the graphs.
The auto setting applies to visualizations that use time periods, such as line graphs, sparklines and reports. Visualizations that do not show a time period, such as Bar Top N, Heatmap and Scalar, use the latest data point. These visualizations do not change when a user changes the page timeframe. For these visualizations the auto resolution setting defaults to daily.
- Raw: slow
- Hourly: fast
- Daily: fastest
Recommendations:
- Performance Bar Top N should use hourly or daily where possible.
- For large groups (>100 objects) use hourly or daily where possible.
- 6 months graphs not showing hourly data?
Why is daily, rather than hourly, data shown for a 6 month graph?
When the retention period for hourly data is set to 180 days and a SquaredUp DS graph is set to display 6 months of data, this can actually be 182 or 183 days. SquaredUp DS will switch to daily data because hourly data is not available for the full 6 month period, i.e. the retention period is less than the date range. The solution is to modify the retention period to reflect 6 months i.e. an extra 5 days and set the retention period to 185 days, this should not affect your storage levels too greatly. After a few days this will resolve the problem because 6 months of hourly data will then be available, when SquaredUp DS checks the retention period against the requested time frame, the retention period would be higher and SquaredUp DS would show hourly data.
- 'No values found' message on a tile?
See No values found for the given metric for the last <time period> at the resolution '<raw/hourly/daily>'>
Timeouts:
- Timeout is 30 seconds
- It cannot be configured
- If queries are taking > 30 seconds, the query is not working as expected!
For more information see Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding