For those not already familiar with the Teradata Workload Analyzer (TWA), TWA is one of the products affiliated with TASM (Teradata Active System Management). It provides guidance for the database administrator in defining the workloads for Teradata through the analysis of log data collected over a period of time and by offering recommendations on the workload definitions, classification criteria, and service level goals. It also provides a migration feature from priority scheduler.
Analysis of an individual workload can be initiated by selecting the ‘Analyze Workload’ right-click option available at workload report or clicking ‘Analyze’ tree node option. To further refine the initial set of workloads into one or more additional workloads, TWA uses DBQL data for workload analysis. Theoretically, a workload can be further sub classified into multiple workloads through additional classification criterion. Encouraging the user to sub-classify on any and all possible classification criterion would lead to confusion and many unnecessary workloads. Also, considering the operational performance points, there is a cap on the number of workloads for a database.
Workload Analyzer will guide the DBA towards appropriate classification criterion. At any given point in the analysis, the user is allowed to choose correlation and distribution parameters in the drop-down list, and then click ‘Analyze’ to analyze the associated usage patterns. He can drill deeper within a chosen cluster, or re-analyze by choosing different correlation and distribution parameters. Through trial-and-error and visualization, the user will decide which parameters identify the request group he desires to isolate most effectively. This trial-and-error process is streamlined by providing the user with distinct count and distribution range insight without having to ‘Analyze’. This can eliminate fruitless visualizations on a single user, or a tight distribution, for example.
The overall flow of the GUI can be better understood by looking at the flow chart below.
This drill down is a recursive process for deeper analysis on correlation and distribution parameters. If not satisfied with the current analysis parameters, the DBA can select more appropriate parameters, as guided by reviewing distinct values and range for those other parameters.
For example, with respect to the distinct value counts, one particular workload could display the following characteristics:
- User Name (24)
- Applications (1)
- Account Name (1)
- Client addresses (2)
- Queryband (3)
- Function (3)
- Urgency (1)
- AggLevel (8)
- Estimated Processing Time (0-1000 secs)
- AMP Count (0-1)
In this example, the DBA would be empowered to know that there is only 1 distinct application or account, and they all run at the same urgency, so trying to identify a correlation against different application, account or urgency values would be a wasted effort. However, the opportunity for correlation does exist with Users, Function and AggLevel. The DBA could pursue those correlation options. Similarly for the distribution parameter ranges: An estimated processing time range from 0 to 1000 seconds suggests a large variation of requests are included in this workload. The opportunity for identifying clusters is higher than if the estimated processing time range was simply 0-1 second.
The DBA can add clusters to the current workload for deeper analysis or clusters can be split off into a new workload. DBA can repeat this process until good set of workloads is defined or all unassigned clusters are assigned to workloads.
TWA uses an assigned and unassigned cluster concept. Each cluster (Accounts, Users, QueryBands) found during analysis are initially unassigned. Selected clusters are assigned after adding clusters to the current workload for deeper analysis or after splitting out into a new workload. Unassigned clusters remain for subsequent action by the DBA if desired. TWA brings back all unassigned clusters if the same analysis parameter is clicked again after displaying the informational message below.
If unassigned clusters are not acted on by the DBA, the associated requests will be relegated to a different workload once the ruleset changes are saved. For example, consider the following set of six workloads that were generated after the 1st level of analysis on Accounts, where Workload A is defined with classification Account=A:
The DBA decides he would like to analyze workload A, who consumes 35% of the CPU. Based on some criteria (e.g. client user), he determines that one element should be isolated, and treated different than the other elements. He has two choices on how to do this: either split or add classification to existing workload.
If the user splits on the particular element, the result is a new workload, A2, with classification Account=A and Client User = xyz. A2 automatically has a higher evaluation order than the original workload A to assure client users of xyz execute within A2, and all other client users will execute within A. The CPU distribution divided between the old (A) and new (A2) workload as follows:
Alternatively, if the user chose to instead add classification to existing workload, (so that the workload classification of A is now Account=A and Client User = xyz), un-chosen elements will be designated “unassigned”, as depicted in the following diagram. If not further acted upon, they will end up executing within WD-Default because no other workload exists that would capture requests with classification Account=A and NOT client user = xyz.
It is suggested, to avoid accidental relegation of unassigned clusters to WD-Default or some other unexpected WD, that drill-down probes begin their first analysis step using the Split option. Additional refinements should then be done using the “add classification to” option against that new workload, so that unassigned requests will be relegated back to the original workload. (Example 2 below will demonstrate this technique.)
The DBA can select correlation parameters (“Who” and “Where”) and distribution parameters (“What” and “Exception”) at each depth of analysis. The DBA can also review the workload by viewing classification list after each depth of analysis and perform Undo (if needed). The Undo operation can be used to undo any previous analysis performed. It will delete assigned clusters from workload classification and bring them back as unassigned clusters for new add/split operations.
A Drill-down Example:
Here is a detailed example exploring the mechanics of workload analysis functionality and all the information provided within.
Select “Analysis->New Workload Recommendations” option. Define the DBQL inputs and Account String category for initial level of analysis.
Select all unassigned request group to new workload, WD_ABC.
At this point, the CPU distribution of all workloads (1) is shown below:
Select ‘Analyze Workload’ right-click option for cluster analysis. The ‘Analyze Workload” tabbed window as shown as below is displayed to select data filters for analysis.
The ‘Current workload classification’ list will display a summarization of the classification:
The numbers adjacent to current classification criteria type will show the total number of elements for correlation and min and max value for distribution classification parameters. E.g.
- Account String (2) => Total 2 accounts are classified.
- Estimated Processing Time (0 – 200 secs) => Workload is classified for Estimated Processing Time between 0 to 200 secs.
The DBA can view the detailed classification and exception definitions by clicking “View Classification” and “View Exception” buttons.
Select appropriate Correlation and distribution parameters. Note that there are 10 distinct Applications found for WD-ABC workload, making it a good candidate to analyze deeper. The following shows that analysis visualized:
Note that the ‘TWA’ Application has significantly higher CPU/IO and Avg EstimatedProcTime than other applications. It is selected for deeper analysis using the “Add Application clusters for deeper analysis” option on the right-click menu. The Report and Graph is updated for remaining unassigned clusters, while ‘TWA’ is assigned on the current workload for deeper analysis.
DBA can view classification for the current workload by clicking ‘Data Filter’ tab.
The CPU distribution of all workloads now looks like the following:
Please note that if unassigned clusters are not added to current/new split workloads before saving rule set to database then all queries arrived for unassigned clusters will be executed part of default workload (WD-Default).
If DBA selects Application again then all unassigned 9 clusters are brought back for subsequent operations (1 is assigned to current workload).
The user can then assign all remaining clusters using either the Add-to or Split options. The user chooses to select all 9 unassigned clusters and split it to new workload called WD-Others.
The CPU distribution now looks as follows, with no unassigned requests.
Finally DBA saves the rule set to database for activation.
A second Drill-down Example:
This example uses analysis against multiple QueryBand parameters to help identify and isolate various request clusters, or provide additional granularity on request clusters. One initial workload consumed a vast majority of resources, and a more granular breakdown of that workload is desired. Also, long-running outliers were noted in the analysis, and a goal is to have these outliers classified into their own workload so that different workload management techniques applied.
Initial workloads are defined based on Account String using auto-generate option.
The CPU distribution of all workloads looks as follows:
Workload WD-ADW-DS is selected for further analysis as it is consuming 92% of the total CPU.
Queryband names and values are loaded in ‘Analyze Workload’ window after clicking ‘Analyze’ tree node. This system found total 5 distinct queryband names. These Queryband names with distinct value count are loaded automatically in QueryBand Filter list for analysis. However, this list can be viewed only when queryband is selected for correlation parameter list.
- QueryBand (5)
- AggLevel (7)
- Function (5)
- Region (7)
- TopTierApp (3)
- Urgency (3)
There are several potentially good analysis candidates here as denoted by the distinct counts. The user selects QueryBand as correlation parameter and query band name of Function from queryband filter list, then clicks “perform analysis”.
Below are the correlation and distribution reports/Graphs. A Total of 5 queryband values are displayed for QueryBand Name=Function.
Notice a possible distinction with Function = MIN, who included queries far lengthier than any other queries. Select ‘MIN’ row, Right-click and Add ‘MIN’ queryband value and split it to new workload ‘WD_ADW_Outliers’. This step is needed to assure unassigned clusters fall back into the original ‘WD-ADW-DS’ workload classification.
The QueryBand Function = MIN is split to new workload called ADW_Outliers and the remaining 4 functions are unassigned, falling back to ADW-DS if no other action by the DBA is taken on them. The CPU distribution now looks as follows:
The Correlation/Distribution reports and graph are refreshed with the remaining 4 unassigned queryband values for next add/split operation.
Further drill-down will be performed on the newly split workload ‘WD_ADW_Outliers’.
Select ‘Analyze’ from ‘WD_ADW_Outliers’ workload tree node for further analysis.
Select TopTierApp from Queryband filter list and click ‘Perform Analysis’ button.
The longest running queries are all common to not just Function =MIN, but also QueryBand Name TopTierApp = BODSS. Select ‘BODSS’ TopTierApp queryband value and add it for deeper analysis on current analyzed workload ‘WD_ADW_Outliers’.
The CPU distribution now looks as follows, with the 2 unassigned toptierapps being relegated to the original WD_ADW_DS workload rather than being part of the WD_ADW_Outliers workload:
Analyzing further on TWA_Outliers, notice that the distinct count for all the correlation parameter shows 1 (only 1 distinct value). This means that all requests in this workload are coming from the same combination of who parameters (Accout ADW_DS and Queryband Name Function = MIN and Queryband Name TopTierApp = BODSS). Now only distribution parameters can be used for further drill-down analysis. The Estimated Process Time can be used as distribution parameter for since the Estimated Processing Time range for current workload is wide (0.00 -158.00 secs).
Select ‘None’ as correlation parameter and Estimated Processing Time as distribution parameter.
Click ‘Perform Analysis’ button to generate distribution graph.
The 8 queries lie in last bucket (10 bucket – Range 142.20 – 158.00) long queries. However, another 8 queries in first bucket (Range 0.00-15.80) are comparatively short running queries, with a wide gap shown in buckets 2-9.
Our goal from the start was to isolate the long running requests found within ADW-DS workload and apply different workload management to them. By adding the last bucket to the ADW_Outliers workload, we will achieve the necessary workload definition to then apply those different workload management techniques. The user selects ‘Add Estimated Processing Time clusters for deeper analysis’ to ‘WD_ADW_Outliers’ after highlighting the last bucket from distribution report, overriding the min estimated processing time to 30 seconds, and overrideing the max estimated processing time to 999999 (essentially unlimited).
The CPU distribution now looks as follows, with the very short running requests relegated back to the original ADW_DS workload.
In summary, the workload classifications of interest within this example are now as follows:
WD-ADW-DS remains in its original form:
WD_ADW_Outliers is created with a higher evaluation order than WD_ADW-DS as follows: