In July 2015 Eric shared on http://developer.teradata.com/tools/reference/teradata-python-module insights on the Teradata Python Module which is supporting the devops approach. The article above is describing the main functions and is showing basic examples how to use them. In the meantime, the discussions clarified some questions and also revealed some minor bugs which had been fixed instantly.
When reading the article, it immediately got my interest but as so often the examples are not showing all the details and how the different aspects work together. Therefore, I decided to give it a try and started to develop a small application. Goal was to come up with a functionality currently not available in Teradata and which would be also not so easy to implement in pure SQL or Stored Procedures.
I choose the pivot challenge, e.g. convert rows to columns. The unpivot function is available as a table function but currently I am not aware of a pivot function. It is quite clear what need to be done in SQL for a specific pivot transformation but the challenge is to come up with a generic (e.g. parameter driven) solution in SQL.
The outcome of the PoC development confirmed my initial thoughts and I would hope that this approach is picked up at many customer sites to replace current BTEQ scripts. The approach would also allow to convert many stored procedures or macros into small generic programs. It might also allow a better process integration of complex transformations into existing ETL tools.
The solution I came up with is available on githup at https://github.com/d2k/TDPythonPivot. The solution uses the REST interface to communicate with the DB. Your can either download the zip file or clone the repository. The code is shared under the MIT license.
The process needs two config files (in fact 3 but the udaexec.ini I never touched).
- The first one contains the logon information. These are separated from the remaining parameters which is essential to maintain security. An example of a logon configuration file can be seen at dwl/demo.dwl.
- The second config file contains the application parameters. The parameters are described in the README.md file.
The application comes with 3 demo parameter settings:
contains the parameter to denormalize the day_of_week, day_of_calendar columns of the sys_calendar.calendar table for the last 12 full months.
The result is stored in a table.
will denormalize the day_of_week column for the last 12 full months. The result columns will contain the number how often this weekday exists in the specific month. E.g. the demoralization contains an aggregation.
The result is again stored in a table.
- appini/demo3.1.ini & appini/demo3.2.ini
will denormalize the columntype of the dbc.columns table. Again the columns contain the column count per columntype value.
demo3.1.ini will create a new table
demo3.2.ini will add new data to the existing table.
To avoid duplicate row errors and to allow linking process meta data to the data, the table contains two technical columns: the application and the run_id – which can be linked to the DBQL queryband information.
The devops approach really worked out well. I started with a pure SQL generation approach. After finishing that I added the create table functionality and finally added the Insert/Select function. All driven via parameter settings. Changes are tracked in git and commit history had been maintained (not all in github).
Refactoring can be done easily which improved the code over time.
The parameter driven approach worked well. Both required config files will be handled by the module and do not require different handling.
So again, I think this is the way to go 😉
Comments are welcome
To Do list: adding possibility to allow different null replace values in case of column lists are given to be denormalized.