Come Teradata 14 and we have series of new feature which was long Cherished to be wish in OLAP system & here we are with series of Teradata 14 new features which can change the way people do computing in database.

Teradata Regular Expression : Series 1

Teradata 14 : Regular expression Test Cases:

Case 1 :

`select` `regexp_replace(``'SatyakiDe'``,``'^(.*)([[:upper:]]{1,})(.*) \$'``,``'1 23'``) ``AS` `COL_VAL;`
`Column_1         COL_VAL`
`---------------- ----------------------------------------`
`Teradata        Terada ta`

Case 2 :
`select` `regexp_replace(``'919047242526'``,``'^([[:digit:]]{2})([[: digit:]]{10})'``,``'+1 2'``) COL_VAL;`
Column_1         COL_VAL
`------------ ---------------`
`919987529486 +91 9987529486`

## String Manipulation Using Teradata Regular Expression

1. How to Split Comma Separated Values in each rows

2. How to bind separate values in 1 row (Just opposite of Step 1)

2nd Options are very demanding as Cross platform database professional specially Oracle Developers looking for these kind of implementation as Oracle has directly built-in functions to do the same. Those functions are Listagg, wm_concat, group_concat.

Let’s check the solution –

Case 1,

Let’s create the table & prepare some data –

 ```1 2 3 4 5 6 7``` ```CREATE MULTISET TABLE ETL_DATA.PARSE_STR ( SEQ_NO INTEGER, SRC_STR VARCHAR(70) ); CREATE TABLE completed. 0 rows processed. Elapsed Time = 00:00:01.864 ```

Let’s insert some data –

 ```1 2 3``` ```INSERT INTO ETL_DATA.PARSE_STR VALUES(1,'RAM,TRIDIB,ANUPAM,BIRESWAR,SUJAY') ;INSERT INTO ETL_DATA.PARSE_STR VALUES(2,'TUNKAI,SAYAN,BABU,PAPU') ;INSERT INTO ETL_DATA.PARSE_STR VALUES(3,'IK,ATBIS,SAPMUNDA'); ```

Let’s check the value –

 ```1 2 3 4 5``` ```SEQ_NO SRC_STR ------ ---------------------------------- 1 RAM,TRIDIB,ANUPAM,BIRESWAR,SUJAY 2 TUNKAI,SAYAN,BABU,PAPU 3 IK,ATBIS,SAPMUNDA ```

Fine, Now our objective will be split these comma separated values in each lines.

 ```1 2 3 4 5 6``` ```SELECT b.SEQ_NO, regexp_substr(b.SRC_STR,'[^,]+',1,day_of_calendar) AS SRC_STR FROM sys_calendar.calendar , PARSE_STR b WHERE day_of_calendar BETWEEN 1 AND (LENGTH(b.SRC_STR) - LENGTH(regexp_replace(b.SRC_STR,'[^A-Z]+','',1,0,'i'))+1 ) ORDER BY 1,2; ```

And, let’s check the output –

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14``` ```SEQ_NO SRC_STR ----- ---------------------- 1 ANUPAM 1 BIRESWAR 1 RAM 1 SUJAY 1 TRIDIB 2 BABU 2 PAPU 2 SAYAN 2 TUNKAI 3 ATBIS 3 IK 3 SAPMUNDA ```

Gr8! I guess, result is coming as per my expectation. 🙂

Case 2(Subsitute Of Listagg, wm_concat, group_concat in Oracle),

This we’ve to do it in Two small Steps for better understanding & performance.

First, let us create another table –

 ```1 2 3 4 5 6 7``` ```CREATE MULTISET TABLE ETL_DATA.WM_CONCAT_TAB ( SEQ_NO INTEGER, SRC_STR VARCHAR(20) ); CREATE TABLE completed. 0 rows processed. Elapsed Time = 00:00:01.230 ```

Good. Now we’ll populate some data into this table. We’ll populate data from Step 1 as this will provide the exact data that we’re expecting as input test data for Case 2.

Let’s insert those data –

 ```1 2 3 4 5 6``` ```INSERT INTO ETL_DATA.WM_CONCAT_TAB SELECT b.SEQ_NO, regexp_substr(b.SRC_STR,'[^,]+',1,day_of_calendar) AS SRC_STR FROM sys_calendar.calendar , PARSE_STR b WHERE day_of_calendar BETWEEN 1 AND (LENGTH(b.SRC_STR) - LENGTH(regexp_replace(b.SRC_STR,'[^A-Z]+','',1,0,'i'))+1 ); ```

Let’s check the data –

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14``` ```SEQ_NO SRC_STR ------ -------------------- 1 ANUPAM 1 BIRESWAR 1 RAM 1 SUJAY 1 TRIDIB 2 BABU 2 PAPU 2 SAYAN 2 TUNKAI 3 ATBIS 3 IK 3 SAPMUNDA ```

As you know in TD we’ve significant restcriction regarding Hirarchical Queries & Recursive Queries. So, In this step we’ll build one relationship like employee & manager in popular employee table. So, if we have that kind of relation then we can easily establish & fit that in TD model.

Let’s create this intermediate table. In this case we’ll go for mapping between current rows with next rows. This is also very useful process. In Oracle, they have LEAD or LAG functions to achieve the same. But, here we’ve to work a little bit more to achive the same.

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20``` ```CREATE MULTISET VOLATILE TABLE VT_SRC_ARRNG AS ( SELECT SEQ_NO, SRC_STR, MAX(SRC_STR) OVER( PARTITION BY SEQ_NO ORDER BY SEQ_NO, SRC_STR ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING ) AS PREV_SRC_STR, COUNT(*) OVER( PARTITION BY SEQ_NO ) AS MAX_RECUR_CNT FROM WM_CONCAT_TAB ) WITH DATA ON COMMIT PRESERVE ROWS; CREATE TABLE completed. 0 rows processed. Elapsed Time = 00:00:01.102 ```

Let’s look the output –

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21``` ```SELECT * FROM VT_SRC_ARRNG ORDER BY 1,2; SEQ_NO SRC_STR PREV_SRC_STR MAX_RECUR_CNT ----- ------- --------------- --------------------- 1 ANUPAM BIRESWAR 5 1 BIRESWAR RAM 5 1 RAM SUJAY 5 1 SUJAY TRIDIB 5 1 TRIDIB ? 5 2 BABU PAPU 4 2 PAPU SAYAN 4 2 SAYAN TUNKAI 4 2 TUNKAI ? 4 3 ATBIS IK 3 3 IK SAPMUNDA 3 3 SAPMUNDA ? 3 ```

Fine. From the above VT we can see every Source String has one Previous Source String. Also, we’ve noted down that in each window of SEQ_NO how many levels are there by MAX_RECUR_CNT. We’ll use this column later.

Let’s move to the 2nd & final part –

Let’s aggregate the values based on SEQ_NO & club them with comma –

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43``` ```WITH RECURSIVE WM_CONCAT(SEQ_NO, SRC_STR, PREV_SRC_STR, MAX_RECUR_CNT, LVL, COMMA_SEP_STR) AS ( SELECT SEQ_NO, SRC_STR, PREV_SRC_STR, MAX_RECUR_CNT, 1 AS LVL, CAST( '' AS VARCHAR(100)) AS COMMA_SEP_STR FROM VT_SRC_ARRNG WHERE PREV_SRC_STR IS NULL UNION ALL SELECT b.SEQ_NO, b.SRC_STR, b.PREV_SRC_STR, b.MAX_RECUR_CNT, c.LVL+1 AS LVL, c.COMMA_SEP_STR||b.SRC_STR||',' AS COMMA_SEP_STR FROM VT_SRC_ARRNG b, WM_CONCAT c WHERE c.SRC_STR = b.PREV_SRC_STR ) SELECT k.SEQ_NO, k.AGGR_STR FROM ( SELECT SEQ_NO, SRC_STR, LVL, MAX_RECUR_CNT, MIN(CASE WHEN LVL = 1 THEN SRC_STR ELSE 'ZZZZZ' END ) OVER( PARTITION BY SEQ_NO ORDER BY LVL ASC ) ROOT_SRC_STR, COMMA_SEP_STR||ROOT_SRC_STR AS AGGR_STR FROM WM_CONCAT ) k WHERE k.LVL = k.MAX_RECUR_CNT ORDER BY 1,2; ```

Let’s check the output –

 ```1 2 3 4 5``` ```SEQ_NO AGGR_STR ------- --------------------------- 1 SUJAY,RAM,BIRESWAR,ANUPAM,TRIDIB 2 SAYAN,PAPU,BABU,TUNKAI 3 IK,ATBIS,SAPMUNDA ```

## Reverse String using Regexp Functions

Today, we’ll be checking one new area where we can implement regular expression to achieve the same without involving any kind of Macro, Stored-Proc.

Many occasion we may have to parse various kind of strings. Assume that, we need to parse the string in reverse order. Until TD 14.0, you don’t have any easy method to implement the same. Off course, this new method also has some limits. It can only able to reverse couple of characters only. I have checked with 9 characters.  It supports that.

So, if you have specific string lengths, then you may also try this solution if you are using TD 14.0. That will be handy.

Let’s check the Query & Output –

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21``` ```SELECT SEQ_NO, SRC_STR, regexp_replace(SRC_STR,'([[:alnum:]]{1,1})([[:alnum:]]{0,1})([[:alnum:]]{0,1})([[:alnum:]]{0,1})([[:alnum:]]{0,1})([[:alnum:]]{0,1})([[:alnum:]]{0,1})([[:alnum:]]{0,1})([[:alnum:]]{0,1})','987654321') AS REV_SRC_STR FROM WM_CONCAT_TAB ORDER BY 1; SEQ_NO SRC_STR REV_SRC_STR ----- ------- -------------- 1 BIRESWAR RAWSERIB 1 TRIDIB BIDIRT 1 SUJAY YAJUS 1 ANUPAM MAPUNA 1 RAM MAR 2 PAPU UPAP 2 SAYAN NAYAS 2 TUNKAI IAKNUT 2 BABU UBAB 3 ATBIS SIBTA 3 SAPMUNDA ADNUMPAS 3 IK KI ```

Courtsey : The Publication is part of satyakide Blog on Teradata 14 Regular expression !!