Teradata Interview Question :SET 2
101) Explain about Skew Factor?
- The data distribution of table among AMPs is called Skew Factor . Generally For Non-Unique PI we get duplicate values so the more duplicate vales we get more the data have same row hash so all the same data will come to same amp, it makes data distribution inequality,One amp will store more data and other amp stores less amount of data, when we are accessing full table,
- The amp which is having more data will take longer time and makes other amps waiting which leads processing wastage In this situation (unequal distribution of data)we get Skew Factor HighFor this type of tables we should avoid full table scans.
Ex:AMP0 AMP110000(10%) 9000000(90%)in this situation skew factor is very high 90%
102) What is use of compress in teradata?Explain?
Compression is used to Minimize the table size, for example when the table size is increasing anonymously We can use Compression to reduce the size of the table
Conditions 1.Compression can be declared at the time of table creation2.We can compress up to 256 column values(not columns) 3.We can’t compress variable length fields (vartext,varchar..)
Condition 2:create table tab1(::Order_type char(25) compress (‘air’,’sea’,’road’):)in the above example order type have 3 fields, one should be selected by the user, so one of the field will repeat for every order, like these column values we can use compress statement because these are repeating for entire table, like these column values TD supports 256 col generally NCR people will provides ready made scripts for these type of compressions However, we can store only one value per column and not 3(air, sea, road). The compressed value is stored in column header and is to be used as default of that column unless a value is present.
Dept of Taxes in Washington has a database that tracks all people working in Washington. Around 99.9% of the tax payers would have Washington as their state code on their address. Instead of storing “Washington” in millions of records the compress will store the value “Washington” in the table header. Now, a value can be assumed in a row as a default unless another value exists inside the column
103) What is the process to restart the multiload if it fails?
MULTILOAD will creates 2 error tables, 1 work table When MULTILOAD fails We have to unlock the Main Table, here error tables and work tables are not locked like FLOAD.
To Unlock the Main Table in case of acquisation Phase :
Mload RELEASE MLOAD ;
To release lock in application phase failure :
RELEASE MLOAD .IN APPLY;
104) Can we load a Multi set table using MLOAD?
We can Load SET, MULTISET tables using Mload, But here when loading into MULTISET table using MLOAD duplicate rows will not be rejected, we have to take care of them before loading.But in case of Fload when we are loading into MULTISET duplicate rows are automatically rejected, FLOAD will not load duplicate rows weather table is SET or MULTISET
105) Can I use “drop” statement in the utility “fload”?
YES,But you have to declare it out of the FLOAD Block it means it should not come between .begin loading,.end loading FLOAD also supports DELETE,CREATE,DROP statements which we have to declare out of FLOAD blocking the FLOAD Block we can give only INSERT
106) Is it possible that there are two primary key will be in a same table?
1. A table should have only one primary key
2. More than one column can consist of a primary key – upto 64 columns
3. Can not be NULL values (missing values)
4. Should be unique values (no duplicate value)
1. A table may have zero or more than that up-to 32 keys
2. More than one column can consist a primary key – up to 64 columns
3. Can have NULL values
4. Can have duplicate values
5. There should be a corresponding primary key in the parent table to enforce referential integrity for a foreign key
107) What are the types of tables in Teradata ?
1. set table
2. multiset table
3. derived table
4. global temporary table(temporary table)
5. volatile table
108) How a Referential integrity is handled in Teradata?
By use of TPUMP utility, referential integrity is handled in teradata
Join Strategies There are 2 tables, table A with 10 million records, table B has 100 million records, now we are joining both tables, when we seen Explain Plan the plan showing TD will took the table A and it will redistributes itNow the Question is: By that plan is the optimizer is correct job or not ? Justify Ans2. From the same above example now the optimizer is taking Table B (100 million records) and it is distributing it, Now is the optimizer is doing best? and How you avoid this situation
Teradata is smart enough to decide when to redistribute and when to copy.
It compares the tables. Are they comparable? or one is big as compared to the other?Based on simple logic it decides whether to distribute the smaller table on all the AMPs or to copy.
what I mean is the small table is copied into all the AMPs in the SPOOL space.Remember all always the Join’s has to take place on the AMPs SPOOL Space.By redistributing it is making sure that the 100 million rows table gets the feeling that it is making AMP local JOIN. Remember the basic thing whatever Teradata does. It does keeping in consideration for Space and Performance and not to forget the Efficiency.
If the table is small redistribute them to all the AMPs to have the AMP local Join.
Always JOINs are made AMP local if it cannot then you have the high chance of running out of SPOOL space.
109) What is the difference between start schema and Fsldm?
FSLDM –> Financial Services Logical Data Model (Teradata FS-LDM 7.0) its developed by Teradata for the financial sectors (Specifically for Bank). it can be customised based on the user requirement.
StarSchema –> Its a relational database schema for representing multidimensional data. The data is stored in a central fact table, with one or more tables holding information on each dimension.
110)What is the difference between Global temporary tables and Volatile temporary tables?
Global Temporary tables (GTT) –
1. When they are created, its definition goes into Data Dictionary.
2. When materialized data goes in temp space.
3. That’s why, data is active upto the session ends, and definition will remain there up-to its not dropped using Drop table statement.If dropped from some other session then its should be Drop table all;
4. you can collect stats on GTT.
Volatile Temporary tables (VTT) –
1. Table Definition is stored in System cache
2. Data is stored in spool space.
3. thats why, data and table definition both are active only upto session ends.
4. No collect stats for VTT.If you are using volatile table, you can not put the default values on column level ( while creating table )
111) Teradata performance tuning and optimization
- Collecting statistics
- Explain Statements
- Avoid Product Joins when possible
- Select appropriate primary index to avoid skewness in storage
- Avoid Redistribution when possible
- Use sub-selects instead of big “IN” lists
- Use derived tables
- Use GROUP BY instead of DISTINCT ( GROUP BY sorts the data locally on the VPROC. DISTINCT sorts the data after it is redistributed)
- Use Compression on large tables
112) How do you transfer large amount of data in Teradata?
Transferring of large amount of data can be done using various Application Teradata Utilities which resides on the host computer ( Mainframe or Workstation) i.e. BTEQ, FastLaod, MultiLoad, Tpump and FastExport.
- BTEQ (Basic Teradata Query) supports all 4 DMLs: SELECT, INSERT, UPDATE and DELETE. BTEQ also support IMPORT/EXPORT protocols.
- Fastload, MultiLoad and Tpump transfer the data from Host to Teradata.
- FastExport is used to export data from Teradata to the Host.
113) How can you use HASH FUNCTIONS to view Data Distribution across all AMPs in Teradata?
Hash Functions can be used to view the data distribution of rows for a chosen primary index.
SELECT HASHAMP(HASHBUCKET(HASHROW())) AS “AMP#”, COUNT(*) FROM GROUP BY 1 ORDER BY 2 DESC;
HASHROW – returns the row hash value for a given value
HASHBUCKET – the grouping of a specific hash value
HASHAMP – the AMP that is associated with the hash bucket
This is really good, by looking into the result set of above written query you can easily find out the Data Distribution across all AMPs in your system and further you can easily identify un-even data distribution.
114) How can you track Login Parameters of users in Teradata?
Answers: You can view all these parameters in this Data Dictionary Table DBC.LOGONOFF
SELECT LOGDATE, LOGTIME, USERNAME, EVENT FROM DBC.LOGONOFF;
115) How can you find the Teradata Release and Version information from Data Dictionary Table?
Answers: To find Release and Version information you can query this Data Dictionary table DBC.DBCINFO
SELECT * FROM DBC.DBCINFO;
116) How can you find the Table Space Size of your table across all AMPs?
Answers: You can find the Table Space Size of your table from this Data Dictionary Table DBC.TABLESIZE
SELECT DATABASENAME, TABLENAME, SUM(CURRENTPERM) FROM DBC.TABLESIZE WHERE DATABASENAME = ‘’ AND TABLENAME = ‘’ GROUP BY DATABASENAME , TABLENAME;
117) How can you determine I/O and CPU usage at a user level in Teradata?
Answers: You can find out I/O and CPU Usage from this Data Dictionary Table DBC.AMPUSAGE;
SELECT ACCOUNTNAME, USERNAME, SUM(CPUTIME) AS CPU, SUM(DISKIO) AS DISKIO FROM DBC.AMPUSAGE GROUP BY 1,2 ORDER BY 3 DESC;
118) Explain TPUMP (Teradata Parallel Data Pump) Utility in Teradata?
- TPUMP allows near real time updates from Transactional Systems into the Data Warehouse.
- It can perform Insert, Update and Delete operations or a combination from the same source.
- It can be used as an alternative to MLOAD for low volume batch maintenance of large databases.
- TPUMP allows target tables to have Secondary Indexes, Join Indexes, Hash Indexes, Referential Integrity, Populated or Empty Table, Multiset or Set Table or Triggers defined on the Tables.
- TPUMP can have many sessions as it doesn’t have session limit.
- TPUMP uses row hash locks thus allowing concurrent updates on the same table.
119) Explain Ferret Utility in Teradata?
Ferret (File Reconfiguration tool) is an utility which is used to display and set Disk Space Utilization parameters within Teradata RDBMS. When you select the Ferret Utility parameters, it dynamically reconfigures the data on disks. We can run this utility through Teradata Manager; to start the Ferret Utility type (START FERRET) in the database window.
Following commands can be used within Ferret Utility:
1. SHOWSPACE – Well this command reports you the amount of Disk Cylinder Space is in use and the amount of Disk Cylinder Space is available in the system. This will give you an information about Permanent Space cylinders, Spool Space cylinders, Temporary Space cylinders, Journaling cylinders, Bad cylinders and Free cylinders. For each of these 5 things it will present you 3 parameters i.e. Average Utilization per cylinder, % of total avaliable cylinders and number of cylinders.
2. SHOWBLOCKS – This command will help you in identifying the Data Block size and the number of Rows per data block. This command displays the Disk Space information for a defined range of Data Blocks and Cylinders.
120) Can you recover the password of a user in Teradata?
No, you can’t recover the password of a user in Teradata. Passwords are stored in this data dictionary table (DBC.DBASE) using a one-way encryption method. You can view the encrypted passwords using the following query
SELECT * FROM DBC.DBASE;
121) What is Reconfiguration Utility in Teradata and What it is used for?
- When we feed Primary Index value to Hashing Algorithm then it gives us Row Hash(32 bit number) value which is used to make entries into Hash Maps.
- Hash Maps are the mechansim for determining which AMP will be getting that row.
- Each Hash Map is an array of 65,536 entries and its size is close to 128KB.
- When Teradata is installed on a system then there are some scrpits which we need to execute i.e. DIP Scripts. So it creates a Hash Maps of 65,536 entries for the current configuration. But what if you want to add some more AMPs into your system?
- Reconfiguration (Reconfig) is a technique for changing the configuration (i.e. changing the number of AMPs in a system) and is controlled by the Reconfiguration Hash Maps. System builds Reconfiguration Hash Maps by reassigning hash map entries to reflect new configuration of system.
- Lets understand this concept with the help of an example; suppose you have a 4 AMPs system which holds 65,536 entries. Each AMP is responsible for holding (65,536/4=16,384) 16,384 entries.
- Now you have added 2 more AMPs in your current configuration so you need to reconfigure your system. Now each AMP would be responsible for holding (65,536/6=10922) 10,922 entries.
122) What is the difference between Sub-Query & Co-Related Sub-Query?
- When queries are written in a nested manner then it is termed as a sub-query. A Sub-Query get executed once for the parent statement whereas Co-Related Sub-Query get executed once for each row of the parent query.
- Select Empname, Deptno, Salary from Employee Emp where Salary = (Select Max(Salary) from Employee where Deptno = Emp.Deptno) order by Deptno
123) What is the difference between Access Logging and Query Logging in Teradata?
1. Access Logging is concerned with security (i.e. who’s is doing what). In access logging you ask the database to log who’s doing what on a given object. The information stored is based on the object not the SQL fired or the user who fired it.
2. Query Logging (DBQL) is used for debugging (i.e. what’s happening around ?). Incase of DBQL database keep on tracking various parameters i.e. the SQLs, Resource, Spool Usage, Steps and other things which help you understand what’s going on, the information is fruitful to debug a problem. Further DBQL is enabled on a User id rather than an object like say Table or so.
124) What is FILLER command in Teradata?
While running Fastload or Multiload if you don’t want to load a particular field from the datafile to the target table then use the FILLER command to achieve this. Syntax for FILLER command would be as following:
.LAYOUT FILE_PRODUCT; /* It is input file layout name */
.FIELD Prod_No * char(11); /* To load data into Prod_No */
.FIELD Prod_Name * char(11); /* To load data into Prod_Name */
.FIELD Location * char(11); /* To load data into Location */
.FILLER Prod_Chars * char(20); /* To skip the value for the next 5 locations */
125) What are TPUMP Utility Limitations?
Following are the limitations of Teradata TPUMP Utility:
- Use of SELECT statement is not allowed.
- Concatenation of Data Files is not supported.
- Exponential & Aggregate Operators are not allowed.
- Arithmatic functions are not supported.
126) What are the MultiLoad Utility limitations?
- MultiLoad is a very powerful utility; it has following limitations:
- MultiLoad Utility doesn’t support SELECT statement.
- Concatenation of multiple input data files is not allowed.
- MultiLoad doesn’t support Arithmatic Functions i.e. ABS, LOG etc. in Mload Script.
- MultiLoad doesn’t support Exponentiation and Aggregator Operators i.e. AVG, SUM etc. in Mload Script.
- MultiLoad doesn’t support USIs (Unique Secondary Indexes), Refrential Integrity, Join Indexes, Hash Indexes and Triggers.
- Import task require use of PI (Primary Index).
127) What are the functions of a Teradata DBA?
Following are the different functions which a DBA can perform:
- User Management – Creation and managing Users, Databases, Roles, Profiles and Accounts.
- Space Allocation – Assigning Permanent Space, Spool Space and Temporary Space.
- Access of Database Objects – Granting and Revoking Access Rights on different database objects.
- Security Control – Handling logon and logoff rules for Users.
- System Maintenance – Specification of system defaults, restart etc.
- System Performance – Use of Performance Monitor(PMON), Priority Scheduler and Job Scheduling.
- Resource Monitoring – Database Query Log(DBQL) and Access Logging.
- Data Archives, Restores and Recovery – ARC Utility and Permanent Journals.
128) What are the 5 phases in a MultiLoad Utility?
- Preliminary Phase – Basic Setup
- DML Phase – Get DML steps down on AMPs
- Acquisition Phase – Send the input data to the AMPs and sort it
- Application Phase – Apply the input data to the appropriate Target Tables
- End Phase – Basic Cleanup
129) How to eliminate Product Joins in a Teradata SQL query?
- Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.
- Make sure you are referencing the correct alias.
- Also, if you have an alias, you must always reference it instead of a fully qualified tablename.
- Sometimes product joins happen for a good reason. Joining a small table (100 rows) to a large table (1 million rows) a product join does make sense.
130) How does Hashing happens in Teradata?
- Hashing is the mechanism through which data is distributed and retrieved to/from AMPs.
- Primary Index (PI) value of a row is the input to the Hashing Algorithm.
- Row Hash (32-bit number) value is the output from this Algorithm.
- Table Id + Row Hash is used to locate Cylinder and Data block.
- Same Primary Index value and data type will always produce same hash value.
- Rows with the same hash value will go to the same AMP.
- So data distribution depends directly on the Row Hash uniqueness; be careful while Choosing Indexes in Teradata.
131) How to recover DBC logon password if it gets locked.
- Logon as Root (on to the gateway node).
- Enter cnsterm 6 on the UNIX command line.This will display theDatabase Window Supervisor screen.
- To start a console utility, type ‘start tstsql’). The display will tell you which partition tstsql was
- started in (i.e., Window 1).
- Press the Delete key (CTRL + D) and enter cnsterm 1. You will now see thecontents of Window 1 (i.e., the TSTSQL program)TSTSQL here. Enter your logon or TSTSQL command:
*** Logon successfully completed.
*** Time was 0 seconds.
TSTSQL -Enter DBC/SQL request or TSTSQL command:
> MODIFY USER dbc AS RELEASE PASSWORD LOCK;
MODIFY USER dbc AS RELEASE PASSWORD LOCK;
*** Database/User has been modified.
*** Time was 0 seconds.
TSTSQL -Enter DBC/SQL request or TSTSQL command:
- Press the DELETE key and enter cnsterm 6 to move back to the Supervisor display.
- Type stop 1 to terminate the utility that was running in Window 1 (i.e.,
132) How to insert a new line between two strings concatenated in answersets of teradata.
It can be achieved by using the hexadecimal string ’0D’XC for this.
SELECT ‘ABC’ || ’0D’XC || ‘DEF’ ;
When you execute this in queryman you will see the title in two lines. But the result will be in a line, just copy/paste the result in Dos or windows or notepad or textpad it will appear in two lines.
133) Creating Duplicate users for Teradata ?
Teradata Administrator (WinDDI) gives us an option to clone the users.
134) How to check the status of backup/Recovery when Net vault/Netbackup is not installed
If we don’t have Netvault installed, but have access to the teradata dictionary the below script helps us to find the status of the Backup or Restore of the TPA.
/* Script to report ARC status */
SELECT * FROM DBC.RCEVENT WHERE OPERATIONINPROCESS = ‘Y’ ;
This may report all failed events aswell, to filter it down to 2 days an additional where clause could be used.
WHERE OPERATIONINPROCESS = ‘Y’
AND CREATEDATE > DATE -1
135) Unix cnsterm utility?
The cnsterm tool is the PDE Console Subsystem Terminal. You can use itto display console utility output without starting the xdbw (DatabaseWindow) program. The cnsterm tool is executed from the UNIX commandline, and you do not need to have X-Windows configured to use it, as youdo with the xdbw. The cnsterm tool displays a single window at a time.When using cnsterm the only command line option available is theDatabase Window partition number. Partition numbers 1 through 4 arethe Database Window console utility windows, partition 5 is the DatabaseI/O window, and partition 6 is the Database Window Supervisor screen.
You will typically start cnsterm in window 6, and then move to that windowto start any console utility programs.
You must be logged on as root to execute cnsterm. To use cnsterm fromthe UNIX command line:
- Logon as Root.
- Enter cnsterm 6 on the UNIX command line. This will display the
- Database Window Supervisor screen.
- To start a console utility, type in the start command syntax (i.e., startqryconfig). The display will tell you which partition qryconfig wasstarted in (i.e., Window 1).
- Press the Delete key and enter cnsterm 1. You will now see thecontents of Window 1 (i.e., the qryconfig program) and you may nowenter commands to execute the qryconfig utility.
- Note: To move between windows, use the Delete key to get a prompt,and enter cnsterm n, where n is the number of the window you want.
- Press the DELETE key and enter cnsterm 6 to move back to theSupervisor display.
- Type stop 1 to terminate the utility that was running in Window 1 (i.e.,qryconfig).
136) BTEQ script to fetch the DDL’s of table/view/macro ?
/* This BTEQ script fetches the table DDL’s for the current DB */
/* Make sure you set the database before running the script */
/* Copy contents and paste it in a text file (eg getDDL.scp) */
/* in the prompt # bteq < getDDL.scp (to execute) */
.SET SIDETITLES OFF
.SET WIDTH 254
.SET QUIET ON
.os IF EXIST showDDL.tmp del showDDL.tmp
.EXPORT REPORT file=showDDL.tmp
SELECT ‘.SET TITLEDASHES OFF’ (TITLE ”);
/* Generate Table DDL’s */
SELECT ‘SHOW TABLE ‘||TABLENAME||’;’ (TITLE ”)
WHERE DATABASENAME = DATABASE AND TABLEKIND = ‘T’
ORDER BY 1;
/* Generate Views DDL’s */
SELECT ‘SHOW VIEW ‘||TABLENAME||’;’ (TITLE ”)
WHERE DATABASENAME = DATABASE AND TABLEKIND = ‘V’
ORDER BY 1;
/* Generate Macro DDL’s */
SELECT ‘SHOW MACRO ‘||TABLENAME||’;’ (TITLE ”)
WHERE DATABASENAME = DATABASE AND TABLEKIND = ‘M’
ORDER BY 1;
/* Generate Stored Procedure DDL’s */
SELECT ‘SHOW PROCEDURE ‘||TABLENAME||’;’ (TITLE ”)
WHERE DATABASENAME = DATABASE AND TABLEKIND = ‘P’
ORDER BY 1;
SELECT ‘.DEFAULTS’ (TITLE ”);
SELECT ‘.EXPORT RESET’ (TITLE ”);
.OS IF EXIST DDLS.txt del DDLS.txt
.EXPORT REPORT file=DDLS.txt
.run file = showDDL.tmp
.OS IF EXIST showDDL.tmp del showDDL.tmp/q
137) Verify if two database are insync and having same objects ?
The below script to test if tables in two db’s are in sync.
138) How to recover left over spool space in Teradata ?
To Identify the left over spool, use the following SQL.
139) Define POSITION Function?
The POSITION function Returns the actual position of the character which occurs first. POSITION function is ANSI standard.
Teradata has an equivalent function called INDEX.
Both the POSITION and INDEX functions returns position of character’s first occurrence in a string.
Examples for the POSITION function
- SELECT POSITION( ‘e’ IN ‘Read’); Displays Result as ‘2’
- SELECT POSITION( ‘ad’ IN ‘Read’); Displays Result as ‘3’
- SELECT POSITION( ‘s’ IN ‘Read’); Displays Result as ‘0’
Examples for the INDEX function.
- SELECT INDEX(‘Write’, ‘i’); Displays Result as ‘3’
- SELECT INDEX(‘Write’, ‘te’); Displays Result as ‘4’
140) Explain LOB?
It is possible to create functions which operate on large object data types.
The following library functions exists to provide the ability use LOB’s as function input or output:
142) How to view recent query executed in teradata ?
To view recent queries in Teradata there are two simple ways :
First one is you can enable history if you are using some utility tool for running SQL’s like
- Teradata SQL Assistant. To enable history in SQL Assistant go to view –> Show History. It will list down all the SQL executed on the system . The disadvantage is it will display only those queries which were executed from that system and not all the queries which were executed on Teradata server.
- Second way of viewing recent queries in Teradata is to use DBC.QRYLOG
- SEL * FROM DBC.QRYLOG;
- This will display all the queries executed on the Teradata Server. Along with the queries it will also show which user executed the query and at what time . You can get much more information from this query. As the query will return huge number of rows if your Teradata Server is mature , you can add some filters in order to minimize the output rows.
143) How does teradata system distributes rows in database ?
- Teradata uses a HASHING algorithm to distribute rows among various AMP’s. The process of rows distribution is unique to Teradata and obviously is the core reason behind the parallel architecture of TERADATA. To understand the process of Rows Distribution , refer to below diagram.
- TERADATA uses indexes to determine the distribution of rows. Teradata uses a hashing algorithm which processes the index and gives the HASH VALUE. Depending on the HASH VALUE , it refers to HASH MAP to decide the HASH BUCKET and hence HASH AMP. That particular AMP will store that record. Similary there are other AMP’s also receiving their share of records. So each record is stored in specific AMP depending on the HASH VALUE. This is the reason it is suggested that the columns with more unique values and used in joins etc are preferable index columns. So whenever set of records are received , index columns are processed and are stored in respective AMP’s. Since the work is distributed between AMP’s TERADATA is so swift. So we can say TERADATA is as fast as its slowest AMP.
TERADATA is unique because of it’s parallel acrhitecture. TERADATA rows distribution is important to understand as it will help us in realising better index columns. One must also check for AMP skewness. In case of improper index column selection , uneven distribution of rows may occur; which will eventually lead to more work for some specific AMP while other AMP’s would be free. Hence increasing turnaround time. AMP skewness can be check by using simple HASH functions available in TERADATA. To understand HASH Functions and ROWS Distributions in Teradata refer to below query which try to process a record with index column value as ‘NITIN’:
HASHROW (‘NITIN’) AS “ROW HASH”
,HASHBUCKET (HASHROW (‘NITIN’)) AS “HASH BUCKET”
,HASHAMP (HASHBUCKET (HASHROW (‘NITIN’))) AS “HASH AMP”;
144) What is difference between primary index and primary key ?
One must not get confused between Primary Key and Primary Index in Teradata. Primary KEY is more of a logical thing however Primary INDEX is more of physical thing. In Teradata, Primary INDEX is used for finding best access path for data retrieval and data insertion and Primary KEY is used for finding each rows uniquely just like in other RDBMS. So below are few differences between PRIMARY KEY and PRIMARY INDEX:
Now we will see few scenarios to see how these two are handled in Teradata:
a) I have not defined PRIMARY INDEX or PRIMARY KEY on table what will happen now:
In this case, Teradata will check if any column is defined as UNIQUE, then it will make that column as UNIQUE PRIMARY INDEX else first column will be created as PRIMARY INDEX.
b) I have not defined PRIMARY INDEX however a column is defined as PRIMARY KEY:
In this case, Teradata will make the PRIMARY KEY column as UNIQUE PRIMARY INDEX of the table.
c) I have defined both PRIMARY KEY and PRIMARY INDEX on different column:
In this case, Teradata will make PRIMARY KEY column as UNIQUE SECONDARY INDEX i.e. UNIQUE INDEX on the table.
So one must understand the importance of PRIMARY INDEX in Teradata. Generally, PRIMARY KEY concept is taken care by UNIQUE PRIMARY INDEX in Teradata environment.
145) What are the different types of temporary tables in teradata?
These are the three types of teradata –
- Derived temporary tables
- Volatile temporary tables
- Global temporary tables
146) Write different teradata versions?
- Teradata v2r5.
- Teradata v2r6
- Teradata 12
- Teradata 13
- Teradata 13.1
- Teradata 14.
147) Tell me few types of HASH functions used in teradata?
These are the types of HASH, HASHROW, HASHAMP and HASHBAKAMP. Their SQL functions are-
- HASHROW (column(s))
- HASHBUCKET (hashrow)
- HASHAMP (hashbucket)
- HASHBAKAMP (hashbucket)
148) What is an error table? What are the uses of it?
- An error table is a chart which contains the list of information concerning-
- Contains rows which failed to be adjusted due to Translation error.
- Captures rows that contain duplicate values for UPAs.
- Data conversion errors and other error conditions.
- Captures rows that contain duplicate values.
149) What is the common data source for the central enterprise data warehouse?
Answer : It is ODS ( Operational Data Source ).
150) How does indexing improve query performance?
It is a technique to physically organize the records to enable frequently used queries to run faster. Anything updated, deleted, re-done has to be updated in the table of indexes. This helps the process to run faster the next time you operate. And Indexes cannot be accessed directly by the users, only the optimizer can access the index.
151) What are the advantages and disadvantages of secondary indexes?
Advantages of secondary indexes-
- A secondary index may be created and dropped dynamically.
- It can be created on any column, be it unique or not unique.
- A table may consist of up to 32 secondary indexes.
Disadvantages of secondary indexes-
- They require additional I/Os to maintain their sub-tables.
- If the statistics are not collected properly then the optimizer goes for a full table scan.
- There is always an overhead for additional spaces, since there are no sub-tables to be created.
152) What are the basic rules which define how PI is defined, in context of Teradata?
- Here are the following basic rules of applying the above command-
- One primary index per table.
- Primary index may be unique or non-unique.
- Primary index can be NULL, modified or it may be so populated that it can’t be modified.
- It is limited to 64 columns.
153) What is optimization and performance tuning?
Optimization is the skill of selecting the least expensive plan which is also the fastest plan for the query to fetch results. Performance is a tool for improving the process for the query to perform faster with minimalistic usage of the resources of CPU.
154) What are examples of an OLTP environment?
Answer : There are 2 examples of OLTP environment, they are ATM and POS.
155) When should be the statistics collected?
Some great guidelines to collect statistics-
- All non unique indices
- Non index join columns
- Primary index of small tables
- Secondary indexes mentioned in the tables
156) How to find the space utilized by particular database in teradata ?
To find out how much space a table is using in a database use below query in Teradata:
DATABASENAME as Database
,TABLENAME as Table
,SUM(CURRENTPERM)/(1024*1024*1024) (DECIMAL(15,6)) as Current_Size
WHERE DATABASENAME = ‘YOUR_DATABASE_NAME‘
GROUP BY 1,2;
Where ‘YOUR_DATABASE_NAME‘ is the name of the database for which u need to do the analysis and ‘YOUR_TABLE_NAME’ is the name of the table for which you want to do the analysis.
This Query will return 3 columns Database, Table and Current_Size in GB .Based on the analysis of these data we can easily see that how much space the table is occupying in the database. However if you want to see how much all the tables in a Database is occupying , just remove the filter on Tablename from the query to get all the tables in a Database with the Permanent Size for each tables.
156) What are different type of locks in teradata ?
In Teradata , LOCKS are used to maintain integrity of data present in the tables. Teradata , itself decides the kind of lock to implement however in some cases users can also decide what kind of lock, the query should use. In TERADATA, locks can be implemented at three levels: DATABASE , TABLE and ROW HASH. And there are 4 types of LOCKS available in TERADATA:
1) Exclusive: Such LOCKS are applied on a table when it is undergoing any STRUCTURAL change. All DDL commands apply this LOCK. If this lock is applied then no other locks can be applied on that table.
2) Write: Such LOCKS are applied when any tables data demography changes by either DELETE,INSERT or UPDATE operation on it. So whenever a DML operation is performed on the table , such locks are applied. Only ACCESS locks are permissible when WRITE locks are applied on the table. And ACCESS Locks may lead to STALE DATA READ from the table since data demography would be changed by DML operations holding WRITE LOCKS.
3)READ: Such LOCKS are applied when user tries to READ data from the table and don’t want to fetch STALE DATA. READ locks maintains the data integrity as the DATA in the tables cannot be modified while READ LOCK is applied on the tables. READ LOCK only allow READ or ACCESS LOCK on the table. READ Locks are the default lock used by Teradata while fetching data from any Table.
4)ACCESS: Such locks are applied when table needs to be accessed and the data demographics is not that important; even stale data is ok for the process. ACCESS LOCKS allows READ, WRITE or ACCESS Locks to be applied on the table. However EXCLUSIVE locks are not allowed.
It can be considered as a good practice to apply ACCESS locks on the source table when multiple SQL Queries are using same Source Table. Even if the TARGET table is different still it can lead to bad performance as by default TERADATA applied READ lock so other queries may have to wait to change the demography in SOURCE table. Understanding LOCKS and LOCKING table manually can be very useful in an environment where several tables are loaded parallely. Teradata automatically blocks the query trying to apply incompatible LOCK on the table already locked by some other query. You can use below command before your actual DML commands:
LOCKING SOURCE_TABLE FOR ACCESS
INSERT INTO TARGET_TABLE
SELECT * FROM SOURCE_TABLE;
The other good practice is to use PRIMARY INDEX column while fetching data from SOURCE table. In such cases, the query becomes an AMP operation and Teradata applies lock at ROW HASH level and not at Table level.
157) How to create a table in Teradata ?
Answer : Creating a table in Teradata gives you much more control over Table. The syntax for CREATING TABLE in TERADATA can be :
CREATE [SET/MULTISET] TABLE DATABASENAME.TABLENAME,
[NO] BEFORE JOURNAL,
[NO] AFTER JOURNAL,
[UNIQUE] PRIMARY INDEX [INDEXNAME] (COLUMN1,COLUMN2…COLUMNZ)
[UNIQUE] INDEX [INDEXNAME] (COLUMN1.COLUMN2…COLUMNG)
Now let us see what each of these options actually mean:
159) How to use analytical function as row number with qualify ?
We can use Teradata Qualify Row Number to have an alternate solution to Group by clause or using distinct.Qualify works on the final set of result rows.Qualify row number we can use in order to have a specific record or number of records.We will understand how Qualify Row Number is useful by taking example of a case when we have SCD2 implemented in table and we want to select latest value for any column.Eg: If there is an employee in a table emp who has changed his location from one city to other in the same company.For SCD2 we will add a new record with new location for the employee and date from which it is effective.If requirement is to select only the latest location for all the employees we can use Qualify Row Number to obtain the desired result.
SQL Query will be somewhat like :
SELECT EMP_NAME,EMP_LOCATION FROM EMP QUALIFY ROW_NUMBER() OVER ( PARTITION BY EMP_NAME ORDER BY DATE_EFFECTIVE DESC) = 1;
- This query will result in getting all the EMPLOYEE NAMES with their latest location.ROW NUMBER will fetch the first record and we are ordering on the basis of date in descending order so latest record will be fetched for each employee.
- Using Qualify Row Number is also very useful when we have very complex queries consisting of left , right joins involving several tables.In that case we can implement all our required rules and at the end we can refine the result set using Qualify Row Number.The Qualify ROW_NUMBER() can be very helpful in removing Unique Primary Key Violation error while loading data into your Target Table.
160) How to find the number of AMPs in any system?
Answer : SELECT HASHAMP()+1 ; [ We have to do +1 because the HASHAMP number starts with 0.
161) Which one is better IN or BETWEEN?
Answer. If we have to search for range of values, it is always advisable to use BETWEEN rather than list of values in IN clause. BETWEEN tells the optimizer that it is range of values we are going to handle rather than individual values. Hence, BETWEEN is much faster than IN.
162) What are disadvantages of Secondary Index?
Answer. Secondary Indexes need more I/O. Secondary Index slows down the process because of Sub-table maintainence. Secondary Index requires more Physical Storage.
163) What is default session transaction mode in Teradata?
Answer. Teradata has two session transaction mode : BTET(Default) and ANSI. To change session mode we use following command before establishing session: .SET SESSION TRANSACTION (ANSI/BTET)
164) What is MSR (Multi Statement Request) in Teradata?
Answer. Sometimes we may have to insert data into table by treating several INSERT blocks as one transaction. In that case, we use MSR. It can be implemented by putting a semi colon “;” in front of next line rather than putting it on end of existing line. eg:
INSERT INTO EMPLOYEES(NAME,AGE) VALUES(‘WILL’,26)
;INSERT INTO EMPLOYEES(NAME,AGE) VALUES(‘SMITH’,28);
165) Which is better GROUP BY or DISTINCT to remove duplicates present in a table?
Answer. It depends on the tablesize. If the records are more than half a million then GROUP BY is much better than DISTINCT. However if the records are very less than DISTINCT performance is better than GROUP BY.
166) How can you apply access lock on table rather than read lock?
Answer. We can override default read lock on any table by explicitly mentioning the ACCESS LOCK before using the table. Syntax is:
LOCK TABLE EMPLOYEES FOR ACCESS
SELECT * FROM EMPLOYEES;
167) Different method of releasing the FASTLOAD lock on the table ?
Answer. Execute the FASTLOAD script by removing the DML block , i.e only BEGIN LOADING and END LOADING in the FASTLOAD script should be kept. Another method can be to drop the table and create it again.
168) What does SLEEP command do ?
Answer. SLEEP command is used to specify the waiting time Teradata will take before re-trying for establishing the connection or session.
169) What does Tenacity command do ?
Answer. Tencacity command is used to specify the total time Teradata will continue trying to establish the connection. After which Teradata won’t make any more attempts for establishing the connection.
170) How to skip 5 rows while reading data from file using Teradata Utilities?
Answer. In BTEQ, we can use SKIP command and other utilities we can use RECORD command to specify from which record, Teradata should start reading data from file
177) How many Join Strategies are available in Teradata?
Answer. There are five Join Strategies available in Teradata : Merge, Nested, Hash, Product, Exclusion.
178) Different levels of COnfidence Optimizer may have while creating EXPLAIN Plan?
Answer. HIGH CONFIDENCE, LOW CONFIDENCE, NO CONFIDENCE.
179) Can we load duplicate data using FASTLOAD?
Answer. No, we cannot load duplicate data via FASTLOAD. FASTLOAD property is to discard the duplicate records. When we restart the FASTLOAD, it send the data again from last checkpoint. In that case, it may send some data again. So FASTLOAD identify such records as duplicate and hence discard it.
180) Can we have several Primary Index on a table?
Answer. No we cannot have multiple Primary Index on a table. However we can create 32 Secondary Indexes on a table
181) How to save a string with single quote (‘) as part of the value?
Answer. INSERT INTO CUSTOMER(COMMENTS) VALUES(‘I”m Good’);
182) How to select 5th row from a table?
Answer. SELECT * FROM TABLENAME QUALIFY ROW_NUMBER() OVER (ORDER BY COLUMN1) = 5;
183) Where is password stored in Teradata?
Answer. User can view the password in DBC.DBASE view however it will be in encrypted format.
184) How to find Teradata Release and Version details?
Answer. SELECT * FROM DBC.DBCINFO;
185) How many phases are in MultiLoad?
Answer. MultiLoad has 5 phases :
- DML Transaction Phase
- Acquisition Phase
- Application Phase
- Clean-Up phase
186) How many populated tables are supported by MultiLoad?
Answer. Teradata can support upto 5 populated tables.
187) What are the DML operations possible in MultiLoad?
Answer. INSERT, UPDATE, DELETE and UPSERTS are possible in Multiload.
188) Does Multiload use Transient Journaling ?
Answer. Multiload does not use Transient Journaling. It uses a LOG table to maintain the process status.
189) Can we do selective loading in Multiload?
Answer. Yes, we can do selective loading in Multiload by specifying condition with the APPLY statement. eg: APPLY (DML LABELNAME) WHERE FIELD_NAME=’some condition’
190) How can you specify the charset to be used for MULTILOAD?
Answer : mload -c [UTF8/ASCII…] scriptname
191) What is difference between UNION and UNION ALL in Teradata SQL.
Teradata SQL UNION and UNION ALL are used to retrieve data from multiple SELECT blocks and merge them into one resultset. Below is the SQL UNION syntax:
SELECT EMP_NAME, EMP_CITY FROM EMPLOYEES
SELECT CUST_NAME, CUST_CITY FROM CUSTOMERS;
The above query will retrieve all the data from first SQL SELECT query and also retrieve the data from second SQL SELECT query and merge the data and display the resultset. SQL UNION will remove the duplicate entries and will display only one row if duplicate rows are present in final resultset.
Below is the SQL UNION ALL syntax:
SELECT EMP_NAME, EMP_CITY FROM EMPLOYEES
SELECT CUST_NAME, CUST_CITY FROM CUSTOMERS;
SQL UNION ALL operates similarly as UNION and will display resultset combining result from multiple SELECT clauses. SQL UNION ALL will not remove duplicate entries and final resultset may have duplicate rows. Since, there is no duplicate rows checking in SQL UNION ALL it is faster than SQL UNION.
Important point to consider while writing SQL UNION & UNION ALL queries:
The SELECT statements should have equal number of columns and the columns datatypes should be same across all SELECT statements.
The UNION or UNION ALL operator will take technical metadata of the First SELECT statement for complete resultset. So you should make sure that all the columns in first SELECT clause should have highest column width in order to avoid truncation of column values in final resultset.
192) What are various teradata datatype sizes ?
If you know the storage requirement of different datatypes used in your tables in Database, you can compute space requirement easily.So storage requirement for few of the most common datatypes used in Teradata are :
BYTEINT – 1 BYTE
SMALLINT – 2 BYTES
INTEGER – 4 BYTES
BIGINT – 8 BYTESS
DECIMAL – 2 BYTES to 8 BYTES
FLOAT – 8 BYTES
DATE – 4 BYTES
TIME – 6 BYTES
TIME WITH ZONE – 8 BYTES
TIMESTAMP – 10 BYTES
TIMESTAMP WITH ZONE – 12 BYTES
These are few of the most common datatypes used in Teradata and their respective storage bytes required. The range for numeric datatypes can be calculated by formula 2(power of BYTES*8 -1).
193) How can we add new columns in a existing table after specific column ?
Answer : No, you can’t. As It’s a relational database,
Suppose, if we need the new column to be in a specific position as a 1st column or in middle, the you need to do like this,
1) RENAME EXISTING TABLE TO BACKUP TABLE
2) CREATE THE NEW TABLE WITH THE SPECIFIED COLUMN POSITION THAT YOU WANT
3) INSERT INTO NEW TABLE SEL OLD COLUMNS FROM BACKUP TABLE (You should use SEL COLUMN LIST, Since New column will not be in the backup table)
4) DROP THE BACKUP TABLE.