Tuesday, October 12, 2010

ONLINE ARCHIVAL procedure and troubleshooting.

ONLINE ARCHIVE

Online archiving is a facility that is available to to have concurrent DML operations run while the archive process archiving. ONLINE ARCHIVE means that the database or tables that are archived are not locked for any DML operations, meaning archive is done during the real time operation of the business. How it works is the logger is triggered which captures all the DML changes that are happening during the archive process, once the archive finishes the logger applies all the changes either it could be commit or rollback statements. Remember the logger is active only when the archive is running and not active when the archive job finishes its archival process.

Code for ONLINE ARCHIVE

ARCHIVE DATA TABLES (DATABASE.TABLE),
RELEASE LOCK,
ONLINE,
FILE=;

How to find which tables or databases are archived ONLINE currently:

select CreateTimeStamp,
DatabaseName (VARCHAR(30)),
TVMName (VARCHAR(30))
from DBC.ArchiveLoggingObjsV;

DBC.ArchiveLoggingObjsV is a view that provides information about a table(s) that has an active log for the online archive.

Notes: It is not recommended to have the view databases to be archived with ONLINE criteria. I came across a bug dated October 10th 2010 when you archive view databases online, the archive job doesn't release the online logging active log. so it prohibits to do any DDL changes like Refresh view even after the archival process is completed. you need to issue an exclusive OFF statement, it is below:

LOGGING ONLINE ARCHIVE OFF FOR (DATABASENAME), OVERRIDE;
---you use the override in situations where the user ID logged in to issue the above statement is different from the user ID that turned ON the ONLINE LOGGING.
---One more note is that you need to issue this statement as an ARC statement and not any SQL statment. It is not like the sql statement that you use to release the HUT locks from any sql tool.
I'm attaching a simple script and how to invoke it.

step1:
create an arc script with any name like release_online_arc_lock.arc
step2:
code for the script should be like this:
.LOGON TDPID/dbc, ;

LOGGING ONLINE ARCHIVE OFF FOR (DATABASE), OVERRIDE;

.LOGOFF;

step3:
After saving invoke the command prompt, a DOS command prompt assuming the BAR server is on a windows box. type the below statement in the command prompt and enter
arcmain
step4: You should see in the following:

C:\Documents and Settings\vkillada>arcmain release_lock.arc
10/11/2010 13:19:42 Copyright 1989-2008, Teradata Corporation.
10/11/2010 13:19:43 All Rights Reserved.
10/11/2010 13:19:43
10/11/2010 13:19:43 *** **** ****
10/11/2010 13:19:43 * * * * * PROGRAM: ARCMAIN
10/11/2010 13:19:43 ***** **** * RELEASE: 13.00.00.05
10/11/2010 13:19:43 * * * * * BUILD: 080221eW (Nov 12 2009)
10/11/2010 13:19:43 * * * * ****
10/11/2010 13:19:43
10/11/2010 13:19:43 RESTARTLOG = ARCLOG101011_131943_130196.RLG
10/11/2010 13:19:43
10/11/2010 13:19:43 PARAMETERS IN USE:
10/11/2010 13:19:43
10/11/2010 13:19:43
10/11/2010 13:19:43
10/11/2010 13:19:43 CHARACTER SET IN USE: ASCII
10/11/2010 13:19:43 .LOGON TDPID/dbc,;
10/11/2010 13:19:45 LOGGED ON 3 SESSIONS
10/11/2010 13:19:45
10/11/2010 13:19:45 DBS LANGUAGE SUPPORT MODE Standard
10/11/2010 13:19:45 DBS RELEASE 12.00.03.08
10/11/2010 13:19:45 DBS VERSION 12.00.03.08
10/11/2010 13:19:45
10/11/2010 13:19:45 STATEMENT COMPLETED
10/11/2010 13:19:45
10/11/2010 13:19:45
10/11/2010 13:19:45 LOGGING ONLINE ARCHIVE OFF FOR (DATABASENAME),
OVERRIDE;
10/11/2010 13:19:45 UTILITY EVENT NUMBER - 326813
10/11/2010 13:19:46 STATEMENT COMPLETED
10/11/2010 13:19:46
10/11/2010 13:19:46
10/11/2010 13:19:46 .LOGOFF;
10/11/2010 13:19:47 LOGGED OFF 3 SESSIONS
10/11/2010 13:19:47 STATEMENT COMPLETED
10/11/2010 13:19:47
10/11/2010 13:19:47
10/11/2010 13:19:47 ARCMAIN TERMINATED WITH SEVERITY 0

THE ONLINE will trigger again if the online archive arcmain script is initiated.
Now if you run this query again you should not see the database that you could not get released after the arcmain script kicked off. The above script exclusively releases the logging locker.


select CreateTimeStamp,
DatabaseName (VARCHAR(30)),
TVMName (VARCHAR(30))
from DBC.ArchiveLoggingObjsV;

if you've any questions related to ARCHIVAL AND LOGGING ISSUES....please feel free to email me
vasudev.killada@gmail.com or at least comment to have me update this blog with more information.











______________________________________________________________

Wednesday, October 6, 2010

Rollback Priority in Teradata

Roll-backing is a CPU intensive job. It happens when the User aborts a query which has done some DML changes to the database and the database is in a phase of revoking all the changes. This happens to maintain the transaction integrity and is maintained through transient journal also called TJ. All the incomplete transactions are stored in the transient journal. The system table is dbc.transientjournal.

Imagine if you have millions to billions of rows that needs to be rollbacked and the caveat to it is Rollbacks are executed in the RUSH performance group of the default partition if the DBS settings are set to FALSE which is default. The rollback job will compete for CPU with other users and will impact the performance a lot. In addition the rollback will have the locks on the tables that are involved in it which are not accessed to the users. you can wait till the rollback finishes or can cancel the rollback job. I've written in my blog how to cancel a rollback job, refer - http://goterabytes.blogspot.com/2010/08/canceling-rollback-job.html

You can cancel the rollback for table availability. It is designed for its own benefit so that rollbacks finish quick and the table is available to the business. But if other users are impacted it needs to be run not in the highest performance group of the default partition.


ROLLBACKPRIORITY SET TO TRUE:

If RollbackPriority is TRUE, rollbacks are executed within the aborted job's performance group and associated resource partition. The intent of this is to isolate the rollback processing to the job's performance group, while not affecting performance of the rest of the system. TRUE is better for smaller rollbacks to non-critical, less extensively used tables.

This only affects from aborted sessions by the user and does not have any affect on the rollbacks caused by the DBS restart. The restart rollback setting can be configured by the rcvmanager utility.

Swapping in Teradata

Swapping in Teradata is bad and has a big impact on performance. We can witness the difference with few performance tweaks.
One among lots that I witnessed recently was fsgcache%. The more the swapping the more the impact on Performance. There should be a balance between Teradata resources and OS resources. The Operating system needs to have room to run its process. We certainly do not want to starve cache in favor of disk reads, but when we swap, we change the whole layout and picture and then all the AMPS will be throttled by a single swap drive. Outside of FSGCACHE are things that the O/S needs room to run.

To eliminate this swapping on a performance critical boxes, it is highly recommended to look into FSGCACHE %. This is pre configured when it is shipped but it doesn’t apply or hold true to all the businesses. If swaps are significantly worse then lowering the value will have a bigger impact.

Things to look:
• Swap on system
• Skew between nodes
• LINUX run queues

How to look swap on a LINUX system

This can be done utilizing the sar utility in LINUX. It is a system activity report.
The samples are taken every five minutes throughout the day, the impact of swap can read and reviewed by running the following commands:

psh sar -W
This will output the entire day's samples whenever it is run. The "psh" part of it runs it across all the nodes. Do not expect to see anything in the Hot Stand By or HSN node, since Teradata is not actively running on HSN.

Options with sar command which needs to be frequently used.

sar -u
%nice - this takes the place of %user that we used to use in MP-RAS
%sys - this is the time spent on system calls
%IO wait - This is all I/O on the system (disk, bynet, etc.).
%cpu idle - This is a good indication of CPU utilization
Like to see %nice to %sys at 60-70%/30-40%
%IO wait can be highly variable since it records all I/O. I don't use it too much.
%cpu idle can certainly run down to 0%, but for extended periods, probably means you are near capacity.

sar -q
runq-sz - is the number of runnable Linux processes waiting to be serviced. This is a good indicator of application activity. Teradata primarily runs AWT processes. Since you have 63 usable AWTs, when the run queue exceeds that, processes wait. It is not bad to wait sometimes, and this queue gets drained
and refilled at a high rate. When I see periods of 100+ in the run queues for sustained periods of time, this will indicate a high degree of concurrent work. This can also be compared between nodes on a system to get a view of overall skew on the machine.

sar -W
pswpin and pswpout - I add these together to get an overall picture of swap. Swap in Teradata is bad. It throttles the AMPS from performing efficiently. FSGCACHE is utilized to tune this. Occasional bursts of this in the 0 - .5 range will not impair operations. Sustained in that range should be tuned.
Ranges from .5 to 5 blocks/sec need tuning to eliminate them. These will impact performance.Ranges above 5 require emergency tuning since their impact on Teradata operation is extreme.These are averages in blocks/sec, so within a 5 minute interval (which is what we sample), this could indicate muchhigher peak values for many seconds at a time, which could severely impact <10sec SLAs for high performance
queries.

And for more information a manual page for sar can be viewed on the system itself.

Peeking facility in Teradata

What is peeking?

Peeking means to check all potential optimization by looking at the USING values during the query parsing time and use those to pick efficient path to satisfy the request.

What is generic plan?
Generic plan is a generic plan for all the like queries, which is cached and reused to save parsing time, it is the time that CPU takes to parse and generate a plan and sending it across to the Dispatcher, but a cached plan is not always the right and correct plan.

What is specific plan?

A plan that is generated specific to a query and is different for different USING values.

What determines an Optimizer to use a specific plan?

DisablePeekUsing field in the DBS control settings determines to use whether to use specific or generic plan. By default it is shipped as FALSE meaning Optimizer Performance enhancements are enabled. The Optimizer has the ability to check various USING values and pick an efficient path. By making it to TRUE , the optimizer enhancements are disabled.
When the setting is set to TRUE, parameterized queries will no longer be eligible for specific plans, generic plans will be used for all parameterized queries, by doing so the parsing overhead is eliminated when the peeking is turned off.

Tuesday, August 31, 2010

Amp Worker Tasks behavior using STREAM or TPUMP operator.

Tpump now called stream (TD12 - Teradata implementation for real time trickle feed) under the covers of TPT; it’s a row level operation which based on the hashing algorithm does the DML operation on the hash row. There could be an application level contention based on the NUPI defined on the table having operations on the same rowhash ID. This can be controlled by proper configuration of the parameters and settings (SLEEP, TENACITY, SERIALIZATION, SESSIONS etc). The session blocking is only momentary but sometimes this can lead to overhead contention and nuisance alerts to the DBAs. Having more sessions is as harmful for performance as having as little as one session, so careful analysis in deciding the sessions will not only improve TPUMP throughput but also improve overall system performance.

System level design for TPUMP operation w/ table designed as fallback

When a DML operation is performed on a fallback table P2P (point-to-point) protocol message is spawned from the main primary AMP to the fallback AMP. The fallback AMP now has the copy as a fallback. This entire operation needs two AWTs (Amp Worker Task) doubling the number as opposed to regular non fallback DML operation.

For e.g.: How it works when a user submits an insert operation on table with fallback definition
Steps:
1. An Amp Worker Task is assigned on the AMP that has the primary copy. Once the insert is successful a message is transmitted to the fallback AMP.
2. Once the fallback AMP receives the message it will try to insert into its AMP so needed another AWT, the AWT on the primary is not relinquished till the operation is successful on the fallback. So two AWTs are on hold until the fallback operation is successful. If the fallback AMP doesn’t get the AWT right away it will be in the queue and the primary AMP will not relinquish its AWT. You can call it as a design flaw or data integrity but that is how it works.

So if you are seeing any delays on the top of deadlocks that TPUMP has on because of multiple sessions, it also could be AWTs which are so minimal but on a busy day with a flow control situation, it worsens to the whole performance package.

Recommended actions/solutions:

1. I don’t recommend this but some sites do this; for Tpump Jobs you can specify some reserved AWTs. Because reserving an AWT will have a 2n+1 rule in place, where n is the number of AWTs to be reserved. The spawning of AWTs make it worse when we reserve.
2. Have both object and workload throttles to have so many AWTs available in the spool so that TPUMP jobs have good throughput. I recommend this approach and it worked much better.

Canceling a rollback Job

A big Rollback Job can bring the system to its knees. If your DBS setting parameter RollbackPriority is set to FALSE then it takes the priority and will not honor any rules related to CPU assignment or TASM or anything. The Rollback job runs in RUSH priority and will bring down your system. This and the ALTER statement will never honor your CPU limitation.

From TD12 I've started canceling of Rollback Job but it is made possible from V2R5.1 release. This may lead even to drop the table on which the cancel rollback is applied, so chances are to lose the data. if you've it from the back up you either can restore/copy the data from the backups.

so what causes a bad rollback?
A user submitting a DML operation which after running a while and not giving the results in the stipulated time or is taking too much time, either the user wants to abort it or wait till it finishes, if he choose to abort and this will cause a rollback on the table which takes more time to relinquish the resources.

How to avoid it?
Stop issuing the rollback or if you issued a rollback then try to cancel the rollback. I suggest in the DBS Control Record of General Fields for the column RollbackPriority set to TRUE which will nullify the rollback to be a resource hog.
For some small rollback jobs the time taken could be negligible but for some big jobs the rollback needs to finish asap so that the ETL team can load further or start loading to the same table with correct update/insert statements.

Notes:
Remember that you may want to drop the table and create a new one most of the time when you cancel a rollback job as table state could be corrupted for most of the times.
The rcvmanager utility is used to do the Job using the root password for the system assuming it is a LINUX box.

Steps to cancel a Rollback Job:
1) Log into the node using the root username and password.
2) Connect to the CNS by typing CNSTERM 6 to find where rcvmanager is? usually it is
in window 1 and press control C to come out of it and type cnsterm 1
3) type start rcvmanager and once you get the prompt for enter the command type -
list rollback tables
4) once you get the list of the rollback tables and get the tableID for the table
you need to cancel rollback.
5) your table ID is in hex vaule so makes sure you copy values before and after the
colon (:)
6) Issue the cancel rollback statement; cancel rollback on table table ID and it
prompts for the DBC password.
7) To verify the status you can issue - list cancel rollback tables to get the
Information about the rollback status, it will be in pending state.
8) Once done you can see the system performance and its speed. At times during a
rollback I observed no more logons are allowed by the database; once the rollback
cancellation all the jobs and resources come to normal behavior.

Hope this article helps to folks who fall in this situation.

Why do we need Query Banding?

By enabling Database Query Log (DBQL) and installed the performance data collection and reporting (PDCR) database in all Teradata environments the organization will have plethora of log information, statistical information and other audit information.
The DBQL and PDCR log and store historical data about queries including query duration, CPU consumption and other performance metrics. The data offers information to calculate suspect query indicators such as large-table scans, skewing and highly consumptive joins. In addition, the data provides diagnostic metrics to help tune applications, monitor performance, manage capacity and operate the Teradata system at peak efficiency.
To fully leverage DBQL and PDCR capabilities, it is imperative for the BI group to enable query banding in universe, which will provide us a valuable tool to band each BusinessObject query submitting to Teradata with a number of attributes and values that provide detailed information about the query. The information allows us to effectively tune BusinessObjetcs applications by identifying and tuning target reports and proactively preventing report performance problems.
Query banding is especially valuable for our BusinessObjects applications since we use the generic accounts to access Teradata database. Query banding will enable us to identify the unique attributes of each request.

How to set Query Banding in universe parameter?

Teeadata database supports passing of parameters before the SELECT statement. The BEGIN_SQL parameter allows us to prefix SQL statements with the query banding values each time a SQL statement is generated. Below are steps to set up the query banding in universe:

• Start the Universe Designer
• Open your Universe (for existing Universes, please import them first)
• Click on File menu -> Parameters…
• Click on Parameter tab
• In the "Parameter" list, click on BEGIN_SQL parameter
If does not exist, add one.



• Enter value for BEGIN_SQL:
Copy following name/value pairs into the Value box:

SET QUERY_BAND = 'USER = '@Variable('BOUSER')'; DOCNAME = '@Variable('DOCNAME')'; DPNAME = '@Variable('DPNAME')'; DPTYPE = '@Variable('DPTYPE')'; UNIVERSE = '@Variable('UNVNAME')'; UNIVERSEID = '@Variable('UNVID')';' FOR TRANSACTION;

Variable description:
a) BOUSER - user login
b) DOCNAME - the name of the document
c) DPNAME - the name of the Data
d) DPTYPE - the type of the Data
e) UNVNAME - the name of the universe
f) UNVID - the ID of the universe used

• Click Replace



• Click OK
• Save the Universe
• Export the Universe to CMS

Oracle to Teradata Migration Project tips

Introduction:
The decision to move to Teradata from an OLTP systems like oracle etc, with multiple data marts into an Enterprise Data warehouse is not complicate to start with but this needs right skills and planning to do and this gives the edge for the business to have the 3NF physical model, linear scalability, mixed workload capacity in an active and challenging business day-to-day life.

Birds Eye view agenda:
• Get the business requirements from the client anything that is necessary/related to the migration
• Have an approach how to design the physical database like tables, Indexes, Views and other database objects.
• All the business related Log and Historic Information related to data migration.
• Incremental Load conversion


Detail Technical and Business artifact for Migration Implementation

Client related Infrastructure:
• Business related rules and triggers embedded in data
• Data model of the existing business and the model for the tales that needs to be moved.
• All the software and hardware compatible with the Release of the Teradata that is in place.
• Make sure that the requirements are at least frozen during the port operations at the sending side (Oracle/DB2).
• If multiple data marts are ported do one at a time and then determine name of the production database and view database.
• Design a strategy in creating profiles and users.
• As part of the data movement and migration make sure we have the information of what gets moved and how, meaning are we landing onto a disk and doing or direct fork lift?
• Determine the right PI in table designing talking to the ETL and Front end Reporting folks.
• Design initial load jobs with the assumption that the target tables are created.
• Determine the type of slowly changing dimensions and design accordingly.

Approach for Physical DB design and conversion:
• Teradata maximum length of a database object is 30, so keep in mind always about the naming convention, like DB2 OS390 Maximum name length = 18 DB2 Unix Maximum name length = 128, no problem with Oracle as its limit is also 30 characters.
• Ask if the client has any naming standard, if so we can follow it; so that a mapping sheet is created for 120 to 30 characters and have the business sit with us to review the sheet and agree upon for a valid names
• Like have all the DDL and indexes to be converted, for DB2
o Use the DB2 system Catalog as input into a conversion script
o Use data modeling tool like ERWIN and do the reverse engineering
o Have a DB2LOOK command and an editor.
o Same thing goes with the Indexes, go with Teradata index creation criteria; use the columns and index table to u unload from the DB2
o Enough testing needs for converting UDFs into Teradata, either the source is DB2 or any other database because the language and features are different. The easy way to handle this is to put the UDF is to create macros or stored procedure and UDTs to data containers in stored procedure.


Log and Historic migration:

• Determine the migration approach, is it going to be an ETL tool doing the pull and push or the source data is put into flat files and then use this to load into Teradata using load utilities; since the tables are empty a FASTLOAD can be used to do the loading.
• The SLA violation is something that determines how to load the data. From legacy systems to a system using legacy ETL and then the Teradata ETL methodology to production or use a scripting tool to generate the load and unload scripts to get the data in place, the second one is an automated process which loads and unloads at the same time (using NDM technology)
• Have all the rules next to you like for instance the flat file format rules for transfer to Teradata, what if it is delimited to fixed or both.


Incremental Load conversion:

• Requirement can be completely ported to Teradata and run the loads or during the port operations the data could be loaded to the DB2/Oracle database; there are ways to handle this:
o Either stamp with a date and the migration moves the data from a point in time also called Incremental load synchronization
o The other way is a refresh, the data is moved at the very end of the migration process after the source system finishes it last run before the first run of the new system, if this is not possible for business continuity reason then a custom solution needs to be implemented to capture the change.

Process flow for a smooth data migration

• The design should be in place for the migration before testing anything with all validation in place
• Then starts the ETL and the specs to build followed by migration of data
• The Integration testing comes into picture with all the sync effort with the parallel systems if it is implemented that way.
• For huge migration projects automation is recommended like create process, macros in place for real time reporting status for tracking purpose.

Conclusion:

For any projects planning phase is so important. Measure thousand times before you cut. I took some time to write few things to share my knowledge and experience as an Lead Teradata DBA, DB Architect and GoldenGate Specialist with strong ETL Architectural development, I put this document as a high level to share my knowledge. I’ll be more than happy if this document can help in any of the missing gaps of your planning and analysis phase.

Fast Path Insert and Delete Statements.

Rules for Fast Path Insert and Delete Statements

Solution/source: Knowledge article

Tech Note

Unconstrained (Fast Path) DELETE Processing
When submitted by itself as a transaction, an unconstrained delete
(DELETE...ALL) is processed very quickly.
An unconstrained DELETE is one where no conditions are attached, that is, all
rows are deleted. When an unconstrained delete is submitted as part of a
multi statement transaction, the speed with which it is processed depends on its
position in the transaction.
To enable fast path processing, which defers processing until a transaction ends
and allows rollback processing to be almost instantaneous, positioning must be
as follows, depending on the mode being used.

In Teradata Mode:
FOR this type of transaction Positioning must be as follows
Implicit (such as a multistatement macro or BTEQ request) DELETE...ALL must be the last statement in the request.
DELETE...ALL must be immediately followed by the END TRANSACTION
statement that terminates the currently open transaction.
Explicit This implementation is not available to a Teradata embedded SQL application
because the DELETE and END TRANSACTION statements must fall into the
same request packet.
======================================================================
Fast Path INSERT … SELECT Statements
Multistatement INSERT … SELECTs are optimized so that each statement
except for the last in the series returns a ‘zero row inserted’ message as a
response. The retrieved rows for each SELECT are sent to an empty target table.
The last INSERT … SELECT statement returns the total number of rows
inserted for the entire request and sorts and merges the spool table into the
target table.
Columns defined with the COMPRESS option cannot participate in fast path
optimizations, so if you perform an INSERT … SELECT operation on
compressed columns, fast path optimization is not specified by the Optimizer
when it creates an access plan.

Rules for Fast Path INSERT … SELECT Statements
There are several restrictions that must be observed in order for the high
performance fast path optimization to succeed in a multistatement request.
These rules are as follows:
• The target table must be empty.
• All INSERT statements in the multistatement request must have the same
target table.
• Only INSERT statements can be included in the request.
If you insert other statement types into the multistatement request, the fast path
optimization does not occur (only the first INSERT … SELECT in the series is
optimized) and performance degrades accordingly.

Tuesday, May 18, 2010

Goldengate

what is goldengate?
GoldenGate enables the exchange and manipulation of data at the transaction level among multiple, heterogeneous platforms across the enterprise. Its modular architecture gives you the flexibility to extract and replicate selected data records, transactional changes, and changes to DDL (data definition language1) across a variety of topologies. With this flexibility, and the filtering, transformation, and custom processing features of
GoldenGate, you can support numerous business requirements:
● Business continuance and high availability.
● Initial load and database migration.
● Data integration.
● Decision support and data warehousing.