Tuesday, October 12, 2010

ONLINE ARCHIVAL procedure and troubleshooting.

ONLINE ARCHIVE

Online archiving is a facility that is available to to have concurrent DML operations run while the archive process archiving. ONLINE ARCHIVE means that the database or tables that are archived are not locked for any DML operations, meaning archive is done during the real time operation of the business. How it works is the logger is triggered which captures all the DML changes that are happening during the archive process, once the archive finishes the logger applies all the changes either it could be commit or rollback statements. Remember the logger is active only when the archive is running and not active when the archive job finishes its archival process.

Code for ONLINE ARCHIVE

ARCHIVE DATA TABLES (DATABASE.TABLE),
RELEASE LOCK,
ONLINE,
FILE=;

How to find which tables or databases are archived ONLINE currently:

select CreateTimeStamp,
DatabaseName (VARCHAR(30)),
TVMName (VARCHAR(30))
from DBC.ArchiveLoggingObjsV;

DBC.ArchiveLoggingObjsV is a view that provides information about a table(s) that has an active log for the online archive.

Notes: It is not recommended to have the view databases to be archived with ONLINE criteria. I came across a bug dated October 10th 2010 when you archive view databases online, the archive job doesn't release the online logging active log. so it prohibits to do any DDL changes like Refresh view even after the archival process is completed. you need to issue an exclusive OFF statement, it is below:

LOGGING ONLINE ARCHIVE OFF FOR (DATABASENAME), OVERRIDE;
---you use the override in situations where the user ID logged in to issue the above statement is different from the user ID that turned ON the ONLINE LOGGING.
---One more note is that you need to issue this statement as an ARC statement and not any SQL statment. It is not like the sql statement that you use to release the HUT locks from any sql tool.
I'm attaching a simple script and how to invoke it.

step1:
create an arc script with any name like release_online_arc_lock.arc
step2:
code for the script should be like this:
.LOGON TDPID/dbc, ;

LOGGING ONLINE ARCHIVE OFF FOR (DATABASE), OVERRIDE;

.LOGOFF;

step3:
After saving invoke the command prompt, a DOS command prompt assuming the BAR server is on a windows box. type the below statement in the command prompt and enter
arcmain
step4: You should see in the following:

C:\Documents and Settings\vkillada>arcmain release_lock.arc
10/11/2010 13:19:42 Copyright 1989-2008, Teradata Corporation.
10/11/2010 13:19:43 All Rights Reserved.
10/11/2010 13:19:43
10/11/2010 13:19:43 *** **** ****
10/11/2010 13:19:43 * * * * * PROGRAM: ARCMAIN
10/11/2010 13:19:43 ***** **** * RELEASE: 13.00.00.05
10/11/2010 13:19:43 * * * * * BUILD: 080221eW (Nov 12 2009)
10/11/2010 13:19:43 * * * * ****
10/11/2010 13:19:43
10/11/2010 13:19:43 RESTARTLOG = ARCLOG101011_131943_130196.RLG
10/11/2010 13:19:43
10/11/2010 13:19:43 PARAMETERS IN USE:
10/11/2010 13:19:43
10/11/2010 13:19:43
10/11/2010 13:19:43
10/11/2010 13:19:43 CHARACTER SET IN USE: ASCII
10/11/2010 13:19:43 .LOGON TDPID/dbc,;
10/11/2010 13:19:45 LOGGED ON 3 SESSIONS
10/11/2010 13:19:45
10/11/2010 13:19:45 DBS LANGUAGE SUPPORT MODE Standard
10/11/2010 13:19:45 DBS RELEASE 12.00.03.08
10/11/2010 13:19:45 DBS VERSION 12.00.03.08
10/11/2010 13:19:45
10/11/2010 13:19:45 STATEMENT COMPLETED
10/11/2010 13:19:45
10/11/2010 13:19:45
10/11/2010 13:19:45 LOGGING ONLINE ARCHIVE OFF FOR (DATABASENAME),
OVERRIDE;
10/11/2010 13:19:45 UTILITY EVENT NUMBER - 326813
10/11/2010 13:19:46 STATEMENT COMPLETED
10/11/2010 13:19:46
10/11/2010 13:19:46
10/11/2010 13:19:46 .LOGOFF;
10/11/2010 13:19:47 LOGGED OFF 3 SESSIONS
10/11/2010 13:19:47 STATEMENT COMPLETED
10/11/2010 13:19:47
10/11/2010 13:19:47
10/11/2010 13:19:47 ARCMAIN TERMINATED WITH SEVERITY 0

THE ONLINE will trigger again if the online archive arcmain script is initiated.
Now if you run this query again you should not see the database that you could not get released after the arcmain script kicked off. The above script exclusively releases the logging locker.


select CreateTimeStamp,
DatabaseName (VARCHAR(30)),
TVMName (VARCHAR(30))
from DBC.ArchiveLoggingObjsV;

if you've any questions related to ARCHIVAL AND LOGGING ISSUES....please feel free to email me
vasudev.killada@gmail.com or at least comment to have me update this blog with more information.











______________________________________________________________

Wednesday, October 6, 2010

Rollback Priority in Teradata

Roll-backing is a CPU intensive job. It happens when the User aborts a query which has done some DML changes to the database and the database is in a phase of revoking all the changes. This happens to maintain the transaction integrity and is maintained through transient journal also called TJ. All the incomplete transactions are stored in the transient journal. The system table is dbc.transientjournal.

Imagine if you have millions to billions of rows that needs to be rollbacked and the caveat to it is Rollbacks are executed in the RUSH performance group of the default partition if the DBS settings are set to FALSE which is default. The rollback job will compete for CPU with other users and will impact the performance a lot. In addition the rollback will have the locks on the tables that are involved in it which are not accessed to the users. you can wait till the rollback finishes or can cancel the rollback job. I've written in my blog how to cancel a rollback job, refer - http://goterabytes.blogspot.com/2010/08/canceling-rollback-job.html

You can cancel the rollback for table availability. It is designed for its own benefit so that rollbacks finish quick and the table is available to the business. But if other users are impacted it needs to be run not in the highest performance group of the default partition.


ROLLBACKPRIORITY SET TO TRUE:

If RollbackPriority is TRUE, rollbacks are executed within the aborted job's performance group and associated resource partition. The intent of this is to isolate the rollback processing to the job's performance group, while not affecting performance of the rest of the system. TRUE is better for smaller rollbacks to non-critical, less extensively used tables.

This only affects from aborted sessions by the user and does not have any affect on the rollbacks caused by the DBS restart. The restart rollback setting can be configured by the rcvmanager utility.

Swapping in Teradata

Swapping in Teradata is bad and has a big impact on performance. We can witness the difference with few performance tweaks.
One among lots that I witnessed recently was fsgcache%. The more the swapping the more the impact on Performance. There should be a balance between Teradata resources and OS resources. The Operating system needs to have room to run its process. We certainly do not want to starve cache in favor of disk reads, but when we swap, we change the whole layout and picture and then all the AMPS will be throttled by a single swap drive. Outside of FSGCACHE are things that the O/S needs room to run.

To eliminate this swapping on a performance critical boxes, it is highly recommended to look into FSGCACHE %. This is pre configured when it is shipped but it doesn’t apply or hold true to all the businesses. If swaps are significantly worse then lowering the value will have a bigger impact.

Things to look:
• Swap on system
• Skew between nodes
• LINUX run queues

How to look swap on a LINUX system

This can be done utilizing the sar utility in LINUX. It is a system activity report.
The samples are taken every five minutes throughout the day, the impact of swap can read and reviewed by running the following commands:

psh sar -W
This will output the entire day's samples whenever it is run. The "psh" part of it runs it across all the nodes. Do not expect to see anything in the Hot Stand By or HSN node, since Teradata is not actively running on HSN.

Options with sar command which needs to be frequently used.

sar -u
%nice - this takes the place of %user that we used to use in MP-RAS
%sys - this is the time spent on system calls
%IO wait - This is all I/O on the system (disk, bynet, etc.).
%cpu idle - This is a good indication of CPU utilization
Like to see %nice to %sys at 60-70%/30-40%
%IO wait can be highly variable since it records all I/O. I don't use it too much.
%cpu idle can certainly run down to 0%, but for extended periods, probably means you are near capacity.

sar -q
runq-sz - is the number of runnable Linux processes waiting to be serviced. This is a good indicator of application activity. Teradata primarily runs AWT processes. Since you have 63 usable AWTs, when the run queue exceeds that, processes wait. It is not bad to wait sometimes, and this queue gets drained
and refilled at a high rate. When I see periods of 100+ in the run queues for sustained periods of time, this will indicate a high degree of concurrent work. This can also be compared between nodes on a system to get a view of overall skew on the machine.

sar -W
pswpin and pswpout - I add these together to get an overall picture of swap. Swap in Teradata is bad. It throttles the AMPS from performing efficiently. FSGCACHE is utilized to tune this. Occasional bursts of this in the 0 - .5 range will not impair operations. Sustained in that range should be tuned.
Ranges from .5 to 5 blocks/sec need tuning to eliminate them. These will impact performance.Ranges above 5 require emergency tuning since their impact on Teradata operation is extreme.These are averages in blocks/sec, so within a 5 minute interval (which is what we sample), this could indicate muchhigher peak values for many seconds at a time, which could severely impact <10sec SLAs for high performance
queries.

And for more information a manual page for sar can be viewed on the system itself.

Peeking facility in Teradata

What is peeking?

Peeking means to check all potential optimization by looking at the USING values during the query parsing time and use those to pick efficient path to satisfy the request.

What is generic plan?
Generic plan is a generic plan for all the like queries, which is cached and reused to save parsing time, it is the time that CPU takes to parse and generate a plan and sending it across to the Dispatcher, but a cached plan is not always the right and correct plan.

What is specific plan?

A plan that is generated specific to a query and is different for different USING values.

What determines an Optimizer to use a specific plan?

DisablePeekUsing field in the DBS control settings determines to use whether to use specific or generic plan. By default it is shipped as FALSE meaning Optimizer Performance enhancements are enabled. The Optimizer has the ability to check various USING values and pick an efficient path. By making it to TRUE , the optimizer enhancements are disabled.
When the setting is set to TRUE, parameterized queries will no longer be eligible for specific plans, generic plans will be used for all parameterized queries, by doing so the parsing overhead is eliminated when the peeking is turned off.