Dedomena: 2015

Friday, 20 November 2015

Goldengate Replication - Sybase to Sybase Database - Step By Step

This post covers Oracle Goldengate data replication between two sybase databases.

Installation is not covered in this post as it is a simple process of just extracting the
.tar file.

#### Setting Environment Variables on Sybase Server ####

go to gg install home

cd /GGS/home

$ vi .ggsprof and add following two entries

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/GGS/home

export PATH=$PATH:/GGS/home

$ chmod 775 .ggsprof

################################################

#### Preparing Sybase Database for GG Capture ####

isql -U dbadmin -P frodo -- logging into isql

>sp_helpdb
>Go

-- will list all databases

>use rknet -- switching to database we want to use
>Go

-- setting secondary truncation point in sybase logs

1> dbcc settrunc ('ltm','valid')
2> Go
ltm_truncpage ltm_trunc_state db_rep_stat gen_id dbid
dbname lti_version
------------- --------------- ----------- ----------- ------
------------------------------ -----------
2746034 1 1 0 9
rknet

#### Logging into db from ggsci prompt ####

$ cd /GGS/home
$ ggsci

GGSCI (kondorapp) 1> dblogin sourcedb rknet, USERID dbadmin
Password:

2015-11-20 12:41:30 INFO OGG-04508 Oracle GoldenGate is searching the Sybas e CS library version that is to be used. Please ignore any error that is reporte d by Sybase as a part of this search process.

The context allocation routine failed.

The following problem caused the failure:

Invalid context version.

The context allocation routine failed.

The following problem caused the failure:

Invalid context version.

2015-11-20 12:41:30 INFO OGG-04509 Oracle GoldenGate is using the Sybase CS library version: [15001].

2015-11-20 12:41:30 INFO OGG-04501 Charset information: Client charset: [is o_1], Database/Server charset: [iso_1], CHARSETCONVERT: [OFF].
Successfully logged into database. --- check this and ignore all errors above

GGSCI (kondorapp as dbadmin@KONDOR) 2>

#############################################

#### Adding Trandata on Table ####

NOTE - UNLIKE ORACLE, ADDING TRANDATA IN SYBASE CAN CAUSE DEADLOCKS SO EXECUTE IT DURING LEAN HOURS.

GGSCI (kondorapp as dbadmin@KONDOR) 2> add trandata dbo.ScheduleLog

Transaction logging enabled for table dbo.ScheduleLog.

GGSCI (kondorapp as dbadmin@KONDOR) 3> info trandata dbo.ScheduleLog

Transaction logging enabled for table dbo.ScheduleLog.

#################################################

#### Configuring Extract ####

ggsci > edit params ext1
EXTRACT ext1
SOURCEDB rknet, userid dbadmin, password frodo
EXTTRAIL ./dirdat/ET
TABLE dbo.ScheduleLog;

ggsci> add extract ext1, tranlog, begin now

ggsci> add exttrail ./dirdat/ET, extract ext1, megabytes 1024

ggsci> start ext1

##################################################

###### SYBASE DATABASE OPERATIONS #######

$ isql -U dbadmin -P frodo

>use rknet -- switching to database we want to use
>Go

--taking backup of schedulelog table

1> select * into ScheduleLog_backup from ScheduleLog
2> Go
(2887526 rows affected)
1> commit
2> Go

--creating empty table from schedulelog

1> select * into ScheduleLogRep from ScheduleLog where 1=2
2> Go
(0 rows affected)
1> select * from ScheduleLogRep
2> Go
DealLogId SettlementLeg FlowDate
FlowAmount FlowCustom ZeroIfNetting
SettlCoeff CLS
-------------------------- ------------- --------------------------
--------------------------- --------------------------- -------------
--------------------------- ------

(0 rows affected)
##################################################

###### CONFIGURING REPLICAT ON SAME SERVER SAME DB #######

NOTE - REPLICATE WILL READ TRAIL FILE PRODUCED BY EXTRACT, SINCE BOTH EXT AND REP ARE ON SAME
SERVER, THERE IS NO USE OF A DATA PUMP EXTRACT.

REPLICATION IS FROM DBO.SCHEDULELOG TO DBO.SCHEDULELOGREP IN THE SAME DATABASE RKNET

ggsci> edit params rep1

REPLICAT REP1
TARGETDB rknet, userid dbadmin, password frodo
HANDLECOLLISIONS
ASSUMETARGETDEFS
DISCARDFILE ./dirrpt/rep1.dsc, purge
MAP dbo.ScheduleLog, TARGET dbo.ScheduleLogRep;

ggsci> add replicat rep1, exttrail ./dirdat/ET

ggsci> start rep 1

##################################################

#### Populating Data and Checking Status of Replication ####

SELF INSERT INTO THE SOURCE TABLE IS CARRIED OUT.

1> insert into ScheduleLog select * from ScheduleLog
2> Go
(2887526 rows affected)
1> commit
2> Go
1> select count(1) from dbo.ScheduleLogRep
2> Go

-----------
2887526

#### Checking Status of Replicat Process ####

Sending STATS request to REPLICAT REP1 ...

Start of Statistics at 2015-11-20 14:32:07.

Replicating from dbo.ScheduleLog to dbo.ScheduleLogRep:

*** Total statistics since 2015-11-20 14:17:08 ***
Total inserts 2887526.00
Total updates 0.00
Total deletes 0.00
Total discards 0.00
Total operations 2887526.00

##################################################

Note -
There are lot of limitations w.r.t Sybase database. e.g. encrypted data types, lobs & DDL Replication. Do check out the install guide for the same here
https://docs.oracle.com/cd/E18101_01/doc.1111/e17806.pdf

Thursday, 5 November 2015

Analyzing DB issues without AWR or ASH reports - The Magic Query

It's a typical DBA scenario....you land up at your desk and are bombarded by users complaining that the system experienced a hiccup of 2 minutes or so in the morning when no one was in office. Transactions were dropped etc....

So what do you do as a DBA....fire up the AWR and ASH reports and try to link the data to the problem cause. Sometimes you just wish you were there to observe what the database was experiencing, in real-time, than doing a post apocalypse analysis.

Well here is a query that will take you back in time and show you events as if you were viewing the system in real time. The result is that you are able to address problems much quicker and find the guilty sql ids in a snap.

Lets say the user gives you a perfect time when the system went bonkers....

Step 1 - Find the SNAP ID's corresponding to this time frame.....

SELECT SNAP_ID,BEGIN_INTERVAL_TIME FROM DBA_HIST_SNAPSHOT ORDER BY 2;

Step 2  - Find the event, sql_ids, sessions IDs as they were during the issue time...

col event for a40

set lines 1000 pages 1000

SELECT SESSION_ID,SQL_ID,EVENT,SAMPLE_TIME FROM DBA_HIST_ACTIVE_SESS_HISTORY WHERE SNAP_ID BETWEEN 31565 AND 31566 ORDER
BY SAMPLE_TIME

HERE IS A SAMPLE OUTPUT FROM A REAL WORLD SCENARIO......

 35               log file sync                            03-NOV-15 11.05.10.019 AM
        39               log file sync                            03-NOV-15 11.05.10.019 AM
       152               log file sync                            03-NOV-15 11.05.10.019 AM
       236               log file sync                            03-NOV-15 11.05.10.019 AM
       238               log file sync                            03-NOV-15 11.05.10.019 AM
       265               log file sync                            03-NOV-15 11.05.10.019 AM
       379               log file sync                            03-NOV-15 11.05.10.019 AM
       383               log file sync                            03-NOV-15 11.05.10.019 AM
       384               log file sync                            03-NOV-15 11.05.10.019 AM
       528               log file sync                            03-NOV-15 11.05.10.019 AM
       534               log file sync                            03-NOV-15 11.05.10.019 AM
       556               log file sync                            03-NOV-15 11.05.10.019 AM
         5 a232czr2w2ryy direct path read                         03-NOV-15 11.05.20.039 AM
       842 a232czr2w2ryy direct path read                         03-NOV-15 11.05.20.039 AM
       527               log file sync                            03-NOV-15 11.05.20.039 AM
       474               log file sync                            03-NOV-15 11.05.20.039 AM
       470               log file sync                            03-NOV-15 11.05.20.039 AM
       417               log file sync                            03-NOV-15 11.05.20.039 AM
       414               log file sync                            03-NOV-15 11.05.20.039 AM
       382               log file sync                            03-NOV-15 11.05.20.039 AM
       326               log file sync                            03-NOV-15 11.05.20.039 AM
       297               log file sync                            03-NOV-15 11.05.20.039 AM
       272               log file sync                            03-NOV-15 11.05.20.039 AM
       265               log file sync                            03-NOV-15 11.05.20.039 AM
       238               log file sync                            03-NOV-15 11.05.20.039 AM
       209               log file sync                            03-NOV-15 11.05.20.039 AM
       121               log file sync                            03-NOV-15 11.05.20.039 AM
       558                                                        03-NOV-15 11.05.20.039 AM
       320               db file async I/O submit                 03-NOV-15 11.05.20.039 AM
       291               db file async I/O submit                 03-NOV-15 11.05.20.039 AM
       707               log file sync                            03-NOV-15 11.05.20.039 AM
       701               log file sync                            03-NOV-15 11.05.20.039 AM
       532               log file sync                            03-NOV-15 11.05.20.039 AM
       670 4ztz048yfq32s direct path read                         03-NOV-15 11.05.20.039 AM
         6               log file sync                            03-NOV-15 11.05.20.039 AM

SO NOW YOU KNOW WHAT WAS HAPPENING EXACTLY AT 11.05 IN YOUR DATABASE WITHOUT LOOKING AT ANY AWR OR ASH REPORTS.....

NOTE -- YOU CAN ALSO INCLUDE P1,P2,P3 VALUES FROM DBA_HIST_ACTIVE_SESS_HISTORY TO FURTHER DRILL DOWN INTO THE EVENTS....

Now Enjoy your AWR and ASH free troubleshooting journey........

Friday, 23 October 2015

Goldengate - Problems converting bit datatypes from sybase, MS Sql and My-Sql database to Oracle Number data type.

Recently encountered an issue where the client was taking data from MS-SQL Server to Oracle 11g Database. Both source and target environments were running the latest Goldengate Version 12.

Columns with Bit Data Type are not converted and sent to Oracle database. i.e. the corresponding column in Oracle db table is blank.

Goldengate Processes (Extracts and Replicats) don't give any kind of errors and their status is shown as Running.

Symptoms -

When Bit data type is used in sqlserver source, and the replicat site maps the BIT to char or number, the bit value will not show in target site.

Cause -

Bit values used to be treated as ASCII prior to 11.1. Now they are treated as binary, so a pre-11.1 trail with a bit value of 0 applied to a target with an 11.1 or later replicat will treat the ASCII value of 0x30 as 1 since it is not 0x00

NOTE - Although Oracle support lists the problem with Goldengate Ver 11. The issue was duplicated in Goldengate Ver 12 also.

Solution -

Apply @NUMTOBIN while converting and writing data on the target column.

e.g

MAP scott.emp, target ora.emp,
COLMAP (usedefaults, source_bit_column_name = @NUMBIN (target_bit_column_name);

Friday, 9 October 2015

LMHB Process Trace Files in Oracle RAC - Potential Database Terminator.

During a routine performance analysis of Exadata system I came along trace files by the names on dbname_lmhb_number.trc i.e. testdb_lmhb_129875.trc being generated in the alert log location with the following lines contained in them

LMDO (ospid: 95477) has not moved for 74 sec (123453423.14444321115)
kjfmGCR_HBCheckAll: LMD0 (ospid: 95477) has status 2
: waiting for event 'ges remote message' for 0 secs with wait_id 123.

There are repeated messages like the above with only the wait_id changing in each.

We can see that the highlighted call above is something related to heartbeat check in RAC.

What Exactly is a LMHB Process

Global Cache/Enqueue Service Heartbeat Monitor
Monitor the heartbeat of LMON, LMD, and LMSn processes
LMHB monitors LMON, LMD, and LMSn processes to ensure they are running normally without blocking or spinning.

Cause of these waits - Identified as BUG 13718279 - Affected Versions - 11.2.0.3

Effects of This Bug - DB instance terminated due to ORA-29770 in RAC.
.
Solution

Apply patch 13718279 or set the hidden init parameter _gc_defer_time=3.
Fixed in Version 11.2.0.4

Reference - MOS DOC ID - 1440112.1

Goldengate Replicats abending with Warnings related to Trigger and no clear error

Situation
GG replicats at client site abended with following errors in the report file

015-10-08 16:03:40 WARNING OGG-02904 Replication of PARTIAL XML containing NCHAR/NVARCHAR/NCLOB data may cause divergence.

2015-10-09 11:18:34  WARNING OGG-00869  OCI Error ORA-04098: trigger 'UATPROD.TRG_VALIDATE' is invalid and failed re-validation (status = 4098), SQL <INSERT INTO "UATPROD.TESTTAB"

2015-10-09 11:18:34  WARNING OGG-01004  Aborted grouped transaction on ' UATPROD.TESTTAB ', Database error 4098 (OCI Error ORA-04098:

2015-10-09 11:18:34  WARNING OGG-01003  Repositioning to rba 28437291 in seqno 3.2015-10-09 11:18:34

Analysis

Although there is no clear OGG error, the warnings themselves reveal the issue. Particularly the warning -- OGG-00869 OCI Error ORA-04098: trigger

In this case the entire schema was replicated at the target db and as a result, during impdp, its triggers also got imported.

When we have trigger firing at source and modifying or checking any data, then we can be rest assured that the data processed and committed as part of this trigger action will find its way into the Goldengate Trail.

Therefore there is no need to have these triggers replicated at the Target Site

The only exception to this case is if you are planning to do switchovers between source and target and have them acting as a data source at different points in time.

Solution

The simple solution was to drop the triggers present in the schema on the Target database and just start the Replicat processes again.

Friday, 21 August 2015

How To Patch Oracle Goldengate

Patching Goldengate Binaries - Step by Step

1. change oracle_home to point to gg home directory

export ORACLE_HOME=/ggs/home

verify above change echo $ORACLE_HOME

2. change opatch directory in PATH variable to OPatch in /ggs/home

export PATH=/oracle/ora11g/11gbase/dbhome/bin:/usr/ccs/bin:/usr/bin:/usr/ucb:/usr/bin:/usr/sbin:/ggs/home/OPatch

verify
ora11g@MINNAOPPRDDBS1:/ggs/home$ which opatch
/ggs/home/OPatch/opatch

3. opatch lsinventory should list gg production information and give no errors.

ora11g@MINNAOPPRDDBS1:/ggs/home$ opatch lsinventory
/ggs/home/OPatch/opatch[103]: whereis: not found [No such file or directory]
Invoking OPatch 11.2.0.1.7

Oracle Interim Patch Installer version 11.2.0.1.7
Copyright (c) 2011, Oracle Corporation. All rights reserved.

Oracle Home : /ggs/home
Central Inventory : /oracle/ora11g/oraInventory
from : /var/opt/oracle/oraInst.loc --- BACKUP .LOC FILE BEFORE INSTALLING PATCH ---
OPatch version : 11.2.0.1.7
OUI version : 11.2.0.3.0
Log file location : /ggs/home/cfgtoollogs/opatch/opatch2015-02-04_12-43-33PM.log

Lsinventory Output file location : /ggs/home/cfgtoollogs/opatch/lsinv/lsinventory2015-02-04_12-43-33PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle GoldenGate Core 12.1.2.1.0
There are 1 products installed in this Oracle Home.

There are no Interim patches installed in this Oracle Home.

--------------------------------------------------------------------------------

OPatch succeeded.

4. Navigate to patch_top_directory/patch_number_directory

cd /ggs/home

cd 20417912

5. Fire opatch apply in this directory

################# De-Installation Steps ###################

1. Navigate to patch_top_directory/patch_number_directory

cd /ggs/home

cd 20417912

opatch rollback -id 20417912

#################### opatch apply output ####################

ora11g@MINNAOPPRDDBS1:/ggs/home/20417912$ opatch apply
/ggs/home/OPatch/opatch[103]: whereis: not found [No such file or directory]
Invoking OPatch 11.2.0.1.7

Oracle Interim Patch Installer version 11.2.0.1.7
Copyright (c) 2011, Oracle Corporation. All rights reserved.

Oracle Home : /ggs/home
Central Inventory : /oracle/ora11g/oraInventory
from : /var/opt/oracle/oraInst.loc
OPatch version : 11.2.0.1.7
OUI version : 11.2.0.3.0
Log file location : /ggs/home/cfgtoollogs/opatch/opatch2015-02-04_13-51-18PM.log

Applying interim patch '20417912' to OH '/ggs/home'
Verifying environment and performing prerequisite checks...

Do you want to proceed? [y|n]
y

Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/ggs/home')

y
User Responded with: Y
All checks passed.

Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/ggs/home')

Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files...

Patching component oracle.oggcore.ora11g, 12.1.2.1.0...
Patch 20417912 successfully applied
Log file location: /ggs/home/cfgtoollogs/opatch/opatch2015-02-04_13-51-18PM.log

OPatch succeeded.

######################################################################

Thursday, 30 July 2015

IO Calibration - AUTO DOP & Excessive Parallelism in Exadata Systems.

The purpose of this post is to show the relationship between the above 3 and the adverse effects it can have on database performance in Exadata Machines.

Situation - Client brings in a new Exadata Engineered System Box and based on the POC conducted is expecting a leap in batch processing performance. It's a x5-2 one eight rack with 2 node RAC.

Reality - Batch Processes go into endless loop taking more time then their current production box.

How we came to such a performance dip on Exadata -

In the initial rounds of testing when generating explain plans for certain queries, we came to know that the AUTO DOP feature was not being used by the optimizer.

How - You get the following line at the end of explain plan output

automatic DOP: skipped because of IO calibrate statistics are missing

On referring Oracle Support, the doc id 1269321.1 highlights the steps to take to generate calibration statistics.

The same were generated for diskgroup data, by finding its underlying disk count through v$asm_disk view.

Post This Operation, the parameter PARALLEL_DEGREE_POLICY was set to AUTO.
refer - http://docs.oracle.com/cd/E11882_01/server.112/e40402/initparams181.htm

Now the client was asked to run the batch process again and BOOM ! suddenly there are 800 plus parallel threads combined, running on both instances of the database.

1st Round of Performance Tuning -

we suspected the sequences and ITL waits to be causing the issue. So they were increased in cache value and initrans respectively.

2. Round of Performance Tuning -
The only waits observed this time were parallel processing waits. Still the process was taking more time than production.

Why a 4 year old box is giving better performance than Exadata

The answer was quite astonishing, It just had a cpu_count of 10.
Wow ! so you ask how a 10 CPU machine can finish jobs in less than the current generation Exadata ?

The answer was that It could just not afford to run queries in excessive parallel threads.

Lets simulate something similar on Exadata and see what happens

Step 1 - remove IO calibration statistics. How ?

delete from resource_io_calibrate$

Step 2 - change init parameters responsible for parallelism

parallel_max_servers set to 400 - original value 970
parallel_servers_target set to 200 - original value 576.

parallel_degree_policy left to AUTO.

The Batch Processing was run again and here is a sample of time comparison

Process Production Time Exadata Time
pre-process1 2hr 33 min 40 min
lp 5hr 26 min 1hr 14 min

Note - The stark performance improvement above was without doing any kind of query tuning/ indexing etc.

So Who is the main culprit ? Exadata ? Parallelism ?

The answer is neither Exadata nor Parallelism. Its the excessive parallelism caused by queries with just parallel hint (without any degree limit) fired on Tables with parallel enabled when combined with AUTO DOP feature of Exadata and a large CPU Count.

Think of it as Mentos popped in Cola bottles !