Goldengate Extract Abended – Recovery Record Is Missing [ID 1094767.1]

Modified 29-SEP-2011 Type PROBLEM Status PUBLISHED
In this Document
Symptoms
Cause
Solution
References

Applies to:
Oracle GoldenGate – Version: 10.4.0.19 to 11.1.1.1.0 – Release: 10.4.0 to 11.1.1
Information in this document applies to any platform.
Symptoms
Extract abended with:
2010-05-04 15:42:23 GGS ERROR 190 Recovery record is missing from log with seqno 14496 when extract has reached log with seqno 14497, block size 512, and next_checkpoint RBA at 149274704.

in later OGG version, the error code is like following:
OGG-01028 Recovery record is missing from log with seqno 1934 when extract has reached log with seqno 1935, block size 512, and next_checkpoint RBA at 83962348.

Archive logs 14497 & 14498 exist, and restarting extract failed with same error.
Cause
The extract was manually stopped and current/next checkpoint is set to a zero length record in redo. This is a very rare condition, and the chance of hitting it is also low. When this extract was restarted, the error occurred. Extract’s Current/Next checkpoint is set to a zero length record at the end of a log and on restart this zero length record is skipped as it has no useful data. Since this record is skipped the recovery code thinks this record was missed and generates an error.

This is bug 12693183.
Solution
This bug is fixed in v11.1.1.1 with patches for bugs 12693183.

The following workaround may also be used with no data loss.
Workaround: change the current checkpoint RBA to next block boundary, keep the recovery checkpoint the same

1. Get block size of your platform.
The block size is shown in the error message.
2010-05-04 15:42:23 GGS ERROR 190 Recovery record is missing from log with seqno 14496 when extract has reached log with seqno 14497, block size 512, and next_checkpoint RBA at 149274704.

For reference the following lists the block size of major platforms:
AIX, Linux, Sun, Windows & VMS: 512 Bytes
HP-UX, Tru64: 1024 Bytes
S390, MVS: 4096 Bytes.

2. Backup the checkpoint file in directory dirchk

3. info showch
Get current and recovery checkpoints, example:

EXTRACT EIDLD Last Started 2010-05-04 16:39 Status ABENDED
Checkpoint Lag 01:14:54 (updated 01:14:41 ago)
Log Read Checkpoint Oracle Redo Logs
2010-05-04 15:24:21 Seqno 14496, RBA 149274704
Current Checkpoint Detail:
Read Checkpoint #1
Oracle Redo Log
Startup Checkpoint (starting position in the data source):
Sequence #: 14496
RBA: 149273104
……………….
Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
Sequence #: 14496
RBA: 149273104
……………….
Current Checkpoint (position of last record read in the data source):
Sequence #: 14496
RBA: 149274704
………………

4. Alter both recovery checkpoint and current checkpoint to the start point of next block number
alter , extseqno , extrba

In the above example:
SQL> select ceil (149274704/512) * 512 from dual;
CEIL(149274704/512)*512
———————–
149275136

seq# remains the same as in the showch display.

ggsci> alter extract , extseqno 14496, extrba 149275136
If the new rba number is bigger than the current file size of the archivelog file, you have to alter both checkpoint to start from RBA 0 in next seqno file.
For example, if the size of archived log that has seqno 14496 is only 149275000, which is smaller than the new RBA calculated, 149275136, you have to issue the following command
ggsci> alter extract , extseqno 14497, extrba 0

5. Alter the extseqno and extrba of recovery checkpoint back to its original recovery checkpoint position
In this example,
ggsci> alter extract , ioextseqno 14496, ioextrba 149273104

6. info , showch
Confirm the change.

7. start

Note: Above example is for extract without threads option. For extract with threads option, “thread ” is needed for “alter extract ” command.
e.g., “ggsci> alter extract , extseqno 14497, extrba 0” will be “ggsci> alter extract , thread , extseqno 14497, extrba 0” (here, is the ogg thread number).
References
BUG:12693183 – EXTRACT ERRORS WITH: RECOVERY RECORD IS MISSING FROM LOG WITH SEQNO 26743
BUG:12693291 – EXTRACT ABENDED WITH RECOVERY RECORD IS MISSING FROM LOG

 

误操作listener.log的补救方法

oracle的/network/log/lisener.log文件减少其大小。,

在进行系统操作时一定不要图方便。下面贴出解决方法:

对日志文件进行截断处理。
listener的日志文件$ORACLE_HOME/network/log/listener.log记录了通过listener处理的网络请求
信息,它包含客户端请求的时间,连接方式(专用或共享),连接程序,网络协议,主机名,网络端口号等信息。
我们也需要周期地截断它,方法是先停止listener记日志的工作:
$lsnrctl set log_status off
然后进行文件处理( 把原来的日志保存到备份文件夹, 使原来的listener.log置空 )
$cp $ORACLE_HOME/network/log/listener.log $ORACLE_BACKUP/network/log/listener_1.log
$cp /dev/null $ORACLE_HOME/network/log/listener.log
文件操作完成后,打开listener记日志的工作:
$lsnrctl set log_status on
如果你会写简单的shell程序,可以把上面的步骤固化成一个脚本,定一个时间表, 让操作系统去做。
下面写的一个按天分割保存listener.log的文件auto_listener.sh
————————————————————————————-
rq=` date +”%d” `
cp $ORACLE_HOME/network/log/listener.log $ORACLE_BACKUP/network/log/listener_$rq.log
sulistener.log的备份方法 2010-11-05 09:03:02| 分类: 数据库Oracle,SQL | 标签: |字号大中小 订阅 .

误操作listener.log的补救方法

oracle的/network/log/lisener.log文件减少其大小。,

在进行系统操作时一定不要图方便。下面贴出解决方法:

对日志文件进行截断处理。
listener的日志文件$ORACLE_HOME/network/log/listener.log记录了通过listener处理的网络请求
信息,它包含客户端请求的时间,连接方式(专用或共享),连接程序,网络协议,主机名,网络端口号等信息。
我们也需要周期地截断它,方法是先停止listener记日志的工作:
$lsnrctl set log_status off
然后进行文件处理( 把原来的日志保存到备份文件夹, 使原来的listener.log置空 )
$cp $ORACLE_HOME/network/log/listener.log $ORACLE_BACKUP/network/log/listener_1.log
$cp /dev/null $ORACLE_HOME/network/log/listener.log
文件操作完成后,打开listener记日志的工作:
$lsnrctl set log_status on
如果你会写简单的shell程序,可以把上面的步骤固化成一个脚本,定一个时间表, 让操作系统去做。
下面写的一个按天分割保存listener.log的文件auto_listener.sh
————————————————————————————-
rq=` date +”%d” `
cp $ORACLE_HOME/network/log/listener.log $ORACLE_BACKUP/network/log/listener_$rq.log
su – oracle -c “lsnrctl set log_status off”
cp /dev/null $ORACLE_HOME/network/log/listener.log
su – oracle -c “lsnrctl set log_status on”
————————————————————————————-
你可以根据自己的情况定义环境变量ORACLE_HOME,ORACLE_BACKUP或者直接改成实际的目录就可以让
操作系统root用户23:59分运行这个shell脚本完成日志文件的分割处理。

– oracle -c “lsnrctl set log_status off”
cp /dev/null $ORACLE_HOME/network/log/listener.log
su – oracle -c “lsnrctl set log_status on”
————————————————————————————-
你可以根据自己的情况定义环境变量ORACLE_HOME,ORACLE_BACKUP或者直接改成实际的目录就可以让
操作系统root用户23:59分运行这个shell脚本完成日志文件的分割处理。

OGG Extract abends with OGG-01028 Record position is beyond end of recovery [ID 1335470.1]

Modified 07-OCT-2011 Type PROBLEM Status PUBLISHED

In this Document
Symptoms
Cause
Solution
References


 

 

Applies to:

Oracle GoldenGate – Version: 11.1.1.1.0 and later [Release: 11.1.1 and later ]
Information in this document applies to any platform.

Symptoms

Extract abends with below error:

ERROR OGG-01028 Record position (SeqNo: 336, RBA: 1040, SCN: 0.40418944 (40418944)) is beyond end of recovery (SeqNo: 335, RBA: 9016952, SCN: 0.40255602 (40255602), Timestamp: 2011-06-21 10:57:03.000000).

Cause

Extract may have hit bug which can occur while processing a 0-length record while the extract is enabled for Bounded Recovery.

Solution

As a work around , please use the below extract parameter and start the extract which will turn off the Bounded recovery. When BR is off, the Extract will do a normal recovery, so it needs all the archive logs file starting from recovery checkpoint of the extract.

BR BROFF

The only time BR is used is upon restart of extract, and only if there were any long running transactions that were persisted. If all of the transactions in the workload are of short duration (less than the BR interval default of 4 hrs), then even upon restart, standard recovery is active instead of bounded recovery.

And regarding transactions we are only concerned with ones that modify data. There may, of course, be batch jobs which run and take a long time, so these could fall into the long duration category thats mentioned above, but even in this case, as long as the redo logs/archive logs are available for standard recovery, the restart time should not be much longer with SR (standard recovery) than BR.

Once the extract has passed by the problematic RBA, this parameter can be removed.The permanent fix is planned for OGG v11.2.1.

undo management

Posted: 九月 18, 2011 in oracle

⊙ 事务回滚
⊙ 事务恢复:用于在实例恢复或介质恢复时的回滚。
⊙ 一致性读

从ORACLE 6版本到ORACLE 9I版本,ORACLE用数据库中的回滚段(Rollback Segment)来提供撤销数据(Undo Data),而从9I开始。ORACLE新提供了一种新的撤销数据(Undo Data)管理方式,就是我们都熟知的AUM

事务使用回滚段来记录变化前的数据或撤销信息,

1.检查empno=7788记录在Buffer Cache中是否存在,如果不存在则读取到Buffer Cache中。
2.在回滚表空间的相应回滚段事务表上分配事务槽,这个操作需要记录REOD信息。
3.从回滚段读入或在Buffer Cache中创建sal=3000(旧值)的BI,这需要产生REDO信息并记入REDO LOG BUFFER;
4.修改sal=4000,这是update的数据变更,需要记入REDO LOG BUFFER。
5.当用户提交时,会在REDO LOG BUFFER记录提交信息,并在回滚段标记该事务为非激活(INACTIVE)
UNDO的REDO
REDO的REDO
在以上事务处理过程中,注意REDO和UNDO是交替出现的,

在ORACLE的性能优化中,有一个性能指标称为平均事务回滚率(Rollback per Transaction),
用来衡量数据库的提交与回滚效率。 在StatsPack的报告中,可以从开头部分找到这个指标 Round(User rollbacks/(User commits + User rollbacks),4) *100% ,user commits和user rollbacks数据来自系统的统计信息,可以从V$SYSSTAT视图得到 ,这个指标应该接近于0,如果该指标过高,则说明数据库的回滚过多。 回滚过多不仅说明数据库经历了太多的无效操作,而且这些操作会极大影响数据库的性能



Modified
09-JAN-2011     Type PROBLEM     Status PUBLISHED

In this Document
Symptoms
Changes
Cause
Solution
References


Applies to:

Oracle Server – Enterprise Edition – Version:
9.2.0.1 to 11.1.0.7 – Release: 9.2 to 11.1
Information in this document
applies to any platform.

Symptoms

For DBA and Support to diagnose
and resolve Streams Capture is stuck with Paused for Flow Control

Definition

This warning message typically
occurs when the target site is unable to keep up
with the rate of messages flowing from the source site.In 10g Flow control is
automatically enabled to reduce the rate at which events are being enqueued into
the staging area. But this message also indicate other problems which will be
covered in this note.

If “Streams AQ: enqueue blocked due to flow
control” occurs,it simply means that we cannot queue messages until the
components that had fallen behind catch up .As long as you see CAPTURE getting
in and out of FLOWCONTROl status,there should not be any
problem.

Implement Flow Control (9.2 source site
only):

When the threshold for memory of the buffer queue is
exceeded, Streams will write the messages to disk.This is sometimes referred to
as “spillover”. When this occurs, Streams can no longer take advantage of the
in-memory queue optimization. One technique to minimize this spillover is to
implement a form of flow control.
The manual scripts of Note 259609.1
Script to Prevent Excessive Spill of Message from the Streams Buffer Queue to
disk should only be used on Oracle 9.2 Streams source
databases (ie, databases running 9.2 streams capture processes).

In 9i
,implement a “heartbeat” table. To ensure that the applied_scn of the
DBA_CAPTURE view is updated periodically, implement a “heart beat” table.
Implementing a heartbeat table ensures that the metadata is updated frequently.
Additionally, the heartbeat table provides quick feedback as to the health of
the streams replication.
In Oracle10g source and target databases, Streams flow
control is automatic. However, if Streams is used in a 9iR2/10g combination with
9iR2 as the source site, it is imperative that the manual 9iR2 flow control
scripts be used at the source site.

//

Changes

Where and which view to
check the symptoms

a) The healthcheck script will show
capture status enabled and capture statistics as PAUSED FOR FLOW
CONTROL
. Example from SOURCE Healthcheck Script

Capture statistics:

C001 CAPUSR_CAP TURE 03:32:25 09/13/08
193 404 PAUSED FOR FLOW CONTROL

or from

SELECT CAPTURE_NAME,
STATE FROM V$STREAMS_CAPTURE;

CAPTURE_NAME
STATE
————– ————————
STREAM_CAPTURE PAUSED FOR
FLOW CONTROL

b) V$BUFFERED_SUBSCRIBERS may show TOO MANY
UNBROWSED MESSAGES
Example from SOURCE Healthcheck Script(HC)

++ BUFFERED PUBLISHERS ++

Queue Name:
STREAMS_QUEUE
SENDER_NAME: STREAM_CAPTURE
Current Number of Msgs in Queue:
15521
MEMORY_USAGE: 15
UNBROWSED_MSGS: 5001
PUBLISHER_STATE: IN FLOW
CONTROL: TOO MANY UNBROWSED MESSAGES

or from Flow Control Query:

COLUMN queue_schema
FORMAT A12
COLUMN queue_name FORMAT A20
SELECT queue_schema, queue_name,
unbrowsed_msgs, overspilled_msgs, memory_usage, publisher_state
FROM
V$BUFFERED_PUBLISHERS;

NOTE: The Flow Control query will
show:

* the queue owner and name
* the number of messages that have
been enqueued, but not browsed
* the number of messages that have been
spilled, but not browsed
* the percentage of the streams pool that is being
used (or 0 if there is no streams pool)
* the state of the publisher.
Possible values are:

– PUBLISHING MESSAGES (normal)
– IN FLOW CONTROL:
TOO MANY UNBROWSED MESSAGES
– IN FLOW CONTROL: OVERSPILLED MESSAGES
– IN
FLOW CONTROL: INSUFFICIENT MEMORY AND UNBROWSED MESSAGES

This last column
will really help to determine why flow control is being enabled for a system and
what can
be done to resolve the matter.

c) Propagation
schedule may throw Ora-25307 error
Example from SOURCE Healthcheck
Script(HC)

++ SCHEDULE FOR EACH PROPAGATION++
Propagation Name:
STREAM_PROPAGATE
Status : Enabled
Error Message: ORA-25307: Enqueue rate
too high, flow control enabled

Cause

The capture
process is unable to enqueue LCRs either because of low memory or because
propagation and apply process are consuming messages slower than the capture
process is creating them.

1) INIT.ORA parameter
problem

Inadequate Memory esp., Streams pool size is small or not
set

2) If AQ_TM_PROCESSES parameter is set too high
in 9i and 10g

For 10g/11g unset AQ_TM_PROCESSES  and QMON is automatically
configured (auto-tune).

For 9i and10g/11g if the parameter is explicitly
set to zero, then QMON processes are not created.  You should never disable the
Queue Monitor processes by setting aq_tm_processes=0

To know more on
this parameter See
Note:305662.1
– Queue Monitor Process: Architecture and Known
Issues

3) Capture parameter setting have
not been done properly .

4) Publisher has enqueued a lot more
messages than have been browsed.

(>1000 unbrowsed messages)

5) High
latency

High Latency can be due to long-running transactions,many dependent
transactions, or  slow Capture,Propagation, or Apply processes.In some
situations, propagation may become disabled (if the number of failures is (16).
Also check Propagation latency from
dba_queue_schedules;

6) Check Apply
latency

select
(apply_time-applied_message_create_time)*86400
"latency_in_seconds",
(sysdate-apply_time)*86400"latency_in_seconds",
to_char(applied_message_create_time,'hh24:mi:ss
mm/dd/yy') "event_creation",
to_char(apply_time,'hh24:mi:ss mm/dd/yy')
"apply_time"
from dba_apply_progress;

7)
Long running Transactions
Check that there is no occurrence
for long running transaction entries with the alert log file,
it will be
something like that C002: long running txn detected, xid:
0x0008.010.00084261

8) Some known bugs

a) Bug 7206332
PROPAGATION NOT HAPPENING–ASSOCIATED J00X PROCESSES STUCK (Suspended, Req’d
Info not Available)

b) Bug 6054573
ENQUEUE: MEMORY FLOW CONTROL ; CAPTURE PERFORMANCE DEGRADATION ON CDC
Closed
as a duplicate of Unpublished Bug 5093060
STREAMS: 5000 LCR LIMIT IS CAUSING UNNECESSARY FLOW CONTROL AT APPLY SITE

Fixed-Releases: A204 B106

c) Streams pool memory accounting can
underflow, leading to capture staying in ‘PAUSED FOR FLOW CONTROL’ state
indefinitely.
If you see capture in ‘PAUSED FOR FLOW CONTROL’ even in a low
workload environment, and x$knlasg has fields such as memory_allocated_knlasg
that are very large, you may have hit this issue. The information in x$knlasg is
recorded in the Streams healthcheck section entitled ++ Streams Pool memory
Information ++ as per Note 273674.1 for 10.2 onwards. The cause has been
identifed and fixed in Bug 7719668
STREAMS MEMORY ACCOUNTING INCORRECT, LEADING TO CAPTURE ‘PAUSED IN FLOW
CONTROL’  Fixed In Ver: 11.2

9) No proper
patches for specific database version installed
.

10) One off case: When the capture process was
recreated, one of the customer did not recreate the
associated queues
and queue tables, which retained the messages from the previous capture process.
This is the reason for CAPTURE to go into FLOW CONTROL.

11)
In a scenario with Multiple Destinations  we can get the ‘pause’
message when one destination is down.This is the correct behaviour. There are
two common reasons for source-queue growth:

Suppose a source queue is
propagating captured messages to multiple destination queues, and one     or
more destination databases acknowledge successful propagation of messages much
more slowly   than the other queues. In this case, the source queue can grow
because the slower destination databases create a backlog of messages that have
already been acknowledged by the faster       destination databases. In such an
environment, consider creating more than one capture process to      capture
changes at the source database. Doing so lets you use one source queue for the
slower
destination databases and another source queue for the faster
destination databases.

12) Capture process is stopped
on PAUSED FOR FLOW CONTROL and latency is increasing continuously.Capture
process is not enabled for Combined Capture Apply (CCA) (only available between
11G releases onwards).

Propagation job is dying as soon as it starts
without no error or a small log on alert.log file:

Tue Sep 07 12:55:35
2010
i_sched_prop: DEBUG : removed job
Tue Sep 07 12:55:55
2010
i_sched_prop: DEBUG : removed job

Running following
query several time, it is possible to observe this propagation job
behavior:

SELECT
p.propagation_name,
DECODE(s.SCHEDULE_DISABLED,
'Y', 'Disabled',
'N',
'Enabled') SCHEDULE_DISABLED,
s.PROCESS_NAME,
s.total_bytes,
s.FAILURES,
s.message_delivery_mode,
s.LAST_ERROR_MSG
FROM
DBA_QUEUE_SCHEDULES s, DBA_PROPAGATION p
WHERE p.DESTINATION_DBLINK
=
NVL(REGEXP_SUBSTR(s.destination, '[^@]+', 1, 2), s.destination)
AND
s.SCHEMA = p.SOURCE_QUEUE_OWNER
AND s.QNAME = p.SOURCE_QUEUE_NAME order by
message_delivery_mode, propagation_name;

Solution

Diagnostics and
Resolution

1) INIT.ora
changes

Increase the STREAMS_POOL_SIZE
on capture and apply site It is required to have a value of 200MB at least as
per recommendation For example:

CONNECT / AS SYSDBA
ALTER SYSTEM SET
STREAMS_POOL_SIZE=500M;

This will re-enable the capture from
the ‘PAUSED FOR FLOW CONTROL’ status.

2)
Unset AQ_TM_PROCESSES

In 10g
onwards this parameter should be unset on both the sites.

ALTER
SYSTEM RESET AQ_TM_PROCESSES SCOPE=SPFILE SID=’*’;

High AQ_TM_PROCESSES
value causes the acknowledgments not to receive from apply site.


3) Follow
Section 5 “Capture Process Configuration”of NOTE:
298877.1
– 10gR1 Streams Recommended Configuration

Example
at SOURCE site:

execute
dbms_capture_adm.set_parameter('capture_name','_CHECKPOINT_FREQUENCY','100');
execute
dbms_capture_adm.set_parameter('capture_name','_SGA_SIZE','50');

Restart(stop
and start) the apply then capture.
If you have a lot of memory to work you
could even set _SGA_SIZE’ to something like 200.

4) If Status is
IN FLOW CONTROL: TOO MANY UNBROWSED MESSAGES

This situation
could be a propagation problem

a) Stop capture that is in Flow Control
via DBMS_CAPTURE_ADM.STOP_CAPTURE

b) Determine the J process associated
with propagation via DBA_QUEUE_SCHEDULES, ie:

select schema, qname,
destination, schedule_disabled, process_name from
dba_queue_schedules;

c) Stop propagation via
DBMS_PROPAGATION_ADM.STOP_PROPAGATION (Set FORCE option to TRUE if
needed)

d) Check DBA_QUEUE_SCHEDULES / DBA_PROPAGATION to make sure the
propagation has been stopped

e) Wait few minutes and make sure that the J
Process associated to the Propagation that has been stopped is gone .If not use
use kill -9 to kill the process at OS.

f) Set job_queue_processes to 0,
wait few minutes, make sure the CJQ process and other J processes are gone (at
OS level), then set JOB_QUEUE_PROCESSES back to the original
value

g) Use DBMS_PROPAGATION_ADM.START_PROPAGATION to start
Propagation

h) Check DBA_QUEUE_SCHEDULES / DBA_PROPAGATION to make sure
the propagation has been started and it has a J process associated.

i)
Start CAPTURE via DBMS_CAPTURE_ADM.START_CAPTURE

j) Check
DBA_QUEUE_SCHEDULES columns TOTAL_NUMBER, TOTAL_BYTES …

If propagation is happening, the numbers would be
changing..

k) At SOURCE and TARGET databases, check
V$BUFFERED_QUEUES.

l) In case PROPAGATION once more seems hanging and not
propagating LCRs then find the associated  J process and upload to Support along
with HC scripts:

– errostacks with 3 minutes interval
– 10046 and 24040
propagation level 10 trace .

From sqlplus as any AS SYSDBA
user :

spool
&spoolfilename

set echo on
set pagesize 1000

column
destination format a25
column session_id format a10
column
destination_dblink format a25
column source_queue_owner format a20
column
source_queue_name format a25

select
schema,qname,destination,latency,process_name,session_id,instance,message_delivery_mode
from
dba_queue_schedules order by message_delivery_mode;
SELECT
source_queue_owner,source_queue_name,propagation_name,
destination_dblink,queue_to_queue,status
from dba_propagation order by propagation_name;

oradebug setospid
<ospid_of_hung_propagation_job>
oradebug unlimit

oradebug dump
errorstack 10
<< wait one minute >>
oradebug dump errorstack
10
<< wait one minute >>
oradebug dump errorstack
10

oradebug tracefile_name

oradebug Event 10046 trace name context
forever, level 12
oradebug Event 24040 trace name context forever, level
10

<< wait 10-15 minutes >>

oradebug dump errorstack
10

oradebug short_stack

-- repeat the short_stack command 10-15
times in rapid succession

oradebug dump errorstack 10
oradebug Event
24040 trace name context off
oradebug Event 10046 trace name context
off
exit;

If the state is IN FLOW
CONTROL: TOO MANY UNBROWSED MESSAGES, then you can run the following query to
determine which subscriber is not dequeuing messagesor lagging behind from the
queue:

SELECT subscriber_name,
cnum_msgs, total_dequeued_msg, total_spilled_msg
FROM
V$BUFFERED_SUBSCRIBERS;

A comparison of the total
number of messages enqueued for the subscriber (CNUM_MSG)
and the number of
messages dequeued by that subscriber (TOTAL_DEQUEUED_MSG) should indicate which
subscriber has fallen behind.

5) Resolve latency of
Propagation

If the propagation is disabled(broken), re-enable it
manually.
To reduce the latency of propagation between jobs, set the hidden
parameter_job_queue_interval = 1. This should be done as an init.ora parameter
or an spfile parameter.Because it is a hidden parameter, the database must be
restarted in order for the value to take effect. The default for
_job_queue_interval is 5 seconds.

In some situations, when propagation
becomes disabled (if the number of failures is 16),re-enable the propagation
manually.For example, if the queue name is STREAMS_QUEUE owned by STRMADMIN and
the destination database link is ORCL2.WORLD, syntax would be:

execute
dbms_aqadm.disable_propagation_schedule
('STRMADMIN.STREAMS_QUEUE','ORCL2.WORLD');
execute
dbms.aqadm.enable_propagation_schedule ('STRMADMIN.STREAMS_QUEUE','ORCL2.WORLD'
);
execute dbms_aqadm.alter_propagation_schedule
('STRMADMIN.STREAMS_QUEUE','ORCL2.WORLD',latency=>5);

Propagation
and the capture process will be resumed automatically when the target site is able to accept
more messages.

6) Check Apply parameters

Set the
appropriate parameters for apply process.For 10.2 set the following apply
parameters.
Disable the capture process and the apply process as well

execute DBMS_APPLY_ADM.Set_parameter(‘applyName’,’parallelism’,’4’)
execute DBMS_APPLY_ADM.Set_parameter(‘applyName’,’_dynamic_stmts’,’Y’)
execute DBMS_APPLY_ADM.Set_parameter(‘applyName’,’_hash_table_size’,’1000000’)
execute DBMS_APPLY_ADM.Set_parameter(‘applyName’,’disable_on_error’,’N’)
execute DBMS_APPLY_ADM.Set_parameter(‘applyName’,’_txn_buffer_size’,10*parallelism’);

NOTE:315666.1
– How To Set The _TXN_BUFFER_SIZE Parameter For a Streams Apply

SELECT
APPLY_NAME,PARAMETER,VALUE FROM
ALL_APPLY_PARAMETERS;

Enable the apply processes
followed by Capture process


7) Check for long running
Transactions:
Identify the transaction id for that long running
transaction using the following query and ignore the
transaction if it
causes high latency

prompt
prompt ++
Current Long Running Transactions ++
prompt Current transactions open for
more than 20 minutes
prompt

col runlength HEAD 'Txn Open|Minutes'
format 9999.99
col sid HEAD 'Session' format a13
col xid HEAD
'Transaction|ID' format a18
col terminal HEAD 'Terminal' format a10
col
program HEAD 'Program' format a27 wrap

select t.inst_id,
sid||','||serial# sid,xidusn||'.'||xidslot||'.'||xidsqn xid, (sysdate -
start_date )* 1440
runlength ,terminal,program from gv$transaction t,
gv$session s where t.addr=s.taddr
and (sysdate - start_date) * 1440 >
20;

For ignoring transaction at capture , use
the following steps on source site:

execute dbms_capture_adm.stop_capture('CAPTURE_NAME');
execute dbms_capture_adm.set_parameter('CAPTURE_NAME','_ignore_transaction','TXN_ID');
execute dbms_capture_adm.start_capture('CAPTURE_NAME');


8)

Solutions for some known bugs

    ORA-25307:
Enqueue rate too high, flow control enabled is reported after enqueueing
5000
user-enqueued buffered messages.

By default flow
control kicks for user-enqueued events when there are 5000 unbrowsed messages in
a
queue. The default behaviour can be changed in 10.2 by applying Patch 5093060 which allows
you to manipulate when
buffer publisher flow control is activated

The fix is event
based in 10.2.0.3 and it introduces two new events:
a) 10867 : This
event controls flow control threshold values for any buffered message
publisher.
The level of the event decides the threshold
value.
b) 10868 : This event controls flow control threshold values for
only capture publisher (the Capture    process).The level of the event decides
the threshold value.
When any of the above events are not set or reset
using “context off” command, the threshold values
falls  back to
default values which are:
1) Buffered publisher : 5000
2)
Capture publisher : 15000

The fix for unpublished bug
5093060 is present in 10.2.0.4 onwards and can be activated by setting the

following hidden parameters, i.e.,

alter system set
"_capture_publisher_flow_control_threshold"=10000 scope=memory;
alter system
set "_buffered_publisher_flow_control_threshold"=10000 scope=memory;

From 10.2.0.1 to 10.2.0.3 ,enable the flow control fix for
user enqueued messages.

connect / as
sysdba
--Setting the value dynamically
alter system set events '10867
trace name context forever, level 15000';
--Setting back to the default
5000
alter system set events '10867 trace name context off';
--Setting in
the spfile
alter system set event='10867 trace name context forever, level
15000' scope=spfile;
--Unsetting in the spfile
alter system reset event
scope=spfile sid='*';
Setting in a pfile

event=’10867 trace name
context forever, level 15000′

Setting any of these will require a bounce
of the database.

Download and apply Patch 7719668 if you are hitting the Bug 7719668 which is
fixed in  11.2
Backport can be made availalble on request for the latest
patchset

9) Apply recommended
patches

Finally make sure you have all the
patches applied for the database version as per Note:437838.1
“Recommended Patches for Streams”

10) The following should be the order
in which you can drop these components.
1  Drop apply.
2. Drop
capture
3. Drop propagation
4. Drop queue tables at APPLY and
CAPTURE sites using the following command

exec
dbms_aqadm.drop_queue_Table('<owner.queue_Table_name>',force=>TRUE);

5. Then recreate the components in the following. sequence

a) recreate the queue
table/queue at apply and capture sites
b) recreate APPLY.
c) recreate
capture
d) recreate propagation.

11)  Create Separate queues

The recommendation for sending LCRs to multiple destinations,and one
destination is constantly
slower than the others in applying LCRs
(or has intermittent network problems), then we  should
configure
a separate queue for that destination to avoid impacting other sites.

12) On Stream Health Check report generated using Doc Id. 273674.1,
following can be observed

a) That there is no propagation process
associated to your propagation although propagation is enabled. On Streams HC
see:
Statistics > Propagation > SCHEDULE FOR EACH
PROPAGATION

b)  That queue where propagation propagates from are not
recovered properly.
On Streams HC you can see a message like the
following:

Analysis > Notification > SYS Checks

+ ERROR: Queue
'<SCHEMA>.<QUEUE_NAME> has not been recovered normally
+ Force
recovery by altering the queue ownership to another instance.
+ Use the
DBMS_AQADM.ALTER_QUEUE_TABLE procedure to specify a different
instance.

This problem is not applicable to Combined Capture
Apply(CCA) propagation issues.

To solve the problem, it has been tried to
recover the queue manually :

On RAC, by altering queue table ownership
using DBMS_AQADM.ALTER_QUEUE_TABLE API as notification explains.

On both
(RAC and single instance), by :

* Stopping associated Capture and
Propagations
* Stopping and starting the queue using DBMS_AQADM
procedures
* Starting associated Capture and Propagations

But despite
of this queue table is still in a not recovered state. It is possible to check
this by running the query

select queue_name,
flags from x$buffered_queues where flags=1 and
queue_schema='<OWNER>';

If a your queue is returned then
the queue has not been recovered properly.
Recover the queue as
follows:

1. Stop all Streams processes (capture, propagation etc)
associated to the queue.
2. ALTER SYSTEM RESET aq_tm_processes SCOPE=SPFILE
SID=’*’ ;
3. Bounce the instance
4. Start all streams processes (capture,
propagation etc) associated to the queue.
5. Check if queues are recovered by
doing:
select queue_name, flags from x$buffered_queues where flags=1 and
queue_schema='<OWNER>’;

If a queue is returned then the queue
has not been recovered properl
Contact Oracle Support for further
diagnosis

//

References

NOTE:298877.1
– 10gR1 Streams Recommended Configuration
NOTE:315666.1
– How To Set The _TXN_BUFFER_SIZE Parameter For a Streams Apply Process
NOTE:335516.1
– Master Note for Streams Performance Recommendations
NOTE:437838.1
– Streams Specific Patches
NOTE:471713.1
– Different States of Capture & Apply Process
NOTE:729860.1
– Troubleshooting Queries in Streams
NOTE:730036.1
– Overview for Troubleshooting Streams Performance Issues
http://st-doc.us.oracle.com/9.0/920/server.920/a96571/prop.htm#46370
NOTE:1277775.1
– Streams Capture process is paused for flow control. Alert log shows ORA-4031
(“streams pool”,”unknown object”,”streams pool”,”sob_kgqmrec”)” followed by
ORA-600[4450

Show Related Information
Related


Products


  • Oracle Database Products > Oracle Database > Oracle Database >
    Oracle Server – Enterprise Edition

Keywords


V$BUFFERED_SUBSCRIBERS; HANGING; PROPAGATE; LATENCY;
AQ_TM_PROCESSES; FLOW CONTROL; ORA-25307

Errors


ORA-25307; 25307 ERROR

Back to topBack to
top

Copyright (c) 2007, 2010, Oracle. All rights reserved.
Legal Notices and Terms of Use | Privacy Statement
Article Rating
Rate this document
Excellent
Good
Poor
Did this document
help you?
Yes
No
Just browsing
How easy was it to
find this document?
Very easy
Somewhat easy
Not easy
Comments

Provide feedback for this article. Please use ‘Contact Us’ for other feedback.Important Note: this feedback may be
anonymously visible to other customers until processed by Oracle
Support.

Cancel

//  

<!–

746247.1
–>


Modified 14-DEC-2010 Type
BULLETIN Status PUBLISHED

In this Document

Purpose

Scope and Application

Different States of Capture & Apply Process

References


Applies to:

Oracle Server – Enterprise Edition – Version: 9.2.0.1
to 11.2.0.2.0 – Release: 9.2 to 11.2

Information in this document applies to any platform.

***Checked for relevance on 14-DEC-2010***

Purpose

The purpose of this is to provide a better
understanding of different states of the streams capture & apply process.

Scope and Application

Applicable for all platforms.

Different States of Capture & Apply Process

NOTE : There are some additional process states, which is not visible using the
specific views, because such states comes for fraction of second and are
internal.

Capture Process States :

The state of a capture process describes what the
capture process is doing currently. You can view the state of a capture
process by querying the STATE column in the V$STREAMS_CAPTURE dynamic
performance view.

The following capture process states are possible in
Oracle 9.2:

* INITIALIZING – Starting up.

* CAPTURING CHANGES – Scanning the redo log for
changes that evaluate to TRUE against the capture process rule sets.

* EVALUATING RULE – Evaluating a change against a
capture process rule set.

* CREATING LCR – Converting a change into an LCR.

* ENQUEUING MESSAGE – Enqueuing an LCR that
satisfies the capture process rule sets into the capture process queue.

* SHUTTING DOWN – Stopping.

The following capture process states are possible in
Oracle 10g and 11.1:

* In addition to the above Capture Process States
the below are the states introduced in oracle 10g and above version.

* WAITING FOR DICTIONARY REDO – Waiting for redo log
files containing the dictionary build related to the first SCN to be added to
the capture process session. A capture process cannot begin to scan the redo
log files until all of the log files containing the dictionary build have
been added.

* DICTIONARY INITIALIZATION – Processing a dictionary build.

* MINING (PROCESSED SCN = scn_value) – Mining a dictionary build at the SCN
scn_value.

* LOADING (step X of Y) – Processing information from a dictionary build and
currently at step X in a process that involves Y steps, where X and Y are
numbers.

* WAITING FOR REDO – Waiting for new redo log files to be added to the
capture process session. The capture process has finished processing all of
the redo log files added to its session. This state is possible if there is
no activity at a source database. For a downstream capture process, this
state is possible if the capture process is waiting for new log files to be
added to its session.

* PAUSED FOR FLOW CONTROL – Introduced in Oracle 10g. Prior to Oracle 9i we
need to configure Flow Control explicitly. It Unable to enqueue LCRs either
because of low memory or because propagations and apply processes are
consuming messages slower than the capture process is creating them. This
state indicates flow control that is used to reduce spilling of captured
messages when propagation or apply has fallen behind or is unavailable.

The following capture process states are possible in
Oracle 11gR2.

In addition to the above Capture Process States the
below are the states introduced in oracle 11gR2 .

* WAITING FOR A SUBSCRIBER TO BE ADDED – Waiting for
a subscriber to the capture process’s queue to be added. A subscriber can be
a propagation or an apply process.

* WAITING FOR THE BUFFERED QUEUE TO SHRINK – Waiting for the buffered queue
to change to a smaller size. The buffered queue shrinks when there is a
memory limitation or when an administrator reduces its size.

* WAITING FOR n SUBSCRIBER(S) INITIALIZING – Waiting
for apply processes that receive LCRs from the capture process to start,
where n is the number of apply processes.

* WAITING FOR TRANSACTION – Waiting for LogMiner to provide more
transactions.

* WAITING FOR INACTIVE DEQUEUERS – Waiting for capture process’s queue
subscribers to start. The capture process stops enqueuing LCRs if there are
no active subscribers to the queue.

* SUSPENDED FOR AUTO SPLIT/MERGE – Waiting for a merge operation to complete.

* SHUTTING DOWN – Stopping.

* ABORTING – Aborting

Apply Server States :

The state of an apply server describes what the apply server is doing
currently. You can view the state of each apply server for an apply process
by querying the V$STREAMS_APPLY_SERVER dynamic performance view. The
following apply server states are possible:

The following apply process states are possible in
Oracle 9.2:

* IDLE – Performing no work.

* RECORD LOW-WATERMARK – Performing an
administrative action that maintains information about the apply progress,
which is used in the ALL_APPLY_PROGRESS and DBA_APPLY_PROGRESS data
dictionary views.

*ADD PARTITION – Performing an administrative action
that adds a partition that is used for recording information about
in-progress transactions.

* DROP PARTITION – Performing an administrative
action that drops a partition that was used to record information about
in-progress transactions.

* EXECUTE TRANSACTION – Applying a transaction.

* WAIT COMMIT – Waiting to commit a transaction
until all other transactions with a lower commit SCN are applied. This state
is possible only if the COMMIT_SERIALIZATION apply process parameter is set
to a value other than none and the PARALELLISM apply process parameter is set
to a value greater than 1.

* WAIT DEPENDENCY – Waiting to apply an LCR in a
transaction until another transaction, on which it has a dependency, is
applied. This state is possible only if the PARALELLISM apply process
parameter is set to a value greater than 1.

* WAIT FOR NEXT CHUNK – Waiting for the next set of
LCRs for a large transaction.

The following apply process states are possible in
Oracle 10g and above:

* In addition to the above Capture Process States the below are the states
introduced in oracle 10g and above version.

* TRANSACTION CLEANUP – Cleaning up an applied
transaction, which includes removing LCRs from the apply process queue.

* INITIALIZING – Starting up.

The following apply process states are possible in
Oracle 11gR2:

ROLLBACK TRANSACTION – Rolling back a transaction.

Apply Reader States :

The state of a reader server describes what the reader server is doing currently.
You can view the state of the reader server for an apply process by querying
the V$STREAMS_APPLY_READER dynamic performance view. The following reader
server states are possible:

The following apply process states are possible in
Oracle 9.2:

* IDLE – Performing no work

* DEQUEUE MESSAGES – Dequeuing messages from the
apply process queue

* SCHEDULE MESSAGES – Computing dependencies between
messages and assembling messages into transactions

The following apply process states are possible in
Oracle 10g and above:

* In addition to the above Capture Process States the below are the states
introduced in oracle 10g and above version.

* INITIALIZING – Starting up

* SPILLING – Spilling unapplied messages from memory
to hard disk

* PAUSED – Waiting for a DDL LCR to be applied

Apply Coordinator States :

The state of a coordinator process describes what the coordinator process is
doing currently. You can view the state of a coordinator process by querying
the V$STREAMS_APPLY_COORDINATOR dynamic performance view. The following
coordinator process states are possible:

The following apply process states are possible in
Oracle 9.2 and above:

* INITIALIZING – Starting up

* APPLYING – Passing transactions to apply servers

* SHUTTING DOWN CLEANLY – Stopping without an error

* ABORTING – Stopping because of an apply error

Reference :

Oracle� Streams Concepts and
Administration

11g Release 1 (11.1) Part Number B28321-02

10g Release 2 (10.2) Part Number B14229-04

10g Release 1 (10.1) Part Number B10727-01

Oracle9i Streams

Release 2 (9.2) Part Number A96571-02

ORA-600/ORA-7445 Error Look-up Tool [ID 153788.1]