shutdown immediate taking ages (two days) to complete – or – ‘No, I will NOT enlarge the UNDO tablespace’ – or – patience …

Third party asked for a restart of one of our testing databases that was having issues. Although I’m always a bit skeptical on simple restart requests I think in this case it actually turned out to shed some light; one huge transaction of a ‘runaway process’ turned out to virtually block all other operations …

The shutdown immediate took almost two days to complete. I was able to perfectly monitor the progress of the shutdown operation; in the alertlog there was a redo log file application every 50 minutes.
This way I could see that the shutdown immediate was not hanging – oracle was actually pretty busy in rolling back this large transaction …

I ruled out the known issue that can sometimes arise with active client (‘LOCAL=no’) processes that sometimes block a shutdown immediate to progress.

I’m glad I had advised the project team against simply enlarging the UNDO tablespace (larger than it’s production equivalent) prior to these issues in this UAT acceptance database …

===
All,

This morning (Saturday 22 September 2012) at 8:35 AM the shutdown immediate command for the ******** database ******** (********) completed. (see alertlog below).

The shutdown took almost 48 hours to complete.

Thu Sep 20 12:33:24 2012
Shutting down instance: further logons disabled
...
Sat Sep 22 08:35:30 2012
Completed: ALTER DATABASE CLOSE NORMAL

I noticed some Logwriter tracefiles this morning at around the same time of the ‘close normal’ and also the database (instance) seemed in a unstable state – I was not able to spawn an internal sysdba connection to the database.

While having my fingers crossed I decided to go for a shutdown abort (we had discussed this and also I had no other options actually). After the ‘shutdown abort’ I was able to start the ******** database normal again.

  • ******** database is running FINE now.
  • Please note that applications are still down.
  • I have just checked and alterered the AWR Configuration it will take a snaphost every 15 minutes.

Short analysis:

From the alert log we can see that Oracle had to rollback about 50 redo logfiles. One redolog file is 250M; this equals about 12G undo. (see alertlog entries log sequences starting at 358 and ending at sequence 412).

12 G is the size of the undo tablespace… This is not a exact 1 on 1 relation but in this case I think it’s good indication of the amount of data that Oracle had to rollback.

Please note that had we simply enlarged the UNDO tablespace to 50G we had to wait a few more days for our shutdown to complete …

Memory of ******** is 500M … It’s production equivalent has 3G memory.

Hope this helps.

===

Excerpt alertlog :

..
Thu Sep 20 12:33:24 2012
Shutting down instance: further logons disabled
Thu Sep 20 12:33:29 2012
Stopping background process QMNC
Thu Sep 20 12:33:29 2012
Stopping background process CJQ0
Thu Sep 20 12:33:31 2012
Stopping background process MMNL
Thu Sep 20 12:33:32 2012
Stopping background process MMON
Thu Sep 20 12:33:33 2012
Shutting down instance (immediate)
License high water mark = 36
Thu Sep 20 12:33:33 2012
Stopping Job queue slave processes
Thu Sep 20 12:33:33 2012
Job queue slave processes stopped
Thu Sep 20 12:33:34 2012
ALTER DATABASE CLOSE NORMAL
Thu Sep 20 12:38:39 2012
Waiting for smon to disable tx recovery.
Thu Sep 20 12:53:28 2012
MMNL absent for 1213 secs; Foregrounds taking over
Thu Sep 20 13:19:46 2012
Thread 1 advanced to log sequence 358
  Current log# 1 seq# 358 mem# 0: /m001/oradata/********/********redo11.rdl
  Current log# 1 seq# 358 mem# 1: /m003/oradata/********/********redo12.rdl
Thu Sep 20 13:52:46 2012
Thread 1 advanced to log sequence 359
  Current log# 2 seq# 359 mem# 0: /m002/oradata/********/********redo21.rdl
  Current log# 2 seq# 359 mem# 1: /m004/oradata/********/********redo22.rdl
Thu Sep 20 14:28:47 2012
Thread 1 advanced to log sequence 360
  Current log# 4 seq# 360 mem# 0: /m004/oradata/********/********redo41.rdl
  Current log# 4 seq# 360 mem# 1: /m002/oradata/********/********redo42.rdl
..
..
..
Sat Sep 22 07:02:27 2012
Thread 1 advanced to log sequence 411
  Current log# 2 seq# 411 mem# 0: /m002/oradata/********/********redo21.rdl
  Current log# 2 seq# 411 mem# 1: /m004/oradata/********/********redo22.rdl
Sat Sep 22 07:54:07 2012
Thread 1 advanced to log sequence 412
  Current log# 4 seq# 412 mem# 0: /m004/oradata/********/********redo41.rdl
  Current log# 4 seq# 412 mem# 1: /m002/oradata/********/********redo42.rdl
Sat Sep 22 08:35:15 2012
SMON: disabling tx recovery
SMON: disabling cache recovery
Sat Sep 22 08:35:18 2012
Shutting down archive processes
Archiving is disabled
Sat Sep 22 08:35:23 2012
ARCH shutting down
ARC1: Archival stopped
Sat Sep 22 08:35:28 2012
ARCH shutting down
ARC0: Archival stopped
Sat Sep 22 08:35:29 2012
Thread 1 closed at log sequence 412
Successful close of redo thread 1
Sat Sep 22 08:35:30 2012
Completed: ALTER DATABASE CLOSE NORMAL
..
.. 
..
Sat Sep 22 11:50:01 2012
Completed: ALTER DATABASE OPEN


Further reading:

Leave a Reply

Helpful? - leave your note below so I can brag

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>