Blog – Page 14 – LUXOUG – LUXEMBOURG ORACLE USERS GROUP

Blog

Exadata, Understanding Metrics

Category: Engineer System Author: Fernando Simon (Board Member) Date: 6 years ago Comments: 0

Exadata, Understanding Metrics

Metrics for Exadata deliver to you one way to deeply see, and understand, what it is happening for Exadata Storage Server and Exadata Software. Understand it is fundamental to identify and solve problems that can be hidden (or even unsee) from the database side. In this post, I will explain details about these metrics and what you can do using them.

My last article about Exadata Storage Server metrics was about one example of how to use them to identify problems that do not appear in the database side. In that post, I showed how I used the metric DB_FC_IO_BY_SEC to identify bad queries.

The point for Exadata (that I made in that article), is that most of the time, Exadata is so powerful that bad statements are handled without a problem because of the features that exist (flashcache, smartio, and others). But another point is that usually, Exadata is a high consolidated environment, where you “consolidate” a lot of databases and it is normal that some of them have different workloads and needs. Using metrics can help you to do a fine tune of your environment, but besides that, it delivers to you one way to check and control everything that’s happening.

In this post, I will not explain each metric one by one, but guide you to understand metrics and some interesting and important details about them.

Understanding metrics

Metrics for Exadata are values extract directly from hardware or directly from storage server software. Values from “IOPS from each disk”, or “MB/s read by SMARTIO” are an example of what you can discover. Directly from the docs:

“Metrics are recorded observations of important run-time properties or internal instrumentation values of the storage cell and its components, such as cell disks or grid disks. Metrics are a series of measurements that are computed and retained in memory for an interval of time, and stored on a disk for a more permanent history.”

To check the definition for Exadata metrics, and all metrics available the best place it the official Exadata User Guide, chapter 6. You can see the definition for all metrics and other important information. I really recommend that you read it to be aware of what you can extract from the metrics.

When reading metrics, you can read the current values (from the last minute), or from history view. From the historic list, values are for each minute from the last 7 days. So, with metrics, you cover 24×7 for each minute during the last 7 days. So, a good source of information to help you. And most important, they are individual and read from each storage server.

Reading metrics

To read metrics you can connect directly in the storage server and with the cellcli use the “list metriccurrent” or “list metrichistory” commands to read it:

[root@exacellsrvx-01 ~]# cellcli

CellCLI: Release 18.1.9.0.0 - Production on Sun Dec 08 15:01:42 BRST 2019




Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved.




CellCLI> list metriccurrent

         CD_BY_FC_DIRTY                          CD_00_exacellsrvx-01                            0.000 MB

…

…

         SIO_IO_WR_RQ_FC_SEC                     SMARTIO                                         0.000 IO/sec

         SIO_IO_WR_RQ_HD                         SMARTIO                                         2,768,097 IO requests

         SIO_IO_WR_RQ_HD_SEC                     SMARTIO                                         0.000 IO/sec

Since it is based in the list command you can detail it, restrict with where, or change the attributes to display it:

CellCLI> list metriccurrent where name = 'FL_IO_DB_BY_W_SEC' detail

         name:                   FL_IO_DB_BY_W_SEC

         alertState:             normal

         collectionTime:         2019-12-08T15:10:14-02:00

         metricObjectName:       FLASHLOG

         metricType:             Instantaneous

         metricValue:            0.189 MB/sec

         objectType:             FLASHLOG




CellCLI> list metriccurrent where name = 'FL_IO_DB_BY_W_SEC' attributes name, metricvalue, collectionTime

         FL_IO_DB_BY_W_SEC       0.133 MB/sec    2019-12-08T15:11:14-02:00




CellCLI>

You can query the metric for each one of the attributes. Like all metrics for IORM or all metrics for that have FC in the name. If you want to query values in the past, you need to use list metrichistory:

CellCLI> list metrichistory where name = 'FL_IO_DB_BY_W_SEC' and collectionTime = '2019-12-08T15:21:14-02:00' attributes name, metricvalue, collectionTime

         FL_IO_DB_BY_W_SEC       0.196 MB/sec    2019-12-08T15:21:14-02:00




CellCLI>

Metric types

There are three types of metrics: Instantaneous (value reflect the moment when was read), Rate (values computed based in the period of time), Cumulative (values since you started storage server from the last time).

All the metrics type Rate, usually are expressed by second. This means that Exadata counted the values from the last minute and divided it by seconds. **So, the *_SEC means the average based at the last minute.**

One important detail is that some have the “small” and “large” metrics. This means that if your request from the database needs more than 128KB the values are marked as large LG, otherwise, as small SM.

Using metrics

To understand metrics for Exadata it is important to know the limits for your hardware, and for Exadata the good (and quick way) is the datasheet. Using the X8M datasheet as an example we can see that the max GB/s per second for each storage server is around 1.78 (25GB/s for full rack divided by 14 storage – as for example). Understand these numbers are important, I recommend you to read the datasheet and understand them.

Since every storage computes each metric in the separate way you need to query each one to have the big picture. But this does not mean that some metrics need to be analyzed globally, instead of per each server. I usually divide Exadata metrics in two ways, Isolated and Per Database.

Isolated Metrics

I consider isolated metrics that are important to check for each server. They express values that are important to check isolated per each storage server. Some metrics that I like to check isolated:

CL_CPUT: The cell CPU utilization.
CL_MEMUT: The percentage of total physical memory used.
N_HCA_MB_RCV_SEC: The number of megabytes received by the InfiniBand interfaces per second
N_HCA_MB_TRANS_SEC: The number of megabytes transmitted by the InfiniBand interfaces per second.
CD_IO_BY_R_LG_SEC: The rate which is the number of megabytes read in large blocks per second from a cell disk.
CD_IO_BY_R_SM_SEC: The rate which is the number of megabytes read in small blocks per second from a cell disk.
CD_IO_BY_W_LG_SEC: The rate which is the number of megabytes written in large blocks per second on a cell disk.
CD_IO_BY_W_SM_SEC: The rate which is the number of megabytes written in small blocks per second on a cell disk.
CD_IO_RQ_R_LG_SEC: The rate which is the number of requests to read large blocks per second from a cell disk.
CD_IO_RQ_R_SM_SEC: The rate which is the number of requests to read small blocks per second from a cell disk.
CD_IO_RQ_W_LG_SEC: The rate which is the number of requests to write large blocks per second to a cell disk.
CD_IO_RQ_W_SM_SEC: The rate which is the number of requests to write small blocks per second to a cell disk.
CD_IO_TM_R_LG_RQ: The rate which is the average latency of reading large blocks per request to a cell disk.
CD_IO_TM_R_SM_RQ: The rate which is the average latency of reading small blocks per request from a cell disk.
CD_IO_TM_W_LG_RQ: The rate which is the average latency of writing large blocks per request to a cell disk.
CD_IO_TM_W_SM_RQ: The rate which is the average latency of writing small blocks per request to a cell disk.
CD_IO_UTIL: The percentage of device utilization for the cell disk.
FC_BY_ALLOCATED: The number of megabytes allocated in flash cache.
FC_BY_USED: The number of megabytes used in flash cache.
FC_BY_DIRTY: The number of megabytes in flash cache that are not synchronized to the grid disks.
FC_IO_BY_R_SEC: The number of megabytes read per second from flash cache.
FC_IO_BY_R_SKIP_SEC: The number of megabytes read from disks per second for I/O requests that bypass flash cache. Read I/O requests that bypass flash cache go directly to disks. These requests do not populate flash cache after reading the requested data.
FC_IO_BY_R_MISS_SEC: The number of megabytes read from disks per second because not all requested data was in flash cache.
FC_IO_BY_W_SEC: The number of megabytes per second written to flash cache.
FC_IO_BY_W_SKIP_SEC: The number of megabytes written to disk per second for I/O requests that bypass flash cache.
FC_IO_RQ_R_SEC: The number of read I/O requests satisfied per second from flash cache.
FC_IO_RQ_W_SEC: The number of I/O requests per second which resulted in flash cache being populated with data.
FC_IO_RQ_R_SKIP_SEC: The number of read I/O requests per second that bypass flash cache. Read I/O requests that bypass flash cache go directly to disks
FC_IO_RQ_W_SKIP_SEC: The number of write I/O requests per second that bypass flash cache
FL_IO_DB_BY_W_SEC: The number of megabytes written per second were written to hard disk by Exadata Smart Flash Log
FL_IO_FL_BY_W_SEC: The number of megabytes written per second were written to flash by Exadata Smart Flash Log.
FL_IO_TM_W_RQ: Average redo log write latency. It includes write I/O latency only.
FL_RQ_TM_W_RQ: Average redo log write request latency.
FL_IO_W_SKIP_BUSY: The number of redo writes during the last minute that could not be serviced by Exadata Smart Flash Log.
N_MB_RECEIVED_SEC: The rate which is the number of megabytes received per second from a particular host.
N_MB_SENT_SEC: The rate which is the number of megabytes received per second from a particular host.
SIO_IO_EL_OF_SEC: The number of megabytes per second eligible for offload by smart I/O.
SIO_IO_OF_RE_SEC: The number of interconnect megabytes per second returned by smart I/O.
SIO_IO_RD_FC_SEC: The number of megabytes per second read from flash cache by smart I/O.
SIO_IO_RD_HD_SEC: The number of megabytes per second read from hard disk by smart I/O.
SIO_IO_RD_RQ_FC_SEC: The number of read I/O requests per second from flash cache by smart I/O.
SIO_IO_RD_FC_HD_SEC: The number of read I/O requests per second from hard disk by smart I/O.
SIO_IO_WR_FC_SEC: The number of megabytes per second of flash cache population writes by smart I/O.
SIO_IO_SI_SV_SEC: The number of megabytes per second saved by the storage index.

With these metrics you can discover, by example, in each server:

Percentage of CPU and memory utilization.
GB/s sent by Smart I/O to one database server. And if you compare with the ingest that came from the database you can see percentage safe by smartio.
The number of flashcache reads that was redirected to disk because the data was not there. Here if you see increase the value tread, maybe you have fixed (or have a lot of data) in flash cache and your queries are demanding other tables.
For celldisk (CD*) metrics it is important to divide by metricObjectName attribute to identify the read from Disks and from Flash. There is no direct metrics for FD*, they are present (at storage level) as a celldisk, but they have different throughputs values. This is true for EF too that just have flash.
For flashcache directly, you can check the allocated values (usually 100%), but also the “dirty” usage when data it does not sync (between flash and disks), this can mean a lot of writes for your database, bad query design, or high pressure between consolidating databases (maybe you can disable for one category/database trough IORM).
From smartscan you can check the MB/s read from flashcache or disk to offload your query. Or even check MB/s that was saved by storage index.

So, as you can see there is a lot of information that you can extract from storage server. I prefer to read these separately (per storage) because if I consider globally (smartio or flascache as an example), I don’t have a correct representation of what it is happening under the hood. Maybe, a good value from one storage can hide a bad from another when I calculate the averages.

The idea for these metrics is to provide one way to see Exadata storage software overview. The amount of data that it is reading from hardware (CD_* metrics) and how it is used by the features. You can see how was safe by storage index of smarscan as an example, or see if the flashcache is correct populated (and not dirty). And even help to identify some queries that may are bypassing the flashache or not using smartio.

Database Metrics

The concept of global metrics does not exist directly in the Exadata, you still need to read separately from each storage server. But I recommend that check it globally, doing the sum for values from each storage server to analyze it. One example it the IOPS (or MB/s) for database, you usually want to know the value for the entire Exadata and not for each server.

In the list, I will put just for database, but you have the same for PDB, Consumer Groups, and from Categories. Remember that for IORM the hierarchy is first Categories and after Databases when creating the plans.

DB_FC_BY_ALLOCATED: The number of megabytes allocated in flash cache for this database.
DB_FC_IO_BY_SEC: The number of megabytes of I/O per second for this database to flash cache.
DB_FC_IO_RQ_LG_SEC: The number of large I/O requests issued by a database to flash cache per second.
DB_FC_IO_RQ_SM_SEC: The number of small I/O requests issued by a database to flash cache per second.
DB_FL_IO_BY_SEC: The number of megabytes written per second to Exadata Smart Flash Log.
DB_FL_IO_RQ_SEC: The number of I/O requests per second issued to Exadata Smart Flash Log.
DB_IO_BY_SEC: The number of megabytes of I/O per second for this database to hard disks.
DB_IO_LOAD: The average I/O load from this database for hard disks. For a description of I/O load, see CD_IO_LOAD.
DB_IO_RQ_LG_SEC: The rate of large I/O requests issued to hard disks by a database per second over the past minute.
DB_IO_RQ_SM_SEC: The rate of small I/O requests issued to hard disks by a database per second over the past minute.
DB_IO_TM_LG_RQ: The rate which is the average latency of reading or writing large blocks per request by a database from hard disks.
DB_IO_TM_SM_RQ: The rate which is the average latency of reading or writing small blocks per request by a database from hard disks.
DB_IO_UTIL_LG: The percentage of hard disk resources utilized by large requests from this database.
DB_IO_UTIL_SM: The percentage of hard disk resources utilized by small requests from this database.
DB_IO_WT_LG_RQ: The average IORM wait time per request for large I/O requests issued to hard disks by a database.
DB_IO_WT_SM_RQ: The average IORM wait time per request for small I/O requests issued to hard disks by a database.

With these metrics you can see how the database/pdb/cg/ct are using the Exadata. As an example, you can check the MB/s (or IOPS) read from flashcache by seconds. And if you compare with CD_* as an example, you can discover which database is using more it. The same can be done by flashlog.

What you can discover

With metrics you can discover a lot of things that are hidden when you just look from the database side (AWR as an example). And it is more critical when you have a consolidated environment in your Exadata. You can compare values from different databases and have insights from something that is not correct (as my example from the previous post).

Here I will show you another example. Again, all the values I collected using my script that you can check in my post at OTN. This script connects in each storage server and retrieves the values from the last 10 minutes. After that I stored, I made the average value for the minutes reported. This means that every point listed below are the average values computed for every 10 minutes.

Look above. This graph represents the values from DB_IO_BY_SEC metric. As you can see, the database reached around 1.6 to 1.7GB/s of disk usage. And if you remember for Exadata datasheet, this means that this database used almost everything from disk capacity/throughput from Exadata storage. This can be confirmed by:

As you can see, the average was around 85 to 90 percent of IO/Utilization for this database. But if I look at the values for each minute, I can see some high usage around 95%.

And as you can image, other databases started to see wait time for the same moment:

If you compare the three graphs at the same time, you can see that when one database started to use a lot from the storage server (around the max throughput from hardware), another database started to see more wait time for each request. The value below is expressed in ms/r (milliseconds per request).

This is one example of what you can discover using metrics. In this case specifically, the analyses started because one application DBA Team reported that started to notice some slow queries during the morning. But the application changed nothing in the previous 4 months and the AWR and tuning their queries helped nothing. Using the metric was discovered that another database deployed a new version with bad SQL. But looking from this database, everything appears normal.

How to read

If you want to read metrics the best way if trough cellcli, but sometimes you don’t need it of you can’t. You can use the Enterprise Manager/Cloud Control to check some information, but you don’t have all the metrics available, but can use the Exadata plugin to check IOPS and MB/s.

Still, at EM/CC you can try to check the KPI for Storage server and write some queries direct to EM/CC database. Oracle has a good reference about this procedure in this PDF: https://www.oracle.com/technetwork/database/availability/exadata-storage-server-kpis-2855362.pdf

You can still use the Oracle Management Cloud for Exadata to help you to identify bottlenecks in your environment. You can use some IA and Oracle expertise to identify trends easier and be more proactive than reactive https://cloudcustomerconnect.oracle.com/files/32e7361eed/OMC_For_EXADATA_whitepaper_v28.pdf or this blog post from Oracle.

Another way is writing/use one script to integrate with your monitoring tool. You can use my script as a guide and adapt it to your needs. The script can be checked in my post at Oracle OTN (Portuguese Version, Spanish Version, unfortunately, there is no English version. Was sent to be published, but never was – I don’t know why), the version published read all metrics and output it in one line. You can use it to insert it into one database or save it in one file as you desire.

Insights

The idea for metric analysis is to check it and have insights about what can be wrong. Sometimes it is easy to identify (like the example before), but other times you need to dig in a lot of metrics to find the problem (smartio, flashcache, dirty cache, etc).

But unfortunately, to understand metric you need to truly understand Exadata. It is more than a simple knowledge of the hardware or how to patch the Exadata. You need to understand the limits for your machine (datasheet can help), database tuning (to understand a bad query plan), and where to search (which metric use). You need to use this together to have the insights, and unfortunately, this came just with day-to-day usage/practice.

The idea for this post was to provide you a guide, or even a shortcut, to understand the Exadata metric and do more. I did not cover every metric, or every detail for each metric, but tried to show you how to read, analyze, and use it to identify problems.

Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community.”

Using the feature recover standby database from service Dataguard on 19c

Category: Database Author: Andre Luiz Dutra Ontalba (Board Member) Date: 6 years ago Comments: 0

Using the feature recover standby database from service Dataguard on 19c

The Oracle Dataguard, part from MAA (Max Availability Architecture), is a prouct vastly utilized all over the entire globe by several companies whom willing to keep their data safe, preventing lost of data in case of human erros, hardware failures and other kind of disasters. The main function of this product is provide a copy from your primary database, synchronized by archivelogs, usually maintained into a different geographical location, in most of cases. With the Active Dataguard introduction, on Oracle 11g, an option, licensed, leverage this product to another level, even better, in my humble opinion. This feature allows the users to run some reports on standby database with the state read only with apply, this can deviate a lot of heavy workloads from principal database and reducing a lot of resources on it.

With Oracle database 19c, Oracle introduced a feature that allows some DML done on standby database be redirected to the primary database and the shipped by archivelogs again to standby. Note, this is not recomended for usage with intense workloads, because of performance matters, use more, cpu, network and etc. Just mention this features for you follow up some improvements on the product on each version. Today we will talk about one more enhancement that make it easier DBA’s life time to time.

Sometimes, because of some hardware failures, human failures, bugs or even some reconfiguration, we need to recreate the standby database, and on Oracle 18c and onwards, Oracle has introduced a single RMAN command to performance the whole resynchronization from standby databases over network. I don’t need to tell you that is no suitable perform multi-terabyte database restoration over network, most of time (you can have a dedicated network to play with).

Benefits

Simplicity to create or recreate the standby database
Best control of the process

Scenario

Primary database: orclcdb
Standby database: orclstb

The primary database and also the standby are managed by Dataguard broker (DGMGRL), I like to use this command line utility to manage the Dataguard environments. It helps to perform all dataguard administration like switchover, configure and a lot more of tasks in a Dataguard environment, if you want to have more information about this incredible tool, please refer the official documentation on the link https://docs.oracle.com/en/database/oracle/oracle-database/19/dgbkr/index.html

Verification

As I mentioned before, using the command line interface from Dataguard Broker, we can use the command show configuration to display the configuration made by us, previously, in our case, on this Dataguard configuration we have the primary database orclcdb and also the standby database orclstb, we can have multiple standby databases and also ZDLRA registered on broker. We will cause some damage on standby database just to show how to solve it on this article, by removing a datafile, let’s say system01.dbf, just for fun.

We can see on the imagem below, there are no problems there, like a gap or any other kind of issues, in this case, if we lose the primary database, the standby will be able to become the primary database with just a manual switchover. It is also possible to automate this task, by configuring Fast-Start Failover.

Causing some damage on standby

On the follwoing image, we can verify all belonging datafiles from our standby database, where we will cause some damage and recover it just using recover standby database from service. The list below, shows all datafiles from our standby (orclstb)

Now, we will remove the main datafile on the standby, /u01/app/oracle/oradata/ORCLSTB/system01.dbf

As we expected, after remove, one of the datafiles on purpose, our environment crashed, look:

Disabling the apply on standby database

Before to initiate with the recover process, from this standby database, we will neneed stop the archivelog synchronization, to do that, we can use the commands below, to edit the state of the database within broker. If you don’t do that, we will face this error:

starting media recovery

media recovery failed

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of recover command at 01/19/2020 11:16:26

RMAN-03015: error occurred in stored script Memory Script

RMAN-11003: failure during parse/execution of SQL statement: alter database recover

 if needed standby start

ORA-01153: an incompatible media recovery is active

As we can see, the command to stop apply on DGMGRL was issued, and now, no more error will happen.

After stopping the recover, we just need to login on RMAN and issue the command recover standby database from service, as we can follow up on upcomming lines of this article:

[oracle@ora19c ~]$ rman target=sys/oracle@orclstb




Recovery Manager: Release 19.0.0.0.0 - Production on Sun Jan 19 11:27:59 2020

Version 19.3.0.0.0




Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.




connected to target database: ORCLCDB (DBID=2780785463, not open)




RMAN> recover standby database from service orclcdb;




Starting recover at 19-JAN-20

using target database control file instead of recovery catalog

Oracle instance started




Total System Global Area    1895823376 bytes




Fixed Size                     9136144 bytes

Variable Size                436207616 bytes

Database Buffers            1442840576 bytes

Redo Buffers                   7639040 bytes




contents of Memory Script:

{

   restore standby controlfile from service  'orclcdb';

   alter database mount standby database;

}

executing Memory Script




Starting restore at 19-JAN-20

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=39 device type=DISK




channel ORA_DISK_1: starting datafile backup set restore

channel ORA_DISK_1: using network backup set from service orclcdb

channel ORA_DISK_1: restoring control file

channel ORA_DISK_1: restore complete, elapsed time: 00:00:02

output file name=/u01/app/oracle/oradata/ORCLSTB/control01.ctl

output file name=/u01/app/oracle/fast_recovery_area/ORCLSTB/control02.ctl

Finished restore at 19-JAN-20




released channel: ORA_DISK_1

Statement processed




contents of Memory Script:

{

set newname for datafile  1 to

 "/u01/app/oracle/oradata/ORCLSTB/system01.dbf";

   restore from service  'orclcdb' datafile

    1;

   catalog datafilecopy  "/u01/app/oracle/oradata/ORCLSTB/system01.dbf";

   switch datafile all;

}

executing Memory Script




executing command: SET NEWNAME




Starting restore at 19-JAN-20

Starting implicit crosscheck backup at 19-JAN-20

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=49 device type=DISK

Crosschecked 3 objects

Finished implicit crosscheck backup at 19-JAN-20




Starting implicit crosscheck copy at 19-JAN-20

using channel ORA_DISK_1

Crosschecked 2 objects

Finished implicit crosscheck copy at 19-JAN-20




searching for all files in the recovery area

cataloging files...

cataloging done




List of Cataloged Files

=======================

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_25_gyq4g4hz_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_26_gyq5km3v_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_27_gyq64bom_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_28_gyq64n7f_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_29_gyq64zy8_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_30_gyq6vxg5_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_32_gyq6vzrv_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_31_gyq6vzs6_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_33_gyq84b59_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_34_gyq88dll_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_35_gyq9rvx9_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_36_gyqcf807_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_37_gyqcfsk1_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_38_gyqcg4l3_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_39_gyqckh4o_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_41_gyqckkfk_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_40_gyqckkg6_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_42_gyqcpsko_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_07/o1_mf_1_43_gyqdchj7_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_08/o1_mf_1_44_gyso1jwz_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_08/o1_mf_1_45_gytopnjo_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2019_12_08/o1_mf_1_46_gytowb7h_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2020_01_09/o1_mf_1_47_h1gstntb_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2020_01_09/o1_mf_1_48_h1gtcgd0_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2020_01_09/o1_mf_1_49_h1gv3wqn_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2020_01_09/o1_mf_1_50_h1gv8rbs_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2020_01_09/o1_mf_1_51_h1gvo2x8_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2020_01_16/o1_mf_1_52_h21lqfd2_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2020_01_16/o1_mf_1_53_h21m7fwt_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/archivelog/2020_01_19/o1_mf_1_54_h28w8y78_.arc

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/autobackup/2019_12_07/o1_mf_s_1026372241_gyq6h263_.bkp

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/autobackup/2019_12_07/o1_mf_s_1026373077_gyq7r09k_.bkp

File Name: /u01/app/oracle/fast_recovery_area/ORCLSTB/autobackup/2019_12_07/o1_mf_s_1026378166_gyqd9fxy_.bkp




using channel ORA_DISK_1




channel ORA_DISK_1: starting datafile backup set restore

channel ORA_DISK_1: using network backup set from service orclcdb

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

channel ORA_DISK_1: restoring datafile 00001 to /u01/app/oracle/oradata/ORCLSTB/system01.dbf

channel ORA_DISK_1: restore complete, elapsed time: 00:00:07

Finished restore at 19-JAN-20




cataloged datafile copy

datafile copy file name=/u01/app/oracle/oradata/ORCLSTB/system01.dbf RECID=5 STAMP=1030102136




datafile 1 switched to datafile copy

input datafile copy RECID=5 STAMP=1030102136 file name=/u01/app/oracle/oradata/ORCLSTB/system01.dbf




contents of Memory Script:

{

  recover database from service  'orclcdb';

}

executing Memory Script




Starting recover at 19-JAN-20

using channel ORA_DISK_1

skipping datafile 1; already restored to SCN 4595092

skipping datafile 3; already restored to SCN 4594570

skipping datafile 5; already restored to SCN 2163739

skipping datafile 6; already restored to SCN 2163739

skipping datafile 7; already restored to SCN 4594577

skipping datafile 8; already restored to SCN 2163739

skipping datafile 9; already restored to SCN 4594580

skipping datafile 10; already restored to SCN 4594582

skipping datafile 12; already restored to SCN 4594588

skipping datafile 13; already restored to SCN 4594593

skipping datafile 14; already restored to SCN 4594596

skipping datafile 15; already restored to SCN 4594598

skipping datafile 19; already restored to SCN 4594600

skipping datafile 20; already restored to SCN 4594604

skipping datafile 21; already restored to SCN 4594611




starting media recovery




media recovery complete, elapsed time: 00:00:00

Finished recover at 19-JAN-20

Finished recover at 19-JAN-20




RMAN>

Thus, we are able to see, with just one simple commad line on RMAN, how can be easy recover/rebuild my entire dataguard database and stay ready for the next disaster recover, we hope not face that, but problems happens. The final result, our standby database was restored and can be fully syncrhonized again with the production database.

So, via DGMGRL interface, we can enable the synchronization again as we can see on the image:

For validation purposes, we will issue a switchover to change the roles of our databases, making the primary database becoming the standby and standby becoming the primary:

Now, our database orclstb, originally standby database is the primary, and the database orclcdb was changed to physical standby role:

Alls are opened and ready to use, the instainswith db_unique_name = orclstb now is with PRIMARY ROLE.

Conclusion: Oracle is always inovating and make our lives easier. With every launched version, new features are added. We can agree on this, this funcionality saves a lot of work, but, be concerned that you restore is fully using network, if your database is huge, you may kill the performance of your entire network.

André Ontalba / Rodrigo Mufalani / Y V RaviKumar

Exadata, Using metrics to help you

Category: Engineer System Author: Fernando Simon (Board Member) Date: 6 years ago Comments: 0

Exadata, Using metrics to help you

It is well known that Exadata delivers a lot of power for databases and, besides that, has a lot of features that can be combined to reach the desired goals. But you need to understand how to use Exadata, it is not just knowing the internal hardware pieces, put some SQL hints, or use smart scan that makes a better DBA (or DMA).

Think about the “traditional” environment (DB + storage) and how you check for performance problems there. Basically, you just have/receive the number of IOPS from luns, throughput in MB/s, and latency from the storage side. But Exadata provides a lot of metrics that go beyond that and can be used to really understand what it is happening between the database and the access of data blocks.

For me, one of the most underrated (and not even well explained in web) features of Exadata is the metrics because they can help you to really understand Exadata deeply. As an example, from metrics, you can check the MB/s read from flash cache, disks (per type), flash log writes, reads that bypassed flash cache and went to disk, Smart I/O per database, PDB or consumer groups. It is not part of this post explain all the metrics (will be in another one), but you can read more at Chapter 6 of the Exadata User Guide.

In this post, I will show you one example of how to use the metric to identify and solve database problems. Sometimes it can be a hide and seek game, but I will try to show you how to use metrics and how they can help you on your daily basis.

DB_FC_IO_BY_SEC

Some weeks ago I was extending one script that I created 5 years ago to read metrics from all Exadata storage cells and saw interesting behavior for one database that can be a good example. The script can be checked in my post at Oracle OTN (Portuguese Version, Spanish Version, unfortunately, there is no English version. Was sent to be published, but never was – I don’t know why), the version published read all metrics and output it in one line. You can use it to insert it into one database or save it in one file as you desire.

The behavior that I saw was related to the values reported by DB_FC_IO_BY_SEC that represent the “The number of megabytes of I/O per second for this database to flash cache” done per database. The image below reports the GB/s read from DB_FC_IO_BY_SEC for just one database in two consecutive days, each point in the graph represent the average value from the last 10 minutes (read minute by minute).

As you can see, there are peaks of values, but in the end, as you can imagine, it does not appear to be correct. This does not appear to be/represent a normal trend/pattern and intuitively this got me attention. And ss you can see below, the activity report from EM/CC (for one/same day than above) for this database does not report the same behavior or even a hint:

Understand, or know Exadata, it is more than put database running there and forgot it. Even a normal/expected report from the default monitoring tools (EM/CC) of the database side can hide bad things. This can be worst in Exadata because you have a lot of power available. Using metrics, you can find these gaps and provide a better result about database tune, or discover and solve problems.

Remember above the metric values for DB_FC_IO_BY_SEC, they have a max peak (in the image) at 25GB/s. And if you check at Exadata X8 Data Sheet, the max value from flash read per server (not EF) is around 21GB/s. Since the values that I showed before are the sum for all cells (Half Rack here), they really bellow the maximum that Exadata can reach. The database was not struggling in Exadata (I imagine that will be in the traditional environment), but the power available at Exadata (and flash cache feature) was hiding the problem.

Returning to the example, since the report hinted some bad usage I made research from running SQL’s and hit two notes:

Slow Performance Of DBA_AUDIT_SESSION Query From “Failed logins” Metric in Enterprise Manager (Doc ID 1178736.1).
EM 12c, 11g: High Cpu Utilization from Enterprise Manger Agent perl Process Running failedLogin.pl for audit_failed_logins metric (Doc ID 1265699.1).

Basically, the EM/CC collection for failed login attempts was running and consuming a lot of resources. And since this EM/CC audit was not needed, it was can be disabled and after that, the DB_FC_IO_BY_SEC changed to:

As you can see, more normal behavior for database usage.

The Point

The point of this article is clear. Exadata can be simple to use, you just drop the database there and the features available will help. But unfortunately, the extra power that you have there can hide the bad things (as shown above). If you compare how to do the same in the traditional environment, the only details from the storage side that you can discover are just IOPS and throughput. But for Exadata, you can read a lot of other points to tune the database or show you problems.

Understand Exadata it is more than just learn about network, hardware, or put some hints to have a better performance. Remember that hardware change every release, and the database optimizer every version too. But Exadata metrics are there since the beginning and they just expand to provide a complete view about what happens between the database and the hardware itself. Know how to read and interpret it is the key to a good DMA.

I really recommend read Chapter 6 of the Exadata User Guide. But it is important too to have a good base about what Exadata HW can deliver and how the features work. The Exadata X8 Data Sheet, Technical White Paper and MAA OOW presentations provide you a good source of information.

Remember, it is more than learning about hardware or use some hints for SQL. It is more about understanding how hardware and software are integrated and what you can extract from there. And with Exadata metrics, you can really discover what is happening.

ZDLRA, Multi-site protection – ZERO RPO for Primary and Standby

Category: Engineer System Author: Fernando Simon (Board Member) Date: 6 years ago Comments: 0

ZDLRA, Multi-site protection - ZERO RPO for Primary and Standby

ZDLRA can be used from a small single database environment to big environments where you need protection in more than one site at the same time. At every level, you can use different features of ZDLRA to provide desirable protection. Here I will show how to reach zero RPO for both primary and standby databases. All the steps, doc, and tech parts are covered.

You can check the examples the reference for every scenario int these two papers from the Oracle MAA team: MAA Overview On-Premises and Oracle MAA Reference Architectures. They provide good information on how to prepare to reduce RPO and improve RTO. In resume, the focus is the same, reduce the downtime and data loss in case of a catastrophe (zero RPO, and zero RPO).

Multi-site protection

If you looked both papers before, you saw that to provide good protection is desirable to have an additional site to, at least, send the backups. And if you go higher, for GOLD and PLATINUM environments, you start to have multiple sites synced with data guard. These Critical/Mission-critical environments need to be protected for every kind of catastrophic failure, from disk until complete site outage (some need to follow specific law’s requirements, bank as an example).

And the focus of this post is these big environments. I will show you how to use ZDLRA to protect both sites, reaching zero RPO even for standby databases. And doing that, you can survive for a catastrophic outage (like entire datacenter failure) and still have zero RPO. Going further, you can even have zero RPO if you lose completely on site when using real-time redo for ZDLRA, and this is not written in the docs by the way.

Some features I already wrote about that in the previous posts. Already wrote about ZDLRA features, how to enroll a database, and how to reach zero RPO for database protection. All of these I will use here, to protect the bigger environments it is used incremental forever strategy for backup and real-time redo to protect primary and standby databases to reach zero RPO.

MAA team already wrote about how to do that at Deploying the Zero Data Loss Recovery Appliance in a Data Guard Configuration, but frankly, it is hard to understand this doc. And here in this post, I will try to provide a better example and how to reach RPO zero for both sites.

Multiple ZDLRA’s

The way that we will use ZDLRA, in this case, is different from the replication feature that exists for ZDLRA. Here, we have two ZDLRA, one for each site. The environment in this case is:

ORCL19: RAC primary database.
ORCL19S: RAC standby database.
ZDLRAS1: ZDLRA that protect the primary site.
ZDLRAS2: ZDLRA that protects the standby site.

And the target will be:

As you can see above, each ZDLRA protects your own site, and the replication between sites is done by DG. The DG configuration it is not part of this post, but the output (and order) for all commands how I created the RAC + RAC DG you can check here – Steps-RAC-RAC-DG-Creation. If you have some doubts about it, I can explain if needed.

ZDLRA Configuration – Protecting and Enrolling Databases

VPC

If you already have a VPC user created for your ZDLRA and want to use an existing one, you can skip this step. Just remember that the same user needs to exist in both ZDLRA’s. This is more critical if you are intending to use real-time redo and reach zero RPO.

So, if needed, create the same VPC user in both ZDLRA’s:

At ZDLRAS1:

[root@zdlras1n1 ~]# /opt/oracle.RecoveryAppliance/bin/racli add vpc_user --user_name=vpczdlra

[vpczdlra] New Password:

Sat Nov  2 19:43:59 2019: Start: Add vpc user vpczdlra.

Sat Nov  2 19:44:00 2019:        Add vpc user vpczdlra successfully.

Sat Nov  2 19:44:00 2019: End:   Add vpc user vpczdlra.

[root@zdlras1n1 ~]#

At ZDLRAS2:

[root@zdlras2n1 ~]# /opt/oracle.RecoveryAppliance/bin/racli add vpc_user --user_name=vpczdlra

[vpczdlra] New Password:

Sat Nov  2 19:43:41 2019: Start: Add vpc user vpczdlra.

Sat Nov  2 19:43:42 2019:        Add vpc user vpczdlra successfully.

Sat Nov  2 19:43:42 2019: End:   Add vpc user vpczdlra.

[root@zdlras2n1 ~]#

Backup Policy

It is not needed to have the same policy in each ZDLRA and it is possible to enroll the database using different policies since they (ZDLRA’s) work separately. Here I used the same policy in both sites (but with different recovery window goals).

At ZDLRAS1:

[oracle@zdlras1n1 ~]$ sqlplus rasys/change^Me2

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Nov 2 22:21:18 2019

Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Sat Nov 02 2019 11:20:06 +01:00

Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.3.0.0.0

SQL> BEGIN

  2  DBMS_RA.CREATE_PROTECTION_POLICY(

  3      protection_policy_name => 'ZDLRA'

  4      , description => 'Policy ZDLRA S1'

  5      , storage_location_name => 'DELTA'

  6      , recovery_window_goal => INTERVAL '5' DAY

  7      , max_retention_window => INTERVAL '10' DAY

  8      , guaranteed_copy => 'NO'

  9      , allow_backup_deletion => 'YES'

 10  );

 11  END;

 12  /

PL/SQL procedure successfully completed.

SQL>

At ZDLRAS2:

[oracle@zdlras2n1 ~]$ sqlplus rasys/change^Me2

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Nov 2 22:22:13 2019

Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Sat Nov 02 2019 11:21:04 +01:00

Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.3.0.0.0

SQL> BEGIN

  2  DBMS_RA.CREATE_PROTECTION_POLICY(

  3      protection_policy_name => 'ZDLRA'

  4      , description => 'Policy ZDLRA S2'

  5      , storage_location_name => 'DELTA'

  6      , recovery_window_goal => INTERVAL '7' DAY

  7      , max_retention_window => INTERVAL '14' DAY

  8      , guaranteed_copy => 'NO'

  9      , allow_backup_deletion => 'YES'

 10  );

 11  END;

 12  /

PL/SQL procedure successfully completed.

SQL>

Grant DB at ZDLRA

Here we have the first key point, and it is related to the way that database is registered database inside ZDLRA. It follows the traditional way, using the ADD_DB, GRANT_DB_ACCESS, and using DB_UNIQUE_NAME as identification.

The point here is that for dataguard the DB_UNIQUE_NAME is different for each database, but one detail it is that ZDLRA internally links DBID of the database with the unique name. So, it is needed and required, that for ZDLRA who will protect the standby site that the add and registration use DB_UNIQUE_NAME from the primary. If you do not do that, the ZDLRA will understand that the standby database it is a completely different database (and will report, RMAN-03009: failure of resync command on default channel, ORA-00001: unique constraint (RASYS.ODB_P) violated, and ORA-04088: error during execution of trigger ‘RASYS.NODE_INSERT_UPDATE’).

And another common error, if you try to register the standby database (instead of primary database) is RMAN-01005: Mounted control file type must be CURRENT to register the database. This occurs because some registrations and checks done by rman and database controlfile need to be at current one.

At ZDLRAS1:

[oracle@zdlras1n1 ~]$ sqlplus rasys/change^Me2

SQL*Plus: Release 19.0.0.0.0 - Production on Sun Nov 3 14:11:37 2019

Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Sun Nov 03 2019 11:16:12 +01:00

Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.3.0.0.0

SQL> BEGIN

  2  DBMS_RA.ADD_DB(

  3        db_unique_name => 'ORCL19'

  4        , protection_policy_name => 'ZDLRA'

  5        , reserved_space => '5G'

  6  );

  7  END;

  8  /

PL/SQL procedure successfully completed.

SQL>

SQL> BEGIN

  2  DBMS_RA.GRANT_DB_ACCESS (

  3        db_unique_name => 'ORCL19'

  4        , username => 'VPCZDLRA'

  5  );

  6  END;

  7  /

PL/SQL procedure successfully completed.

SQL>

At ZDLRAS2:

[oracle@zdlras2n1 ~]$ sqlplus rasys/change^Me2

SQL*Plus: Release 19.0.0.0.0 - Production on Sun Nov 3 16:14:06 2019

Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Sun Nov 03 2019 11:16:55 +01:00

Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.3.0.0.0

SQL> BEGIN

  2  DBMS_RA.ADD_DB(

  3        db_unique_name => 'ORCL19'

  4        , protection_policy_name => 'ZDLRA'

  5        , reserved_space => '5G'

  6  );

  7  END;

  8  /

PL/SQL procedure successfully completed.

SQL> BEGIN

  2  DBMS_RA.GRANT_DB_ACCESS (

  3        db_unique_name => 'ORCL19'

  4        , username => 'VPCZDLRA'

  5  );

  6  END;

  7  /

PL/SQL procedure successfully completed.

SQL>

Look above that ADD_DB and GRANT_DB_ACCESS at ZDLRAS2 used the DB_UNIQUE_NAME as ORCL19 (that it is the name for primary).

Registry database

At Primary – ORLC19

At this point, it is possible to register and backup the primary database with ZDLRAS1. The steps are the same that I described at my post on how to enroll a database at ZDLRA. The steps are, in order:

Install ZDLRA library
Create Wallet
Configure/Test tns entry to ZDLRA
Adjust sqlnet.ora file (this is needed because the real-time redo)

[oracle@orcl19p ~]$ cd /tmp/

[oracle@orcl19p tmp]$ unzip ra_linux64.zip

Archive:  ra_linux64.zip

  inflating: libra.so

  inflating: metadata.xml

[oracle@orcl19p tmp]$

[oracle@orcl19p tmp]$ cp ./libra.so /u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libra.so

[oracle@orcl19p tmp]$

[oracle@orcl19p tmp]$ cd $ORACLE_HOME/dbs

[oracle@orcl19p dbs]$

[oracle@orcl19p dbs]$ mkstore -wrl /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet -createALO

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.4.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

[oracle@orcl19p dbs]$

[oracle@orcl19p dbs]$ mkstore -wrl /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet -createCredential zdlras1-scan:1521/zdlras1:VPCZDLRA vpczdlra oracle

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.4.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

[oracle@orcl19p dbs]$

[oracle@orcl19p dbs]$ mkstore -wrl /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet -listCredential

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.4.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

List credential (index: connect_string username)

1: zdlras1-scan:1521/zdlras1:VPCZDLRA vpczdlra

[oracle@orcl19p dbs]$

[oracle@orcl19p dbs]$ vi $ORACLE_HOME/dbs/raORCL191.ora

[oracle@orcl19p dbs]$ cat $ORACLE_HOME/dbs/raORCL191.ora

RA_WALLET='LOCATION=file:/u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet CREDENTIAL_ALIAS=zdlras1-scan:1521/zdlras1:VPCZDLRA'

[oracle@orcl19p dbs]$

[oracle@orcl19p dbs]$ tnsping zdlras1

TNS Ping Utility for Linux: Version 19.0.0.0.0 - Production on 03-NOV-2019 15:34:09

Copyright (c) 1997, 2019, Oracle.  All rights reserved.

Used parameter files:

Used TNSNAMES adapter to resolve the alias

Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = zdlras1-scan)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = zdlras1)))

OK (20 msec)

[oracle@orcl19p dbs]$

[oracle@orcl19p dbs]$ cat $ORACLE_HOME/network/admin/sqlnet.ora

cat: /u01/app/oracle/product/19.0.0.0/dbhome_1/network/admin/sqlnet.ora: No such file or directory

[oracle@orcl19p dbs]$

[oracle@orcl19p dbs]$ vi $ORACLE_HOME/network/admin/sqlnet.ora

[oracle@orcl19p dbs]$

[oracle@orcl19p dbs]$ cat $ORACLE_HOME/network/admin/sqlnet.ora

NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)

SQLNET.WALLET_OVERRIDE = true

WALLET_LOCATION =

(

   SOURCE =

      (METHOD = FILE)

      (METHOD_DATA =

         (DIRECTORY = /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet)

      )

)

[oracle@orcl19p dbs]$

At Standby – ORLC19S

The same procedure than above it is done here:

Install ZDLRA library
Create Wallet
Configure/Test tns entry to ZDLRA
Adjust sqlnet.ora file (this is needed because the real-time redo)

[oracle@orcl19s ~]$ cd /tmp/

[oracle@orcl19s tmp]$ unzip ra_linux64.zip

Archive:  ra_linux64.zip

  inflating: libra.so

  inflating: metadata.xml

[oracle@orcl19s tmp]$

[oracle@orcl19s tmp]$ cp ./libra.so /u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libra.so

[oracle@orcl19s tmp]$

[oracle@orcl19s tmp]$ cd $ORACLE_HOME/dbs

[oracle@orcl19s dbs]$

[oracle@orcl19s dbs]$ mkstore -wrl /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet -createALO

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.4.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

[oracle@orcl19s dbs]$

[oracle@orcl19s dbs]$ mkstore -wrl /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet -createCredential zdlras2-scan:1521/zdlras2:VPCZDLRA vpczdlra oracle

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.4.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

[oracle@orcl19s dbs]$

[oracle@orcl19s dbs]$ mkstore -wrl /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet -listCredential

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.4.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

List credential (index: connect_string username)

1: zdlras2-scan:1521/zdlras2:VPCZDLRA vpczdlra

[oracle@orcl19s dbs]$

[oracle@orcl19s dbs]$ vi $ORACLE_HOME/dbs/raORCL19S1.ora

[oracle@orcl19s dbs]$

[oracle@orcl19s dbs]$ cat $ORACLE_HOME/dbs/raORCL19S1.ora

RA_WALLET='LOCATION=file:/u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet CREDENTIAL_ALIAS=zdlras2-scan:1521/zdlras2:VPCZDLRA'

[oracle@orcl19s dbs]$

[oracle@orcl19s dbs]$ tnsping zdlras2

TNS Ping Utility for Linux: Version 19.0.0.0.0 - Production on 03-NOV-2019 16:32:57

Copyright (c) 1997, 2019, Oracle.  All rights reserved.

Used parameter files:

Used TNSNAMES adapter to resolve the alias

Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = zdlras2-scan)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = zdlras2)))

OK (0 msec)

[oracle@orcl19s dbs]$

[oracle@orcl19s dbs]$ cat $ORACLE_HOME/network/admin/sqlnet.ora

cat: /u01/app/oracle/product/19.0.0.0/dbhome_1/network/admin/sqlnet.ora: No such file or directory

[oracle@orcl19s dbs]$

[oracle@orcl19s dbs]$ vi $ORACLE_HOME/network/admin/sqlnet.ora

[oracle@orcl19s dbs]$

[oracle@orcl19s dbs]$ cat $ORACLE_HOME/network/admin/sqlnet.ora

NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)

SQLNET.WALLET_OVERRIDE = true

WALLET_LOCATION =

(

   SOURCE =

      (METHOD = FILE)

      (METHOD_DATA =

         (DIRECTORY = /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet)

      )

)

[oracle@orcl19s dbs]$

Registry Primary Database – Both ZDLRA’s

After that, we can register ORACL19 at ZDLRAS1. It is critical to register the primary database first at the ZDLRA that will protect the primary site.

[oracle@orcl19p dbs]$ rman target=/ catalog=vpczdlra/oracle@zdlras1

Recovery Manager: Release 19.0.0.0.0 - Production on Sun Nov 3 15:40:37 2019

Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL19 (DBID=324042554)

connected to recovery catalog database

PL/SQL package VPCZDLRA.DBMS_RCVCAT version 19.03.00.00. in RCVCAT database is not current

PL/SQL package VPCZDLRA.DBMS_RCVMAN version 19.03.00.00 in RCVCAT database is not current

RMAN> register database;

database registered in recovery catalog

starting full resync of recovery catalog

full resync complete

RMAN>

If you want, you can do the backup of the database (it is optional), I not showed here but it is a simple backup database level 0 filesperset 1 and you can see the output here – Output-Backup-Primary-ORCL19-at-ZDLRAS1.

After registry primary database ORCL19 at ZDLRA (ZDLRAS1) that protect the primary site, we can register the primary ORCL19 at the ZDLRA that protect the standby site (ZDLRAS2)

Look that I need to add the entry of TNS names for ZDLRAS2 (at the primary server) too:

[oracle@orcl19p dbs]$ tnsping zdlras2

TNS Ping Utility for Linux: Version 19.0.0.0.0 - Production on 03-NOV-2019 16:37:38

Copyright (c) 1997, 2019, Oracle.  All rights reserved.

Used parameter files:

/u01/app/oracle/product/19.0.0.0/dbhome_1/network/admin/sqlnet.ora

Used TNSNAMES adapter to resolve the alias

Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = zdlras2-scan)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = zdlras2)))

OK (0 msec)

[oracle@orcl19p dbs]$

[oracle@orcl19p dbs]$ rman target=/ catalog=vpczdlra/oracle@zdlras2

Recovery Manager: Release 19.0.0.0.0 - Production on Sun Nov 3 17:30:06 2019

Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL19 (DBID=324042554)

connected to recovery catalog database

PL/SQL package VPCZDLRA.DBMS_RCVCAT version 19.03.00.00. in RCVCAT database is not current

PL/SQL package VPCZDLRA.DBMS_RCVMAN version 19.03.00.00 in RCVCAT database is not current

RMAN> register database;

database registered in recovery catalog

starting full resync of recovery catalog

full resync complete

RMAN> exit

Recovery Manager complete.

[oracle@orcl19p dbs]$

Since the registration was done at primary, the controlfile of the standby database needs to be updated with one rman catalog, and it is needed to inform ZDLRAS2 (that protect standby site) who is the standby database. This is done by doing a simple resync catalog from standby, connected at ZDLRAS2.

[oracle@orcl19s dbs]$ rman target=/ catalog=vpczdlra/oracle@zdlras2

Recovery Manager: Release 19.0.0.0.0 - Production on Sun Nov 3 20:12:32 2019

Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL19 (DBID=324042554, not open)

connected to recovery catalog database

PL/SQL package VPCZDLRA.DBMS_RCVCAT version 19.03.00.00. in RCVCAT database is not current

PL/SQL package VPCZDLRA.DBMS_RCVMAN version 19.03.00.00 in RCVCAT database is not current

RMAN> show all;

RMAN configuration parameters for database with db_unique_name ORCL19S are:

CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default

CONFIGURE BACKUP OPTIMIZATION OFF; # default

CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default

CONFIGURE CONTROLFILE AUTOBACKUP ON; # default

CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default

CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE SBT_TAPE TO '%F'; # default

CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default

CONFIGURE DEVICE TYPE SBT_TAPE PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default

CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default

CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE SBT_TAPE TO 1; # default

CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default

CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE SBT_TAPE TO 1; # default

CONFIGURE CHANNEL 1 DEVICE TYPE 'SBT_TAPE' FORMAT   '%d_%U' PARMS  "SBT_LIBRARY=/u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libra.so, ENV=(RA_WALLET='location=file:/u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet credential_alias=zdlras1-scan:1521/zdlras1:VPCZDLRA')" CONNECT '*';

CONFIGURE MAXSETSIZE TO UNLIMITED; # default

CONFIGURE ENCRYPTION FOR DATABASE OFF; # default

CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default

CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default

CONFIGURE RMAN OUTPUT TO KEEP FOR 7 DAYS; # default

CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default

CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/snapcf_orcl19s1.f'; # default

RMAN> resync catalog;

starting resync of recovery catalog

resync complete

RMAN> list db_unique_name of database orcl19;

List of Databases

DB Key  DB Name  DB ID            Database Role    Db_unique_name

------- ------- ----------------- ---------------  ------------------

835     ORCL19   324042554        PRIMARY          ORCL19

835     ORCL19   324042554        STANDBY          ORCL19S

RMAN>

Look at some points above:

show all read the already registered information from rman catalog (look channel set)
After the resync catalog, the list db_unique_name reported both databases, primary (ORCL19) and standby (ORCL19S)

After that, you can configure the correct channel for ORCL19S to point to ZDLRAS2, and do the backup. This output can be checked here – Output-Backup-Primary-ORCL19S-at-ZDLRAS2.

Environment protected

At this point, you have both sites protected, which one with dedicated local backup with ZDLRA. Primary ORCL19 database does the backup at ZDLRAS1, and standby database ORCL19S does the backup at ZDLRAS2:

But this not protect you in case of failure of one site. In this case, the RPO will not be zero because we don’t have where to send the archivelogs.

Think that you lose your primary site, and after the switch to standby the machine failures again. What will be your RPO? The last backup of standby will be, because your standby (new primary) it is not sending archivelogs to the old primary (because it is out for outage), and you don’t have real-time redo configured for ZDLRA. So, if you have an outage at standby you lose until the point of your last backup. This is not clear at the docs from MAA Overview On-Premises and Oracle MAA Reference Architectures.

ZERO RPO

To reach zero RPO we just need to enable real-time redo for both sites. We can do manually or using the broker. If you want to see in detail how to configure you can check my two posts: ZDLRA, Real-Time Redo and ZDLRA, Real-Time Redo and Zero RPO, here I will post resumed steps.

Be careful with the order of the steps below, they are important because we are changing the user (redo_transport_user) that will be configured in both databases. If you change first at standby you will face issued because the user is created at primary first and the “creation” it sends automatically by redo. By the way, this user needs to be the same as VPC user that you used to configure/add/grant database access for ZDLRA. And need to be the same for every ZDLRA because both databases use the same user. This is the second key point of this config.

Manually Configuration

The manually way differs because here we set manually the log_archive_dest parameters. The others need to be done besides the way that we choose. And the manual config is the only way for 11G databases.

Basic config

For real-time redo we need to configure some parameters, but basically, I made these configs in order:

log_archive_config: Check that for primary it is ZDLRAS1, and ZDLRAS2 for standby
log_archive_dest_3: Again, each site points to the own ZDLRA. And look the type defined at VALID_FOR
Create the user for redo_transport_user inside database.
redo_transport_user: Same for both databases.

At Primary ORCL19:

[oracle@orcl19p ~]$ sqlplus / as sysdba




SQL*Plus: Release 19.0.0.0.0 - Production on Sun Nov 3 22:34:28 2019

Version 19.5.0.0.0




Copyright (c) 1982, 2019, Oracle.  All rights reserved.







Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.5.0.0.0




SQL> show parameter log_archive_config;




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

log_archive_config                   string      DG_CONFIG=(orcl19,orcl19s)

SQL> ALTER SYSTEM SET log_archive_config='DG_CONFIG=(orcl19,orcl19s,zdlras1)' SCOPE=SPFILE SID='*';




System altered.




SQL> ALTER SYSTEM SET log_archive_dest_3='SERVICE="zdlras1-scan:1521/zdlras1:VPCZDLRA" ASYNC NOAFFIRM DB_UNIQUE_NAME=zdlras1 VALID_FOR=(ONLINE_LOGFILE,ALL_ROLES)' scope = spfile sid = '*';




System altered.




SQL>

SQL> alter system set log_archive_dest_state_3=DEFER scope = spfile sid = '*';




System altered.




SQL>

SQL> col username format a20

SQL> select USERNAME,SYSDBA,SYSOPER,SYSASM,SYSBACKUP,SYSDG,SYSKM from v$pwfile_users;




USERNAME             SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM

-------------------- ----- ----- ----- ----- ----- -----

SYS                  TRUE  TRUE  FALSE FALSE FALSE FALSE




SQL> show parameter redo_transport_user;




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

redo_transport_user                  string

SQL> alter system set redo_transport_user='VPCZDLRA' scope = spfile SID='*';




System altered.




SQL>

SQL> alter session set "_ORACLE_SCRIPT"=true;




Session altered.




SQL> create user VPCZDLRA identified by oracle;




User created.




SQL> grant sysoper to VPCZDLRA;




Grant succeeded.




SQL> alter session set "_ORACLE_SCRIPT"=false;




Session altered.




SQL> select USERNAME,SYSDBA,SYSOPER,SYSASM,SYSBACKUP,SYSDG,SYSKM from v$pwfile_users;




USERNAME             SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM

-------------------- ----- ----- ----- ----- ----- -----

SYS                  TRUE  TRUE  FALSE FALSE FALSE FALSE

VPCZDLRA             FALSE TRUE  FALSE FALSE FALSE FALSE




SQL> exit

Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.5.0.0.0

[oracle@orcl19p ~]$

At Standby ORCL19S:

[oracle@orcl19s dbs]$ sqlplus / as sysdba




SQL*Plus: Release 19.0.0.0.0 - Production on Sun Nov 3 22:53:05 2019

Version 19.5.0.0.0




Copyright (c) 1982, 2019, Oracle.  All rights reserved.







Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.5.0.0.0




SQL> show parameter log_archive_config;




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

log_archive_config                   string      DG_CONFIG=(orcl19,orcl19s)

SQL> ALTER SYSTEM SET log_archive_config='DG_CONFIG=(orcl19,orcl19s,zdlras2)' SCOPE=SPFILE SID='*';




System altered.




SQL> ALTER SYSTEM SET log_archive_dest_3='SERVICE="zdlras2-scan:1521/zdlras2:VPCZDLRA" ASYNC NOAFFIRM DB_UNIQUE_NAME=zdlras2 VALID_FOR=(ALL_LOGFILES,ALL_ROLES)' scope = spfile sid = '*';




System altered.




SQL> alter system set redo_transport_user='VPCZDLRA' scope = spfile SID='*';




System altered.




SQL>

SQL> exit

Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.5.0.0.0

[oracle@orcl19s dbs]$

[oracle@orcl19s dbs]$ srvctl stop database -d orcl19s -o immediate

[oracle@orcl19s dbs]$

As you can see here, the user is not created at standby because the creation will come by redo. And as you can see in the end, I stopped the standby database.

Password file

Since one user was created in just one site, we need to copy the password file from production to the standby.

At Primary ORCL19:

[root@orcl19p ~]# su - grid

Last login: Sun Nov  3 23:03:09 CET 2019

[grid@orcl19p ~]$ asmcmd

ASMCMD> cd +DATA/ORCL19/

ASMCMD> ls -l

Type             Redund  Striped  Time             Sys  Name

                                                   Y    95D7C2B2568F0A70E0533205A00AC96B/

                                                   Y    95D86C40430C3E85E0533205A00A0EBA/

                                                   Y    CONTROLFILE/

                                                   Y    DATAFILE/

                                                   Y    DATAGUARDCONFIG/

                                                   Y    ONLINELOG/

                                                   Y    PARAMETERFILE/

                                                   Y    PASSWORD/

                                                   Y    TEMPFILE/

DATAGUARDCONFIG  UNPROT  COARSE   NOV 03 23:00:00  N    dr1orcl19.dat => +DATA/ORCL19/DATAGUARDCONFIG/ORCL19.275.1023297961

ASMCMD> ls -l PASSWORD/

Type      Redund  Striped  Time             Sys  Name

PASSWORD  UNPROT  COARSE   OCT 26 23:00:00  Y    pwdorcl19.256.1022714087

ASMCMD> ls -l

Type             Redund  Striped  Time             Sys  Name

                                                   Y    95D7C2B2568F0A70E0533205A00AC96B/

                                                   Y    95D86C40430C3E85E0533205A00A0EBA/

                                                   Y    CONTROLFILE/

                                                   Y    DATAFILE/

                                                   Y    DATAGUARDCONFIG/

                                                   Y    ONLINELOG/

                                                   Y    PARAMETERFILE/

                                                   Y    PASSWORD/

                                                   Y    TEMPFILE/

DATAGUARDCONFIG  UNPROT  COARSE   NOV 03 23:00:00  N    dr1orcl19.dat => +DATA/ORCL19/DATAGUARDCONFIG/ORCL19.275.1023297961

ASMCMD>

ASMCMD> cp PASSWORD/pwdorcl19.256.1022714087 /tmp

copying +DATA/ORCL19/PASSWORD/pwdorcl19.256.1022714087 -> /tmp/pwdorcl19.256.1022714087

ASMCMD> exit

[grid@orcl19p ~]$

As you saw, I saved the password file at /tmp folder.

At Standby ORCL19S:

[root@orcl19s ~]# su - grid

Last login: Sun Nov  3 23:03:00 CET 2019

[grid@orcl19s ~]$

[grid@orcl19s ~]$

[grid@orcl19s ~]$ scp orcl19p:/tmp/pwdorcl19.256.1022714087 /tmp/pwdorcl19.256.1022714087

grid@orcl19p's password:

pwdorcl19.256.1022714087                                                                                                                                                                                   100% 2560     4.3MB/s   00:00

[grid@orcl19s ~]$

[grid@orcl19s ~]$

[grid@orcl19s ~]$

[grid@orcl19s ~]$ asmcmd

ASMCMD> ls -l +DATA/ORCL19S/

Type             Redund  Striped  Time             Sys  Name

                                                   Y    95D7C2B2568F0A70E0533205A00AC96B/

                                                   Y    95D86C40430C3E85E0533205A00A0EBA/

                                                   Y    CONTROLFILE/

                                                   Y    DATAFILE/

                                                   Y    DATAGUARDCONFIG/

                                                   Y    ONLINELOG/

                                                   Y    PARAMETERFILE/

                                                   Y    PASSWORD/

DATAGUARDCONFIG  UNPROT  COARSE   NOV 03 19:00:00  N    dr1orcl19s.dat => +DATA/ORCL19S/DATAGUARDCONFIG/orcl19s.275.1023298021

PASSWORD         UNPROT  COARSE   NOV 02 15:00:00  N    orapworcl19s => +DATA/ORCL19S/PASSWORD/pwdorcl19s.268.1023290373

PARAMETERFILE    UNPROT  COARSE   NOV 03 22:00:00  N    spfile.ora => +DATA/ORCL19S/PARAMETERFILE/spfile.272.1023290247

ASMCMD>

ASMCMD> pwcopy --dbuniquename orcl19s '/tmp/pwdorcl19.256.1022714087' '+DATA/ORCL19S/orapworcl19s' -f

ASMCMD-9453: failed to register password file as a CRS resource

copying /tmp/pwdorcl19.256.1022714087 -> +DATA/ORCL19S/orapworcl19s

ASMCMD-9453: failed to register password file as a CRS resource

ASMCMD>

ASMCMD>

ASMCMD> ls -l +DATA/ORCL19S/

Type             Redund  Striped  Time             Sys  Name

                                                   Y    95D7C2B2568F0A70E0533205A00AC96B/

                                                   Y    95D86C40430C3E85E0533205A00A0EBA/

                                                   Y    CONTROLFILE/

                                                   Y    DATAFILE/

                                                   Y    DATAGUARDCONFIG/

                                                   Y    ONLINELOG/

                                                   Y    PARAMETERFILE/

                                                   Y    PASSWORD/

DATAGUARDCONFIG  UNPROT  COARSE   NOV 03 19:00:00  N    dr1orcl19s.dat => +DATA/ORCL19S/DATAGUARDCONFIG/orcl19s.275.1023298021

PASSWORD         UNPROT  COARSE   NOV 03 23:00:00  N    orapworcl19s => +DATA/ORCL19S/PASSWORD/pwdorcl19s.268.1023405041

PARAMETERFILE    UNPROT  COARSE   NOV 03 22:00:00  N    spfile.ora => +DATA/ORCL19S/PARAMETERFILE/spfile.272.1023290247

ASMCMD>

ASMCMD> exit

[grid@orcl19s ~]$

As you saw above, the file from primary was copied and overwritten the old.

Bounce everything

After doing this it is time to bounce the primary database to reload the parameters and start the standby database to read password file and new parameters

At Primary ORCL19:

[oracle@orcl19p ~]$ srvctl stop database -d orcl19s -o immediate

[oracle@orcl19p ~]$ srvctl start database -d orcl19

[oracle@orcl19p ~]$

At Standby ORCL19S



[oracle@orcl19s ~]$ srvctl start database -d orcl19s

[oracle@orcl19s ~]$

[oracle@orcl19s ~]$ sqlplus / as sysdba




SQL*Plus: Release 19.0.0.0.0 - Production on Sun Nov 3 23:13:22 2019

Version 19.5.0.0.0




Copyright (c) 1982, 2019, Oracle.  All rights reserved.







Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.5.0.0.0




SQL> col username format a20

SQL> select USERNAME,SYSDBA,SYSOPER,SYSASM,SYSBACKUP,SYSDG,SYSKM from v$pwfile_users;




USERNAME             SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM

-------------------- ----- ----- ----- ----- ----- -----

SYS                  TRUE  TRUE  FALSE FALSE FALSE FALSE

VPCZDLRA             FALSE TRUE  FALSE FALSE FALSE FALSE


SQL>

Checking ZERO RPO

To check if the conf is OK, you can do a simple archivelog at primary and verify if everything was OK for both sites.

At Primary ORCL19:

[oracle@orcl19p ~]$ sqlplus / as sysdba




SQL*Plus: Release 19.0.0.0.0 - Production on Mon Nov 4 00:46:46 2019

Version 19.5.0.0.0




Copyright (c) 1982, 2019, Oracle.  All rights reserved.







Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.5.0.0.0




SQL> archive log list;

Database log mode              Archive Mode

Automatic archival             Enabled

Archive destination            USE_DB_RECOVERY_FILE_DEST

Oldest online log sequence     117

Next log sequence to archive   119

Current log sequence           119

SQL> alter system archive log current;




System altered.




SQL> exit

Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.5.0.0.0

[oracle@orcl19p ~]$ rman target=/ catalog=vpczdlra/oracle@zdlras1




Recovery Manager: Release 19.0.0.0.0 - Production on Mon Nov 4 00:47:04 2019

Version 19.5.0.0.0




Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.




connected to target database: ORCL19 (DBID=324042554)

connected to recovery catalog database

PL/SQL package VPCZDLRA.DBMS_RCVCAT version 19.03.00.00. in RCVCAT database is not current

PL/SQL package VPCZDLRA.DBMS_RCVMAN version 19.03.00.00 in RCVCAT database is not current




RMAN> list backup of archivelog sequence 118;







List of Backup Sets

===================







BS Key  Size       Device Type Elapsed Time Completion Time

------- ---------- ----------- ------------ -------------------

1965    8.00K      SBT_TAPE    00:00:01     04/11/2019 00:45:49

        BP Key: 1966   Status: AVAILABLE  Compressed: YES  Tag: TAG20191104T004548

        Handle: $RSCN_1_RTIM_1022714106_THRD_1_SEQ_118_CTKEY_1946_BACKUP   Media:




  List of Archived Logs in backup set 1965

  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time

  ---- ------- ---------- ------------------- ---------- ---------

  1    118     2137116    04/11/2019 00:44:35 2137128    04/11/2019 00:44:39




RMAN> exit







Recovery Manager complete.

[oracle@orcl19p ~]$

At Standby ORCL19S

[oracle@orcl19s ~]$ rman target=/ catalog=vpczdlra/oracle@zdlras2




Recovery Manager: Release 19.0.0.0.0 - Production on Mon Nov 4 00:48:39 2019

Version 19.5.0.0.0




Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.




connected to target database: ORCL19 (DBID=324042554, not open)

connected to recovery catalog database

PL/SQL package VPCZDLRA.DBMS_RCVCAT version 19.03.00.00. in RCVCAT database is not current

PL/SQL package VPCZDLRA.DBMS_RCVMAN version 19.03.00.00 in RCVCAT database is not current




RMAN> list backup of archivelog sequence 118;







List of Backup Sets

===================







BS Key  Size       Device Type Elapsed Time Completion Time

------- ---------- ----------- ------------ -------------------

2240    8.00K      SBT_TAPE    00:00:00     04/11/2019 00:45:35

        BP Key: 2241   Status: AVAILABLE  Compressed: YES  Tag: TAG20191104T004535

        Handle: $RSCN_1_RTIM_1022714106_THRD_1_SEQ_118_CTKEY_2228_BACKUP   Media:




  List of Archived Logs in backup set 2240

  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time

  ---- ------- ---------- ------------------- ---------- ---------

  1    118     2137116    04/11/2019 00:44:35 2137128    04/11/2019 00:44:39




RMAN> exit







Recovery Manager complete.

[oracle@orcl19s ~]$

So, as you can see above. The ZDLRAS1 received the onlinelog (sequence 118) from ORCL19. And the ZDLRAS2 received a copy of it (standby redo log) from ORCL19S. This means that now you have zero RPO from primary and standby.

Broker config

From 12.1 and new versions, you can have recovery_appliance configured directly from broker when using DG. This executes the configuration for log_archive_dest parameters automatically when you add the ZDLRA.

Basic config

Pretty same config that was made as manual config before.

At Primary ORCL19:

[oracle@orcl19p ~]$ sqlplus / as sysdba




SQL*Plus: Release 19.0.0.0.0 - Production on Sat Nov 9 16:24:27 2019

Version 19.5.0.0.0




Copyright (c) 1982, 2019, Oracle.  All rights reserved.







Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.5.0.0.0




SQL>

SQL> col username format a20

SQL> select USERNAME,SYSDBA,SYSOPER,SYSASM,SYSBACKUP,SYSDG,SYSKM from v$pwfile_users;




USERNAME             SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM

-------------------- ----- ----- ----- ----- ----- -----

SYS                  TRUE  TRUE  FALSE FALSE FALSE FALSE




SQL> show parameter redo_transport_user;




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

redo_transport_user                  string

SQL> alter system set redo_transport_user='VPCZDLRA' scope = spfile SID='*';




System altered.




SQL>

SQL> alter session set "_ORACLE_SCRIPT"=true;




Session altered.




SQL> create user VPCZDLRA identified by oracle;




User created.




SQL> grant sysoper to VPCZDLRA;




Grant succeeded.




SQL> alter session set "_ORACLE_SCRIPT"=false;




Session altered.




SQL> select USERNAME,SYSDBA,SYSOPER,SYSASM,SYSBACKUP,SYSDG,SYSKM from v$pwfile_users;




USERNAME             SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM

-------------------- ----- ----- ----- ----- ----- -----

SYS                  TRUE  TRUE  FALSE FALSE FALSE FALSE

VPCZDLRA             FALSE TRUE  FALSE FALSE FALSE FALSE




SQL> exit

Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.5.0.0.0

[oracle@orcl19p ~]$

At Standby ORCL19S

[oracle@orcl19s ~]$ sqlplus / as sysdba




SQL*Plus: Release 19.0.0.0.0 - Production on Sat Nov 9 16:25:09 2019

Version 19.5.0.0.0




Copyright (c) 1982, 2019, Oracle.  All rights reserved.







Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.5.0.0.0




SQL>

SQL> alter system set redo_transport_user='VPCZDLRA' scope = spfile SID='*';




System altered.




SQL>

SQL> exit

Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.5.0.0.0

[oracle@orcl19s dbs]$

Password file

At Primary ORCL19:

[grid@orcl19p ~]$ asmcmd

ASMCMD> cd +DATA/ORCL19/

ASMCMD> ls -l

Type             Redund  Striped  Time             Sys  Name

                                                   Y    95D7C2B2568F0A70E0533205A00AC96B/

                                                   Y    95D86C40430C3E85E0533205A00A0EBA/

                                                   Y    CONTROLFILE/

                                                   Y    DATAFILE/

                                                   Y    DATAGUARDCONFIG/

                                                   Y    ONLINELOG/

                                                   Y    PARAMETERFILE/

                                                   Y    PASSWORD/

                                                   Y    TEMPFILE/

DATAGUARDCONFIG  UNPROT  COARSE   NOV 03 23:00:00  N    dr1orcl19.dat => +DATA/ORCL19/DATAGUARDCONFIG/ORCL19.275.1023297961

ASMCMD> ls -l PASSWORD/

Type      Redund  Striped  Time             Sys  Name

PASSWORD  UNPROT  COARSE   OCT 26 23:00:00  Y    pwdorcl19.256.1022714087

ASMCMD> ls -l

Type             Redund  Striped  Time             Sys  Name

                                                   Y    95D7C2B2568F0A70E0533205A00AC96B/

                                                   Y    95D86C40430C3E85E0533205A00A0EBA/

                                                   Y    CONTROLFILE/

                                                   Y    DATAFILE/

                                                   Y    DATAGUARDCONFIG/

                                                   Y    ONLINELOG/

                                                   Y    PARAMETERFILE/

                                                   Y    PASSWORD/

                                                   Y    TEMPFILE/

DATAGUARDCONFIG  UNPROT  COARSE   NOV 03 23:00:00  N    dr1orcl19.dat => +DATA/ORCL19/DATAGUARDCONFIG/ORCL19.275.1023297961

ASMCMD>

ASMCMD> cp PASSWORD/pwdorcl19.256.1022714087 /tmp

copying +DATA/ORCL19/PASSWORD/pwdorcl19.256.1022714087 -> /tmp/pwdorcl19.256.1022714087

ASMCMD> exit

[grid@orcl19p ~]$

At Standby ORCL19S

[grid@orcl19p ~]$ asmcmd

ASMCMD> cd +DATA/ORCL19/

ASMCMD> ls -l

Type             Redund  Striped  Time             Sys  Name

                                                   Y    95D7C2B2568F0A70E0533205A00AC96B/

                                                   Y    95D86C40430C3E85E0533205A00A0EBA/

                                                   Y    CONTROLFILE/

                                                   Y    DATAFILE/

                                                   Y    DATAGUARDCONFIG/

                                                   Y    ONLINELOG/

                                                   Y    PARAMETERFILE/

                                                   Y    PASSWORD/

                                                   Y    TEMPFILE/

DATAGUARDCONFIG  UNPROT  COARSE   NOV 03 23:00:00  N    dr1orcl19.dat => +DATA/ORCL19/DATAGUARDCONFIG/ORCL19.275.1023297961

ASMCMD> ls -l PASSWORD/

Type      Redund  Striped  Time             Sys  Name

PASSWORD  UNPROT  COARSE   OCT 26 23:00:00  Y    pwdorcl19.256.1022714087

ASMCMD> ls -l

Type             Redund  Striped  Time             Sys  Name

                                                   Y    95D7C2B2568F0A70E0533205A00AC96B/

                                                   Y    95D86C40430C3E85E0533205A00A0EBA/

                                                   Y    CONTROLFILE/

                                                   Y    DATAFILE/

                                                   Y    DATAGUARDCONFIG/

                                                   Y    ONLINELOG/

                                                   Y    PARAMETERFILE/

                                                   Y    PASSWORD/

                                                   Y    TEMPFILE/

DATAGUARDCONFIG  UNPROT  COARSE   NOV 03 23:00:00  N    dr1orcl19.dat => +DATA/ORCL19/DATAGUARDCONFIG/ORCL19.275.1023297961

ASMCMD>

ASMCMD> cp PASSWORD/pwdorcl19.256.1022714087 /tmp

copying +DATA/ORCL19/PASSWORD/pwdorcl19.256.1022714087 -> /tmp/pwdorcl19.256.1022714087

ASMCMD> exit

[grid@orcl19p ~]$

Wallet Configuration

When using broker configuration, it is recommended to add both credential entries, of ZDLRA from primary and standby, in the wallet from both databases.

At Primary ORCL19:

[oracle@orcl19p ~]$ mkstore -wrl /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet -createCredential zdlras2-scan:1521/zdlras2:VPCZDLRA vpczdlra oracle

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.4.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.




[oracle@orcl19p ~]$

[oracle@orcl19p ~]$ mkstore -wrl /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet -listCredential

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.4.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.




List credential (index: connect_string username)

2: zdlras2-scan:1521/zdlras2:VPCZDLRA vpczdlra

1: zdlras1-scan:1521/zdlras1:VPCZDLRA vpczdlra

[oracle@orcl19p ~]$

At Standby ORCL19S:

[oracle@orcl19s ~]$ mkstore -wrl /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet -createCredential zdlras1-scan:1521/zdlras1:VPCZDLRA vpczdlra oracle

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.4.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.




[oracle@orcl19s ~]$ mkstore -wrl /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/ra_wallet -listCredential

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.4.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.




List credential (index: connect_string username)

2: zdlras1-scan:1521/zdlras1:VPCZDLRA vpczdlra

1: zdlras2-scan:1521/zdlras2:VPCZDLRA vpczdlra

[oracle@orcl19s ~]$

Bounce everything

At Standby ORCL19S:

[oracle@orcl19s ~]$ srvctl stop database -d orcl19s -o immediate

[oracle@orcl19s ~]$

At Primary ORCL19

[oracle@orcl19p ~]$ srvctl stop database -d orcl19 -o immediate

[oracle@orcl19p ~]$ srvctl start database -d orcl19

[oracle@orcl19p ~]$

At Standby ORCL19S

[oracle@orcl19s ~]$ srvctl start database -d orcl19s

[oracle@orcl19s ~]$

Broker config

Now I can add the ZDLRA at config. First, look the config:

[oracle@orcl19p ~]$ dgmgrl sys/oracle@orcl19

DGMGRL for Linux: Release 19.0.0.0.0 - Production on Sat Nov 9 16:26:26 2019

Version 19.5.0.0.0




Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.




Welcome to DGMGRL, type "help" for information.

Connected to "ORCL19"

Connected as SYSDBA.

DGMGRL>

DGMGRL> show configuration verbose;




Configuration - orcl19




  Protection Mode: MaxAvailability

  Members:

  orcl19  - Primary database

    orcl19s - Physical standby database




  Properties:

    FastStartFailoverThreshold      = '30'

    OperationTimeout                = '30'

    TraceLevel                      = 'USER'

    FastStartFailoverLagLimit       = '0'

    CommunicationTimeout            = '180'

    ObserverReconnect               = '0'

    FastStartFailoverAutoReinstate  = 'TRUE'

    FastStartFailoverPmyShutdown    = 'TRUE'

    BystandersFollowRoleChange      = 'ALL'

    ObserverOverride                = 'FALSE'

    ExternalDestination1            = ''

    ExternalDestination2            = ''

    PrimaryLostWriteAction          = 'CONTINUE'

    ConfigurationWideServiceName    = 'orcl19_CFG'




Fast-Start Failover:  Disabled




Configuration Status:

SUCCESS




DGMGRL>

Now, add both ZDLRA entries. Look that connect identifier it is the same as the credential name. This is a key point too.

DGMGRL> add recovery_appliance zdlras1 as connect identifier is 'zdlras1-scan:1521/zdlras1:VPCZDLRA';

Recovery Appliance "zdlras1" added

DGMGRL> add recovery_appliance zdlras2 as connect identifier is 'zdlras2-scan:1521/zdlras2:VPCZDLRA';

Recovery Appliance "zdlras2" added

DGMGRL> show configuration verbose;




Configuration - orcl19




  Protection Mode: MaxAvailability

  Members:

  orcl19  - Primary database

    orcl19s - Physical standby database

      zdlras2 - Recovery appliance (disabled)

        ORA-16905: The member was not enabled yet.




    zdlras1 - Recovery appliance (disabled)

      ORA-16905: The member was not enabled yet.




  Properties:

    FastStartFailoverThreshold      = '30'

    OperationTimeout                = '30'

    TraceLevel                      = 'USER'

    FastStartFailoverLagLimit       = '0'

    CommunicationTimeout            = '180'

    ObserverReconnect               = '0'

    FastStartFailoverAutoReinstate  = 'TRUE'

    FastStartFailoverPmyShutdown    = 'TRUE'

    BystandersFollowRoleChange      = 'ALL'

    ObserverOverride                = 'FALSE'

    ExternalDestination1            = ''

    ExternalDestination2            = ''

    PrimaryLostWriteAction          = 'CONTINUE'

    ConfigurationWideServiceName    = 'orcl19_CFG'




Fast-Start Failover:  Disabled




Configuration Status:

SUCCESS




DGMGRL>

Look that now I have both ZDLRA in the config, but they did not enable yet. Before I enable it, I need to configure the RedoRoutes. If I do not do that, both ZDLRA will be enabled for the primary site as log_archive_dest. This is a key point too.

DGMGRL> edit database orcl19 set property RedoRoutes = '(orcl19 : orcl19s sync, zdlras1 async)( orcl19s : zdlras2 async )';

Property "redoroutes" updated

DGMGRL> edit database orcl19s set property RedoRoutes = '(orcl19s : orcl19 sync, zdlras2 async)( orcl19 : zdlras1 async )';

Property "redoroutes" updated

DGMGRL>

Just to explain what this means:

RedoRoutes for ORCL19: When ORCL19 is the primary database, it will send the redo to ORCL19S in sync mode, and to ZDLRAS1 in async mode. And at the same time, the ORCL19S will send their redo (stand by redo – since it is standby) to ZDLRAS2 in async mode.
RedoRoutes for ORCL19S: When ORCL19S is primary database, it will send the redo to ORCL19 in sync mode, and to ZDLRAS2 in async mode. And at the same time, the ORCL19 will send their redo (stand by redo – since it is standby) to ZDLRAS1 in async mode.

Now, I need to enable ZDLRA’s. I will do one by one:

DGMGRL> show configuration;




Configuration - orcl19




  Protection Mode: MaxAvailability

  Members:

  orcl19  - Primary database

    orcl19s - Physical standby database

      zdlras2 - Recovery appliance (disabled)

        ORA-16905: The member was not enabled yet.




    zdlras1 - Recovery appliance (disabled)

      ORA-16905: The member was not enabled yet.




Fast-Start Failover:  Disabled




Configuration Status:

SUCCESS   (status updated 35 seconds ago)




DGMGRL>

DGMGRL> enable recovery_appliance zdlras1;

Enabled.

DGMGRL>

And at primary (ORCL19) alertlog will show (look the parameter set):

2019-11-09T21:36:24.189975+01:00

Data Guard Broker executes SQL [alter system set log_archive_config='dg_config=(ORCL19,orcl19s,zdlras1)']

2019-11-09T21:36:24.205453+01:00

ALTER SYSTEM SET log_archive_config='dg_config=(ORCL19,orcl19s,zdlras1)' SCOPE=BOTH;

2019-11-09T21:36:36.625928+01:00

ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH;

2019-11-09T21:36:36.638226+01:00

RSM0 (PID:9918): Using STANDBY_ARCHIVE_DEST parameter default value as USE_DB_RECOVERY_FILE_DEST [krsd.c:18157]

2019-11-09T21:36:36.650443+01:00

ALTER SYSTEM SET log_archive_dest_3='service="zdlras1-scan:1521/zdlras1:VPCZDLRA"','ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 reopen=300 db_unique_name="zdlras1" net_timeout=30','valid_for=(online_logfile,all_roles)' SCOPE=BOTH;

2019-11-09T21:36:36.670412+01:00

ALTER SYSTEM SET log_archive_dest_state_3='ENABLE' SCOPE=BOTH;

2019-11-09T21:36:36.670547+01:00

ALTER SYSTEM ARCHIVE LOG

2019-11-09T21:36:36.683698+01:00

Thread 1 cannot allocate new log, sequence 242

Checkpoint not complete

  Current log# 3 seq# 241 mem# 0: +DATA/ORCL19/ONLINELOG/group_3.260.1022714131

  Current log# 3 seq# 241 mem# 1: +RECO/ORCL19/ONLINELOG/group_3.259.1022714135

2019-11-09T21:36:38.922616+01:00

LGWR (PID:7900): SRL selected to archive T-1.S-242

LGWR (PID:7900): SRL selected for T-1.S-242 for LAD:2

2019-11-09T21:36:38.934895+01:00

Thread 1 advanced to log sequence 242 (LGWR switch)

  Current log# 1 seq# 242 mem# 0: +DATA/ORCL19/ONLINELOG/group_1.258.1022714111

  Current log# 1 seq# 242 mem# 1: +RECO/ORCL19/ONLINELOG/group_1.257.1022714115

2019-11-09T21:36:38.974260+01:00

RSM0 (PID:9918): Archived Log entry 588 added for T-1.S-241 ID 0x135c2429 LAD:1




And in alertlog from standby (ORCL19S):

2019-11-09T21:36:23.036628+01:00

Data Guard Broker executes SQL [alter system set log_archive_config='dg_config=(orcl19s,orcl19,zdlras1)']

2019-11-09T21:36:23.051110+01:00

ALTER SYSTEM SET log_archive_config='dg_config=(orcl19s,orcl19,zdlras1)' SCOPE=BOTH;

2019-11-09T21:36:38.755320+01:00

 rfs (PID:22533): Standby controlfile consistent with primary

2019-11-09T21:36:38.770320+01:00

 rfs (PID:22533): Selected LNO:4 for T-1.S-242 dbid 324042554 branch 1022714106

2019-11-09T21:36:38.773001+01:00

MRP0 (PID:11871): Media Recovery Waiting for T-1.S-242 (in transit)

2019-11-09T21:36:38.776175+01:00

ARC0 (PID:9009): Archived Log entry 304 added for T-1.S-241 ID 0x135c2429 LAD:1

2019-11-09T21:36:39.836191+01:00

Recovery of Online Redo Log: Thread 1 Group 4 Seq 242 Reading mem 0

  Mem# 0: +DATA/ORCL19S/ONLINELOG/group_4.279.1023289067

  Mem# 1: +RECO/ORCL19S/ONLINELOG/group_4.269.1023289069

And now, the second ZDLRA:

DGMGRL> enable recovery_appliance zdlras2;

Enabled.

DGMGRL>

Again, at primary (ORCL19) alertlog:

2019-11-09T21:37:32.341087+01:00

Data Guard Broker executes SQL [alter system set log_archive_config='dg_config=(ORCL19,orcl19s,zdlras1,zdlras2)']

2019-11-09T21:37:32.353276+01:00

ALTER SYSTEM SET log_archive_config='dg_config=(ORCL19,orcl19s,zdlras1,zdlras2)' SCOPE=BOTH;

2019-11-09T21:37:40.827841+01:00

ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=MEMORY SID='*';

2019-11-09T21:37:41.943026+01:00

LGWR (PID:7900): SRL selected to archive T-1.S-243

LGWR (PID:7900): SRL selected for T-1.S-243 for LAD:2

2019-11-09T21:37:41.949207+01:00

Thread 1 advanced to log sequence 243 (LGWR switch)

  Current log# 2 seq# 243 mem# 0: +DATA/ORCL19/ONLINELOG/group_2.259.1022714121

  Current log# 2 seq# 243 mem# 1: +RECO/ORCL19/ONLINELOG/group_2.258.1022714125

2019-11-09T21:37:41.981177+01:00

ARC1 (PID:10098): Archived Log entry 591 added for T-1.S-242 ID 0x135c2429 LAD:1

And in the alertlog from stabdby (ORCL19S):

2019-11-09T21:37:31.175857+01:00

Data Guard Broker executes SQL [alter system set log_archive_config='dg_config=(orcl19s,orcl19,zdlras1,zdlras2)']

2019-11-09T21:37:31.193280+01:00

ALTER SYSTEM SET log_archive_config='dg_config=(orcl19s,orcl19,zdlras1,zdlras2)' SCOPE=BOTH;

2019-11-09T21:37:37.424268+01:00

RSM0 (PID:10393): Using STANDBY_ARCHIVE_DEST parameter default value as USE_DB_RECOVERY_FILE_DEST [krsd.c:18157]

2019-11-09T21:37:37.438735+01:00

ALTER SYSTEM SET log_archive_dest_2='service="zdlras2-scan:1521/zdlras2:VPCZDLRA"','ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 reopen=300 db_unique_name="zdlras2" net_timeout=30','valid_for=(standby_logfile,all_roles)' SCOPE=BOTH;

2019-11-09T21:37:37.476545+01:00

ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH;

2019-11-09T21:37:41.778391+01:00

MRP0 (PID:11871): Media Recovery Waiting for T-1.S-243 (in transit)

2019-11-09T21:37:41.791532+01:00

ARC0 (PID:9009): Archived Log entry 312 added for T-1.S-242 ID 0x135c2429 LAD:1

2019-11-09T21:37:41.793625+01:00

 rfs (PID:22533): Standby controlfile consistent with primary

2019-11-09T21:37:41.800377+01:00

 rfs (PID:22533): Selected LNO:5 for T-1.S-243 dbid 324042554 branch 1022714106

2019-11-09T21:37:42.826630+01:00

Recovery of Online Redo Log: Thread 1 Group 5 Seq 243 Reading mem 0

  Mem# 0: +DATA/ORCL19S/ONLINELOG/group_5.278.1023289071

  Mem# 1: +RECO/ORCL19S/ONLINELOG/group_5.268.1023289073

And now the conf reports:

DGMGRL> show configuration;




Configuration - orcl19




  Protection Mode: MaxAvailability

  Members:

  orcl19  - Primary database

    orcl19s - Physical standby database

      zdlras2 - Recovery appliance (receiving current redo)

    zdlras1 - Recovery appliance




Fast-Start Failover:  Disabled




Configuration Status:

SUCCESS   (status updated 54 seconds ago)




DGMGRL>

As you can see, the primary is cascading the redo to ORCL19S and ZDLRAS1. And the ORCL19S is cascading the redo (standby redo) to ZDLRAS2.

Checking ZERO RPO

To check if everything is fine, and we have zero RPO in both sited I made a simple archivelog at the primary site and check the cascade of events.

At Primary ORCL19:

[oracle@orcl19p ~]$ rman target=/ catalog=vpczdlra/oracle@zdlras1




Recovery Manager: Release 19.0.0.0.0 - Production on Sat Nov 9 21:43:25 2019

Version 19.5.0.0.0




Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.




connected to target database: ORCL19 (DBID=324042554)

connected to recovery catalog database

PL/SQL package VPCZDLRA.DBMS_RCVCAT version 19.03.00.00. in RCVCAT database is not current

PL/SQL package VPCZDLRA.DBMS_RCVMAN version 19.03.00.00 in RCVCAT database is not current




RMAN> alter system archive log current;




Statement processed




RMAN> list copy of archivelog all completed after "sysdate - 5/1440";




List of Archived Log Copies for database with db_unique_name ORCL19

=====================================================================




Key     Thrd Seq     S Low Time

------- ---- ------- - -------------------

3296    1    243     A 09/11/2019 21:37:41

        Name: +RECO/ORCL19/ARCHIVELOG/2019_11_09/thread_1_seq_243.389.1023918423







RMAN> list backup of archivelog all completed after "sysdate - 5/1440";







List of Backup Sets

===================







BS Key  Size       Device Type Elapsed Time Completion Time

------- ---------- ----------- ------------ -------------------

3304    322.00K    SBT_TAPE    00:00:02     09/11/2019 21:48:02

        BP Key: 3305   Status: AVAILABLE  Compressed: YES  Tag: TAG20191109T214800

        Handle: $RSCN_1_RTIM_1022714106_THRD_1_SEQ_243_CTKEY_3206_BACKUP   Media:




  List of Archived Logs in backup set 3304

  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time

  ---- ------- ---------- ------------------- ---------- ---------

  1    243     3082409    09/11/2019 21:37:41 3083705    09/11/2019 21:47:03




RMAN>

At Standby ORCL19S

[oracle@orcl19s ~]$ rman target=/ catalog=vpczdlra/oracle@zdlras2




Recovery Manager: Release 19.0.0.0.0 - Production on Sat Nov 9 21:44:58 2019

Version 19.5.0.0.0




Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.




connected to target database: ORCL19 (DBID=324042554, not open)

connected to recovery catalog database

PL/SQL package VPCZDLRA.DBMS_RCVCAT version 19.03.00.00. in RCVCAT database is not current

PL/SQL package VPCZDLRA.DBMS_RCVMAN version 19.03.00.00 in RCVCAT database is not current




RMAN> list copy of archivelog all completed after "sysdate - 5/1440";




List of Archived Log Copies for database with db_unique_name ORCL19S

=====================================================================




Key     Thrd Seq     S Low Time

------- ---- ------- - -------------------

3428    1    243     A 09/11/2019 21:37:41

        Name: +RECO/ORCL19S/ARCHIVELOG/2019_11_09/thread_1_seq_243.371.1023918423







RMAN> list backup of archivelog all completed after "sysdate - 5/1440";







List of Backup Sets

===================







BS Key  Size       Device Type Elapsed Time Completion Time

------- ---------- ----------- ------------ -------------------

3419    322.00K    SBT_TAPE    00:00:00     09/11/2019 21:47:10

        BP Key: 3420   Status: AVAILABLE  Compressed: YES  Tag: TAG20191109T214710

        Handle: $RSCN_1_RTIM_1022714106_THRD_1_SEQ_243_CTKEY_3306_BACKUP   Media:




  List of Archived Logs in backup set 3419

  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time

  ---- ------- ---------- ------------------- ---------- ---------

  1    243     3082409    09/11/2019 21:37:41 3083705    09/11/2019 21:47:03




RMAN>

As you can see here, the sequence 243 was created at primary, sent by DG to ORCL19S. And ZDLRAS1 created the backup for archivelog automatically, and the same was made by ZDLRAS2.

Multiple levels of protection

The idea of this post was to show how to protect a big environment with ZDLRA, reaching ZERO RPO for both sites, and going beyond the protection defined by Oracle MAA docs. If you look closely, the environment that I showed here has zero RPO and zero RTO. You will lose data only if you have a catastrophic failure in both sites (but you can protect even more if you use tape backup offloads from ZDLRA).

As told before, these critical/extreme critical environments need to follow some strict rules for data protection. Adding ZDLRA for it, provide an additional level of protection and allow you to use some features like incremental backup strategy.

« 1 … 12 13 14 15 16 … 32 »

Exadata, Understanding Metrics

My last article about Exadata Storage Server metrics was about one example of how to use them to identify problems that do not appear in the database side. In that post, I showed how I used the metric DB_FC_IO_BY_SEC to identify bad queries.

In this post, I will not explain each metric one by one, but guide you to understand metrics and some interesting and important details about them.

Understanding metrics

Metrics for Exadata are values extract directly from hardware or directly from storage server software. Values from “IOPS from each disk”, or “MB/s read by SMARTIO” are an example of what you can discover. Directly from the docs:

To check the definition for Exadata metrics, and all metrics available the best place it the official Exadata User Guide, chapter 6. You can see the definition for all metrics and other important information. I really recommend that you read it to be aware of what you can extract from the metrics.

Reading metrics

To read metrics you can connect directly in the storage server and with the cellcli use the “list metriccurrent” or “list metrichistory” commands to read it:

Since it is based in the list command you can detail it, restrict with where, or change the attributes to display it:

You can query the metric for each one of the attributes. Like all metrics for IORM or all metrics for that have FC in the name. If you want to query values in the past, you need to use list metrichistory:

Metric types

There are three types of metrics: Instantaneous (value reflect the moment when was read), Rate (values computed based in the period of time), Cumulative (values since you started storage server from the last time).

All the metrics type Rate, usually are expressed by second. This means that Exadata counted the values from the last minute and divided it by seconds. So, the *_SEC means the average based at the last minute.

One important detail is that some have the “small” and “large” metrics. This means that if your request from the database needs more than 128KB the values are marked as large *LG*, otherwise, as small *SM*.

Using metrics

Since every storage computes each metric in the separate way you need to query each one to have the big picture. But this does not mean that some metrics need to be analyzed globally, instead of per each server. I usually divide Exadata metrics in two ways, Isolated and Per Database.

Isolated Metrics

I consider isolated metrics that are important to check for each server. They express values that are important to check isolated per each storage server. Some metrics that I like to check isolated:

CL_CPUT: The cell CPU utilization.

CL_MEMUT: The percentage of total physical memory used.

N_HCA_MB_RCV_SEC: The number of megabytes received by the InfiniBand interfaces per second

N_HCA_MB_TRANS_SEC: The number of megabytes transmitted by the InfiniBand interfaces per second.

CD_IO_BY_R_LG_SEC: The rate which is the number of megabytes read in large blocks per second from a cell disk.

CD_IO_BY_R_SM_SEC: The rate which is the number of megabytes read in small blocks per second from a cell disk.

CD_IO_BY_W_LG_SEC: The rate which is the number of megabytes written in large blocks per second on a cell disk.

CD_IO_BY_W_SM_SEC: The rate which is the number of megabytes written in small blocks per second on a cell disk.

CD_IO_RQ_R_LG_SEC: The rate which is the number of requests to read large blocks per second from a cell disk.

CD_IO_RQ_R_SM_SEC: The rate which is the number of requests to read small blocks per second from a cell disk.

CD_IO_RQ_W_LG_SEC: The rate which is the number of requests to write large blocks per second to a cell disk.

CD_IO_RQ_W_SM_SEC: The rate which is the number of requests to write small blocks per second to a cell disk.

CD_IO_TM_R_LG_RQ: The rate which is the average latency of reading large blocks per request to a cell disk.

CD_IO_TM_R_SM_RQ: The rate which is the average latency of reading small blocks per request from a cell disk.

CD_IO_TM_W_LG_RQ: The rate which is the average latency of writing large blocks per request to a cell disk.

CD_IO_TM_W_SM_RQ: The rate which is the average latency of writing small blocks per request to a cell disk.

CD_IO_UTIL: The percentage of device utilization for the cell disk.

FC_BY_ALLOCATED: The number of megabytes allocated in flash cache.

FC_BY_USED: The number of megabytes used in flash cache.

FC_BY_DIRTY: The number of megabytes in flash cache that are not synchronized to the grid disks.

FC_IO_BY_R_SEC: The number of megabytes read per second from flash cache.

FC_IO_BY_R_SKIP_SEC: The number of megabytes read from disks per second for I/O requests that bypass flash cache. Read I/O requests that bypass flash cache go directly to disks. These requests do not populate flash cache after reading the requested data.

FC_IO_BY_R_MISS_SEC: The number of megabytes read from disks per second because not all requested data was in flash cache.

FC_IO_BY_W_SEC: The number of megabytes per second written to flash cache.

FC_IO_BY_W_SKIP_SEC: The number of megabytes written to disk per second for I/O requests that bypass flash cache.

FC_IO_RQ_R_SEC: The number of read I/O requests satisfied per second from flash cache.

FC_IO_RQ_W_SEC: The number of I/O requests per second which resulted in flash cache being populated with data.

FC_IO_RQ_R_SKIP_SEC: The number of read I/O requests per second that bypass flash cache. Read I/O requests that bypass flash cache go directly to disks

FC_IO_RQ_W_SKIP_SEC: The number of write I/O requests per second that bypass flash cache

FL_IO_DB_BY_W_SEC: The number of megabytes written per second were written to hard disk by Exadata Smart Flash Log

FL_IO_FL_BY_W_SEC: The number of megabytes written per second were written to flash by Exadata Smart Flash Log.

FL_IO_TM_W_RQ: Average redo log write latency. It includes write I/O latency only.

FL_RQ_TM_W_RQ: Average redo log write request latency.

FL_IO_W_SKIP_BUSY: The number of redo writes during the last minute that could not be serviced by Exadata Smart Flash Log.

N_MB_RECEIVED_SEC: The rate which is the number of megabytes received per second from a particular host.

N_MB_SENT_SEC: The rate which is the number of megabytes received per second from a particular host.

SIO_IO_EL_OF_SEC: The number of megabytes per second eligible for offload by smart I/O.

SIO_IO_OF_RE_SEC: The number of interconnect megabytes per second returned by smart I/O.

SIO_IO_RD_FC_SEC: The number of megabytes per second read from flash cache by smart I/O.

SIO_IO_RD_HD_SEC: The number of megabytes per second read from hard disk by smart I/O.

SIO_IO_RD_RQ_FC_SEC: The number of read I/O requests per second from flash cache by smart I/O.

SIO_IO_RD_FC_HD_SEC: The number of read I/O requests per second from hard disk by smart I/O.

SIO_IO_WR_FC_SEC: The number of megabytes per second of flash cache population writes by smart I/O.

SIO_IO_SI_SV_SEC: The number of megabytes per second saved by the storage index.

With these metrics you can discover, by example, in each server:

Percentage of CPU and memory utilization.

GB/s sent by Smart I/O to one database server. And if you compare with the ingest that came from the database you can see percentage safe by smartio.

The number of flashcache reads that was redirected to disk because the data was not there. Here if you see increase the value tread, maybe you have fixed (or have a lot of data) in flash cache and your queries are demanding other tables.

From smartscan you can check the MB/s read from flashcache or disk to offload your query. Or even check MB/s that was saved by storage index.

Database Metrics

In the list, I will put just for database, but you have the same for PDB, Consumer Groups, and from Categories. Remember that for IORM the hierarchy is first Categories and after Databases when creating the plans.

DB_FC_BY_ALLOCATED: The number of megabytes allocated in flash cache for this database.

DB_FC_IO_BY_SEC: The number of megabytes of I/O per second for this database to flash cache.

DB_FC_IO_RQ_LG_SEC: The number of large I/O requests issued by a database to flash cache per second.

DB_FC_IO_RQ_SM_SEC: The number of small I/O requests issued by a database to flash cache per second.

DB_FL_IO_BY_SEC: The number of megabytes written per second to Exadata Smart Flash Log.

DB_FL_IO_RQ_SEC: The number of I/O requests per second issued to Exadata Smart Flash Log.

DB_IO_BY_SEC: The number of megabytes of I/O per second for this database to hard disks.

DB_IO_LOAD: The average I/O load from this database for hard disks. For a description of I/O load, see CD_IO_LOAD.

DB_IO_RQ_LG_SEC: The rate of large I/O requests issued to hard disks by a database per second over the past minute.

DB_IO_RQ_SM_SEC: The rate of small I/O requests issued to hard disks by a database per second over the past minute.

DB_IO_TM_LG_RQ: The rate which is the average latency of reading or writing large blocks per request by a database from hard disks.

All the metrics type Rate, usually are expressed by second. This means that Exadata counted the values from the last minute and divided it by seconds. **So, the *_SEC means the average based at the last minute.**

One important detail is that some have the “small” and “large” metrics. This means that if your request from the database needs more than 128KB the values are marked as large LG, otherwise, as small SM.