OBSERVERS, MORE THAN ONE

Recently I made a post about a little issue that I got with Oracle Dataguard. In that scenario, because outage in the standby datacenter, healthy primary database shutdown with error “ORA-16830: primary isolated…”. Just to remember that the database was running with Maximum Availability, Fail-start Failover enabled and (the most important detail) the Observer was running in the standby datacenter too.
The point from my post (that you can read here) tried to show that does not exists one doc that provide full details about “pros” and “cons” where put your observer. Whatever place, on primary datacenter or in standby, have little details to check. Even the best (ideal) scenario with a third datacenter can be tough to sustain.
Here I will try to show one option that can help you and improve the reliability of your environment. At least, you will have more options to decide how to protect your database. Bellow I show some details about how to configure and use multiple observers, but if you want to see a little concern about this you can directly to the end of the post.

 

 

More than one

Basically, to do that, you can add more than one observer to protect your DG environment. It is simple to configure, and you can use this since 12.2 and have at least three of them. To configure you just need to do (in the simplest way):
  1. Install the default Oracle Client infrastructure.
  2. Add TNS entry for/to both sides.
  3. Open the DGMGRL.
  4. Call “start observer” command.
Check how easy it is:

 

[oracle@dbobss ~]$ dgmgrl sys/oracle@orcls

DGMGRL for Linux: Release 12.2.0.1.0 - Production on Sun May 5 16:30:58 2019




Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.




Welcome to DGMGRL, type "help" for information.

Connected to "orcls"

Connected as SYSDBA.

DGMGRL> start observer

[W000 05/05 16:31:40.34] FSFO target standby is orcls

[W000 05/05 16:31:42.53] Observer trace level is set to USER

[W000 05/05 16:31:42.53] Try to connect to the primary.

[W000 05/05 16:31:42.53] Try to connect to the primary orcl.

[W000 05/05 16:31:42.54] The standby orcls is ready to be a FSFO target

[W000 05/05 16:31:42.54] Reconnect interval expired, create new connection to primary database.

[W000 05/05 16:31:42.54] Try to connect to the primary.

[W000 05/05 16:31:43.68] Connection to the primary restored!

[W000 05/05 16:31:44.68] Disconnecting from database orcl.

 

When using multiple observers you can have at least 3 observers at same time. Exists only one master observer and it is responsible for fast-start failover and protect the system. If you lost the master observer the Broker/Primary/Standby decide which one will be the next master observer. Until the 19c version they not work in quorum (or something like this using a voting system to decide the role switch) to protect the DG.
The interesting part about multiple observer it is that provide to you another way to customize your environment. Remember in my first post I reported the complexity (bases in pros and con) to choose the better place to put the observer. Now with multiple observers, you can put one in each data center and switch between then when you want to protect one side or another.
Now, my example environment it is two databases, three observers:

 

Check that I have one in each datacenter and one in external place. And inside of broker you can see:

 

DGMGRL> show observer




Configuration - dgconfig




  Primary:            orcl

  Target:             orcls




Observer "dbobss" - Master




  Host Name:                    dbobss

  Last Ping to Primary:         1 second ago

  Last Ping to Target:          1 second ago




Observer "dbobsp" - Backup




  Host Name:                    dbobsp

  Last Ping to Primary:         1 second ago

  Last Ping to Target:          0 seconds ago




Observer "dbobst" - Backup




  Host Name:                    dbobst

  Last Ping to Primary:         2 seconds ago

  Last Ping to Target:          2 seconds ago


 

In case of failure Broker/Primary/Standby decides which one will be the next master observer. The time to decides that occurs after 30 seconds and need to be coordinated/communicated and the agreement from both, primary and standby. Unfortunately, there is no way to reduce this time/check from 30 seconds.
In my environment, I shutdown the machine running the master observer (dbobss) and the log from broker (in primary):

 

05/05/2019 17:15:34

FSFP: FSFO SetState(st=43 "SET SWOB INPRG", fl=0x0 "", ob=0x0, tgt=0, v=0)

Data Guard Broker initiated a master observer switch since the current master observer cannot reach the primary database

FSFP: FSFO SetState(st=12 "SET OBID", fl=0x0 "", ob=0x32cc2ad6, tgt=0, v=0)

Succeeded in switching master observer from observer 'dbobss' to 'dbobsp'

FSFP: FSFO SetState(st=44 "CLR SWOB INPRG", fl=0x0 "", ob=0x0, tgt=0, v=0)

FSFP: FSFO SetState(st=16 "UNOBSERVED", fl=0x0 "", ob=0x0, tgt=0, v=0)

Master observer begins pinging this instance

Fore: FSFO SetState(st=15 "OBSERVED", fl=0x0 "", ob=0x0, tgt=0, v=0)

 

And in the broker log for standby:

 

05/05/2019 17:15:34

drcx: FSFO SetState(st=16 "UNOBSERVED", fl=0x0 "", ob=0x0, tgt=0, v=0)

drcx: FSFO SetState(st=43 "SET SWOB INPRG", fl=0x0 "", ob=0x0, tgt=0, v=0)

05/05/2019 17:15:37

drcx: FSFO SetState(st=15 "OBSERVED", fl=0x0 "", ob=0x0, tgt=0, v=0)

drcx: FSFO SetState(st=44 "CLR SWOB INPRG", fl=0x0 "", ob=0x0, tgt=0, v=0)

drcx: FSFO SetState(st=12 "SET OBID", fl=0x0 "", ob=0x32cc2ad6, tgt=0, v=0)

05/05/2019 17:15:39

Master observer begins pinging this instance

 

Look in the logs that both (primary and standby) agreed with the change. After the failure you saw the events SET SWOB INPRG (switch observer in progress) and SET OBID (set observer ID) and CLR SWOB INPRG (clear switch observer in progress) to confirm that was detect UNOBSERVED state. You can see here the output when you use the trace level for broker as support. Interesting note that inside broker the faulty observer does not disappears after the failure:

 

DGMGRL> show observer




Configuration - dgconfig




  Primary:            orcl

  Target:             orcls




Observer "dbobsp" - Master




  Host Name:                    dbobsp

  Last Ping to Primary:         1 second ago

  Last Ping to Target:          3 seconds ago




Observer "dbobss" - Backup




  Host Name:                    dbobss

  Last Ping to Primary:         256 seconds ago

  Last Ping to Target:          221 seconds ago




Observer "dbobst" - Backup




  Host Name:                    dbobst

  Last Ping to Primary:         1 second ago

  Last Ping to Target:          5 seconds ago




DGMGRL>

 

After you reinstate you observer and it go back, you can simple set the master observer to the desired one:

 

DGMGRL> set masterobserver to dbobss;

Sent the proposed master observer to the data guard broker configuration. Please run SHOW OBSERVER to see if master observer switch actually happens.

DGMGRL>

 

Hierarchy

When you use multiple observers you can have more control how to protect your DG, you can have one observer in each site and choose the side that you want to protect. You can write one script to check the database role in the observer side and change the master to protect the desired database role.
Remember my previous post. If you choose to protect the primary (with observer in the same datacenter), if your entire datacenter fails, FSFO not occurs because standby does not decide alone. If you choose to protect the standby (with observer in the same datacenter), a datacenter/network failure in standby side, this can lead you a complete shutdown from a healthy primary database because it become “isolated”.
Since multiple observers continues to use hierarchy decision, the decision remains over only one observer. Even if you have a multiple observers, 3 as example and one in each side, if you put the master observer in the same site than standby and they become isolated, they still decide alone and because FSFO the primary continues to shutdown because it thinks that it is isolated. Even if it continues to receive connections from other two observers.
Because the actual design, even if you put the “FastStartFailoverThreshold” as 240, the automatic switch from Master observer does not occurs because the standby side cannot be reach to confirm the change. Maybe for the next versions (20, 21…) we can see a change in this design and when you use multiple observers voting/quorum method are used to decide role change for FSFO. Of course that even a quorum approach can lead a problem if you put two in the same datacenter, but it can mitigate problems in some cases.
In my next post I will dig more about this, with some examples and logs/traces analyses. You will see some details when the standby is isolated and you use multiple observers.

 

 

Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community.”