Oracle Voting Disk

What is a Voting Disk ?
  • Voting disk is a file in which all the nodes of a cluster register their heartbeat information.
  • It also contains the list of the active nodes.
  • Voting disks or files are like attendance registers where nodes mark their attendance (heartbeats) to confirm they are alive.
  • If the disk is not updated in a short timeout period, the node is considered unhealthy and may be rebooted to protect the database.
  • If you do not configure voting disks on Oracle ASM, then for high availability, Oracle recommends that you have a minimum of three voting disks on physically separate storage.This avoids having a single point of failure. If you configure a single voting disk, then you must use external mirroring to provide redundancy.
  • No. of voting disks depend on the type of redundancy. From 11.2.0.x onwards OCR and voting files are placed in the ASM diskgroup.

  • External redundancy = 1 Voting disk
    Normal redundancy = 3 Voting disks
    High redundancy = 5 Voting disks

    You can have up to 32 voting disks in your cluster
    Oracle recommends that you configure multiple voting disks during Oracle Clusterware installation to improve availability. If you choose to put the voting disks into an Oracle ASM disk group, then Oracle ASM ensures the configuration of multiple voting disks if you use a normal or high redundancy disk group.

    Example :
  • 3 node clusters (Node1, Node2, Node3) are present in cluster ware.
  • DB, Vote Disk and OCR Disk are present on Shared Storage.
  • All three Nodes will register their heartbeats to voting disk.

  • votingdisk-1

    To identify the status and voting disk location :
        [oracle@rac1 ~]$ crsctl query css votedisk
    
        ## STATE File Universal Id File Name Disk group
        -- ----- ----------------- --------- ---------
        1. ONLINE 
        b4a7f383bb414f7ebf6aaae7c3873401(/dev/oracleasm/disks/ASMDISK1) [DATA]
        Located 1 voting disk(s).
    
    
    Replace a voting disk :
        [oracle@rac1 ~]$ crsctl replace votedisk +DATA1
        Successful addition of voting disk 9789b4bf42214f8bbf14fda587ba331a.
        Successful deletion of voting disk b4a7f383bb414f7ebf6aaae7c3873401.
        Successfully replaced voting disk group with +DATA1.
        CRS-4266: Voting file(s) successfully replaced
    
    
    Check the status and verify voting disk location :
        [oracle@rac1 ~]$ crsctl query css votedisk
    
        ## STATE File Universal Id File Name Disk group
        -- ----- ----------------- --------- ---------
        1. ONLINE 9789b4bf42214f8bbf14fda587ba331a (/dev/oracleasm/disks/ASMDISK2) [DATA1]
        Located 1 voting disk(s).
    
    
    Why should we have ODD number of voting disk ?
    A node must be able to access more than half of the voting disks at any time.

    Scenario :
    Let us consider 2 node clusters with even number of voting disks say 2.
  • Let node 1 is able to access voting disk 1.
  • Node 2 is able to access voting disk 2.
  • From the above steps, we see that we don’t have any common file where clusterware can check the heartbeat of both the nodes.
  • If we have 3 voting disks and both the nodes are able to access more than half ie., 2 voting disks, there will be atleast one disk which will be accessed by both the nodes. The clusterware can use this disk to check the heartbeat of the nodes.
  • A node not able to do so will be evicted from the cluster by another node that has more than half the voting disks to maintain the integrity of the cluster.

  • Recover the corrupted voting disk :
        ASMCMD> lsdsk -G DATA1
        Path
        /dev/oracleasm/disks/ASMDISK2
    
    
    As a root user :
        #dd if=/dev/zero of=/dev/oracleasm/disks/ASMDISK2 bs=4096 count=1000000
    
    
    After reboot both the nodes,check the clusterware status :
        [oracle@rac1 ~]$ crsctl check crs
        CRS-4638: Oracle High Availability Services is online
        CRS-4535: Cannot communicate with Cluster Ready Services
        CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
        CRS-4534: Cannot communicate with Event Manager
        Since voting disk can’t be restored back to DATA1 diskgroup as disk in DATA1 has been corrupted
    
    
    Stop the CRS forcefully in both the nodes and check the clusterware status :
        [root@rac1 bin]# ./crsctl stop crs -f
        CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
        CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rac1'
        CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
        CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
        CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1'
        CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
        CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
        CRS-2677: Stop of 'ora.cssdmonitor' on 'rac1' succeeded
        CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded
        CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
        CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
        CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
        CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
        CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
        CRS-4133: Oracle High Availability Services has been stopped.
        Start the CRS in exclusive mode in any nodes:
        [root@rac1 bin]# ./crsctl start crs -excl
        CRS-4123: Oracle High Availability Services has been started.
        CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
        CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
        CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
        CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded
        CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
        CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
        CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
        CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
        CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
        CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
        CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
        CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
        CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
        CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
        CRS-2672: Attempting to start 'ora.crf' on 'rac1'
        CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
        CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
        CRS-2676: Start of 'ora.crf' on 'rac1' succeeded
        CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
        CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
        CRS-2679: Attempting to clean 'ora.asm' on 'rac1'
        CRS-2681: Clean of 'ora.asm' on 'rac1' succeeded
        CRS-2672: Attempting to start 'ora.asm' on 'rac1'
        CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
        CRS-2672: Attempting to start 'ora.storage' on 'rac1'
        CRS-2676: Start of 'ora.storage' on 'rac1' succeeded
        CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
        CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
    
    
    After CRS exclusive startup,check the clusterware status :
        [root@rac1 bin]# ./crsctl check crs
        CRS-4638: Oracle High Availability Services is online
        CRS-4692: Cluster Ready Services is online in exclusive mode
        CRS-4529: Cluster Synchronization Services is online
        CRS-4533: Event Manager is online
    
    
    Recreate the ASM diskgroups using ASMCA where voting disk is placed before named as ‘DATA1’ :
        ASMCMD> lsdg
        State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
        MOUNTED EXTERN N 512 4096 1048576 30718 20165 0 20165 0 N DATA/
        MOUNTED EXTERN N 512 4096 1048576 10236 10183 0 10183 0 N DATA1/
    
    
    Check the voting disk location :
        [oracle@rac1 ~]$ crsctl query css votedisk
        Located 0 voting disk(s).
    
    
    Replace the voting disk :
        [oracle@rac1 ~]$ crsctl replace votedisk +DATA1
        Successful addition of voting disk 5a1ef50fe3354f35bfa7f86a6ccb8990.
        Successfully replaced voting disk group with +DATA1.
        CRS-4266: Voting file(s) successfully replaced
    
        [oracle@rac1 ~]$ crsctl query css votedisk
        ## STATE File Universal Id File Name Disk group
        -- ----- ----------------- --------- ---------
        1. ONLINE 5a1ef50fe3354f35bfa7f86a6ccb8990 (/dev/oracleasm/disks/ASMDISK2) [DATA1]
        Located 1 voting disk(s).
    
    
    Stop the CRS running in exclusive mode :
        # crsctl stop crs
    
    
    Start the CRS(clusterware) in all nodes :
        # crsctl start crs
    
    
    Check the clusterware status of both nodes :
        [root@rac1 bin]# ./crsctl check cluster -all
        **************************************************************
        rac1:
        CRS-4535: Cluster Ready Services is online
        CRS-4529: Cluster Synchronization Services is online
        CRS-4533: Event Manager is online
        **************************************************************
        rac2:
        CRS-4535: Cluster Ready Services is online
        CRS-4529: Cluster Synchronization Services is online
        CRS-4533: Event Manager is online
        **************************************************************
    


    (Oracle Rac - Oracle Olsnodes)