Oracle Rac - Architecture

Oracle Real Application clusters allows multiple instances to access a single database, the instances will be running on multiple nodes.

In a standard Oracle configuration, a database can only be mounted by one instance but in a RAC environment many instances can access a single database

It’s an option to the Oracle Database that provides High Availability (HA) and Scalability to the Oracle Database without requiring any application changes.

Oracle's RAC is heavily dependent on an efficient,

high reliable high speed private network called the interconnect, make sure when designing a RAC system that you get the best that you can afford

Benefits :

Your application is more scalable ; if you need more power, just add a new node.

You can also reduce the total cost of ownership for the infrastructure by providing a scalable system using low-cost commodity hardware

In case of a problem, you have the ability to fail over from one node to another

You can increase throughput on demand for cluster-aware applications. One more time, increase cluster resources by adding servers to your cluster

Increase throughput for cluster-aware applications by enabling the applications to run on all of the nodes in a cluster or just in a selection of nodes

You can easily program the startup of applications in a planned order. In that way you ensure dependent processes are started in the correct sequence

Ability to monitor processes and restart them if they stop

With the RAC architecture, you eliminate your Single Point of Failure (SPOF) and unplanned downtime due to hardware or software malfunctions

The difference of a standard oracle database (single instance) a RAC environment


    Component	                            RAC Environment         
    -----------------               --------------------------------------------------------------------------------------------
    SGA	                            Each instance has its own SGA.

    Background processes            Each instance has its own set of background processes.

    Datafiles	                    Shared by all instances (shared storage).

    Control Files                   Shared by all instances (shared storage).
    
    Online Redo Logfile	            Only one instance can write but other instances can read during recovery and archiving. 
                                    If an instance is shutdown, log switches by other instances can force the idle instance 
                                    redo logs to be archived.

    Archived Redo Logfile           Private to the instance but other instances will need access to all required archive logs 
                                    during media recovery.

    Flash Recovery Log	            Shared by all instances (shared storage).

    Alert Log and Trace Files	    Private to each instance, other instances never read or write to those files.

    ORACLE_HOME	                    Same as single instance plus can be placed on shared file system allowing a common 
                                    ORACLE_HOME for all instances in a RAC environment.

RAC Components :
The major components of a Oracle RAC system are

Shared disk system

Oracle Clusterware

Cluster Interconnects

Oracle Kernel Components

The below diagram describes the basic architecture of the Oracle RAC environment
RAC

Disk System :
With today's SAN and NAS disk storage systems, sharing storage is fairly easy and is required for a RAC environment, you can use the below storage setups

SAN (Storage Area Networks) - generally using fibre to connect to the SAN

NAS ( Network Attached Storage) - generally using a network to connect to the NAS using either NFS, ISCSI

JBOD - direct attached storage, the old traditional way and still used by many companies as a cheap option

Oracle Clusterware :
Oracle Clusterware software is designed to run Oracle in a cluster mode, it can support you to 64 nodes, it can even be used with a vendor cluster like Sun Cluster.
The Clusterware software allows nodes to communicate with each other and forms the cluster that makes the nodes work as a single logical server.
The software is run by the Cluster Ready Services (CRS) using the Oracle Cluster Registry (OCR) that records and maintains the cluster and node membership information and the voting disk which acts as a tiebreaker during communication failures. Consistent heartbeat information travels across the interconnect to the voting disk when the cluster is running.

The CRS has four components

OPROCd : Process Monitor Daemon

CRSd : CRS daemon, the failure of this daemon results in a node being reboot to avoid data corruption

OCSSd : Oracle Cluster Synchronization Service Daemon (updates the registry)

EVMd : Event Volume Manager Daemon


     CRS - Process                           Functionality                            Failure of the Process
    -------------------------        -----------------------------------      ----------------------------------------
    OPROCd - Process Monitor            provides basic cluster                   Node Restart(root user).
                                        integrity services.	    
    EVMd - Event Management             spawns a child process event logger      Daemon automatically restarted, no node
                                        and generates callouts.                  restart(oracle user).
    OCSSd - Cluster                     basic node membership,                   Node Restart(oracle user).
            Synchronization             group services,basic locking.
            Services
    CRSd - Cluster Ready                resource monitoring,failover            Daemon restarted automatically,
           Services                     and node recovery.                      no node restart(root user).

Oracle Cluster Ready Services (CRS) :

The Oracle Cluster Ready Services (CRS) uses the registry to keep the cluster configuration, it should reside on a shared storage and accessible to all nodes within the cluster.

This shared storage is known as the Oracle Cluster Registry (OCR) and its a major part of the cluster, it is automatically backed up (every 4 hours) the daemons plus you can manually back it up.

The OCSSd uses the OCR extensively and writes the changes to the registry

Oracle Cluster Registry (OCR) :

The OCR keeps details of all resources and services, it stores name and value pairs of information such as resources that are used to manage the resource equivalents by the CRS stack.

Resources with the CRS stack are components that are managed by CRS and have the information on the good/bad state and the callout scripts.

The OCR is also used to supply bootstrap information ports, nodes, etc, it is a binary file.

The OCR is loaded as cache on each node, each node will update the cache then only one node is allowed to write the cache to the OCR file, the node is called the master.

The Enterprise manager also uses the OCR cache, it should be at least 100MB in size. The CRS daemon will update the OCR about status of the nodes in the cluster during reconfigurations and failures.

Voting Disk :

The voting disk (or quorum disk) is shared by all nodes within the cluster, information about the cluster is constantly being written to the disk, this is know as the heartbeat.

If for any reason a node cannot access the voting disk it is immediately evicted from the cluster, this protects the cluster from split-brains (the Instance Membership Recovery algorithm IMR is used to detect and resolve split-brains) as the voting disk decides what part is the really cluster.

The voting disk manages the cluster membership and arbitrates the cluster ownership during communication failures between nodes

The voting disk has to reside on shared storage, it is a a small file (20MB) that can be accessed by all nodes in the cluster.

In Oracle 10g R1 you can have only one voting disk, but in R2 you can have upto 32 voting disks allowing you to eliminate any SPOF's.

Oracle Kernel Components :

The kernel components relate to the background processes, buffer cache and shared pool and managing the resources without conflicts and corruptions requires special handling.

In RAC as more than one instance is accessing the resource, the instances require better coordination at the resource management level.

Each node will have its own set of buffers but will be able to request and receive data blocks currently held in another instance's cache. The management of data sharing and exchange is done by the Global Cache Services (GCS).

All the resources in the cluster group form a central repository called the Global Resource Directory (GRD), which is distributed. Each instance masters some set of resources and together all instances form the GRD.

The resources are equally distributed among the nodes based on their weight. The GRD is managed by two services called Global Caches Services (GCS) and Global Enqueue Services (GES), together they form and manage the GRD.

When a node leaves the cluster, the GRD portion of that instance needs to be redistributed to the surviving nodes, a similar action is performed when a new node joins.

Background Processes :
Oracle RAC is composed of two or more database instances. They are composed of memory structures and background processes same as the single instance database.
Oracle RAC instances are composed of following background processes:

ACMS : Atomic Control file to Memory Service

GTX0-j : Global Transaction Process

LMON : Global Enqueue Service Monitor

LMD : Global Enqueue Service Daemon

LMS : Global Cache Service Process

LCK0 : Instance Enqueue Process

DIAG : Diagnosability Daemon

RMSn : RAC Management Processes

RSMN : Remote Slave Monitor

DBRM : Database Resource Manager (from 11g R2)

PING : Response Time Agent (from 11g R2)

LMHB : Global Cache/Enqueue Service Heartbeat Monitor

RCBG : Result Cache BackGround Process

These processes spawned for supporting the multi-instance coordination.

ACMS (from Oracle 11g)Atomic Control file Memory Service :
In an Oracle RAC environment ACMS background process is an agent that ensures a distributed SGA memory update(ie) SGA updates are globally committed on success or globally aborted in event of a failure.

GTX0-j (from Oracle 11g)Global Transaction Process :
The process provides transparent support for XA global transactions in a RAC environment. The database auto tunes the number of these processes based on the workload of XA global transactions.

LMON Global Enqueue Service Monitor(Lock monitor) :
The LMON monitors the entire cluster to manage the global enqueues and the resources and performs global enqueue recovery operations. LMON manages instance and process failures and the associated recovery for the Global Cache Service (GCS) and Global Enqueue Service (GES).
In particular, LMON handles the part of recovery associated with global resources. LMON provided services are also known as cluster group services (CGS). Lock monitor manages global locks and resources.
It handles the redistribution of instance locks whenever instances are started or shutdown. Lock monitor also recovers instance lock information prior to the instance recovery process. Lock monitor co-ordinates with the Process Monitor (PMON) to recover dead processes that hold instance locks.

LMDx Global Enqueue Service Daemon :
The LMD is the lock agent process that manages enqueue manager service requests for Global Cache Service enqueues to control access to global enqueues and resources. This process manages incoming remote resource requests within each instance.
The LMD process also handles deadlock detection and remote enqueue requests. Remote resource requests are the requests originating from another instance. LMDn processes manage instance locks that are used to share resources between instances. LMDn processes also handle deadlock detection and remote lock requests.

LMSx Global Cache Service Processes :
The LMSx are the processes that handle remote Global Cache Service (GCS) messages. Real Application Clusters software provides for up to 10 Global Cache Service Processes. The number of LMSx varies depending on the amount of messaging traffic among nodes in the cluster.

This process maintains statuses of datafiles and each cached block by recording information in a Global Resource Directory(GRD). This process also controls the flow of messages to remote instances and manages global data block access and transmits block images between the buffer caches of different instances. This processing is a part of cache fusion feature.

The LMSx handles the acquisition interrupt and blocking interrupt requests from the remote instances for Global Cache Service resources. For cross-instance consistent read requests, the LMSx will create a consistent read version of the block and send it to the requesting instance.
The LMSx also controls the flow of messages to remote instances.
The LMSn processes handle the blocking interrupts from the remote instance for the Global Cache Service resources by:

Managing the resource requests and cross-instance call operations for the shared resources.

Building a list of invalid lock elements and validating the lock elements during recovery.

Handling the global lock deadlock detection and Monitoring for the lock conversion timeouts.

LCKx Instance Enqueue process :
This process manages the global enqueue requests and the cross-instance broadcast. Workload is automatically shared and balanced when there are multiple Global Cache Service Processes (LMSx). This process is called as instance enqueue process. This process manages non-cache fusion resource requests such as library and row cache requests. The instance locks that are used to share resources between instances are held by the lock processes.

DIAG Diagnosability Daemon :
Monitors the health of the instance and captures the data for instance process failures.

RMSn RAC Management Service :
This process is called as Oracle RAC Management Service/Process. These processes perform manageability tasks for Oracle RAC. Tasks include creation of resources related Oracle RAC when new instances are added to the cluster.

RSMN Remote Slave Monitor :
This process is called as Remote Slave Monitor. This process manages background slave process creation and communication on remote instances. This is a background slave process. This process performs tasks on behalf of a coordinating process running in another instance.

LMHB Global Cache/Enqueue Service Heartbeat Monitor :
LMHB monitors the heartbeat of LMON, LMD, and LMSn processes to ensure they are running normally without blocking or spinning. LMBH trace reports low memory, swap space problem and system average load.

Oracle RAC instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that enable cache fusion. The GES and GCS maintain records of the statuses of each datafile and each cached block using global resource directory (GRD). This process is referred to as cache fusion and helps in data integrity.

Oracle RAC is composed of two or more instances.
When a block of data is read from datafile by an instance within the cluster and another instance is in need of the same block, it is easy to get the block image from the instance which has the block in its SGA rather than reading from the disk.
To enable inter instance communication Oracle RAC makes use of interconnects. The Global Enqueue Service(GES) monitors and Instance enqueue process manages the cache fusion.

Next Topic » (Oracle Rac - Configuring Oracle Virtual Box for RAC)