sitemap

RSS地图

收藏本站

设为首页

Oracle研究中心

当前位置:Oracle研究中心 > 产品DBA > Oracle RAC >

【案例】Oracle 10G RAC中crash节点的强制删除与重新增加的笔记

时间:2016-10-23 10:09   来源:Oracle研究中心   作者:HTZ   点击:

天萃荷净 Oracle研究中心案例分析:运维DBA反映需求SOLARIS RAC平台模拟节点crash后强制删除与增加的方法,结合实例总结出实施笔记。
本次测试来至于跟朋友一次聊天,关于10G RAC中crash节点的删除与重新增加,已经N久没有做过10G RAC的操作,并且原来的操作记录也没有找到,悲剧的曾经的笔记全掉了。
在这次测试过程中,遇到一个原来重要没有遇到的过的问题。

本次是通过手动删除节点1来模拟节点1crash后,在节点2上清除节点1的信息。官方文档见:
Steps to Remove Node from Cluster When the Node Crashes Due to OS/Hardware Failure and cannot boot up (文档 ID 466975.1)

1,当前集群资源的状态
oracleplus.net$crs_stat -t
Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE ONLINE sol1
ora….L1.lsnr application ONLINE ONLINE sol1
ora.sol1.gsd application ONLINE ONLINE sol1
ora.sol1.ons application ONLINE ONLINE sol1
ora.sol1.vip application ONLINE ONLINE sol1
ora.sol10g.db application ONLINE ONLINE sol2
ora….g1.inst application ONLINE ONLINE sol1
ora….g2.inst application ONLINE ONLINE sol2
ora….SM2.asm application ONLINE ONLINE sol2
ora….L2.lsnr application ONLINE ONLINE sol2
ora.sol2.gsd application ONLINE ONLINE sol2
ora.sol2.ons application ONLINE ONLINE sol2
ora.sol2.vip application ONLINE ONLINE sol2

2,删除节点1中数据库与crs的信息
oracleplus.net # /oracle/app/oracle/product/10.2.0/db_1/bin/crsctl stop crs
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

oracleplus.net$crs_stat -t
Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE OFFLINE
ora….L1.lsnr application ONLINE OFFLINE
ora.sol1.gsd application ONLINE OFFLINE
ora.sol1.ons application ONLINE OFFLINE
ora.sol1.vip application ONLINE ONLINE sol2
ora.sol10g.db application ONLINE ONLINE sol2
ora….g1.inst application ONLINE OFFLINE
ora….g2.inst application ONLINE ONLINE sol2
ora….SM2.asm application ONLINE ONLINE sol2
ora….L2.lsnr application ONLINE ONLINE sol2
ora.sol2.gsd application ONLINE ONLINE sol2
ora.sol2.ons application ONLINE ONLINE sol2
ora.sol2.vip application ONLINE ONLINE sol2


删除节点1相当的信息
oracleplus.net # rm /etc/init.d/init.cssd
rm /etc/init.d/init.crs
oracleplus.net # rm /etc/init.d/init.crs
oracleplus.net # rm /etc/init.d/init.crsd
oracleplus.net # rm /etc/init.d/init.evmd
oracleplus.net # rm /etc/rc3.d/K96init.crs
/etc/rc3.d/K96init.crs: No such file or directory
oracleplus.net # rm /etc/rc3.d/S96init.crs
oracleplus.net # rm -Rf /var/opt/oracle/scls_scr
oracleplus.net # rm -Rf /var/opt/oracle/oprocd
oracleplus.net # rm /etc/inittab.crs
oracleplus.net # cp /etc/inittab.orig /etc/inittab

oracleplus.net # rm -rf /var/tmp/.oracle/*
oracleplus.net #
oracleplus.net # rm -rf /tmp/.oracle/*
oracleplus.net # rm -rf /oracle/app

3,节点2,CRS中削除节点1的信息
oracleplus.net # pwd
/oracle/app/oracle/product/10.2.0/crs_1/bin
oracleplus.net # ./oifcfg getif
e1000g0 192.168.111.0 global public
e1000g1 192.168.112.0 global cluster_interconnect

在oifcfg中清除节点1的信息
oracleplus.net # ./oifcfg delif -node sol1
PROC-4: The cluster registry key to be operated on does not exist.
PRIF-11: cluster registry error

在ons中清除节点1的信息
oracleplus.net # cat $CRS_HOME/opmn/conf/ons.config
localport=6100
remoteport=6200
loglevel=3
useocr=on


oracleplus.net # $CRS_HOME/bin/racgons remove_config sol1:6200
racgons: Existing key value on sol1 = 6200.
racgons: sol1:6200 removed from OCR.


oracleplus.net # $CRS_HOME/bin/crs_stat -t
Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE OFFLINE
ora….L1.lsnr application ONLINE OFFLINE
ora.sol1.gsd application ONLINE OFFLINE
ora.sol1.ons application ONLINE OFFLINE
ora.sol1.vip application ONLINE ONLINE sol2
ora.sol10g.db application ONLINE ONLINE sol2
ora….g1.inst application ONLINE OFFLINE
ora….g2.inst application ONLINE ONLINE sol2
ora….SM2.asm application ONLINE ONLINE sol2
ora….L2.lsnr application ONLINE ONLINE sol2
ora.sol2.gsd application ONLINE ONLINE sol2
ora.sol2.ons application ONLINE ONLINE sol2
ora.sol2.vip application ONLINE ONLINE sol2


在CRS中清除节点1的信息
oracleplus.net # $CRS_HOME/bin/srvctl remove instance -d sol10g -i sol10g1
Remove instance sol10g1 from the database sol10g (y/[n]) y

oracleplus.net # $CRS_HOME/bin/srvctl remove asm -n sol1


oracleplus.net # $CRS_HOME/bin/srvctl remove nodeapps -n sol1
Please confirm that you intend to remove the node-level applications on node sol1 (y/[n]) y
PRKO-2108 : Node applications are still running on node: sol1

oracleplus.net # $CRS_HOME/bin/srvctl remove nodeapps -n sol1
Please confirm that you intend to remove the node-level applications on node sol1 (y/[n]) y
PRKO-2108 : Node applications are still running on node: sol1
# $CRS_HOME/bin/crs_stat -t
Name Type Target State Host
————————————————————
ora….L1.lsnr application ONLINE OFFLINE
ora.sol1.gsd application ONLINE OFFLINE
ora.sol1.ons application ONLINE OFFLINE
ora.sol1.vip application ONLINE ONLINE sol2
ora.sol10g.db application ONLINE ONLINE sol2
ora….g2.inst application ONLINE ONLINE sol2
ora….SM2.asm application ONLINE ONLINE sol2
http://www.oracleplus.net
ora….L2.lsnr application ONLINE ONLINE sol2
ora.sol2.gsd application ONLINE ONLINE sol2
ora.sol2.ons application ONLINE ONLINE sol2
ora.sol2.vip application ONLINE ONLINE sol2
oracleplus.net # $CRS_HOME/bin/crs_stop -f ora.sol1.vip
Target set to OFFLINE for `ora.sol1.LISTENER_SOL1.lsnr`
Attempting to stop `ora.sol1.vip` on member `sol2`
Stop of `ora.sol1.vip` on member `sol2` succeeded.
oracleplus.net # $CRS_HOME/bin/srvctl remove nodeapps -n sol1
Please confirm that you intend to remove the node-level applications on node sol1 (y/[n]) y
PRKO-2112 : Some or all node applications are not removed successfully on node: sol1
# $CRS_HOME/bin/crs_stat -t
Name Type Target State Host
————————————————————
ora….L1.lsnr application OFFLINE OFFLINE
ora.sol1.vip application OFFLINE OFFLINE
ora.sol10g.db application ONLINE ONLINE sol2
ora….g2.inst application ONLINE ONLINE sol2
ora….SM2.asm application ONLINE ONLINE sol2
ora….L2.lsnr application ONLINE ONLINE sol2
ora.sol2.gsd application ONLINE ONLINE sol2
ora.sol2.ons application ONLINE ONLINE sol2
ora.sol2.vip application ONLINE ONLINE sol2


oracleplus.net # $CRS_HOME/bin/crs_stat|grep lsn
NAME=ora.sol1.LISTENER_SOL1.lsnr
NAME=ora.sol2.LISTENER_SOL2.lsnr
oracleplus.net # $CRS_HOME/bin/crs_unregister ora.sol1.LISTENER_SOL1.lsnr

oracleplus.net # $CRS_HOME/bin/olsnodes -n
sol1 1
sol2 2


删除节点1的信息
oracleplus.net # ./rootdeletenode.sh sol1,1
CRS-0210: Could not find resource ‘ora.sol1.LISTENER_SOL1.lsnr’.
CRS-0210: Could not find resource ‘ora.sol1.ons’.
CRS-0210: Could not find resource ‘ora.sol1.vip’.
CRS-0210: Could not find resource ‘ora.sol1.gsd’.
CRS-0210: Could not find resource ora.sol1.vip.
CRS nodeapps are deleted successfully
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully deleted 14 values from OCR.
Key SYSTEM.css.interfaces.nodesol1 marked for deletion is not there. Ignoring.
Successfully deleted 5 keys from OCR.
Node deletion operation successful.
‘sol1,1’ deleted successfully
oracleplus.net # $CRS_HOME/bin/olsnodes -n
sol2 2

更新inventory文件
oracleplus.net$$ORA_CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORA_CRS_HOME “CLUSTER_NODES=sol2” CRS=TRUE
Starting Oracle Universal Installer…

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
The inventory pointer is located at /var/opt/oracle/oraInst.loc
The inventory is located at /oracle/app/oracle/oraInventory
‘UpdateNodeList’ was successful.


oracleplus.net$$ORACLE_HOME//oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME “CLUSTER_NODES=sol2”
Starting Oracle Universal Installer…

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
The inventory pointer is located at /var/opt/oracle/oraInst.loc
The inventory is located at /oracle/app/oracle/oraInventory
‘UpdateNodeList’ was successful.

更新后的值

oracleplus.net$cat inventory.xml
< xml version=”1.0″ standalone=”yes” >
10.2.0.4.0
2.1.0.6.0







4,增加节点
oracleplus.net$pwd
/oracle/app/oracle/product/10.2.0/crs_1/oui/bin
oracleplus.net$ls
addLangs.sh addNode.sh attachHome.sh detachHome.sh lsnodes ouica.bat ouica.sh resource runConfig.sh runInstaller runInstaller.sh
oracleplus.net$./addNode.sh
Starting Oracle Universal Installer…

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
Oracle Universal Installer, Version 10.2.0.4.0 Production
Copyright (C) 1999, 2008, Oracle. All rights reserved

这里点next就报错,从报错信息中可以找到下面内容,oui的默认日志路径见log文件位置
INFO: Username:oracle

INFO: Install area Control created with access level 1

INFO: Oracle Universal Installer version is 10.2.0.4.0

INFO: Setting variable ‘ORACLE_HOME’ to ‘/oracle/app/oracle/product/10.2.0/crs_1’. Received the value from the command line.
INFO: Setting variable ‘PREREQ_CONFIG_LOCATION’ to ”. Received the value from variable association.
INFO: Setting variable ‘FROM_LOCATION’ to ‘/oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml’. Received the value from a code block.
INFO: Setting variable ‘ROOTSH_LOCATION’ to ‘/oracle/app/oracle/product/10.2.0/crs_1/root.sh’. Received the value from a code block.
INFO: Setting variable ‘ROOTSH_STATUS’ to ‘3’. Received the value from a code block.
INFO: Setting variable ‘ORACLE_HOME’ to ‘/oracle/app/oracle/product/10.2.0/crs_1’. Received the value from the command line.
INFO: Setting variable ‘PREREQ_CONFIG_LOCATION’ to ”. Received the value from variable association.
INFO: Setting variable ‘FROM_LOCATION’ to ‘/oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml’. Received the value from a code block.
INFO: Setting variable ‘ROOTSH_LOCATION’ to ‘/oracle/app/oracle/product/10.2.0/crs_1/root.sh’. Received the value from a code block.
INFO: Setting variable ‘ROOTSH_STATUS’ to ‘3’. Received the value from a code block.
INFO:
*** Welcome Page***
INFO: Unable to read inventory information for the home: /oracle/app/oracle/product/10.2.0/crs_1.
INFO: Unable to read inventory information for the home: /oracle/app/oracle/product/10.2.0/crs_1.

下面部分是在点next之后产生的
/************************************************************************************
INFO: Setting variable ‘ORACLE_HOME_NAME’ to ‘OraCrs10g_home’. Received the value from a code block.
INFO: Unable to read inventory information for the home: /oracle/app/oracle/product/10.2.0/crs_1.
SEVERE: Abnormal program termination. An internal error has occured. Please provide the following files to Oracle Support :

“/oracle/app/oracle/oraInventory/logs/addNodeActions2014-05-08_06-05-16PM.log”
“/oracle/app/oracle/oraInventory/logs/oraInstall2014-05-08_06-05-16PM.err”
“/oracle/app/oracle/oraInventory/logs/oraInstall2014-05-08_06-05-16PM.out
***************************************************************************************/

/oracle/app/oracle/oraInventory/logs/oraInstall2014-05-08_06-05-16PM.err中我们可以发现下面的报错信息

org.xml.sax.SAXParseException: : XML-20210: (Fatal Error) Unexpected EOF.
at oracle.xml.parser.v2.XMLError.flushErrorHandler(XMLError.java:415)
at oracle.xml.parser.v2.XMLError.flushErrors1(XMLError.java:284)
at oracle.xml.parser.v2.XMLReader.popXMLReader(XMLReader.java:540)
at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingParser.java:1339)
at oracle.xml.parser.v2.NonValidatingParser.parseRootElement(NonValidatingParser.java:326)
at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingParser.java:293)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:209)
at oracle.sysman.oii.oiii.OiiiInstallXMLReader.readComps(OiiiInstallXMLReader.java:271)
at oracle.sysman.oii.oiii.OiiiInstallInventory.getCompOHListElement(OiiiInstallInventory.java:1663)
at oracle.sysman.oii.oiii.OiiiAreaInventory.getAllCompsVect(OiiiAreaInventory.java:1052)
at oracle.sysman.oii.oiii.OiiiAreaInventory.getTopLevelComps(OiiiAreaInventory.java:1872)
at oracle.sysman.oii.oiii.OiiiInstallInventory.setOHProperties(OiiiInstallInventory.java:6064)
at oracle.sysman.oii.oiif.oiifp.OiifpContentsTabPanel.addHomes(OiifpContentsTabPanel.java:777)
at oracle.sysman.oii.oiif.oiifp.OiifpContentsTabPanel.fillInventoryTree(OiifpContentsTabPanel.java:691)
at oracle.sysman.oii.oiif.oiifp.OiifpContentsTabPanel.refreshTree(OiifpContentsTabPanel.java:1508)
at oracle.sysman.oii.oiif.oiifp.OiifpContentsTabPanel.prepareInvTree(OiifpContentsTabPanel.java:2253)
at oracle.sysman.oii.oiif.oiifd.OiifdInventoryDialog.doModal(OiifdInventoryDialog.java:457)
at oracle.sysman.oii.oiif.oiifw.OiifwWizDialog.onViewPrivate(OiifwWizDialog.java:863)
at oracle.sysman.oii.oiif.oiifw.OiifwWizDialog.access$000(OiifwWizDialog.java:330)
at oracle.sysman.oii.oiif.oiifw.OiifwWizDialog$PrepareInventoryTree.run(OiifwWizDialog.java:1778)
at java.lang.Thread.run(Thread.java:534)
这里报XML结果,非法的文件结局,中途以为是inventory配置错误,使用opatch -lsinventory结果显示正常,这里怀疑是某个XML文件损坏导致的。

通过truss来查看runInstaller访问了那些xml文件

oracleplus.net$truss -aefo /tmp/123.log ./addNode.sh
Starting Oracle Universal Installer…

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
Oracle Universal Installer, Version 10.2.0.4.0 Production
Copyright (C) 1999, 2008, Oracle. All rights reserved.


可以看到打开了如下的XML文件
115 1728 23646/1: open(“/oracle/app/oracle/product/10.2.0/crs_1/oui/jlib/xmlparserv2.jar”, O_RDONLY|O_LARGEFILE) = 6
119 1776 23646/1: open(“/oracle/app/oracle/product/10.2.0/crs_1/oui/jlib/xml.jar”, O_RDONLY|O_LARGEFILE) = 6
1382 15600 23646/1: open(“/oracle/app/oracle/oraInventory/ContentsXML/inventory.xml”, O_RDONLY|O_LARGEFILE) = 15
1385 15915 23646/1: open(“/oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/oraclehomeproperties.xml”, O_RDONLY|O_LARGEFILE) = 15
1387 15974 23646/1: open(“/oracle/app/oracle/product/10.2.0/db_1/inventory/ContentsXML/oraclehomeproperties.xml”, O_RDONLY|O_LARGEFILE) = 15
1401 18121 23646/15: open(“/oracle/app/oracle/oraInventory/ContentsXML/comps.xml”, O_RDONLY|O_LARGEFILE) = 19
1403 18217 23646/15: open(“/oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml”, O_RDONLY|O_LARGEFILE) = 19
1421 19159 23646/15: open(“/oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml”, O_RDONLY|O_LARGEFILE) = 20
1444 22099 23646/1: open(“/oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml”, O_RDONLY|O_LARGEFILE) = 21

最后发现只有comps.xml有889行

这里下载了一个XML编辑器,不能正常编辑COMPS.XML文件,说明文件有异常。
从其它的环境是CP一个COMPS.XML过来一对比,发现出错的XML文件下了很多的内容

正常的comps.xml文件的结构如下:
oracleplus.net$cat comps.xml
< xml version=”1.0″ standalone=”yes” >
oracleplus.net$pwd
/oracle/app/oracle/oraInventory/ContentsXML

其实这里我们也可以通过opatch来验证XML文件结构是否正确

oracleplus.net$$ORA_CRS_HOME/OPatch/opatch util LoadXML -xmlInput /oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml.back
Invoking OPatch 10.2.0.4.2

Oracle Interim Patch Installer version 10.2.0.4.2
Copyright (c) 2007, Oracle Corporation. All rights reserved.

UTIL session

Oracle Home : /oracle/app/oracle/product/10.2.0/db_1
Central Inventory : /oracle/app/oracle/oraInventory
from : /var/opt/oracle/oraInst.loc
OPatch version : 10.2.0.4.2
OUI version : 10.2.0.4.0
OUI location : /oracle/app/oracle/product/10.2.0/db_1/oui
Log file location : /oracle/app/oracle/product/10.2.0/db_1/cfgtoollogs/opatch/opatch2014-05-08_20-39-18PM.log

Invoking utility “loadxml”
UtilSession failed: Unable to parse the xml file.

OPatch failed with error code 73

由于这里是测试环境,所以我直接使用的mv方式,如果是生产环境,建议从其它相当环境CP一个文件过来

oracleplus.net # mv /oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml /oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml.back

再次执行addNode.sh终于见到了
clip_image001
一路next下去,一切正常,直到出现下面的图片
clip_image002
这里我们选择yes,因为OCR是由root用户执行的,日志属主是root,不影响addNode.sh操作

在执行addNode.sh操作的主机上面执行rootaddnode.sh报下面的错误

oracleplus.net # /oracle/app/oracle/product/10.2.0/crs_1/rootaddnode.sh
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Attempting to add 1 new nodes to the configuration
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 3: sol1 sol1-priv sol1
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
/oracle/app/oracle/product/10.2.0/crs_1/bin/srvctl add nodeapps -n sol1 -A %s_nodevips%/255.255.255.0/e1000g0 -o /oracle/app/oracle/product/10.2.0/crs_1
PRKO-2109 : Invalid address string: %s_nodevips%/255.255.255.0/e1000g0

oracleplus.net # /oracle/app/oracle/product/10.2.0/crs_1/bin/srvctl config nodeapps -n sol2 -a
VIP exists.: /sol2-vip/192.168.111.49/255.255.255.0/e1000g0


oracleplus.net # grep “s_nodevips|CRS_NEW_NODEVIPS” /oracle/app/oracle/product/10.2.0/crs_1/rootaddnode.sh
CRS_NEW_NODEVIPS=%s_nodevips%


oracleplus.net # grep “CRS_NEW_NODEVIPS” /oracle/app/oracle/product/10.2.0/crs_1/rootaddnode.sh
CRS_NEW_NODEVIPS=%s_nodevips%
NODE_VIP=`$ECHO $CRS_NEW_NODEVIPS | $CUT -d’,’ -f$Ni`

手动修改rootaddnode.sh脚本内容,
/oracle/app/oracle/product/10.2.0/crs_1/rootaddnode.sh

Ni=1
for i in `$ECHO $NODES_LIST`
do
NODE_NAME=$i
NODE_VIP=`$ECHO $CRS_NEW_NODEVIPS | $CUT -d’,’ -f$Ni`
NODEVIP=$NODE_VIP/$NETMASK/$NETIFs

$ECHO $CH/bin/srvctl add nodeapps -n $NODE_NAME -A $NODEVIP -o $CH
$CH/bin/srvctl add nodeapps -n $NODE_NAME -A $NODEVIP -o $CH
Ni=`expr $Ni + 1`
done


更改后
Ni=1
for i in `$ECHO $NODES_LIST`
do
NODE_NAME=$i
#NODE_VIP=`$ECHO $CRS_NEW_NODEVIPS | $CUT -d’,’ -f$Ni`
NODE_VIP=192.168.111.48
NODEVIP=$NODE_VIP/$NETMASK/$NETIFs

$ECHO $CH/bin/srvctl add nodeapps -n $NODE_NAME -A $NODEVIP -o $CH
$CH/bin/srvctl add nodeapps -n $NODE_NAME -A $NODEVIP -o $CH
Ni=`expr $Ni + 1`
done


oracleplus.net # /oracle/app/oracle/product/10.2.0/crs_1/rootaddnode.sh
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Node sol1 is already assigned nodenum 3.
Aborting: No configuration data has been changed.
clscfg -add -nn nameA,numA,nameB,numB,… -pn privA,numA,privB,numB,…
[-hn hostA,numA,hostB,numB,…] [-t p1,p2,p3,p4]
-nn specifies nodenames in the same fashion as -nn in -install mode
-pn specifies private interconnect names as -pn in -install mode
-hn specifies hostnames in the same fashion as -hn in -install mode
-t specifies port numbers to be used by CRS daemons on the new node(s)
default ports: 49895,49896,49897,49898
WARNING: Using this tool may corrupt your cluster configuration. Do not
use unless you positively know what you are doing.

/oracle/app/oracle/product/10.2.0/crs_1/bin/srvctl add nodeapps -n sol1 -A 192.168.111.48/255.255.255.0/e1000g0 -o /oracle/app/oracle/product/10.2.0/crs_1


下面是在1节点执行
oracleplus.net # /oracle/app/oracle/product/10.2.0/crs_1/root.sh
WARNING: directory ‘/oracle/app/oracle/product/10.2.0’ is not owned by root
WARNING: directory ‘/oracle/app/oracle/product’ is not owned by root
WARNING: directory ‘/oracle/app/oracle’ is not owned by root
WARNING: directory ‘/oracle/app’ is not owned by root
WARNING: directory ‘/oracle’ is not owned by root
Checking to see if Oracle CRS stack is already configured
OCR LOCATIONS = /dev/rdsk/c2t0d0s0
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory ‘/oracle/app/oracle/product/10.2.0’ is not owned by root
WARNING: directory ‘/oracle/app/oracle/product’ is not owned by root
WARNING: directory ‘/oracle/app/oracle’ is not owned by root
WARNING: directory ‘/oracle/app’ is not owned by root
WARNING: directory ‘/oracle’ is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: sol1 sol1-priv sol1
node 2: sol2 sol2-priv sol2
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
sol2
sol1
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps

Creating VIP application resource on (0) nodes.
Creating GSD application resource on (0) nodes.
Creating ONS application resource on (0) nodes.
Starting VIP application resource on (2) nodes1:CRS-1002: Resource ‘ora.sol1.vip’ is already running on member ‘sol1’
CRS-0223: Resource ‘ora.sol1.vip’ has placement error.
Check the log file “/oracle/app/oracle/product/10.2.0/crs_1/log/sol1/racg/ora.sol1.vip.log” for more details

Starting GSD application resource on (2) nodes1:CRS-0233: Resource or relatives are currently involved with another operation.
Check the log file “/oracle/app/oracle/product/10.2.0/crs_1/log/sol1/racg/ora.sol1.gsd.log” for more details

Starting ONS application resource on (2) nodes1:CRS-0233: Resource or relatives are currently involved with another operation.
Check the log file “/oracle/app/oracle/product/10.2.0/crs_1/log/sol1/racg/ora.sol1.ons.log” for more details



Done.

这里注意报了很多错误,但是不影响。

这里看到各个节点都正常
oracleplus.net$crs_stat -t
Name Type Target State Host
————————————————————
ora.sol1.gsd application ONLINE ONLINE sol1
ora.sol1.ons application ONLINE ONLINE sol1
ora.sol1.vip application ONLINE ONLINE sol1
ora.sol10g.db application ONLINE ONLINE sol2
ora….g2.inst application ONLINE ONLINE sol2
ora….SM2.asm application ONLINE ONLINE sol2
ora….L2.lsnr application ONLINE ONLINE sol2
ora.sol2.gsd application ONLINE ONLINE sol2
ora.sol2.ons application ONLINE ONLINE sol2
ora.sol2.vip application ONLINE ONLINE sol2

oracleplus.net # ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843 mtu 1500 index 2
inet 192.168.111.46 netmask ffffff00 broadcast 192.168.111.255
ether 0:c:29:5a:e5:7a
e1000g0:1: flags=1040843 mtu 1500 index 2
inet 192.168.111.48 netmask ffffff00 broadcast 192.168.111.255
e1000g1: flags=1000843 mtu 1500 index 3
inet 192.168.112.46 netmask ffffff00 broadcast 192.168.112.255
ether 0:c:29:5a:e5:84


下面就是在oracle用户下面增加节点
oracleplus.net$./addNode.sh
这里很顺利,无任何报错

配置监听服务,这里可以使用手动的方式来配置,由于是测试环境,我这里在正常的节点上面通过netca来配置的。在配置过程中监听服务需要被中断。

这里中途需要停listener
oracleplus.net$crs_stat -t
Name Type Target State Host
————————————————————
ora….L1.lsnr application ONLINE ONLINE sol1
ora.sol1.gsd application ONLINE ONLINE sol1
ora.sol1.ons application ONLINE ONLINE sol1
ora.sol1.vip application ONLINE ONLINE sol1
ora.sol10g.db application ONLINE ONLINE sol2
ora….g2.inst application ONLINE ONLINE sol2
ora….SM2.asm application ONLINE ONLINE sol2
ora….L2.lsnr application ONLINE ONLINE sol2
ora.sol2.gsd application ONLINE ONLINE sol2
ora.sol2.ons application ONLINE ONLINE sol2
ora.sol2.vip application ONLINE ONLINE sol2

在正常的节点上面执行dbca来增加实例
dbca很正常,会自动增加ASM实例的信息。

oracleplus.net$crs_stat -t
Name Type Target State Host
————————————————————
ora….SM3.asm application ONLINE ONLINE sol1
ora….L1.lsnr application ONLINE ONLINE sol1
ora.sol1.gsd application ONLINE ONLINE sol1
ora.sol1.ons application ONLINE ONLINE sol1
ora.sol1.vip application ONLINE ONLINE sol1
ora.sol10g.db application ONLINE ONLINE sol2
ora….g1.inst application ONLINE ONLINE sol1
ora….g2.inst application ONLINE ONLINE sol2
ora….SM2.asm application ONLINE ONLINE sol2
ora….L2.lsnr application ONLINE ONLINE sol2
ora.sol2.gsd application ONLINE ONLINE sol2
ora.sol2.ons application ONLINE ONLINE sol2
ora.sol2.vip application ONLINE ONLINE sol2

oracleplus.net$crs_stat|grep asm
NAME=ora.sol1.ASM3.asm
NAME=ora.sol2.ASM2.asm

这里注意到我们的ASM实例变成了ASM3,是由于自动增加的原因,我们可以使用增加增加ASM实例来解决问题

oracleplus.net$srvctl stop instance -d sol10g -i sol10g1
oracleplus.net$srvctl stop asm -n sol1

oracleplus.net$crs_unregister ora.sol10g.sol10g1.inst
oracleplus.net$crs_unregister ora.sol1.ASM3.asm


oracleplus.net$cat /oracle/app/oracle/admin/+ASM/pfile/init.ora
##############################################################################
oracleplus.net # Copyright (c) 1991, 2001, 2002 by Oracle Corporation
##############################################################################

###########################################
oracleplus.net # Cluster Database
###########################################
asm_diskgroups=’DATA’
background_dump_dest=/oracle/app/oracle/admin/+ASM/bdump
cluster_database=TRUE
core_dump_dest=/oracle/app/oracle/admin/+ASM/cdump
instance_type=asm
large_pool_size=12582912
remote_login_passwordfile=EXCLUSIVE
user_dump_dest=/oracle/app/oracle/admin/+ASM/udump

+ASM2.instance_number=2
+ASM1.instance_number=1


oracleplus.net$export ORACLE_SID=+ASM1
oracleplus.net$sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 – Production on Thu May 8 23:14:50 2014

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to an idle instance.

SQL> create spfile from pfile;

File created.
oracleplus.net$srvctl remove asm -n sol1
oracleplus.net$srvctl add asm -n sol1 -i +ASM1 -o $ORACLE_HOME -p $ORACLE_HOME/dbs/spfile+ASM1.ora
oracleplus.net$srvctl start asm -n sol1

oracleplus.net$srvctl start instance -d sol10g -i sol10g1

oracleplus.net$srvctl start instance -d sol10g -i sol10g1
修改一下依赖性
oracleplus.net$srvctl modify instance -d sol10g -i sol10g1 -s +ASM1
oracleplus.net$srvctl stop asm -n sol1
oracleplus.net$srvctl start instance -d sol10g -i sol10g1

下切正常
oracleplus.net$crs_stat -t
Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE ONLINE sol1
ora….L1.lsnr application ONLINE ONLINE sol1
ora.sol1.gsd application ONLINE ONLINE sol1
ora.sol1.ons application ONLINE ONLINE sol1
ora.sol1.vip application ONLINE ONLINE sol1
ora.sol10g.db application ONLINE ONLINE sol2
ora….g1.inst application ONLINE ONLINE sol1
ora….g2.inst application ONLINE ONLINE sol2
ora….SM2.asm application ONLINE ONLINE sol2
ora….L2.lsnr application ONLINE ONLINE sol2
ora.sol2.gsd application ONLINE ONLINE sol2
ora.sol2.ons application ONLINE ONLINE sol2
ora.sol2.vip application ONLINE ONLINE sol2
整个增加过程结束,在增加过程中,遇到了一点小麻烦。

本文固定链接: http://www.htz.pw/2014/05/11/solaris-rac%e5%b9%b3%e5%8f%b0%e6%a8%a1%e6%8b%9f%e8%8a%82%e7%82%b9crash%e8%8a%82%e7%82%b9%e7%9a%84%e5%bc%ba%e5%88%b6%e5%88%a0%e9%99%a4%e4%b8%8e%e5%a2%9e%e5%8a%a0.html | 认真就输

--------------------------------------ORACLE-DBA----------------------------------------

最权威、专业的Oracle案例资源汇总之【案例】Oracle 10G RAC中crash节点的强制删除与重新增加的笔记

本文由大师惜分飞原创分享,网址:http://www.oracleplus.net/arch/1041.html

Oracle研究中心

关键词:

Oracle RAC如何强制删除和增加节点

SOLARIS RAC平台模拟节点crash后强制删除与增加