I have a customer trying to add a new node to a cluster using Fleet Patching and Provisioning.
The error in the command output is not very friendly:
[grid@fpps ~]$ rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 \ -newnodes fppc3:fppc3-vip -cred fppc-cred fpps: Audit ID: 269 PRCT-1003 : failed to run "rhphelper" on node "fppc2" PRCT-1014 : Internal error: RHPHELP_preNodeAddVal-05null
The “RHPHELP_preNodeAddVal” might already give an idea of the cause: something related to the “cluvfy stage -pre nodeadd” evaluation that we normally do when adding a node by hand. FPP does not really run cluvfy, but it calls the same primitives cluvfy is based on.
In FPP, when the error does not give any useful information, this is the flow to follow:
- use “rhpctl query audit” to get the date and time of the failing operation
- open the “rhpserver.log.0” and look for the operation log in that time frame
- get the UID of the operation e.g., in the following line it is “1556344143”:
[UID:-1556344143] [RMI TCP Connection(153)-192.168.1.151] [ 2021-07-27 00:25:20.741 KST ] [ServerCommon.processParameters:485] before parsing: params = {-methodName=addnodesWorkingCopy, -userName=grid, -version=19.0.0.0.0, -auditId=-1556344143, -auditCli=rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 -newnodes fppc3:fppc3-vip -cred cred_fppc, -plsnrPort=31605, -noun=gihome, -isSingleNodeProv=FALSE, -nls_lang=AMERICAN_AMERICA.AL32UTF8, -clusterName=fpps-cluster, -plsnrHost=fpps, -SA11204ClusterName=null, -lang=en_US, -clientNode=fpps, -verb=addnode, -ghopuid=-1556344143}
- Isolate the log for the operation:
grep $UID rhpserver.log.0 > $UID.log
- Locate the trace file of the rhphelper remote execution:
[UID:-1556344143] [RMI TCP Connection(153)-192.168.1.151] [ 2021-07-27 00:26:07.031 KST ] [RHPHELPERUtil.getTraceEnvs:4386] TraceFileLocEnv is :RHPHELPER_TRACEFILE=/u01/app/grid/crsdata/fppc2/rhp/rhphelp_20210727002603.trc
- Find the root cause in the rhphelper trace:
[main] [ 2021-07-27 00:27:02.600 KST ] [reflect.GeneratedMethodAccessor1.invoke:-1] PRVG-11406 : API with node roles argument must be called for Flex Cluster
In this case, the target cluster is a Flex Cluster, so the command must be run specifying the node_role.
The documentation is not clear (we will fix it soon):
rhpctl addnode gihome {-workingcopy workingcopy_name | -client cluster_name} -newnodes node_name:node_vip[:node_role][,node_name:node_vip[:node_role]...]
node_role must be specified for Flex Clusters, and it must be either HUB or LEAF.
After using the correct command line, the command succeeded.
rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 \ -newnodes fppc3:fppc3-vip:HUB -cred fppc-cred
HTH
—
Ludovico