Nov 5, 2017 7:01:00 AM by Syed Jaffar Hussain
A brand new Exadata X5-2L eighth rack (I knew the latest is X7 now, but it was for a POC, so no worries) has been deployed recently at a customer for Oracle EBS Exadata migration POC purpose. Its wasn't an easy walk in the park as I initially presumed. There were some challenges (network, configuration) thrown during the migration, but, happily overcome and had it installed and EBS database migration completion.
So, I am going to share yet another Exadata bare metal deployment story explaining the challenges I have faced, and how they are fixed.
After successful execution of the elasticConfig, all the Exadata factory IP addresses have been set to client IPs. Though the management network was accessible from the outside, client network was not accessible. When verified with the network team about enabling the ports on the corporate switch, they confirmed that the ports are enabled, however, the connection is showing as not active and asked to us investigate the network cables connected to the DB nodes. When we verified the network cables ports, we didn't find any lights flashing and after an extensive investigation (Switch ports, SFP on Exadata and Corporate switch, checking the cables status), it was found that the cables pin direction was not properly connected. Also, found that the network bonding interfaces (eth1 and eth2) were not up, confirmed from ethtool eth1 command. After fixing the cables, and bringing up the interfaces (ifup eth1 & eith2), we could see that cables are connected properly and we can also see the lights on the ports.
$ethtool eth1 (shows the interfaces were not connected)
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 100baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes: 100baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Duplex: Unknown! (255)
Port: Twisted Pair
Supports Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: no
After fixing the cable issues, we then continued with onecommand execution. During the validation it failed because of different netmask for client network (under the cluster information section). The customer unfortunately made a mistake in the client network netmask selection for cluster settings, so there was a difference in the client netmask value for client and cluster. This was fixed by modifying the netmask value in the ifcfg-boneth0 file (/etc/sysconfig/network-scripts), restart the network services.
Since the system was delivered somewhere during end of August 2015, no one actually knows exactly the disk size and rack model. The BOQ (Bill of quantity) for the order only shows X5-2 HC storage. So, there was wrong selection in the OEDA for Exadata rack and disk size. Instead of 4TB disk size, it was selected as 8TB disk size and instead of Elastic configuration, fixed Eighth rack was selected. This was fixed by rerunning the OEDA with the correct options.
There was another obstacle faced while doing the cell connectivity (part of onecommand). Cell server IPs were not modified by the elasticConfig. Fortunately, I found my friend blog on this topic and quickly fixed the issue. This is why I like to blog all the technical issues, who knows, this could solve someone pains.
Cluster validation failed due to inconsistent values for scan name. During the investigation of various issues, private, public & scan IPs are put in the /etc/hosts file. So, while configuring LISTENER_SCAN2 and SCAN3, this issue happened. This was fairly understandable. Due to 3 entries of scan values in the /etc/hosts file this happened. Upon a quick google about the issue, the following blog helped me to fix the issue
Finally, I have managed to deploy the Exadata successfully and perform the Oracle EBS database migration . No doubt, this experience really made me strong in network and other areas. So, every challenges comes with opportunity to learn.
I thank those individuals who really write blogs and share their experience to help Oracle community.
There is still one open issues which is yet to be resolved. A slow sqlplus and db startup. I presume this is due to heavy resource utilization over the server. Yet to resolve the mystery. Stay tuned for more updates.
Written by Syed Jaffar Hussain
An Oracle Database Expert for over 15 years from his 20 years of Information Technology (IT) career. Over the past 15 years of Oracle journey, he involved with several local and large scaled international banks where he implemented and managed highly complex cluster and non-cluster environments with over 100’s of business critical databases. Recognizing his efforts and contribution towards the Oracle community, Oracle awarded him the prestigious ‘Best DBA of the year, 2011’ and Oracle ACE Director status. He also acquired industry best Oracle credentials, Oracle Certified Master (OCM), Oracle RAC Expert, OCP DBA 8i,9i,10g & 11g in addition to ITIL Expertise.
Syed is an active Oracle speaker, regularly presents technical sessions and webinars on various Oracle database technologies at many Oracle events. You can visit his technical blog, http://jaffardba.blogspot.com where he discuss and writes the workaround/solution about the issues confronted from his day-to-day activities. Apart from being the part of the core Technical Review committee member for a few Oracle technology oriented books, he also co-authored an Oracle 11g R1/R2 Real Application Cluster Essentials and Oracle Expert RAC books. His blog can be found at http://jaffardba.blogspot.com/