In this series of articles we will look at all the buildings in the Cloudera Hadoop Cluster that are recommended by suppliers and industrial companies.
Part 2: Establish the conditions for Hadoop and increase safety
Part 3: How to install and configure the Cloud Manager on CentOS/RHEL 7?
Part 4: How to install the HDC and set up service rooms on CentOS/RHEL 7
Part 5: How do you set a high availability for Namenode?
Part 6: How to set high availability for the resource manager
Part 7: How do you install and set up a high-availability hive?
Part 8: How to install and configure the clock mechanism (authorisation tool)
Part 9: How to install Kerberos (Kerberizing the Cluster) for Hadoop authentication?
Part 10: Setting up the cluster(s) on CentOS/RHEL 7
Setting up and running the operating system The prerequisites are the first steps in creating a Hadoop cluster. Hadoop can run on multiple Linux platforms: CentOS, RedHat, Ubuntu, Debian, SUSE, etc., In real-time most Hadoop clusters are built on RHEL/CentOS, we will use CentOS 7 for demonstration in this series of lessons.
Within an organization, the installation of the operating system can be done with Kickstart. If it is a cluster of 3-4 nodes, manual installation is possible, but if we build a large cluster of more than 10 nodes, it is annoying to install the operating system one after the other. In this scenario, where the Kickstart method comes into play, we can start a mass installation with Kickstart.
Achieving good performance in a Hadoop environment depends on the availability of the right hardware and software. Creating a Hadoop production cluster therefore requires a lot of attention for hardware and software.
In this article we examine several benchmarks for the installation of the operating system and some best practices for the installation of the Cloudera Hadoop Cluster server on CentOS/RHEL 7.
Important considerations and best practices when deploying the Hadoop server
Below are the best ways to configure the deployment of the Cloudera Hadoop Cloud Server Cluster on CentOS/RHEL 7.
- Hadoop servers do not require standard business servers to build a cluster, they require standard equipment.
- It is recommended to have 8 to 12 data disks in a production group. Depending on the nature of the workload, we have to make a decision. If the cluster is designed for compute-intensive applications, it is preferable to have 4 to 6 drives to avoid I/O problems.
- Data disks must be partitioned separately, for example by starting with /data01 and ending with /data10.
- The RAID configuration is not recommended for working nodes, since Hadoop itself provides the data elasticity by replicating blocks in 3 by default. The JBOD is therefore the most suitable for the work units.
- For main servers, RAID 1 is the best approach.
- The default file system on CentOS/RHEL 7.x is XFS. Hadoop supports XFS, ext3 and ext4. We recommend the ext3 file system because it has been tested for good performance.
- All servers must have the same version of the operating system, and the last server must have the same small version.
- The best is homogeneous hardware (all working nodes should have the same hardware properties (RAM, disk space, kernel, etc.).
- Depending on the load of the cluster (Balanced Workload, Intensive Computing, Intensive I/O) and its size, the resource planning (RAM, CPU) will be different for each server.
Here is an example of partitioning hard disks on servers with a storage capacity of 24 TB.
Install CentOS 7 to deploy the Hadoopserver.
What you need to know before installing CentOS 7 Server for Hadoop Server.
- The minimum installation is sufficient for Hadoop servers (work nodes), in some cases the GUI can only be installed for master servers or management servers where we can use browsers for the web-based management tool interfaces.
- Configuration of networks, host names and other parameters related to the operating system can be performed after the operating system has been installed.
- In real time, server manufacturers will have their own console for server communication and management, for example – Dell servers have iDRAC, a device built into the server. This iDRAC interface allows us to install an operating system with an image of the operating system on our local system.
In this article we have installed the operating system (CentOS 7) on VMware’s virtual machine. Here we will not have multiple disks to run partitions. CentOS is similar to RHEL (same functionality), so we look at the installation steps of CentOS.
1. Start by downloading the CentOS 7.x ISO image to your local Windows system and select it when starting up the virtual machine. Select Install CentOS 7 as shown in the figure.
CentOS 7 Install Startup Menu
2. Select Language, the default language is English and click Next.
CentOS 7 Select language
3. Select the software – Select Minimum installation and click Done.
Centralized softwareCentralized softwareCentralized softwareCentralized softwareCentralized software 7 Minimal installation
4. Set the root password, because that tells us how to set it.
Set a root password
5. The purpose of the installation is an important step to which attention must be paid. You must select the drive on which the Control System is to be installed, the specific drive must be selected for the Control System. Click on Setup Assignment and select Disk, there will be several stations in real time, we need to select the desired sda.
Installation locationCentOS Installation disk
6. Other storage options – Select the second option (I will configure markup) to configure operating system related markup, such as /var, /var/log, /home, /tmp, /opt, /swap.
Manual splitting of CentOS
7. Then start the installation.
AssemblyAssemblyCentOS 7 AssemblyCentOS 7 AssemblyCentOS 7
8. Once the installation is complete, restart the server.
CentOS 7 Installation Complete
9. Connect to the server and set the hostname.
# hostnamectl status
# hostnamectl set-hostname tecmint
# hostnamectl status
Set the host name to CentOS
In this article we have covered the installation steps of the operating system and the best practices for partitioning the file system. All these general indications may need to be qualified, depending on the nature of the workload, in order to achieve the best performance in the group. Cluster planning is an art for the Hadup administrator. In the following article, we look at the requirements at the level of the operating system and how we can improve safety.
If you like what we do here at TecMint, you should think:.
TecMint is the fastest growing and most reliable community site for all articles, guides and books about Linux on the web. Millions of people visit TecMint! to find or consult thousands of published articles that are accessible to everyone for FREE.
If you like what you read, consider buying us a coffee (or 2) as a thank you.
Thank you for your continued support.
install cloudera manager on centos 7,download cdh,cloudera cdh 6 installation,cloudera installation,automated installation by cloudera manager,download cloudera quickstart vm,cloudera cluster installation step by step