This post introduces the MapReduce framework that enables you to write applications that process vast amounts of data, in parallel, on large clusters of commodity hardware, in a reliable and fault-tolerant manner. In addition, this post describes the architectural components of MapReduce and lists the benefits of using MapReduce. MapReduce It is a software framework […]
Archives for September 2018
CCA131 – Configure NameNode HA
Note: This post is part of the CCA Administrator Exam (CCA131) objectives series HDFS High Availability Overview A single NameNode is a single point of failure in a Hadoop cluster. You can experience HDFS downtime from an unexpected NameNode Crash or planned maintenance of NameNode. Having a NameNode high availability setup avoids these single points […]
CCA131 – Configuring HDFS snapshot Policy
Note: This post is part of the CCA Administrator Exam (CCA131) objectives series What is HDFS Snapshot Policy You can create Snapshot Policies using Cloudera Manager for taking an automated snapshot of snapshottable paths on HDFS. The snapshot policies run at the time specified (hourly, daily, weekly etc) by the user. Before we can create […]
How To Disable MD5-based HMAC Algorithm’s for SSH
This is a short post on how to disable MD5-based HMAC algorithm’s for ssh on Linux. 1. Make sure you have updated openssh package to latest available version. 2. To change the ciphers/md5 in use requires modifying sshd_config file, you can append Ciphers & MACs with options as per the man page. For example: # […]
How to increase swap space on Linux
The Ask The user wants to increase the swap space on their Linux machine (CentOS/RHEL). The existing swap space has been configured as an LVM logical Volume. The Solution The following solution will first add a new physical volume (PV) to the volume group being used, and will then extend the swap logical volume. In […]
CCA 131 – Create/restore a snapshot of an HDFS directory (Using Cloudera Manager)
Note: This post is part of the CCA Administrator Exam (CCA131) objectives series HDFS Snapshot Directories in HDFS can be snapshotted, which means creating one or more point-in-time images, or snapshots, of the directory. Snapshots include subdirectories, and can even include the entire filesystem (be careful with this for obvious reasons). Snapshots can be used […]
CCA131 – Create an HDFS user’s home directory
Note: This post is part of the CCA Administrator Exam (CCA131) objectives series In the exam, you may be asked to create a home directory for an existing local user onto HDFS. You may further be asked to set a specific ownership or permission to the home directory. The process basically involves: Create a local […]
CCA 131 – Configure HDFS ACLs
Note: This post is part of the CCA Administrator Exam (CCA131) objectives series The basis for Hadoop Access Control Lists is POSIX ACLs, available on the Linux filesystem. These ACLs allow you to link a set of permissions to a file or directory that is not limited to just one user and a group who […]