Before taking the HDPCA exam, you can get the feel of the exam by using the HDPCA practice exam on AWS cloud. The practice exam is very similar to the actual exam. You can perform 6 tasks for practice on this machine. The recommended instance configuration in AWS is m3.2xlarge which has 30 GB of […]
HDPCA
HDPCA Exam Objective – Configure HiveServer2 HA ( Part 2 – Configure HA )
Note: This is post is part of the HDPCA exam objective series Hive first started with HiveServer1. However, this version of the Hive server was not very stable. It sometimes suspended or blocked clients’ connection quietly. Since version 11, Hive includes a new Hive server called HiveSever2 as an addition to HiveServer1. HiveServer2 is an […]
How to configure Capacity Scheduler Queues Using YARN Queue Manager
Note: This is post is part of the HDPCA exam objective series Capacity Scheduler is mainly designed for multitenancy, where multiple organizations collectively fund the cluster based on the computing needs. There is an added benefit that an organization can access any excess capacity not being used by others. This provides elasticity for the organizations […]
How to Create HDFS policies in Ranger
Note: This is post is part of the HDPCA exam objective series Apache Ranger is an application that enables data architects to implement security policies on a big data ecosystem. The goal of this project is to provide a unified way for all Hadoop applications to adhere to the security guidelines that are defined. Here […]
How to Configure Hive Authorization Using Apache Ranger
Note: This is post is part of the HDPCA exam objective series Apache Ranger is a framework for enabling, monitoring, and managing the comprehensive data security across the Hadoop platform. Ranger simply helps a Hadoop admin with various security management tasks. It provides a mechanism to manage the security from a single pane for various […]
HDPCA Exam Objective – Configure HiveServer2 HA ( Part 1 – Installing HiveServer )
Note: This is post is part of the HDPCA exam objective series It is important to configure high availability in production so that if one of the hiveserver2 fails, the others can respond to client requests. This can be achieved by using the ZooKeeper discovery mechanism to point the clients to the active Hive servers. […]
HDPCA Exam Objective – View an application’s log file (Troubleshoot a failed job)
Note: This is post is part of the HDPCA exam objective series It is an integral part of Haddop administration to troubleshoot running or failed jobs. In order to troubleshoot a running/failed job, we must view the application’s log file. This post focuses on the HDPCA exam objective “View an application’s log file”. We will […]
HDPCA Exam Objective – Configure and manage alerts
Note: This is post is part of the HDPCA exam objective series Monitoring the health of Hadoop cluster is an important aspect of Hadoop administration. Ambari provides us the centralized management of health alerts and checks for the services in your cluster. You can set thresholds and can disable/enable alerts using the ambari UI. You […]
HDPCA Exam Objective – Install and configure Knox
Note: This is post is part of the HDPCA exam objective series Knox Basics Knox Gateway is another Apache project that addresses the concern of secured access to the Hadoop cluster from corporate networks. Knox Gateway provides a single point-to-point of authentication and access for Apache Hadoop services in a cluster. Knox runs as a […]
HDPCA Exam Objective – Configure the Capacity Scheduler
Note: This is post is part of the HDPCA exam objective series YARN Schedulers The Hadoop YARN scheduler is responsible for assigning resources to the applications submitted by users. There are 3 types of schedulers in YARN. First in First out (FIFO) (Hadoop 1.x) Fair scheduler Capacity scheduler First in First out (FIFO) By default, […]