It is an integral part of Haddop administration to troubleshoot running or failed jobs. In order to troubleshoot a running/failed job, we must view the application’s log file. This post focuses on the HDPCA exam objective “View an application’s log file”. We will run a sample map reduce program and view the status of the program using the command line and ResourceManager UI.
Running an Example job
The HDP installation comes with few examples MapReduce jobs which we can run to test the yarn functionality. You can check the available examples by running the below command:
[hdfs@nn1 ~]$ hadoop jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-mapreduce-examples.jar An example program must be given as the first argument. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
As you can see there are several example jobs can be run. The job is provided with the description on what it does.
Word Count
1. Lets us run the famous “word count” job and see if it runs properly. Lets copy a file from local to the HDFS with some sample data.
[hdfs@nn1 ~]$ cat /home/hdfs/test_file This is a test file.
[hdfs@nn1 ~]$ hdfs dfs -put /home/hdfs/test_file /user/test/test_file
2. We can now run the “word count” example on the test_file we just uploaded to HDFS. The syntax to run the job is:
$ hadoop jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount [input directory] [output directory]
Here,
input directory – This will contain the input file(s)
output directory – This will contain the output file(s).
Lets run the job now:
$ hadoop jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /user/test /user/test/output
You can verify the output from the output directory.
$ hdfs dfs -ls /user/test/output Found 2 items -rw-r--r-- 3 hdfs hdfs 0 2018-07-28 10:11 /user/test/output/_SUCCESS -rw-r--r-- 3 hdfs hdfs 31 2018-07-28 10:11 /user/test/output/part-r-00000
$ hdfs dfs -cat /user/test/output/part-r-00000 This 1 a 1 file. 1 is 1 test 1
Viewing application log file
There are 2 ways to view the log file of an application.
1. Using command line
2. Using ResourceManager UI
Using command line
You can view the currently running jobs with the “yarn application” command. To list all the options available, use the “yarn application” command without any arguements. The command need to be executed with the “yarn” user.
[yarn@nn1 ~]$ yarn application
You can list the running jobs using the below command:
$ yarn application -list 18/07/28 11:19:06 INFO client.RMProxy: Connecting to ResourceManager at dn3.localdomain/192.168.1.5:8050 18/07/28 11:19:08 INFO client.AHSProxy: Connecting to Application History server at dn3.localdomain/192.168.1.5:10200 Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1532705907874_0207 QuasiMonteCarlo MAPREDUCE ambari-qa default RUNNING UNDEFINED 5% http://nn2.localdomain:38307
Here each job has a unique applicationID. You can also filter the jobs using the -appStates option where you can filter the jobs based on their status (RUNNING, ACCEPTED, NEW, FINISHED etc).
$ yarn application -list -appStates FINISHED 18/07/28 11:22:16 INFO client.RMProxy: Connecting to ResourceManager at dn3.localdomain/192.168.1.5:8050 18/07/28 11:22:17 INFO client.AHSProxy: Connecting to Application History server at dn3.localdomain/192.168.1.5:10200 Total number of applications (application-types: [] and states: [FINISHED]):13 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1531843386282_0002 word count MAPREDUCE ambari-qa default FINISHED SUCCEEDED 100% http://dn2.localdomain:19888/jobhistory/job/job_1531843386282_0002
You can Also kill a running job using the Application ID:
$ yarn application -kill application_1532705907874_0212 18/07/28 11:24:25 INFO client.RMProxy: Connecting to ResourceManager at dn3.localdomain/192.168.1.5:8050 18/07/28 11:24:26 INFO client.AHSProxy: Connecting to Application History server at dn3.localdomain/192.168.1.5:10200 Killing application application_1532705907874_0212 18/07/28 11:24:27 INFO impl.YarnClientImpl: Killed application application_1532705907874_0212
Now to view an application’s log, use the below command:
$ yarn logs -applicationId application_1531843386282_0001 18/07/28 11:30:05 INFO client.RMProxy: Connecting to ResourceManager at dn3.localdomain/192.168.1.5:8050 18/07/28 11:30:06 INFO client.AHSProxy: Connecting to Application History server at dn3.localdomain/192.168.1.5:10200 18/07/28 11:30:09 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 18/07/28 11:30:09 INFO compress.CodecPool: Got brand-new decompressor [.deflate] Container: container_1531843386282_0001_01_000001 on nn1.localdomain_45454 LogAggregationType: AGGREGATED ========================================================================== LogType:launch_container.sh LogLastModifiedTime:Tue Jul 17 21:34:48 +0530 2018 LogLength:4381 LogContents: #!/bin/bash set -o pipefail -e export PRELAUNCH_OUT="/hadoop/yarn/log/application_1531843386282_0001/container_1531843386282_0001_01_000001/prelaunch.out" exec >"${PRELAUNCH_OUT}" export PRELAUNCH_ERR="/hadoop/yarn/log/application_1531843386282_0001/container_1531843386282_0001_01_000001/prelaunch.err" exec 2>"${PRELAUNCH_ERR}" echo "Setting up env variables" .... ....
This will be a very long log file. You can view this log file to troubleshoot any failures during the application execution.
Viewing logs using ResourceManager UI
Another way to view the application logs is to use the ResourceManager UI. Point your web browser to http://[resource manaer hostname or IP]/8088, to access the ResourceManager Web UI.
http://[resource manaer hostname or IP]/8088
Or you can also use the ambari to go to ResourceManager UI.
In the Resouce Manager UI, you can view the applications which are filtered with their State – RUNNING, ACCEPTED, NEW etc. In my case, the job was finished, so I can look into the FINISHED jobs to find the application job. You can click the applicationId to find more details on the job.
From this page, you can also view the logs of the application.
Below are sample logs of a finished job:
Viewing Log files in Hadoop eco system
It is also important to know the log file locations of each of the ahdoop ecosystem components such as HDFS, ResourceManager, NodeManager etc. There will be 3 types of file for each component/deamon in hadoop ecosystem.
- .log
- .out
- .log.[date]
.log extension log files
The log files with the .log extension show the log messages for the running daemons. If there are any errors encountered while the daemon is running, the stack trace of the error is logged in these files. The example below is a .log extension log file for the NodeManager.
.out extension log files
The log files with the .out extension are created and written to during start-up of deamons. It is very rare that these files get populated, but they can be helpful when trying to determine why Resource Manager, Node Manager, or the Job History Server daemons are not starting up. Example shown below is for the .out extension log file for the NodeManager Deamon.
.log.[date]
The log files with extension .log.[date] are created when the log files are rotated. These files are useful when an issue has occurred multiple times, and comparing these older log files with the most recent log file can help uncover patterns of occurrence.
Finding location of log files from ambari
The location of these log files can be easily found out from the ambari. For example to find YARN log file locations, go to Services > YARN > Configs. Search for the property “YARN_LOGS_DIR” in the search box.
References :
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_using-apache-hadoop/content/log_files.html
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_using-apache-hadoop/content/running_mapreduce_examples_on_yarn.html