Monday, February 11, 2019

Configuring NameNode Heap Size

Okay, so as my hadoop already reach more than 6 Milion files, the current setting is not suitable for running nameNode. So need toTotal Java Heap need to  increased to 6GB.

Details can refer this arcticle : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/configuring-namenode-heap-size.html

NameNode heap size depends on many factors, such as the number of files, the number of blocks, and the load on the system. The following table provides recommendations for NameNode heap size configuration. These settings should work for typical Hadoop clusters in which the number of blocks is very close to the number of files (generally, the average ratio of number of blocks per file in a system is 1.1 to 1.2).
Some clusters might require further tweaking of the following settings. Also, it is generally better to set the total Java heap to a higher value.
Table 1.11. Recommended NameNode Heap Size Settings
Number of Files, in Millions
Total Java Heap (Xmx and Xms)
Young Generation Size (-XX:NewSize -XX:MaxNewSize)
< 1 million files
1126m
128m
1-5 million files
3379m
512m
5-10
5913m
768m
10-20
10982m
1280m
20-30
16332m
2048m
30-40
21401m
2560m
40-50
26752m
3072m
50-70
36889m
4352m
70-100
52659m
6144m
100-125
65612m
7680m
125-150
78566m
8960m
150-200
104473m
8960m


[Note]Note
Hortonworks recommends a maximum of 300 million files on the NameNode.

You should also set -XX:PermSize to 128m and -XX:MaxPermSize to 256m.
Following are the recommended settings for HADOOP_NAMENODE_OPTS in the hadoop-env.sh file (replacing the ##### placeholder for -XX:NewSize, -XX:MaxNewSize, -Xms, and -Xmx with the recommended values from the table):
-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=##### -XX:MaxNewSize=##### -Xms##### -Xmx##### -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}
If the cluster uses a secondary NameNode, you should also set HADOOP_SECONDARYNAMENODE_OPTS to HADOOP_NAMENODE_OPTS in the hadoop-env.sh file:
HADOOP_SECONDARYNAMENODE_OPTS=$HADOOP_NAMENODE_OPTS
Another useful HADOOP_NAMENODE_OPTS setting is -XX:+HeapDumpOnOutOfMemoryError. This option specifies that a heap dump should be executed when an out-of-memory error occurs. You should also use -XX:HeapDumpPath to specify the location for the heap dump file:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./etc/heapdump.hprof
Share: 

0 komen:

Post a Comment