Hadoop——伪分布式搭建

伪分布式实际上就是假分布式,即只用一台机器来模拟了整个分布式的过程,所以伪分布式下Hadoop就是在一台机器上配置了所有Hadoop的节点。

Posted by caosw on August 24, 2019

Hadoop——伪分布式搭建


1、安装jdk

1.1、下载jdk源码包,上传至服务器的/opt路径下并解压,这里使用了jdk-8u65-linux-x64.tar.gz
[root@nna /opt]# tar zxvf jdk-8u65-linux-x64.tar.gz
1.2、创建软连接(个人习惯)
[root@nna /opt]# ln -s /opt/jdk1.8.0_65/ /usr/local/java
1.3、配置JAVA_HOME环境变量
[root@nna /opt]# vi /etc/profile
export JAVA_HOME=/usr/local/java
export PATH=$PATH:$JAVA_HOME/bin
[root@nna /opt]# source /etc/profile

2、安装hadoop

2.1、下载hadoop安装包,上传至服务器的/opt路径下并解压,这里使用的hadoop版本为2.7.7
[root@nna /opt]# tar zxvf hadoop-2.7.7.tar.gz
2.2、创建软连接
[root@nna /opt]# ln -s /opt/hadoop-2.7.7 /usr/local/hadoop
2.3、配置HADOOP_HOME环境变量
[root@nna /opt]# vi /etc/profile
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
[root@nna /opt]# source /etc/profile

3、配置hadoop

3.1、进入$HADOOP_HOME/etc/hadoop目录
[root@nna /opt]# cd $HADOOP_HOME/etc/hadoop
3.2、编辑core-site.xml文件
[root@nna /usr/local/hadoop/etc/hadoop]# vim core-site.xml
<configuration>
    <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost/</value>
    </property>
</configuration>
3.3、编辑hdfs-site.xml文件
[root@nna /usr/local/hadoop/etc/hadoop]# vi hdfs-site.xml
<configuration>
    <property>
            <name>dfs.replication</name>
            <value>1</value>
    </property>
</configuration>
3.4、编辑mapred-site.xml文件,需要先通过mapred-site.xml.template样本文件复制一份出来
[root@nna /usr/local/hadoop/etc/hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@nna /usr/local/hadoop/etc/hadoop]# vi mapred-site.xml
<configuration>
    <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
    </property>
</configuration>
3.5、编辑yarn-site.xml文件
[root@nna /usr/local/hadoop/etc/hadoop]# vi yarn-site.xml
<configuration>
    <property>
            <name>yarn.resourcemanager.hostname</name>
            <!-- 配置主机名 -->
            <value>nna</value>
    </property>
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
    </property>
</configuration>

3、配置SSH

3.1、查看是否启动了sshd进程
[root@nna /root]# ps -ef | grep sshd
root      4993     1  0 Aug23 ?        00:00:00 /usr/sbin/sshd -D
root      5509  4993  0 Aug23 ?        00:00:13 sshd: root@pts/0
root      7269  4993  0 Aug24 ?        00:00:00 sshd: root@pts/1
root     14509  7274  0 00:39 pts/1    00:00:00 grep --color=auto sshd
3.2、生成公私秘钥对,会生成~/.ssh文件,有id_rsa(私钥)和id_rsa.pub(公钥)
[root@nna /root]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Created directory '/root/.ssh'.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:aU46oAu1Ut8pF6yGU734IC4tVI7j5jVMC9CwPLjIPBE root@nna
The key's randomart image is:
+---[RSA 2048]----+
|.E               |
|o+.              |
|++.              |
|=.o. o   .       |
|o=*.o + S        |
| *=B.= O         |
|=.B=B B .        |
|o*o=.= .         |
|o+o   .          |
+----[SHA256]-----+
[root@nna /root]# cd .ssh/
[root@nna /root/.ssh]# ll
total 8
-rw-------. 1 root root 1675 Aug 25 00:42 id_rsa
-rw-r--r--. 1 root root  390 Aug 25 00:42 id_rsa.pub
3.3、追加公钥至~/.ssh/authorized_keys文件中
[root@nna /root/.ssh]# cat id_rsa.pub >> authorized_keys
3.4、修改authorized_keys的权限为644
[root@nna /root/.ssh]# chmod 644 authorized_keys
3.5、测试
[root@nna /root/.ssh]# ssh nna
Last login: Sun Aug 25 00:44:55 2019 from nna
[root@nna /root]#

4、伪分布式测试

4.1、修改hadoop环境变量配置文件,手动指定JAVA_HOME环境变量
[root@nna /root]# cd $HADOOP_HOME/etc/hadoop/
[root@nna /usr/local/hadoop/etc/hadoop]# vi hadoop-env.sh
...
export JAVA_HOME=/usr/local/java
...
4.2、对hdfs进行格式化
[root@nna /root]# hadoop namenode -format
4.3、启动hadoop的所有进程,这里图方便直接启动直接使用start-all.sh,建议分开启动start-dfs.sh和start-yarn.sh
[root@nna /root]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-namenode-nna.out
localhost: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-nna.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-nna.out
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-resourcemanager-nna.out
localhost: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-nna.out
4.4、启动完成后,通过jps查看启动了的进程
[root@nna /root]# jps
16226 NameNode
16899 NodeManager
16516 SecondaryNameNode
16676 ResourceManager
17095 Jps
4.5、查看hdfs的文件系统(这里由于未创建目录显示为空),创建目录
[root@nna /root]# hdfs dfs -ls /
[root@nna /root]# hdfs dfs -mkdir -p /caosw/test
[root@nna /root]# hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - root supergroup          0 2019-08-25 00:55 /caosw
4.6、通过浏览器查看hadoop的文件系统

4.7、停止hadoop所有进程
[root@nna /root]# stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: no datanode to stop
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop