Setting up a Beowulf Cluster: WEEK #5 & WEEK #6
Last time I posted we
were confused about which Operating System to install. We were more inclined to
use the PelicanHPC. But it didn't work the way we wanted it to work. We could
only boot the PelicanHPC in Live mode, which is not that great. Although using
PelicanHPC would have been the better choice because we would have created our
cluster in no time. According to Mayank Sharma, it would have only taken less than
3 minutes.[1]
After failing with both
PelicanHPC and RedHat, we choose Ubuntu 6.04. Yes, we had to install, the old
version, because our processors would not support the newer stable
versions.
Installing Required
Packages and Compilers
After finally deciding
which operating system to use, we had to install all the necessary packages,
and compilers. But then we faced another minor problem.
Problem: After support for an old Ubuntu releases had been
dropped, the repositories from their main repository servers such as archive.ubuntu.com and security.ubuntu.com was removed. So, when you run sudo apt-get update you are met with a list
of errors telling that the configured repositories were not found.
Solution: To get back on the repository
train, we edited /etc/apt/sources.list by following the guide
from http://www.warpconduit.net/2011/07/31/apt-repository-for-old-ubuntu-releases and replaced all
instances of archive.ubuntu.com and security.ubuntu.com with the very fitting old-releases.ubuntu.com. After that, running sudo apt-get update and the repository
indexes fixes.
Finally,
installation of GCC and G++ was done by the following command:
· sudo apt-get install gcc-4.0
· sudo apt-get install g++-4.0
Defining User for Mpi
Jobs
We referred the
following site: http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/
Here it stated that if
we create the same user in all the computers, then it would be alot easier when
we mount and run the MPI jobs on the nodes. So, following this site, we created
same user with the same user id in all the computers, i.e., both the slaves and
the master.
$ sudo adduser mpiuser
--uid 999
Now we mount the master
home directory; /home/mpiuser in Prithvi-desktop, on the slave nodes by the
following code:
vayu01:~$ sudo mount
prithvi-desktop:/home/mpiuser /home/mpiuser
We can check whether the
mount was successful or not by creating a folder or file in the master
/home/mpiuser, then checking on the node vayu01 if the same folder or file has
been created or not. Dont freak if mount is not successfull in the first try,
reboot the system and try again, it may take a while.
Installation of Open MPI
The following steps were followed
after downloading the file:
Step1: Untar and unzip the downloaded
file.
·
tar zxvf openmpi-1.4.4.tar.gz
Step2: Then navigate into the unzipped
directory and type the following, where:
/usr/local/opemmpi is the location in
which you would like to install OpenMPI (root or administrator access may be
required; type "sudo make install")
·
./configure
--prefix=/usr/local/openmpi
·
make
·
make install
Step3: Setting up Environment
Variables
Add the following environment variable
to your ~/.bashrc (if you are using bash) where /usr/local is the installation
directory:
export MPI_DIR=/usr/local/openmpi
Step4:Adding OpenMPI to your Path
You will also need to add OpenMPI to
your path. To view your PATH, type the following:
echo $PATH
This will probably look something like
PATH=/usr/bin:/bin:/usr/local/bin, which is a list of ':' separated directories
of where commands can be executed from without typing the full path. To add
OpenMPI to your path add the following to your ~/.bashrc file (where
/usr/local/openmpi is the path where your MPI implementation was installed):
·
export
PATH=/usr/local/openmpi/bin:$PATH
·
export
LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH
To let Open MPI know which machines to run your
programs on, you can create a file to store this. I will call this file /home/mpiuser/.mpi_hostfile
and it could contain the following:
# The Hostfile for Open MPI
# The master node, 'slots=1' is used because it is a single-processor
machine.
localhost slots=1
# The following slave nodes are single processor
machines:
vayu01
jal02
agni03
dharti04
jeevan05
References:
Comments
Post a Comment