Getting Started With Spark and Scala
What is Spark?
Spark is a distributed cluster computing designed by Apache Software Foundation. It was built on top of Hadoop MapReduce and was designed for fast computation. One of the main features that Spark addresses is its in-memory cluster computing that increases the processing speed of an application.
Before you install spark, please make sure you have installed Java and Hadoop in your system. My instructions are targeted for I am using Ubuntu 14.04 LTS.
Java Installation
Check whether your system has Java installed or not. Please type the following in your command line:
$java -version
You should get something like below:
java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
If Java is not installed please refer to this link.
Hadoop Installation
Verify Hadoop installation with the following command:
$hadoop version
Output should be something like this:
Hadoop 2.7.3 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff Compiled by root on 2016-08-18T01:41Z Compiled with protoc 2.5.0 From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
If Hadoop is not installed then please use this link.
Scala Installation
Verify Scala version using the following command:
$scala -version
Output should be something like this:
Scala code runner version 2.9.2 -- Copyright 2002-2011, LAMP/EPFL
If Scala needs to be installed please refer this link.
Spark Installation
Download Spark from here.
Extract the tgz file as follows:
$ tar xvf spark-2.1.1-bin-hadoop2.3.tgz
Move the spark files to /usr/local/spark directory as follows:
$ su mv spark-2.1.1-bin-hadoop2.3 /usr/local/spark/
Set up the environment for Spark by adding the following line to ~/.bashrc file:
export PATH = $PATH:/usr/local/spark/bin
Use the following command for sourcing the ~/.bashrc file.
$ source ~/.bashrc
Verify Spark Installation using the following command:
$spark-shell
Output:
Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/09/18 11:09:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/09/18 11:09:37 WARN util.Utils: Your hostname, dwnpcpu190 resolves to a loopback address: 127.0.1.1; using 192.168.2.176 instead (on interface eth0) 17/09/18 11:09:37 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address 17/09/18 11:09:59 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 17/09/18 11:09:59 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException 17/09/18 11:10:04 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://192.168.2.176:4040 Spark context available as 'sc' (master = local[*], app id = local-1505712279237). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.1.1 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144) Type in expressions to have them evaluated. Type :help for more information. scala>
Comments
Post a Comment