Getting Started
Introduction
Let's dive into Ryba and bootstrap our first Hadoop Cluster. The most simple way is to download the ryba-cluster package. It comes with a ready to use configuration and can serve as a starting example.
The package deploys a cluster composed of 6 nodes. We use Vagrant to manage our 6 virtual machines. Running 6 virtual machines on the same developer machine requires 16 GB of RAM. If you dont have enough resources on your machine, you can use physical nodes or virtual machines at your disposal. This requires to update the configuration to reflect the hostnames and IP addresses of those servers.
These instructions presume that your host computer is connected to the Internet. You can find further instructions in the documentation to work offline.
Environnement
Ryba should work on any operating system. The given instructions are made for UNIX based OS'. We only work on Linux and OSX but the tools we used are all available on Windows.
Install Git
You can either install it as a package, or download from source and compile it yourself.
On linux systems, you can install from you package manager by typing : yum install git
or apt-get install git
( if you are on debian base systems).
On OSX or Windows, you can download the Git installer
available for your operating system.
Install Node.js
Ryba is written to run on the Node.js plateform. Dependencies are managed with NPM, the Node.js Package Manager. To install Node.js, the recommended way is to use n. If you are not familiar with Node.js, it would be easier to simply download the Node.js installer available for your operating system.
Download the ryba-cluster
starting package
It comes as mentioned above with a pre-configured cluster. Ryba is started from
this package. In order to get Ryba-cluster we recommend to git clone directly
the repository, then to install the dependencies run npm install
. Ryba is a
Node.js good citizen. The more familiar you are with Node.js the faster you will
understand Ryba's internal operating way.
You can open the package.json file to check the dependencies.
Run in a prompt
git clone https://github.com/ryba-io/ryba-cluster.git
cd ryba-cluster
Get Familiar with the package
ryba-cluster
package has been prepared as a reference to run ryba. The project
layout contains the following files and folders
- "bin" From this folder you can run vagrant, Ryba and manage your YUM local repositories.
- "conf" This folder stores configuration files. The configurations files are modules that ryba will merge when it's launched.
- "node_modules" This is the folder managed by NPM and used by Node.js to find it's dependencies.
- "packages.json" A Node.js specific file which describe your project and its dependencies.
Set UP and start your cluster
This step is to bootstrap easily your cluster with Vagrant. You can read about Vagrant if you are not familiar with it. It's an easy-to-use software to manage Virtual Machine. Just describe the VM's properties (memory, processors, hosts adresses...) in Vagrant's configuration file, it will then read it and copy the files needed and starts you VM.
The configuration file we provide uses Vagrant to bootstrap a cluster of 6 nodes with a private network. You'll need 16GB of memory. It also registers the server names and IP address inside your "/etc/hosts" file. You can skip this step if you already have physical or virtual nodes at your disposal. Just modify the "conf/server.coffee" file to reflect your network topology.
Install Ryba
This section will download all your dependencies and leverages Node.js tools.
When you run npm install
, NPM reads the names and versions of your
dependencies from the package.json file. It downloads and installs them inside
the node_modules directory.
npm install
Run Ryba
Wait for your cluster and its configuration to be ready. Then, to make Ryba install, start and check your components is as simple as executing:
bin/ryba install
Configure your host machine
On your host, you need declare the name and IP addresses of your cluster (if using Vagrant). You'll also need to import the Kerberos client configuration file.
sudo tee -a /etc/hosts << RYBA
10.10.10.11 master1.ryba
10.10.10.12 master2.ryba
10.10.10.13 master3.ryba
10.10.10.14 front1.ryba
10.10.10.16 worker1.ryba
10.10.10.17 worker2.ryba
10.10.10.18 worker3.ryba
RYBA
# Write "vagrant" as a password
# Be careful, this will overwrite your local krb5 file
scp vagrant@master1.ryba:/etc/krb5.conf /etc/krb5.conf
Access the Hadoop Cluster web interfaces
You can read about Kerberos if you are not familiar with it. Your host machine is now configured with Kerberos. From the command line, you shall be able to get a new ticket:
echo hdfs123 | kinit hdfs@HADOOP.RYBA
klist
Most of the web applications started by Hadoop use SPNEGO
to provide Kerberos authentication. SPNEGO isn't limited to Kerberos and is
already supported by your favorite web browser. However, most of the browsers
(with the exception of Safari) need some specific configuration. Refer to the
web to configure it or use curl
:
curl -k --negotiate -u: https://master1.ryba:50470
You shall now be familiar with Ryba. Join us and participate to this project on GitHub.