Sunday, 10 January 2016

Hadoop & SSH

There were many excellent resources online that explain the installation of a single node pseudo-distributed Hadoop installation like here. But I saw that many of these existing instructions are for installing older versions of Hadoop and there has been some minor changes since then. In this post I will be explaining how I installed Hadoop 2.6.2 on Ubuntu 15.10 along with some cool stuff about SSH.

Install Java
Hadoop requires a working Java 1.5+ installation. Get it here.

Add a new user
We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps in keeping your original user account clean & keeps the hadoop user account secure.

This will add the user "hduser" and the group "hadoop" to your local machine. If you would like to give hduser super-user priviliges then add hduser to the sudoers list i.e the sudo user group by using:

Stuff about SSH

For doing secure communications we need to create an asymmetric key consisting of a private/public pair of keys. The private key is kept on the computer you log in from and so the public/private key pair is created here itself to avoid the hassle of moving the private key later to another location via (maybe) insecure channels.

A pass-phrase can be added during the creation of the key pair. The work of this  phrase is to unlock the private key that will allow the decryption of the incoming encrypted data. An SSH key pass-phrase is a secondary form of security that gives you a little time to change the keys when your original keys are stolen. This can be kept blank if there is a high no. of transactions between the hosts otherwise the user will have to enter the pass-phrase to unlock the private key during each transaction. This is applicable for hduser as you don’t want to enter the pass-phrase every time Hadoop interacts with its nodes.

The public key is added to the .ssh/authorized_keys file on all the computers you want to log in to.

It is a good idea to disable the PasswordAuthentication option when configuring sshd if you don't need it. This is because a lot of people with SSH servers use weak passwords and many online attackers will look for an SSH server, then start guessing passwords at random. If PasswordAuthentication is turned On while configuring sshd then any person can try brute-forcing the password and gain access to the system running the ssh-server. By disabling it we can make sure that only approved systems can gain access to the system. After disabling PasswordAuthentication, you will need to manually add the newly created public key into the .ssh/authorized_keys file of the remote host to gain access to it. Needless to say you most probably won't be able to copy the public ssh key using ssh-copy-id as it requires password authentication to be enabled.

After configuring sshd you will have to restart sshd by using this:
During the creation of the keys, if you have named your key something other than id_rsa(or another standard name) say "OfficeKey" then you will have to use the -i option when you use ssh like this:
This will prevent errors like "Permission denied (publickey)." from popping up.

Installation

Download hadoop and then extract it to /usr/local/.

Configuration

Step 1:Add the following to your ~/.bashrc file of user hduser (if you are on bash i.e).
Step 2:
Regarding disabling IPv6, IPv6 was already disabled in Hadoop 2.6.2 as defined in the file $HADOOP_HOME/etc/hadoop/hadoop-env.sh.

In $HADOOP_HOME/etc/hadoop/hadoop-env.sh change the value of JAVA_HOME to the directory where your Java has been installed. For my system I had to change it from:
    export JAVA_HOME=${JAVA_HOME}
to:
    export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Step 3:
Hadoop’s default configurations use hadoop.tmp.dir as the base temporary directory both for the local file system and HDFS. So we need to assign it a directory with the correct permissions.


Update $HADOOP_HOME/etc/hadoop/core-site.xml
Add the following lines within the configuration tags.


First copy mapred-site.xml.template into mapred-site.xml and then update $HADOOP_HOME/etc/hadoop/mapred-site.xml.
Add the following lines within the configuration tags.


Add these within the configuration tags in yarn-site.xml:


Update $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the following lines within the configuration tags.

Formatting HDFS filesystem
(via namenode)

This has to be done only the first time when you set up a new Hadoop cluster.

Start daemons

To start the HDFS, YARN, and MapReduce daemons, type:

A tool named jps can give the information regarding the running Hadoop processes.
Stop daemons

To stop the MapReduce, YARN, HDFS daemone, type:
 

Tuesday, 25 August 2015

GSoC 2015 Wrap Up Report

So the Google Summer of Code '15 is coming to an end. It has been a very memorable summer and this certainly qualifies to be the biggest project I have ever undertaken.

My work over the summer was to port the Amarok code-base to use Qt5/KF5 as much as possible because it was tough to port the entire base under the GSoC time-frame. I have ported a considerable portion of the code-base and now I will be continuing the project along with the community to see it to the end :)

I started my work by porting the CMake files and as soon as it was complete, I moved on to the porting of the C++ part. The current aim is to make Amarok compile while it's dependent on KF5::KDELibs4Support. I faced many problems along the way mainly because there are still parts of KF5 that need better documentation. I have mentioned about these problems in my previous posts.

Right now I am focusing on adding more information to the Amarok porting wiki page. Myriam helped me in creating different tables for each target that is created during the compilation. These tables contain all the relevant information regarding the files that are built for that target. The most important information is about the TODOs & the FIXMEs in a file. I have had to disable parts of the code-base and I will be documenting each of these in the porting wiki page so that they are not forgotten.

Also, I had been pushing all my commits to a repository containing Amarok's clone. I won't be pushing my future commits there and instead, I have now pushed all the commits to the kf5 branch of Amarok code-base that can be accessed in the kf5 head here.

I would like to thank Mark Kretschmann, Myriam Schweingruber, Amarok community and the KDE community in general for giving me this awesome opportunity and helping me in the project along the way.

Cheers.

Thursday, 6 August 2015

GSoC 2015 Week #7-10 with Amarok

I haven't posted here for quite some time and a lot has happened over the last few weeks. Blame my habit of procrastination for not posting more frequently ;)

  • Ported from KAction to QAction, KMenu to QMenu with the help of the porting scripts.
  • Added KF5::GlobalAccel, KF5::KIO components, Qt5::Sql, Qt5::Quick, Qt5::ScriptTools, KF5::PlasmaQuick, KF5::NotifyConfig and KF5::Archive components.
  • KGlobal::mainComponent().aboutData() is replaced with KAboutData::applicationData() which contains information such as authors, license, etc.
  • KGlobalAccel::setGlobalShortcut is used instead of setGlobalShortcut to set global shortcuts.
  • QApplication::type() no longer exists and hence the macro QApplication::qApp is used by casting in src/PluginManager.cpp and other files.
  • In TrayIcon.cpp a QMap has been created mapping each QString to its corresponding QAction which is done in KActionCollection and hence the calls to actionCollection()->action is replaced by calls to a function defined in the class itself.
  • KGlobalSettings::CompletionPopup is replaced with KCompletion::CompletionPopup.

Now after all this, I was getting linking errors during the linking of amaroklib and hence I decided that it was high time to port the code in src/context but during the port of the code in src/context, I realized that it is going to take a LOT of time to complete. We may have to move to QML as Plasma 5 only supports QML for the widgets. So Mark suggested to leave it aside for the moment. To get around it, I tried to disable the compilation of src/context and the files that depend on the stuff in src/context. Well I should have guessed that it won't work (The effect was similar to the fall of dominoes; for each file that was disabled, I had to disable one or more file that depended on it) and later I had to comment out the code that was causing the linking errors. I have appropriately marked out each commented piece of code (with "FIXME: disabled temporarily for KF5 porting") so that it can be re-enabled later and we won't have to deal with obscure bugs due to their absence.

Apart from this, I had to remove the second argument of the KPluginInfo constructor ( KPluginInfo( const QString & filename, const char* resource = 0 ) has changed to KPluginInfo (const QString &filename) ). I am still unsure whether this change is correct or not.

Sadly, over the last week I messed up my system and I had to repeat quite a bit of the work but that didn't take much of my time. The port seems to be proceeding nicely and I am now reading about QNetworkAccessManager to replace QHttp that is no longer present in Qt5 API.

Cheers !!!

Wednesday, 8 July 2015

GSoC 2015 Week #5, 6 with Amarok

The midterm evaluations are over and here is another report on the progress of the Amarok port so far.

Here are some of the major changes to the code-base that I have made in the last two weeks:

    * I removed setCodecForCStrings for setting UTF-8 encoding in QString. In Qt5 QString automatically calls the fromUtf8 function instead of fromAscii function and as Mark pointed out we won't need to call setCodecForCStrings to set the UTF-8 encoding anymore.
    * I have removed the calls to KDialog::makeStandardCaption as the application name is automatically added to the titlebar in desktop platforms.
    * I have added ViewPrivate class to the ContextView class and other functions, signals to restore its functionality to when it derived from Plasma::View. The new functions in question are few and just as a reminder,I had changed the inheritance structure to derive it from QGraphicsView in a previous commit.
    * CodeCompletionModelControllerInterface3 is now contained in CodeCompletionModelControllerInterface and so the inheritance structure of the AmarokScriptCodeCompletionModel class has been changed to reflect this. More details can be seen here.
    * KIO::upUrl is used to up the directory now instead of KUrl::upUrl.
    * Also KTextEditor::SmartInterface is no longer present in KF5 and KTextEditor::Editor::instance() is used instead of KTextEditor::EditorChooser::editor().
    * I have ported the code dependent on threadweaver and added defaultBegin, defaultEnd functions to the classes that inherit from ThreadWeaver::Job apart from making it inherit from QObject too. I repeated other changes made in previous week's commit(with ID:914b8cc) for the rest of the code-base.
    * The calls to Weaver::instance() has been changed to Queue::instance() to access the global application's queue.
    * The pure virtual function ThreadWeaver::Job::run() prototype has changed and hence new formal parameters have been added in the derived classes' re-implementations.
 
Apart from all this, I was wondering why is Amarok named so? What's the relation between a media player and a mythical beast/wolf god from the Inuit mythology? I am sure this question must have crossed the minds of many others too.
 
Cheers !!!

Tuesday, 23 June 2015

GSoC 2015 Week #4 with Amarok

So the fourth week is over and the mid term evaluations are upon us and I have to say, I didn't even realize how quickly the last four weeks have gone by :)

I had to take the third week off but it didn't affect my project much as I had started working before 25 May.
So some of the major changes to the code-base that I have made over the last few days are as follows:
  • I added KF5::KCMUtils and KF5NewStuff components.
  • Most of the KDialog code has been moved to QDialog which included changes like replacement of setButtons with QDialogButtonBox. Although KDialog is deprecated and this conversion seems unnecessary at this stage of porting, I encountered dependencies on classes in KF5 that now inherits from QDialog instead of KDialog. Due to this some KDialog code in Amarok had to be ported now rather than later.
  • I made a few classes in Amarok to inherit from KPageDialog instead of KDialog to let us use the buttonBox() function in the former which is pretty useful.
  • I removed the definition of slotButtonClicked() slot in deviceconfiguredialog.cpp as there were not any signals connecting to it. The rest of the slotButtonClicked() calls were replaced with QDialog::accept() or QDialog::reject().
  • I discovered a function QDir::toNativeSeparators which is really useful. On moving from KUrl to QUrl, I had to replace the calls to KUrl::addPath(const QString& txt) with u.setPath(u.path() + '/' + txt) [Copied straight from the docs ;)]. Now you can see that we have a '/' in setPath which may or may not be the native separator of the platform. So here the static function QDir::toNativeSeparators comes into use and it will make the separators appropriate (if needed) for the underlying operating system.

Cheers !!!

Monday, 8 June 2015

GSoC 2015 Week #2 with Amarok

The second week of GSoC coding period has come to an end.

In case you haven't read my previous blog posts then I should start by saying that I am working on porting Amarok to Qt5/KF5 as part of the GSoC 2015 program under Mark Kretschmann (markey) and Myriam Schweingruber (mamarok).

So this week some of the changes that I made are as follows:
  • Changed KIcon to QIcon. If anyone is using the porting script in kde-dev-scripts for this, then be careful that the script doesn't affect the return types of the functions which previously returned an instance of KIcon. This means that if the return type of a function is KIcon then you have to change it to QIcon by yourself. Not a major inconvenience, if you ask me.
  • Added include directory of phonon (I added PHONON_INCLUDES instead of PHONON_INCLUDE_DIR) to the cmake path in which header files are searched. Adding this wasn't important before but now with Phonon4Qt5 it is important. More information can be found here.
  • I will be moving amarok to KF5::Plasma after it has been ported to use the other KF5 components because I think that porting the rest of the code demands more attention.
  • I have marked some code with "#TODO KF5" so that the work there wont be forgotten and especially to mark temporary solutions.

Though KDialog is in KDELibs4Support but the classes that previously inherited from this have changed their base class. As the usage of functions(of KDialog) from these inherited classes cause errors now so I am currently porting from KDialog to QDialog. I have to review (many)changes made by the porting script for which I have to understand their API first. So I believe this is going to take some time.

I will be pushing the commits that concern the above changes soon.

Cheers!!

P.S Happy Birthday Mamarok!

Monday, 1 June 2015

GSoC 2015 Week #1 with Amarok

Well it has been a week since the GSoC coding period started.
I have pushed commits to the personal repository here. A KDE community page was created by Myriam Schweingruber which can be found here and it consists of the details about the porting of Amarok. I will also be making changes to that page over time.

The major changes that I introduced in the Amarok code-base till now are as follows:

  • I have changed the CMake files and I have added ECM (with a minimum version of 1.7.0) in it. The KF5 libraries have been remodelled into a set of independent modules enabling the developers to use only the specific parts which they need and avoid pulling unwanted dependencies. For this reason, for now I have added a few key components of KF5 and Qt5 for the cmake to find. I will be adding more components as and when required. I have added the list of the new dependencies here.
  • A major development included a change in MySQLAmarok.cmake module. As feature_summary has been used with FATAL_ON_MISSING_REQUIRED_PACKAGES, MySQLAmarok was turning out to be missing as MYSQLAMAROK_FOUND wasn't yet defined. So I have defined MYSQLAMAROK_FOUND as TRUE when both mysql and its embedded libraries have been found. More details on this have been given in the commit and in the cmake module itself.
  • The porting scripts in kde-dev-scripts have been a lot of help to me (kudos to Laurent Montel, Kevin Funk, David Faure and all other contributors). I have used them and they really eased the conversion of KUrl to QUrl and in many many more changes (like the next one).
  • The icons have been renamed according to the new pattern and the ecm_install_icons calls have been changed similarly which previously created a lot of warnings.
    CMake version was increased to 2.8.12 as it's the minimum required to use FindKF5.cmake.
  • To port to CMAKE_AUTOMOC, the moc include files were no longer needed(except a few places) and they have been removed as they too created a lot of warnings.
  • QT_NO_URL_CAST_FROM_STRING has been added to make the changes from QUrl from QString explicit.
I will be adding the definitions to disable depreciation warnings in upcoming commits.

One of the major problems that I have faced this week was in porting to KF5::ThreadWeaver. The documentation of the new API was surprisingly lacking in very important details. I faced problems in usage of the QObjectDecorator class where some signals have been moved into. I have thought of two solutions for this which lets the files compile and which I am planning to test in future. For this I spent some time on learning about the fine details of the working of signals and slots systems.

It has been a very nice first week and I have learnt the details of quite a few things like the above said signals and slots system and I also spent some time to learn about the cmake's AUTOMOC system too.

Cheers !!!