Confused Coders is a place where we share lessons and thoughts with you. Feel free to fire you doubts straight on our face and we will try best to come back to you with the clarifications. We also have few pdf's which might be helpful to you for your interview preparations.

     Book shelf: Feel free to download and share. Cheers \m/

           



Have Fun !

SQL on Cassandra : Querying Cassandra via Apache Drill

Cassandra and Drill are Buddies now. In this crisp post I would be talking about Drill’s Cassandra Storage plugin which would enable us to query Cassandra via Apache Drill. That also means that we would be able to issue ANSI SQL queries on Cassandra which is not inherently supported on Cassandra. All the code : https://github.com/yssharma/drill/tree/cassandra-storage Review Board: https://reviews.apache.org/r/29816/ There are couple of steps we would need to setup Cassandra storage before we can start playing with Cassandra and Drill. 1. Get Drill: Lets get the Drill source $> git clone https://github.com/apache/drill.git 2. Get Cassandra Storage patch: Download the Patch file from https://reviews.apache.org/r/29816/diff/raw/ 3. Apply the patch on top of Drill $> […]

Installing Solr on ubuntu

Here is a quick dirty post on installing SOLR on your box. Hope its helpful. Download SOLR Get new Solr copy. I got my copy from https://lucene.apache.org/solr/downloads.html. Download a version you are interested in. Preferrably the latest version. – extract out solr – copy /examples contents to – /opt/solr – check for another solr dir inside the examples dir, rename it as solr_home (not required though, but avoids confusion with parent dir) – Add alias (for ease): SOLR_HOME=/opt/solr/solr_home alias startsolr=”cd /opt/solr; java -Dsolr.solr.home=$SOLR_HOME -jar start.jar” Start Solr Use command: $> cd /opt/solr $> java -Dsolr.solr.home=$SOLR_HOME -jar start.jar or alias directly: $> startsolr Solr Admin Check the Solr Admin web interface at- localhost:8983/solr […]

How to run pig latin scripts on apache drill

This is an initial work on supporting Pig scripts on Drill. It extends the PigServer to parse the Pig Latin script and to get a Pig logical plan corresponding to the pig script. It then converts the Pig logical plan to Drill logical plan. The code is not complete and supports limited number of Pig Operators like LOAD, STORE, FILTER, UNION, JOIN, DISTINCT, LIMIT etc. It serves as a starting point for the concept. Architecture Diagram: Code: https://github.com/yssharma/pig-on-drill Review Board: https://reviews.apache.org/r/26769/ Operators Supported: LOAD, STORE, FILTER, UNION, JOIN, DISTINCT, LIMIT. Future work: FOREACH and GROUP is not supported yet. TestCases: org.apache.drill.exec.pigparser.TestPigLatinOperators. Pig Scripts can be tested on Drill’s web interface as well (localhost:8047/query). […]

Mahout usage IncompatibleClassChangeError Exception

The error pops up while using mahout collab filtering on Hadoop 2. Exception in thread “main” java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174) at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614) at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:73) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:168) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:335) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208)   Fix: Specify hadoop 2 version explicitly mvn clean install -DskipTests=true -Dhadoop2.version=2.0.0

Hive Hangs unexpectedly and ends up with : Error in acquireLock..

Error in acquireLock… FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquired. Instant Patchy workaround: SET hive.support.concurrency=false; Unlock table: unlock table my_table; Some other tricks that work – as suggested in cloudera forums – Hue leaves locks on tables sometimes. HUE_SER=`ls -alrt /var/run/cloudera-scm-agent/process | grep HUE | tail -1 | awk ‘{print $9}’` HUE_CONF_DIR=/var/run/cloudera-scm-agent/process/${HUE_SER} ls HUE_CONF_DIR /opt/cloudera/parcels/CDH/share/hue/build/env/bin/hue close_queries > tmp.log 2>> tmp.log Search is still On !!

How to convert mongo db json object to csv file

Quickly scribbled a function to get a plain csv out of mongo db json object. Use the script as you would call any shell script. sh mongo_command.sh > social_data_tmp.csv The mongo_command.sh has all the required mongo code, Something like – mongo << EOF function printUserDetails(user){ if (user == undefined){ return; } print(user._id+’,'+ user.email+’,'+ user.birthday+’,'+ ((user.homeTown == undefined) ? ” : user.homeTown._id)+’,'+ cleanString((user.homeTown == undefined) ? ” : user.homeTown.name)+’,'+ ((user.location == undefined) ? ” : user.location._id) +’,'+ cleanString((user.location == undefined) ? ” : user.location.name)+’,'+ getNames(user.likes)); } db.facebookUserData.find().forEach(function(user){ printUserDetails(user); }); EOF

Apache Drill – REST Support

This came as a pleasant surprise to me today when I found that Apache Drill now also has an embedded Jetty-Jersey based REST service interface exposed for tracking the status of the Drillbit along with the status of submitted queries. The interface can be checked out here once the Drillbit is running: http://localhost:8047/status

Contributing to Apache Drill – Part 2 : Freemarker Code gen implementation

Implement Drill Trigonometric functions – Using Freemarker code generation This post is a followup to this last post Contributing to Apache Drill – Math Functions. Lot of Drill function implementations are moving towards Freemarker code gen implementation since there is a need to generate lot of function definitions with different argument datatypes etc. Freemarker allows us to define templates which would be used to bake java code ready to be used in Apache Drill. So here are the steps to start contributing to Drill: Freemarker Textual data description (Tdd): Some data is prepared for the trigo functions which can be used in the freemarker templates to generate java code. Here […]

Cross platform encryption decryption using Java/C#

Cross Platform Encryption Decryption Encryption and Decryption have been very important modules for any enterprise application. Whether is a file on our system or the data travelling via the wire everything is encrypted. Encryption is needed to ensure that only the intended user gains access over the information and all other malicious users/programs are blocked. In this post we would be talking about encryption/decryption mechanism on a cross platform scenario. We would consider Java and C# as our programming languages to discuss the cross platform compatibility. We would be discussing about the Rijndael and AES implementations supported by C# and Java respectively. Algorithm Rijndael is a class of cipher developed […]

Use Hive Serde for Fixed Length (index based) strings

Hive fixed length serde can be used in scenarios where we do not have any delimiters in out data file. Using RegexSerDe for fixed length strings is pretty straight: CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” ) LOCATION ‘path/to/data’; The above query only expects exactly 32 characters in a line of text (10+10+10+2). The query can be customized to Ignore any characters at end after the useful data is read: CREATE EXTERNAL TABLE customers ((userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.{2}).*” ) LOCATION ‘path/to/data’; Thats all. Have […]