Confused Coders is a place where we share our learnings with you. Feel free to fire you doubts straight on our face and we will try best to come back to you with the clarifications. We also have few pdf's which might be helpful to you for your interview preparations.

     Book shelf: Feel free to download and share. Cheers \m/


Have Fun !

Installing Solr on ubuntu

Here is a quick dirty post on installing SOLR on your box. Hope its helpful. Download SOLR Get new Solr copy. I got my copy from Download a version you are interested in. Preferrably the latest version. – extract out solr – copy /examples contents to – /opt/solr – check for another solr dir inside the examples dir, rename it as solr_home (not required though, but avoids confusion with parent dir) – Add alias (for ease): SOLR_HOME=/opt/solr/solr_home alias startsolr=”cd /opt/solr; java -Dsolr.solr.home=$SOLR_HOME -jar start.jar” Start Solr Use command: $> cd /opt/solr $> java -Dsolr.solr.home=$SOLR_HOME -jar start.jar or alias directly: $> startsolr Solr Admin Check the Solr Admin web interface at- localhost:8983/solr […]

How to run pig latin scripts on apache drill

This is an initial work on supporting Pig scripts on Drill. It extends the PigServer to parse the Pig Latin script and to get a Pig logical plan corresponding to the pig script. It then converts the Pig logical plan to Drill logical plan. The code is not complete and supports limited number of Pig Operators like LOAD, STORE, FILTER, UNION, JOIN, DISTINCT, LIMIT etc. It serves as a starting point for the concept. Architecture Diagram: Code: Review Board: Operators Supported: LOAD, STORE, FILTER, UNION, JOIN, DISTINCT, LIMIT. Future work: FOREACH and GROUP is not supported yet. TestCases: org.apache.drill.exec.pigparser.TestPigLatinOperators. Pig Scripts can be tested on Drill’s web interface as well (localhost:8047/query). […]

Mahout usage IncompatibleClassChangeError Exception

The error pops up while using mahout collab filtering on Hadoop 2. Exception in thread “main” java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.mahout.common.HadoopUtil.getCustomJobName( at org.apache.mahout.common.AbstractJob.prepareJob( at at at at at at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( at sun.reflect.DelegatingMethodAccessorImpl.invoke( at java.lang.reflect.Method.invoke( at org.apache.hadoop.util.RunJar.main(   Fix: Specify hadoop 2 version explicitly mvn clean install -DskipTests=true -Dhadoop2.version=2.0.0

Hive Hangs unexpectedly and ends up with : Error in acquireLock..

Error in acquireLock… FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquired. Instant Patchy workaround: SET; Unlock table: unlock table my_table; Some other tricks that work – as suggested in cloudera forums – Hue leaves locks on tables sometimes. HUE_SER=`ls -alrt /var/run/cloudera-scm-agent/process | grep HUE | tail -1 | awk ‘{print $9}’` HUE_CONF_DIR=/var/run/cloudera-scm-agent/process/${HUE_SER} ls HUE_CONF_DIR /opt/cloudera/parcels/CDH/share/hue/build/env/bin/hue close_queries > tmp.log 2>> tmp.log Search is still On !!

How to convert mongo db json object to csv file

Quickly scribbled a function to get a plain csv out of mongo db json object. Use the script as you would call any shell script. sh > social_data_tmp.csv The has all the required mongo code, Something like – mongo << EOF function printUserDetails(user){ if (user == undefined){ return; } print(user._id+’,'+’,'+ user.birthday+’,'+ ((user.homeTown == undefined) ? ” : user.homeTown._id)+’,'+ cleanString((user.homeTown == undefined) ? ” :’,'+ ((user.location == undefined) ? ” : user.location._id) +’,'+ cleanString((user.location == undefined) ? ” :’,'+ getNames(user.likes)); } db.facebookUserData.find().forEach(function(user){ printUserDetails(user); }); EOF

Apache Drill – REST Support

This came as a pleasant surprise to me today when I found that Apache Drill now also has an embedded Jetty-Jersey based REST service interface exposed for tracking the status of the Drillbit along with the status of submitted queries. The interface can be checked out here once the Drillbit is running: http://localhost:8047/status

Contributing to Apache Drill – Part 2 : Freemarker Code gen implementation

Implement Drill Trigonometric functions – Using Freemarker code generation This post is a followup to this last post Contributing to Apache Drill – Math Functions. Lot of Drill function implementations are moving towards Freemarker code gen implementation since there is a need to generate lot of function definitions with different argument datatypes etc. Freemarker allows us to define templates which would be used to bake java code ready to be used in Apache Drill. So here are the steps to start contributing to Drill: Freemarker Textual data description (Tdd): Some data is prepared for the trigo functions which can be used in the freemarker templates to generate java code. Here […]

Cross platform encryption decryption using Java/C#

Cross Platform Encryption Decryption Encryption and Decryption have been very important modules for any enterprise application. Whether is a file on our system or the data travelling via the wire everything is encrypted. Encryption is needed to ensure that only the intended user gains access over the information and all other malicious users/programs are blocked. In this post we would be talking about encryption/decryption mechanism on a cross platform scenario. We would consider Java and C# as our programming languages to discuss the cross platform compatibility. We would be discussing about the Rijndael and AES implementations supported by C# and Java respectively. Algorithm Rijndael is a class of cipher developed […]

Use Hive Serde for Fixed Length (index based) strings

Hive fixed length serde can be used in scenarios where we do not have any delimiters in out data file. Using RegexSerDe for fixed length strings is pretty straight: CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” ) LOCATION ‘path/to/data’; The above query only expects exactly 32 characters in a line of text (10+10+10+2). The query can be customized to Ignore any characters at end after the useful data is read: CREATE EXTERNAL TABLE customers ((userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.{2}).*” ) LOCATION ‘path/to/data’; Thats all. Have […]

Querying MongoDB via Apache Pig

This crisp post in on querying MongoDB for HDFS Data Transfer, via Pig. Below are the steps involved for the same: 1. Install MongoDB on box a. Download MongoDB binaries b. Extract mongodb and export bin path to $PATH c. create db dir for mongodb d. start mongodb by the command : mongod –dbpath INFO: Mongo listens on port 27017 by default. 2. Start Mongo Shell a. Goto mongo installation dir in new terminal b. $> ./bin/mongo c. type exit to exit mongo shell 3. Load data into Mongo a. Create a json data file for importing data to mongo b. call mongoimport to import the data file into mongo: […]