Confused Coders is a place where we share our learnings with you. Feel free to fire you doubts straight on our face and we will try best to come back to you with the clarifications. We also have few pdf's which might be helpful to you for your interview preparations.

     Book shelf: Feel free to download and share. Cheers \m/


Have Fun !

Apache Drill – REST Support

This came as a pleasant surprise to me today when I found that Apache Drill now also has an embedded Jetty-Jersey based REST service interface exposed for tracking the status of the Drillbit along with the status of submitted queries. The interface can be checked out here once the Drillbit is running: http://localhost:8047/status

Contributing to Apache Drill – Part 2 : Freemarker Code gen implementation

Implement Drill Trigonometric functions – Using Freemarker code generation This post is a followup to this last post Contributing to Apache Drill – Math Functions. Lot of Drill function implementations are moving towards Freemarker code gen implementation since there is a need to generate lot of function definitions with different argument datatypes etc. Freemarker allows us to define templates which would be used to bake java code ready to be used in Apache Drill. So here are the steps to start contributing to Drill: Freemarker Textual data description (Tdd): Some data is prepared for the trigo functions which can be used in the freemarker templates to generate java code. Here […]

Cross platform encryption decryption using Java/C#

Cross Platform Encryption Decryption Encryption and Decryption have been very important modules for any enterprise application. Whether is a file on our system or the data travelling via the wire everything is encrypted. Encryption is needed to ensure that only the intended user gains access over the information and all other malicious users/programs are blocked. In this post we would be talking about encryption/decryption mechanism on a cross platform scenario. We would consider Java and C# as our programming languages to discuss the cross platform compatibility. We would be discussing about the Rijndael and AES implementations supported by C# and Java respectively. Algorithm Rijndael is a class of cipher developed […]

Use Hive Serde for Fixed Length (index based) strings

Hive fixed length serde can be used in scenarios where we do not have any delimiters in out data file. Using¬†RegexSerDe for fixed length strings is pretty straight: CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” ) LOCATION ‘path/to/data’; The above query only expects exactly 32 characters in a line of text (10+10+10+2). The query can be customized to Ignore any characters at end after the useful data is read: CREATE EXTERNAL TABLE customers ((userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.{2}).*” ) LOCATION ‘path/to/data’; Thats all. Have […]

Querying MongoDB via Apache Pig

This crisp post in on querying MongoDB for HDFS Data Transfer, via Pig. Below are the steps involved for the same: 1. Install MongoDB on box a. Download MongoDB binaries b. Extract mongodb and export bin path to $PATH c. create db dir for mongodb d. start mongodb by the command : mongod –dbpath INFO: Mongo listens on port 27017 by default. 2. Start Mongo Shell a. Goto mongo installation dir in new terminal b. $> ./bin/mongo c. type exit to exit mongo shell 3. Load data into Mongo a. Create a json data file for importing data to mongo b. call mongoimport to import the data file into mongo: […]

HBase create statement org.apache.hadoop.hbase.PleaseHoldException

Quick post on HBase Exception:¬†HBase create statement org.apache.hadoop.hbase.PleaseHoldException hbase(main):002:0> create ‘temptable’, ‘fam1′, ‘fam2′ ERROR: org.apache.hadoop.hbase.PleaseHoldException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing at org.apache.hadoop.hbase.master.HMaster.checkInitialized( at org.apache.hadoop.hbase.master.HMaster.createTable( at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( at sun.reflect.DelegatingMethodAccessorImpl.invoke( at java.lang.reflect.Method.invoke( at org.apache.hadoop.hbase.ipc.WritableRpcEngine$ at org.apache.hadoop.hbase.ipc.HBaseServer$ ISSUE: Region servers or master may be down. Use: $ sudo service hbase-master start $ sudo service hbase-regionserver start For CDH Users: We can directly visit : Cloudera Manager > Services> HBase look fo rthe action dropdown where we can start the services. Cheers \m/  

Get email id of user using GitHub userid

Lot of time you need to connect to people personally but you do not actually know their emailid. GitHub is a social coding platform where you can get email id’s of the active programmers who have submitted their code to repo’s. This information is actually publically available because with the user commits the email id also gets passed to the repo along with the code. I hope this technique is used in a positive way rather than being misused. GitHub API: Replace the yssharma with the user id of the person whose email id you want to extract. You would receive a JSON with public activity of the user. […]

BigData for Barbers — food for thought

On lighter note.. Think of the Ideal Hairstyle, Where our barber could just scan our face/CR code and get all: 1. our previous hairstyles 2. our hairstyle preferences 3. our reactions/feedback on the haircut 4. browse through our previous hairstyles in-case we are interested again And then it would also recommend your hairstyles to people with similar hair traits and features as yours, and to you too vice versa Update: Your hairstyle is being followed by XX other friends

Hive – Selected data import/query – Files and folders (mapred.input.dir.recursive)

Data import in Hive by default expects a directory name in its query specified by LOCATION keyword. By default Hive picks up all the files from the dir and imports into itself. If the directory does not contain files, rather consists of sub directories Hive blows up with the exception: Not a file: /path/to/data/* This quick post is about 2 customized ways of importing data into Hive: 1. Importing data from directory which contains sub directories Set the mapred parameter recursive to true. Syntax: SET mapred.input.dir.recursive=true The above parameter will enable hive to recursively scan all sub-directories and fetch all the data from all sub-directories. In case you stumble […]

Awesome post by Timothy Chen – Lifetime of a Query in Drill Alpha Release

Here is an awesome post by Tim on the whole lifespan of a Drill Query. Nice read, highly recommended for all the fresh Drill’ers. Cheers