Confused Coders is a place where we share lessons and thoughts with you. Feel free to fire you doubts straight on our face and we will try best to come back to you with the clarifications. We also have few pdf's which might be helpful to you for your interview preparations.

     Book shelf: Feel free to download and share. Cheers \m/


Have Fun !

Indexing csv data in Solr via Python – PySolr

Here is a crisp post to index Data in Solr using Python. 1. Install Pre-requisites – pip – PySolr 2. Python Script #!/usr/bin/python import sys, getopt import pysolr import csv, json #SOLR_URL= def main(args): solrurl=” inputfile=” try: opts, args = getopt.getopt(args,”hi:u:”) except getopt.GetoptError: print ‘ -i -u ‘ sys.exit(2) for opt, arg in opts: if opt == ‘-h’: print ‘ -i -u ‘ sys.exit() elif opt in (“-i”): inputfile = arg elif opt in (“-u”): solrurl = arg # create a connection to a solr server s = pysolr.Solr(solrurl, timeout=10) keys=(“rank”, “pogid”, “cat”, “subcat”, “question_bucketid”, “brand”, “discount”, “age_grp”, “gender”, “inventory”, “last_updated”) record_count=0 for line in open(inputfile, ‘r’).readlines(): splits = line.split(‘,’) […]

How to get Pig Logical plan (Execution DAG) from Pig Latin script

TLDR; A Pig Logical plan is the Plan DAG that is used to execute the chain oj Jobs on Hadoop. Here is the code snippet for obtaining a Pig latin Logical Plan DAG frpm a Pig Script-

PySolr : How to boost a field for Solr document

Adding a Quick note -¬†PySolr : How to boost a field for Solr document Index time boosting conn.add(docs, boost={‘author’: ’2.0′,}) Query time boosting qf=title^5 content^2 comments^0.5 Read:    

JSolr Exception – Exception in thread “main” org.apache.solr.common.SolrException: Bad Request

Exception in thread “main” org.apache.solr.common.SolrException: Bad Request Bad Request request: Solution: Check Solr logs. INFO – 2014-11-07 07:04:42.985; org.apache.solr.update.processor.LogUpdateProcessor; [feeddata] webapp=/solr path=/update params={wt=javabin&version=2} {} 0 1 ERROR – 2014-11-07 07:04:42.985; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id Here it is: Document is missing mandatory uniqueKey field: id   Another instance: INFO – 2014-11-07 07:13:21.684; org.apache.solr.update.processor.LogUpdateProcessor; [feeddata] webapp=/solr path=/update params={wt=javabin&version=2} {} 0 1 ERROR – 2014-11-07 07:13:21.685; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: ERROR: [doc=0] unknown field ‘win_hour’ Takeaway : ¬†Logs are very helpful. Do have a look before searching else where.

Indexing CSV data file in Solr – Using annotated java pojo’s

1. Java pojo: Add the Java POJO with the required fields- import org.apache.solr.client.solrj.beans.Field; /** * Created by yash on 18/11/14. */ public class ProductBean { @Field private int id; @Field(“rank”) private int rank; @Field(“prodid”) private long prodid; @Field(“cat”) private int cat; @Field(“subcat”) private int subcat; public ProductBean(){} // Required by Solr to initialize bean. public ProductBean(int id, int rank, long prodid, int cat, int subcat) { = id; this.rank = rank; this.prodid = prodid; = cat; this.subcat = subcat; } public int getRank() { return rank; } public void setRank(int rank) { this.rank = rank; } public long getprodid() { return prodid; } public void setprodid(long prodid) { […]

Mahout Exception : java.lang.NoSuchMethodError: org.apache.hadoop.util.ProgramDriver.driver

Another annoying Mahout Error on running the Mahout jobs. Well this is caused because of the reason already discussed. The mahout is not build explicitly for Hadoop 2. So all it needs is a small re build of Mahout: mvn clean install -Dhadoop2 -Dhadoop2.version=2.2.0 -DskipTests=true Thats it. Now mahout slould work just fine. Drop a note in case you get stuck anywhere.

Tunnel all cluster ports on local port – via browser

Couple of steps for tunneling the hadoop box’s ports to local box ports- Ssh to any box in cluster to any local port: ssh -D 9999 dk2567@ Add proxy settings in secondary browser (firefox here): Edit > Preferences > Advanced Connections > Settings> Goto : Manual proxy settings > Add localhost ip and port (9999) in the SOCKS settings: Visit : application on browser : // With Destination IP and destination port.

Local Cassandra cluster via Cassandra CCM – Cluster Manager

Cam across this cool little utility to launch a local Cassandra cluster and test your apps around Cassandra. Check it out. 1. Link 2. Install Instructions. Bring up a 5 Node cassandra Cluster up with Cassandra version 2.1.3. Minimal and Simplistic- cd work/git/ccm ccm status ccm create test -v 2.1.3 -n 5 -s ccm start ccm status 3. Other commands Bring selected node down: ccm node4 stop Use cqlsh on node: ccm node1 cqlsh USE demo; SELECT * from testcf;

Minimal Hadoop and Yarn installation

New best tutorial around. Keeping a note of it Check it out, you might love it too.

Minimal Spark hello world

1. Build Sbt Create a build.sbt file. This manages all dependencies and stuffs that would had been in your pom file- import AssemblyKeys._ import sbtassembly.Plugin._ name := “FeedSystem” version := “1.0″ scalaVersion := “2.10.5″ organization := “com.snapdeal” resolvers += “Typesafe Repo” at “” libraryDependencies ++= Seq(“org.apache.spark” % “spark-core_2.10″ % “1.3.1″ % “provided”, “org.apache.spark” % “spark-mllib_2.10″ % “1.3.1″ % “provided”, “com.amazonaws” % “aws-java-sdk” % “1.9.27″, “org.scalatest” % “scalatest_2.10″ % “2.2.5″ % “test”) scalacOptions += “-deprecation” scalacOptions += “-feature” // This statement includes the assembly plugin capabilities assemblySettings // Configure jar named used with the assembly plug-in jarName in assembly := “testspark-assembly.jar” // A special option to exclude Scala itself form our […]