This is the final blog in Kafka Message Ordering series. I would highly recommend reading the previous blogs to ensure that you can follow the discussion here. Here is what Read More
Kafka Message Ordering Part – 3
This is the third blog, in the four-part series to develop an understanding of how to maintain the ordering of messages while architecting a solution. I would highly recommend you Read More
Kafka Message Ordering Part – 2
Welcome back to the second blog in the message ordering series. If you haven’t read the first blog, Kafka Message Ordering Part – 1, I would recommend you to go Read More
Kafka Message Ordering Part – 1
When working with messaging systems it is a good practice to architect systems that are asynchronous, idempotent, and independent of message sequence (see Saga Pattern). But for some specific use Read More
Sqoop Errors
Every big data engineer at some point in time has to work with Apache Sqoop as one of the bridges for data connectivity between the RDBMS and the Hadoop environment. Read More
Import BLOB and CLOB columns from Oracle into HDFS using Sqoop
Importing data into HDFS (hadoop distributed file system) from various supported RDBMS (relational database management systems) using Sqoop is one of the initial most steps the tech community tried as Read More
Node and Disk Balancer in hadoop
Node and disk balancer in hadoop is an important concept used by cluster admins to ensure that all nodes and the volumes (disks in those nodes) are in an equilibrium Read More
How to delete Topic from Kafka : Topic marked for deletion issue
Mostly while working with Kafka topics we create, drop, and recreate them but sometimes Kafka topics behave in an intended way. For example, after executing the drop command when we Read More
Find command in hadoop : How to find files of specific size in hdfs
Most often I see developers struggling to mimic Linux find command for hadoop files especially based on size or size range. No wonder all this pain is because there was Read More
Database To AD Group Mapping In Apache Sentry
This post is for Big Data cluster admins who want to take a stock of which active directory group has access to which hive/impala database in the cluster. In Hadoop Read More