Every big data engineer at some point in time has to work with Apache Sqoop as one of the bridges for data connectivity between the RDBMS and the Hadoop environment. Read More
Import BLOB and CLOB columns from Oracle into HDFS using Sqoop
Importing data into HDFS (hadoop distributed file system) from various supported RDBMS (relational database management systems) using Sqoop is one of the initial most steps the tech community tried as Read More
Node and Disk Balancer in hadoop
Node and disk balancer in hadoop is an important concept used by cluster admins to ensure that all nodes and the volumes (disks in those nodes) are in an equilibrium Read More
How to delete Topic from Kafka : Topic marked for deletion issue
Mostly while working with Kafka topics we create, drop, and recreate them but sometimes Kafka topics behave in an intended way. For example, after executing the drop command when we Read More
Find command in hadoop : How to find files of specific size in hdfs
Most often I see developers struggling to mimic Linux find command for hadoop files especially based on size or size range. No wonder all this pain is because there was Read More
Database To AD Group Mapping In Apache Sentry
This post is for Big Data cluster admins who want to take a stock of which active directory group has access to which hive/impala database in the cluster. In Hadoop Read More