Hadoop Data Ingestion Tool

By powering parallel processing across large clusters of commodity server hardware, Hadoop provides a means of cost-effectively analyzing "big data" as well as a robust ecosystem of open-source analytics software options. The business case for deploying a data lake – is so compelling that for most enterprises the question now is not whether to do it but how to do it. Key implementation questions including which of the several Hadoop software distributions to use, whether to use Hadoop on-premises or in the cloud, and what Hadoop data ingestion tool to use to load data from your source systems into Hadoop. Increasingly, businesses are having an easy time selecting a Hadoop data ingestion tool as they've found that Qlik Replicate® provides features and benefits that set it apart from competing tools.

The Most Versatile and Agile Hadoop Data Ingestion Tool

Qlik innovative solutions for data migration and integration are used by thousands of businesses, from high tech start-ups to Fortune 500 enterprises. While businesses count on Qlik technology to solve a wide range of modern data management challenges, an increasingly popular use case is to employ Qlik Replicate as a Hadoop data ingestion tool.

For organizations wanting to move quickly to capture the benefits of Hadoop-powered data lake analytics, Qlik Replicate offers key advantages as a Hadoop data ingestion tool. Qlik Replicate:

  • Works seamlessly with any major Hadoop distribution (including Hortonworks, MapR, and Cloudera), as well as cloud-based Hadoop services such as Amazon Elastic MapReduce.
  • Works seamlessly with any type of source system such as relational databases (including Oracle, SQL Server, MySQL, and many others), enterprise data warehouse systems (including Exadata, Teradata, Netezza, and others), mainframes (IMS/DB, VSAM, and more), and on-premises or cloud-based enterprise applications (such as SAP and Salesforce).
  • Supports a range of data transfer modes including high-performance bulk loading, real-time change data capture, and message-encoded Kafka Hadoop workflows that enable you to stream real-time source data through Apache Kafka to multiple big data target systems concurrently.
  • Saves money and accelerates time-to-value, by allowing non-programmers to configure and execute a wide range of Hadoop data ingestion jobs through a drag-n-drop UI without needing deep technical knowledge.

A Hadoop Data Ingestion Tool and More

Unlike a typical narrowly restrictive Hadoop data ingestion tool, Qlik Replicate business value extends well beyond loading data into your Hadoop cluster. For example, a common Hadoop workflow entails moving processed data --- the output of Hadoop map-reduce jobs – out of the data lake and into some other system where it can be preserved and/or accessed by applications and users. Qlik supports all such workflows, allowing for easily created and executed migrations of processed data from Hadoop to destination systems such as databases, conventional data warehouses and data marts, or cloud-based targets such as Amazon S3.

More broadly, while excelling as a Hadoop data ingestion tool, Qlik Replicate can serves all your organization's needs for moving and integrating big data. With a single unified data integration platform, you can not only meet your data warehouse and Hadoop operational needs, but also manage tasks such as database migrations, replicating data across multiple data centers, or replicating data from on-premises systems to the cloud.

Qlik Compose® for Hive: rapid, automated creation of analytics-ready data stores

Once data is ingested and landed in Hadoop, IT often still struggles to create usable analytics data stores. Traditional methods require Hadoop-savvy ETL programmers to manually code the various steps – including data transformation, the creation of Hive SQL structures, and reconciliation of data insertions, updates and deletions to avoid locking and disrupting users. The administrative burden of ensuring data is accurate and consistent can delay and even kill analytics projects.

Qlik Compose for Hive automates the creation, loading and transformation of enterprise data into Hadoop Hive structures. Our solution fully automates the pipeline of BI ready data into Hive, enabling you to automatically create both Operational Data Stores (ODS) and Historical Data Stores (HDS). And we leverage the latest innovations in Hadoop such as the new ACID Merge SQL capabilities, available today in Apache Hive (part of the Hortonworks 2.6 distribution), to automatically and efficiently process data insertions, updates and deletions.

Qlik Replicate integrates with Qlik Compose for Hive to simplify and accelerate data ingestion, data landing, SQL schema creation, data transformation and ODS and HDS creation/updates.

Whitepaper

Real-Time Database Streaming for Kafka