Big Data Ingestion

Big data analytics have become an integral part of the competitive arsenal of today's enterprises. With the Hadoop data warehouse now the de facto industry standard technology for powering big data analytics, the key differentiators among competing enterprises are the content of each competitor's big data inventory and the analytics applications that they create on top of Hadoop. Almost as important a differentiator in the big data and Hadoop competitive landscape is how effectively an organization tackles the challenge of big data ingestion into the Hadoop environment. Organizations need efficient big data ingestion processes that enable agile analytics programs. With these in place, they can meet changing business requirements and yield timely business intelligence based on the freshest possible data.

Qlik Replicate®: The Easy to Use Solution for Universal Big Data Ingestion

Leading organizations in a variety of industries have gained a leg up on the competition by using Qlik Replicate to meet the challenges of Hadoop data ingestion. Used by more than 2000 organizations worldwide, Qlik Replicate is a next-generation data integration platform that offers two critical advantages for implementing big data ingestion flows.

The first is that Qlik Replicate is a unified solution for migrating data from nearly any type of source system to any major Hadoop distribution on premises or in the cloud. With a single solution you can create and execute big data ingestion flows from source systems including relational databases (Oracle, SQL Server, DB2, MySQL, Informix, and more), conventional data warehouses (Exadata, Teradata, and others), mainframe/legacy systems, SAS applications, and file systems. Through a single unified interface you can execute big data ingestion jobs on schedule or on-demand, and monitor the progress of those jobs as they move data into your target data warehouse Hadoop platform.

The second major advantage of Qlik Replicate is that big data ingestion flows can be created through a graphical interface with no need for manual coding or expertise in the source or target system APIs. With a centralized big data ingestion solution that puts control in the hands of data architects and data scientists with little or no reliance on developers, your organization can get big data initiatives up and running quickly and add new data sources easily when business requirements change.

Qlik Replicate for Real-Time Big Data Ingestion and Analytics

To deliver maximum business value, big data analytics programs need to be agile. They also need to be based on the freshest possible data, so that business decisions can be responsive to today's challenges and opportunities rather than yesterday's or last week's. Qlik Replicate powers real-time analytics by delivering real-time data ingestion from source database systems into your big data analytics environment. Leveraging log-based change data capture technology, the Qlik solution for big data ingestion keeps the data in your Hadoop cluster in sync with the latest operational data, without straining the source operational systems. For big data architectures that utilize Apache Kafka message brokering technology, Qlik Replicate can feed Kafka with message-encoded real-time data streams that Kafka can in turn feed into one or more big data targets including HBase, Cassandra, Couchbase, and MongoDB

Qlik Compose® for Hive: Streamline and Automate Your Data Lake Pipeline

Once the data is ingested and landed in Hadoop, IT often still struggles to create usable analytics data stores. Traditional methods require Hadoop-savvy ETL programmers to manually code the various steps – including data transformation, the creation of Hive SQL structures, and reconciliation of data insertions, updates and deletions to avoid locking and disrupting users. The administrative burden of ensuring data is accurate and consistent can delay and even kill analytics projects.

Qlik Compose for Hive automates the creation, loading and transformation of enterprise data into Hadoop Hive structures. Our solution fully automates the pipeline of BI ready data into Hive, enabling you to automatically create both Operational Data Stores (ODS) and Historical Data Stores (HDS). And we leverage the latest innovations in Hadoop such as the new ACID Merge SQL capabilities, available today in Apache Hive (part of the Hortonworks 2.6 distribution), to automatically and efficiently process data insertions, updates and deletions.

Qlik Replicate integrates with Qlik Compose for Hive to simplify and accelerate data ingestion, data landing, SQL schema creation, data transformation and ODS and HDS creation/updates.

Data Sheet

Real-time data ingestion to the cloud-built data warehouse