Skip to content

Everybody Can Enjoy the Benefits of Cloud-Based Data Ingestion. Yes, Even You…

, | September 23, 2016 | By

by Steve Thompson – 

Big Data isn’t the next big thing. It’s the NOW big thing.

The people that would argue that point are a rapidly dwindling group. But there’s a vast difference between recognizing the benefits of Big Data and realizing the benefits of Big Data.

The Virtual Fence

Unfortunately, lots of companies — particularly mid-size and smaller companies — find themselves on the outside looking in. Like a kid that can’t afford the price of a ticket to a ballgame, they find themselves peeking through the cracks of a virtual fence to see what they’re missing.

That virtual fence represents the cost of tapping into the benefits of Big Data. It’s a fence that has kept many companies out of the game.

That fence is about to come down.

The V Test

You’ve probably heard some of the statistics1 about the astounding speed with which data is generated nowadays:

  • More data has been generated in just the past two years than in all of previous human history
  • It’s projected that 1.7 megabytes of new data will be generated every second for every person on the planet by 2020

Not all of that data is useful, of course. Part of the challenge of making the most of Big Data involves sorting the wheat from the chaff. If data doesn’t conform to the following five Vs, it doesn’t really do anything for your business use case:

  1. Volume: Data is flooding in at unprecedented rates and volumes, and it has to be stored effectively. Databases of hundreds of Terabytes and tens of Petabytes are now common.
  2. Velocity: Data is coming in fast, necessitating real-time and near-time data ingestion.
  3. Variety: The data is no longer just structured, it’s now unstructured, semi-structured, and structured. It must be ingested and stored in all forms.
  4. Veracity: The data should be accurate, and requires data management, data governance, metadata management, and data lineage elements.
  5. Value: The data should provide value to the business use cases.

The Cloud’s Silver Lining

Selectively capturing Big Data that conforms to the five Vs, and using the data to build a Hadoop data lake – that’s how many companies have found success with Big Data.

A data lake enables a single source of truth. It enables data governance and data management. It supports predictive analytics and business intelligence.

Many companies, though, have been unable to benefit from Big Data because of the prohibitive costs involved. Multi-million dollar investments in hardware and software have been required to play in the Big Data sandbox.

But that is changing, thanks to the cloud.

A Disruptive Technology

Quite simply, cloud-based data ingestion is a game changer. It’s disruptive technology in the sense that it’s going to change the way things are done and change the way people do business.

Cloud-based data ingestion eliminates the need to invest millions of dollars in hardware and software. Instead, you just use a cloud-based Hadoop cluster and data ingestion engine as you need it.

Whether through Amazon AWS EC2 Cloud, Microsoft Azure Cloud, using a Cloudera cluster, or through Hortonworks, you can now utilize a pay-per-use strategy for data ingestion. Data ingestion can now be performed on-demand, scheduled or event-driven. And you’re only paying for it when you use it. Clusters can now grow as needed and dynamically, and then be shut down when not needed. And the data can be stored in the cloud and archived as needed (ie, EBS and S3 storage).

Quite simply, cloud-based data ingestion makes the benefits of Big Data available to virtually every company, but at a fraction of the cost and as a pay-per-use model for the business.

The RCG Cloud Edge

New services and frameworks such as RCG|enable™ Data simplify cloud-based data ingestion.

RCG|enable™ Data is a service that delivers a data ingestion platform that’s compatible with many different open source products and technologies. It runs on the gateway on the edge node and enables data ingestion, which is why we call it RCG|enable™ Data.

With RCG|enable™ Data you can connect to a wide range of different technologies and data formats, such as:

  • Relational databases (Oracle, Microsoft SQL Server, etc.)
  • Generic data formats
  • Hadoop data formats
  • Industry-specific EDI formats
  • Hive and Impala data types
  • CSV files

You can use RCG|enable™ Data with structured, semi-structured, and unstructured data. And you can perform tasks with a drag-and-drop user interface that enables quick and easy management across all these different technologies. The native mpp capabilities of the cluster are then used to run the ingestion jobs, either map reduce, yarn, or spark.

The cloud now makes the potential of Big Data available and affordable to businesses of all sizes. And tools such as RCG|enable™ Data provide a means of harnessing that potential.

Early Adoption is Costly

Early adopters of technology typically pay a pretty penny to play in a new sandbox. Early-stage technology is expensive; that’s the way it has always been.

Not many businesses could afford to invest millions in a room-sized UNIVAC computer in the 1950s, for example. Those that were able to afford the revolutionary technology could benefit greatly. Others simply had to wait for the day when the new technology became more affordable.

But that day always comes. New technology always becomes more affordable, while also becoming more capable.

And thanks to the advent of cloud-based data ingestion and tools such as RCG|enable™ Data, that day has now come for the age of Big Data.

Works Cited

1. Forbes. (2015, Sept 30) "Big Data: 20 Mind-Boggling Facts Everyone Must Read" Retrieved from https://www.forbes.com/sites/bernardmarr/2015/09/30/big-data-20-mind-boggling-facts-everyone-must-read