Member-only story
Redshift & Data Lake
In this article, I am capturing the redshift features which help in integrating with the data lake. We will discuss Redshift Spectrum and UNLOAD features of Redshift along with their sample implementations.
Overview
Amazon Redshift is one of the most popular cloud-based data warehousing systems today. It's easy to onboard, easy to scale, and easy to manage. But that's not all. It's equally easy to integrate the data with the data lake as well.
As we already know the data lake keeps raw data whereas the data warehouse keeps the highly structured and quality data. Many organizations follow the model where both these systems coexist.
A subset of data is moved to the data warehouse where the data is refined and stored. Business teams use it to achieve traditional analytics with this data which is difficult to achieve with the data lake. And it uses the data lake for advanced analytics like machine learning. There is also a possibility that once the data is refined at the data warehouse, it’s brought back to the data lake so that it can be included in advanced analytics.