Building Lake House on AWS
In this post I’ll be discussing how we can build the lake house architecture on AWS. Whether you are a data scientist business analyst or an application developer, understanding the lake house architecture will benefit you, in terms of efficiently deriving the Data Insights.
The content is inspired from a blog available with AWS website, which you can access here.
Background
To understand lake house architecture we must understand the key component of it, which is data Lake. Let’s answer our very first question —
Why do we need the Data Lake?
Data from various sources is moved to the data Lake which serves as a single repository for the data analysis across multiple sources. Data in data lake is ingested in all the forms — structured, semi-structured or unstructured.
Data Lake is best for storing all types of data at scale, supporting various analytics use cases including big data projects, machine learning models, and realtime analytics.
Even though for many organizations data lake has become the single source of data analytics, it’s not able to replace the…