Member-only story

AWS Redshift — Automated Table Design

The article talks about the automated table design feature of AWS Redshift which takes care of data distribution for efficient storage, retrieval and fast processing

agentred
6 min readDec 2, 2021
Photo by Marcin Jozwiak on Unsplash

Data Distribution in Redshift

A typical redshift cluster has multiple nodes and slices where the data is distributed. When the query is executed, the query execution plan determines what data blocks need to be located to best run the query.

Then the data is physically moved, or redistributed. And it all happens in parallel to the execution of the query. Redistribution might involve either sending specific rows to nodes for joining or broadcasting an entire table to all of the nodes.

When you load the data into Redshift, it distributes the data rows to the compute nodes and its slices. The distribution of data depends on the distribution style we choose while creating the table. The data distribution has two goals

  • Distribute the workload uniformly among the nodes in the cluster.
  • To minimize data movement as a query runs.

If either of this is skewed, our query performance can go wrong. By choosing the best distribution style including the…

--

--

agentred
agentred

Written by agentred

Curiosity Crafted, Knowledge Unveiled !

No responses yet