AWS Redshift Best Practices

Here are the best practices with AWS Redshift for various aspects based on my experience

Data Loading :

The best way to load the data in AWS Redshift is through Redshift Copy Command.

The copy command can load the data from variety of sources including AWS S3 bucket, EMR Cluster or any host that can be accessed using SSH. Copy command can load the data from DynamoDB as well.
Copy Command loads data in parrallel from the source table and the data is imported and stored in a more efficient way than the insert command

Keys :

Make sure to add the keys(sort key , distribution key) to the Redshift Table

Sort Key : Amazon Redshift stores your data on disk in sorted order according to the sort key. The Amazon Redshift query optimizer uses sort order when it determines optimal query plans.
Distribution Key : When you execute a query, the query optimizer redistributes the rows to the compute nodes as needed to perform any joins and aggregations. The goal in selecting a table distribution style is to minimize the impact of the redistribution step by locating the data where it needs to be before the query is executed

The Data Samurai

Search This Blog

AWS Redshift Best Practices

Labels

Comments

Post a Comment

Popular posts from this blog

Creating a UUID function in Redshift

Create Strip, LStrip, RStrip Functions in Redshift

AWS Aurora Bulk Load Performance Issues - Resolved