Here are the best practices with AWS Redshift for various aspects based on my experience
Data Loading :
The best way to load the data in AWS Redshift is through Redshift Copy Command.
Data Loading :
The best way to load the data in AWS Redshift is through Redshift Copy Command.
- The copy command can load the data from variety of sources including AWS S3 bucket, EMR Cluster or any host that can be accessed using SSH. Copy command can load the data from DynamoDB as well.
- Copy Command loads data in parrallel from the source table and the data is imported and stored in a more efficient way than the insert command
Keys :
Make sure to add the keys(sort key , distribution key) to the Redshift Table
- Sort Key : Amazon Redshift stores your data on disk in sorted order according to the sort key. The Amazon Redshift query optimizer uses sort order when it determines optimal query plans.
- Distribution Key : When you execute a query, the query optimizer redistributes the rows to the compute nodes as needed to perform any joins and aggregations. The goal in selecting a table distribution style is to minimize the impact of the redistribution step by locating the data where it needs to be before the query is executed
Comments
Post a Comment