The Data Samurai

Posts

Showing posts from July, 2016

AWS Redshift Best Practices

Here are the best practices with AWS Redshift for various aspects based on my experience Data Loading : The best way to load the data in AWS Redshift is through Redshift Copy Command. The copy command can load the data from variety of sources including AWS S3 bucket, EMR Cluster or any host that can be accessed using SSH. Copy command can load the data from DynamoDB as well. Copy Command loads data in parrallel from the source table and the data is imported and stored in a more efficient way than the insert command Keys : Make sure to add the keys(sort key , distribution key) to the Redshift Table Sort Key : Amazon Redshift stores your data on disk in sorted order according to the sort key. The Amazon Redshift query optimizer uses sort order when it determines optimal query plans. Distribution Key : When you execute a query, the query optimizer redistributes the rows to the compute nodes as needed to perform any joins and aggregations. The goal in selecting a tabl

AWS Aurora Performance Review

AWS Aurora is the only PaaS offering for a Relational DBMS based on MYSQL platform. Aurora is a game changer for many companies in a way. Having worked my way through figuring out the different aspects here are the observations : Good : Almost Full Stack Mysql compatibility. The scale up is painless Cheaper and Better alternative to other other RDBMS Benefits of any of the PaaS offering The Read Speed can be increased by creating multi AZ(Availability Zone) configuration. Bad : The write speed is poor when compared to the read speed. Lack of bulk import functionality, that makes data ingestion painful Overall : It is a very good alternative to other RDS instances (SQL Server / Oracle) Cost effective and better alternative to dynamodb. If you have structured data that can be handled by traditional DBMS.