About Me

Rakesh Ghodasara

I am a database professional with a passion to solve the data challenges. I have experience on working on end to end data implementation including but not limited to OLTP, OLAP, BigData and Cloud Migration.

I currently work in the Greater Denver Area as a Principal Big Data Engineering Architect at an induestry leader in IIOT space (Industrial iot)

I live in south western suburb of Denver with my wife and two lovely daughters.

Thanks for visiting my site. Please post your feedback, suggestions and questions in the comments below and I would be more than happy to check them out

https://www.linkedin.com/in/rakeshghodasara

Comments

Creating a UUID function in Redshift

We all know the data hotspots have negative impact on the performance in any distributed data processing environment and engine. This holds true for hadoop / MPP columnar and other databases. One way to avoid hotspots is to use the UUID to generate unique Ids. As defined by wikipedia a UUID is " A UUID is simply a 128-bit value. The meaning of each bit is defined by any of several variants. " By default there is no UUID function in AWS Redshift. However with the python UDF you can easily create a UUID function in Redshift. If you want random UUID CREATE OR REPLACE FUNCTION public.fn_uuid() RETURNS character varying AS ' import uuid return uuid.uuid4().__str__() ' LANGUAGE plpythonu VOLATILE; If you want sequential UUID CREATE OR REPLACE FUNCTION public.fn_uuid() RETURNS character varying AS ' import uuid return uuid.uuid1().__str__() ' LANGUAGE plpythonu VOLATILE;

Create Strip, LStrip, RStrip Functions in Redshift

There is no redshift inbuilt function to strip a character from start-end of a string. For eg. if a numeric value is stored as a character with preceding zeros ' 0000123' and you want to store/operate/aggregate/join it as a number it is not possible with inbuilt redshift functions. What you can do is to create the following strip udfs and make a use of these. lstrip : strips out the left instances of a character from a string. CREATE OR REPLACE FUNCTION public.fn_lstrip(str_in character varying, a character) RETURNS character varying AS ' try: return(str_in.lstrip(a)) except: return None' LANGUAGE plpythonu VOLATILE; eg. select public.fn_lstrip('00001234','0') would result in 12345 rstrip : strips out the right instances of a character from a string. CREATE OR REPLACE FUNCTION public.fn_rstrip(str_in character varying, a character) RETURNS character varying AS ' try: return(str_in.rstrip(a)) except: return None...

AWS Aurora Bulk Load Performance Issues - Resolved

We have had performance issues when loading the bulk data into the AWS Aurora. The bulk load performance was so bad that it was nearly worthless pushing around 2 million rows in to AWS Aurora. We were inserting about 1000 records per second. This was much worse comparing with the other MySQL counterparts like MySQL, MariaDB etc. However a few tweaks to the parameter and it resolved most of the performance issues we faced in the bulk Load. The solution is to add two parameters when you connect to the AWS Aurora jdbc for bulk load. These two parameters are : useServerPrepStatmts =false rewriteBatchedStatements =true Your full JDBC connection string should look like “jdbc:mysql://host:3306/db? useServerPrepStmts=false & rewriteBatchedStatements=true ", "username", “password”” Once we changed these parameters, the performance was blazing fast. We were able to load the 2 million rows in flat 3 minutes. The Aurora Sever used in the benchark was r3....

The Data Samurai

Search This Blog

About Me

Comments

Post a Comment

Popular posts from this blog

Creating a UUID function in Redshift

Create Strip, LStrip, RStrip Functions in Redshift

AWS Aurora Bulk Load Performance Issues - Resolved