Creating a UUID function in Redshift

We all know the data hotspots have negative impact on the performance in any distributed data processing environment and engine. This holds true for hadoop / MPP columnar and other databases. One way to avoid hotspots is to use the UUID to generate unique Ids.

As defined by wikipedia a UUID is " A UUID is simply a 128-bit value. The meaning of each bit is defined by any of several variants."

By default there is no UUID function in AWS Redshift. However with the python UDF you can easily create a UUID function in Redshift.

If you want random UUID

CREATE OR REPLACE FUNCTION public.fn_uuid()
RETURNS character varying AS
' import uuid
return uuid.uuid4().__str__()
'
LANGUAGE plpythonu VOLATILE;

If you want sequential UUID

CREATE OR REPLACE FUNCTION public.fn_uuid()
RETURNS character varying AS
' import uuid
return uuid.uuid1().__str__()
'
LANGUAGE plpythonu VOLATILE;

Comments

AnonymousAugust 24, 2016 at 11:34 AM
Phenomenal, thanks for coding this up! How would you alter a table to add this as a new column?
ReplyDelete
Replies
Rakesh GAugust 24, 2016 at 12:21 PM
You cannot default the new column to uuid (Redshift doesnt allow udf as default). The workaround is to add a new column to the table (alter table add column datatype) and then update the value with uuid ( update table set column=fn_uuid() )
ReplyDelete
Replies
UnknownDecember 16, 2016 at 3:03 PM
Was looking for this, didn't event think it was possible. Thanks.
ReplyDelete
Replies
MPSCSeptember 26, 2017 at 6:44 AM
Thanks for this code
ReplyDelete
Replies

Add comment

The Data Samurai

Search This Blog

Creating a UUID function in Redshift

Labels

Comments

Post a Comment

Popular posts from this blog

Create Strip, LStrip, RStrip Functions in Redshift

AWS Aurora Bulk Load Performance Issues - Resolved