The Data Samurai

Creating a UUID function in Redshift

We all know the data hotspots have negative impact on the performance in any distributed data processing environment and engine. This holds true for hadoop / MPP columnar and other databases. One way to avoid hotspots is to use the UUID to generate unique Ids. As defined by wikipedia a UUID is " A UUID is simply a 128-bit value. The meaning of each bit is defined by any of several variants. " By default there is no UUID function in AWS Redshift. However with the python UDF you can easily create a UUID function in Redshift. If you want random UUID CREATE OR REPLACE FUNCTION public.fn_uuid() RETURNS character varying AS ' import uuid return uuid.uuid4().__str__() ' LANGUAGE plpythonu VOLATILE; If you want sequential UUID CREATE OR REPLACE FUNCTION public.fn_uuid() RETURNS character varying AS ' import uuid return uuid.uuid1().__str__() ' LANGUAGE plpythonu VOLATILE;

The Data Samurai

Search This Blog

Posts

Creating a UUID function in Redshift