Persist the List of Files in an External Stage in Snowflake
How to easily get the list of files in an external location such as S3 and save it in a table for querying.
AUTOMATIONSNOWFLAKE
8/1/20221 min read
In Snowflake, you can access files external to the Snowflake instance using something called a STAGE.
A STAGE is a named object that points to a location such as AWS S3, Azure Storage Account or Google Cloud Storage Buckets, and contains the required authentication credentials to access the files in that storage location.
To programmatically view what files are available in the stage, you can use the LIST command.
The problem is that the LIST command output cannot natively be persisted into a table for later viewing. It can only be executed and viewed via ad-hoc queries triggered by a user session.
Honestly, this is probably a simple thing for Snowflake to implement as a native feature, but here we are, yet again, trying to figure out workarounds to do the simplest thing in a cloud service. When will this madness ever end…
The Workaround
Snowflake provides a workaround for this in the form of utilising the RESULT_SCAN function. In the linked article, they explain the various pieces needed to create the solution below. I have simply taken those pieces and combined them in an easy-to-use set of two stored procedures.
The gist of it is
Run a JavaScript stored procedure that will execute the list command against the stage. This will return the list of files metadata as a CSV string.
The second stored procedure will call the first procedure, the use the RESULT_SCAN function to extract the results of the first procedure. It will then deconstruct the CSV list and save the records to a table.
Prerequisites
The stage will need to exist before you are able to run these stored procedures. Your user will also need to have permission to use the stage. Please contact your system administrator to arrange these permissions.
Procedure 1 - PROCEDURE_LIST_FILES_IN_STAGE
Procedure 2 - PROCEDURE_PERSIST_FILES_IN_STAGE
Use the procedures
To use the code, execute the second procedure as follows:
CALL "PROCEDURE_PERSIST_FILES_IN_STAGE(AWS_S3_STAGE)
Thank you for reading my ramblings, if you want to you can buy me a coffee here:
Social
All content on this website is my own. I create the posts here to help the community as best I can. It doesn't mean I am always correct, or the methods I show here are the best. We all change and learn as we grow, so if you see something you think I could have done better, please reach out! Lets share the knowledge and be kind to each other!