Persist the List of Files in an External Stage in Snowflake

How to easily get the list of files in an external location such as S3 and save it in a table for querying.

AUTOMATIONSNOWFLAKE

8/1/20221 min read

In Snowflake, you can access files external to the Snowflake instance using something called a STAGE.

A STAGE is a named object that points to a location such as AWS S3, Azure Storage Account or Google Cloud Storage Buckets, and contains the required authentication credentials to access the files in that storage location.

To programmatically view what files are available in the stage, you can use the LIST command.

The problem is that the LIST command output cannot natively be persisted into a table for later viewing. It can only be executed and viewed via ad-hoc queries triggered by a user session.

Honestly, this is probably a simple thing for Snowflake to implement as a native feature, but here we are, yet again, trying to figure out workarounds to do the simplest thing in a cloud service. When will this madness ever end…

The Workaround

Snowflake provides a workaround for this in the form of utilising the RESULT_SCAN function. In the linked article, they explain the various pieces needed to create the solution below. I have simply taken those pieces and combined them in an easy-to-use set of two stored procedures.

The gist of it is

  • Run a JavaScript stored procedure that will execute the list command against the stage. This will return the list of files metadata as a CSV string.

  • The second stored procedure will call the first procedure, the use the RESULT_SCAN function to extract the results of the first procedure. It will then deconstruct the CSV list and save the records to a table.

Prerequisites

The stage will need to exist before you are able to run these stored procedures. Your user will also need to have permission to use the stage. Please contact your system administrator to arrange these permissions.

Procedure 1 - PROCEDURE_LIST_FILES_IN_STAGE

Procedure 2 - PROCEDURE_PERSIST_FILES_IN_STAGE

Use the procedures

To use the code, execute the second procedure as follows:

CALL "PROCEDURE_PERSIST_FILES_IN_STAGE(AWS_S3_STAGE)

Thank you for reading my ramblings, if you want to you can buy me a coffee here:

Support Dian Germishuizen on Ko-fi! ❤️