
In Snowflake, you can access files external to the Snowflake instance using something called a STAGE.
A STAGE is a named object that points to a location such as AWS S3, Azure Storage Account or Google Cloud Storage Buckets, and contains the required authentication credentials to access the files in that storage location.
To programmatically view what files are available in the stage, you can use the LIST command.
The problem is that the LIST command output cannot natively be persisted into a table for later viewing. It can only be executed and viewed via ad-hoc queries triggered by a user session.
Honestly, this is probably a simple thing for Snowflake to implement as a native feature, but here we are, yet again, trying to figure out workarounds to do the simplest thing in a cloud service. When will this madness ever end…
The Workaround
Snowflake provides a workaround for this in the form of utilizing the RESULT_SCAN function. In the linked article, they explain the various pieces needed to create the solution below. I have simply taken those pieces and combined them in an easy-to-use set of two stored procedures.
The gist of it is
- Run a JavaScript stored procedure that will execute the list command against the stage. This will return the list of files metadata as a CSV string.
- The second stored procedure will call the first procedure, the use the RESULT_SCAN function to extract the results of the first procedure. It will then deconstruct the CSV list and save the records to a table.
Procedure 1 - PROCEDURE_LIST_FILES_IN_STAGE
CREATE OR REPLACE PROCEDURE "PROCEDURE_LIST_FILES_IN_STAGE"
(
STAGE_NAME VARCHAR
)
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
/*====================================================================
Author: Dian Germishuizen
Description: List the files available in an external or internal stage.
Return the results as a CSV delimited list, each file's data seperated by a new line character
Original Source: https://docs.snowflake.com/en/sql-reference/stored-procedures-javascript.html#using-result-scan-to-retrieve-the-result-from-a-stored-procedure
------------------------------------------------------------------------------------
Change History
-------------
Date Author Description
---------- ------------------- -------------------
2022/08/19 Dian Germishuizen Created
------------------------------------------------------------------------------------
==================================================================================*/
/* Variable to return at the end */
var return_value = "";
try {
/* Ensure the current session is in the correct database */
var usedb_stmt = snowflake.createStatement({
sqlText: 'USE SCHEMA "MY_SCHEMA_NAME";'
}).execute();
/* Make a template command that will be concatenated with the stage name passed in as parameter.
This can be combined into a single command if you want to.
Final string should look like "LIST @STAGE_NAME" */
var sqlText_template = 'list @';
var sqlText_dynamic = sqlText_template.concat("", STAGE_NAME)
/* Create an SQL Statement with the LIST command prepared */
var stmt = snowflake.createStatement({
sqlText: sqlText_dynamic
});
/* Execute the statement prepared */
var result = stmt.execute();
/* Take the results of the statement and construct the CSV string with each file from the stage on a new line */
if (result.next()) {
return_value += result.getColumnValue(1);
return_value += ", " + result.getColumnValue(2);
return_value += ", " + result.getColumnValue(3);
return_value += ", " + result.getColumnValue(4);
}
while (result.next()) {
return_value += "\n";
return_value += result.getColumnValue(1);
return_value += ", " + result.getColumnValue(2);
return_value += ", " + result.getColumnValue(3);
return_value += ", " + result.getColumnValue(4);
}
}
/* Catch errors elegantly */
catch (err) {
result = "Failed: Code: " + err.code + "\n State: " + err.state;
result += "\nMessage: " + err.message;
result += "\nStack Trace: \n" + err.stackTraceTxt;
}
/* Return the string */
return return_value;
$$;
Procedure 2 - PROCEDURE_PERSIST_FILES_IN_STAGE
CREATE OR REPLACE PROCEDURE "PROCEDURE_PERSIST_FILES_IN_STAGE"
(
STAGE_NAME VARCHAR
)
RETURNS VARCHAR
LANGUAGE SQL
EXECUTE AS CALLER
AS
$$BEGIN
/*====================================================================
Author: Dian Germishuizen
Description: Get the files in an external stage using procedure "PROCEDURE_LIST_FILES_IN_STAGE". Then persist the output to a table.
Original Source: https://docs.snowflake.com/en/sql-reference/stored-procedures-javascript.html#using-result-scan-to-retrieve-the-result-from-a-stored-procedure
------------------------------------------------------------------------------------
Change History
-------------
Date Author Description
---------- ------------------- -------------------
2022/08/19 Dian Germishuizen Created
------------------------------------------------------------------------------------
==================================================================================*/
/*Create the output table if it doesn't exist. */
CREATE TABLE IF NOT EXISTS "FILES_AVAILABLE_IN_EXTERNAL_STAGE"
(
FILE_NAME VARCHAR(255)
, SIZE VARCHAR(255)
, MD5_HASH VARCHAR(255)
, LAST_MODIFIED_DATE_TIME VARCHAR(255)
, STAGE_NAME VARCHAR(255)
, INSERT_DATE_TIME DATETIME
);
/*Run the procedure which will return the list of files as a comma seperated value string , each file on a new line*/
CALL "PROCEDURE_LIST_FILES_IN_STAGE"(:STAGE_NAME);
/*Insert the data, splitting the string into columns and rows*/
INSERT INTO "FILES_AVAILABLE_IN_EXTERNAL_STAGE"
(
FILE_NAME
, SIZE
, MD5_HASH
, LAST_MODIFIED_DATE_TIME
, INSERT_DATE_TIME
)
WITH ONE_STRING_CTE (string_col)
AS
( /*Get the results from the previous stored procedure call */
SELECT *
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
)
, THREE_STRINGS_CTE (one_row)
AS
( /*Take the string and split it into multiple rows based on the delimiter*/
SELECT VALUE
FROM ONE_STRING_CTE, LATERAL SPLIT_TO_TABLE(ONE_STRING_CTE.string_col, '\n')
)
SELECT
/*Use STROK to extract sections of a string based on a delimiter. Extract each column value between commas. */
STRTOK(one_row, ',', 1) AS "FILE_NAME"
, STRTOK(one_row, ',', 2) AS "SIZE"
, STRTOK(one_row, ',', 3) AS "MD5_HASH"
, STRTOK(one_row, ',', 4) AS "LAST_MODIFIED_DATE_TIME"
, CURRENT_TIMESTAMP AS "INSERT_DATE_TIME"
FROM THREE_STRINGS_CTE;
END$$;
Use the procedures
To use the code, execute the second procedure as follows:
CALL ” PROCEDURE_PERSIST_FILES_IN_STAGE “(‘AWS_S3_STAGE’);
Prerequisites
The stage will need to exist before you are able to run these stored procedures. Your user will also need to have permission to use the stage. Please contact your system administrator to arrange these permissions.
If you like what I do please consider supporting me on Ko-Fi