Skip to main content

Adding a Snowflake Connection

Prerequisites

  • In order to establish a connection with Snowflake, a user with sufficient permissions is required.
  • A route between the Zeenea scanner and the database must be open to allow traffic between the two.
note

A configuration template can be downloaded here: Snowflake.conf

Supported Versions

The Snowflake connector has been tested with the cloud version of this service.

Installing the Plugin

From version 54 of the scanner, the Snowflake connector is presented as a plugin.

It can be downloaded here and requires a scanner version 64 or later: Zeenea Connector Downloads.

For more information on how to install a plugin, please refer to the following article: Installing and Configuring Connectors as a Plugin.

note

For users of Java 17 and later, the Snowflake driver use old private Java API. A special option should be added on the scanner command line in order to provide access to this API.

-J--add-opens=java.base/java.nio=ALL-UNNAMED

You can pass it directly to zeenea-scanner. If on Linux, it can be added to the conf/application.ini file. Create it if it doesn't exist already.

Declaring the Connection

Creating and configuring connectors is done through a dedicated configuration file located in the /connections folder of the relevant scanner.

Read more: Managing Connections

In order to establish a connection with Snowflake, specifying the following parameters in the dedicated file is required:

ParameterExpected Value
nameThe name that will be displayed to catalog users for this connection
codeUnique identifier of the connection on the Zeenea platform. Once registered on the platform, this code must not be modified or the connection will be considered as new and the old one removed from the scanner.
connector_idThe type of connector to be used for the connection. Here, the value must be Snowflake and this value must not be modified.
connection.urlDatabase address.

Example: jdbc:snowflake://kn12345.eu-west-1.snowflakecomputing.com/?db=SNOWFLAKE_DATA&warehouse=SMALL_COMPUTE_WH.

Parameter db and warehouse are mandatory to define a connection entry point.
connection.usernameUsername
connection.passwordUser password. (Optional during key pair authentication use.)
connection.privateKeyPathFull path to the rs_key.p8 type file for key pair authentication
connection.passphraseKey password
dataClassification.enabledOption to enable retrieval of "Privacy Category" and "Semantic Category" metadata related to Snowflake data classification features
proxy.schemeDepending on the proxy, http or https
proxy.hostnameProxy address
proxy.portProxy port
proxy.usernameProxy username
proxy.passwordProxy account password
cache.enabledEnable the cache functionality. When the cache is activated, the schema update performs four queries per database instead of four per imported table. The result is greater efficiency.
cache.folderFolder where caches are stored. The same folder can be used by several connections.
The size of the cache file produced depends on the number of tables in the database (and not on the number of tables imported into Zeenea).
If the folder is not specified, the cache is stored in memory.
cache.ttlCache validity period (default 12h). As long as the cache is valid, requests that fill it are not executed.
lineage.view.enabledOption to enable view lineage feature. Default value false.
lineage.pipe.enabledOption to enable Snowpipe lineage feature. Default value false.
lineage.history.enabledOption to enable execution history lineage feature. Default value false.
lineage.history.warehouseThe name of the Warehouse where the queries are executed.
lineage.history.periodNumber of days to analyze the queries between the start of the extraction and this number.
filterTo filter datasets during the inventory. See Rich Filters.
multi_catalog.enabled

To activate multi-database connection configuration. Default value false.

The multi catalog mode makes the connector incompatible with the lineage features available from other connectors such as dbt, Tableau, PowerBI SaaS, etc.

multi_account.enabled

Plugin v73 and higher.

To activate multi account support (true | false). Default is false.

When multi account is activated, multi_catalog is activated as well and overrides multi_catalog.enabled parameter to true.

Warning: this parameter should not be changed after datasets have been imported because it changes the ways they are identified.

multi_account.list

Plugin v73 and higher.

(Optional) List of accounts to be inventoried. It consists of a succession of Snowflake account identifiers separated by a space. Account identifiers must be in the form “-” (see Snowflake documentation).

If unset, the organization's account list is retrieved and used.

note

A template of the configuration file is available in this repository.

User Permissions

In order to collect metadata, the running user must have read-only access to databases that need cataloging.

If the data profiling feature was enabled, the user must have read access to impacted tables. Otherwise, this permission is not necessary.

If a lineage feature is enabled, the user must have the following rights:

GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE SCANNER

If the Snowpipe lineage feature is enabled, the user must have MONITOR rights on each Snowpipe.

GRANT MONITOR ON PIPE [nom du Snowpipe] TO ROLE SCANNER

Rich Filters

Since version 47 of the scanner, the Snowflake connector benefits from the feature of rich filters in the configuration of the connector.

Read more: Filters

Multi Account Support

Snowflake connector supports a multi-account mode.

When activated, the list of accounts to scan can be determined by two ways:

  • The list can be provided by the parameter multi_account.list as a list of Snowflake account identifiers separated by a space. Account identifiers must be in the form - (see snowflake documentation).

    Example:

    multi_account.list = "abcdefg-kn67972 abcdefg-kh90823"

  • If no list is provided, the connector execute the following SQL query to get the accounts available in the organization:

    SELECT ACCOUNT_LOCATOR, REGION
    FROM SNOWFLAKE.ORGANIZATION_USAGE.ACCOUNTS

Authentication

If all accounts can be accessed with the same username/password, you can just define the credentials the same way than with a single account connection.

Otherwise, it is necessary to configure them in a secret manager so that the Snowflake connector can authenticate on each account.

The parameters are the same as those used in the connection file. They just need to be prefixed by the account identifier.

  • "-.url"
  • "-.username"
  • "-.password"
  • "-.privateKeyPath"
  • "-.passphrase"

If the key is not found in the secret for a given account, the connector use value defined in the Snowflake configuration file.

Example with password

For example, let's consider a secret being defined this way:

my-secret {
"abcdefg-kn67972.username":"username1",
"abcdefg-kn67972.password":"password1",
"abcdefg-kn67972.url":"jdbc:snowflake://abcdefg-kn67972.snowflakecomputing.com/?db=ZEENEA_PS_TEST&role=CNTY&warehouse=SMALL_COMPUTE_WH""
"abcdefg-kh90823.username":"username2",
"abcdefg-kh90823.password":"password2",
"abcdefg-kh90823.url":"jdbc:snowflake://abcdefg-kh90823.snowflakecomputing.com/?db=ZEENEA_PS_TEST&role=CNTY&warehouse=SMALL_COMPUTE_WH"",
}

and a Snowflake configuration file like this one:

name = "snowflake"
code = "snowflake"
connector_id = "Snowflake"
enabled = true
secret_manager {
enabled = true
key = "my-secret"
}
connection {
url = ${secret_manager.abcdfeg-kn67972.url}
username = ${secret_manager.abcdfeg-kn67972.username}
password = ${secret_manager.abcdfeg-kn67972.password}
}
multi_account.enabled = true

In such a case, the Snowflake connector will use the connection information provided here to execute the SQL Query listing all the account and then will try to get from the Secret Manager associated values to request Snowflake and collect metadata.

Example wit PKCS#8 Key

Here is another example of a secret using PKCS#8 keys:

my-secret {
"abcdefg-kn67972.username":"kn67972_user",
"abcdefg-kn67972.privateKeyPath":"/etc/zeenea/keys/zeenea-kn67972.p8",
"abcdefg-kn67972.passphrase":"kn67972_key_passphrase",
"abcdefg-kn67972.url":"jdbc:snowflake://abcdefg-kn67972.snowflakecomputing.com/?db=ZEENEA_PS_TEST&role=CNTY&warehouse=SMALL_COMPUTE_WH""
"abcdefg-kh90823.username":"kh90823_user",
"abcdefg-kh90823.privateKeyPath":"/etc/zeenea/keys/zeenea-kh90823.p8",
"abcdefg-kh90823.passphrase":"kh90823_key_passphrase",
"abcdefg-kh90823.url":"jdbc:snowflake://abcdefg-kh90823.snowflakecomputing.com/?db=ZEENEA_PS_TEST&role=CNTY&warehouse=SMALL_COMPUTE_WH"",
}

With its configuration file:

# ...
secret_manager {
enabled = true
key = "my-secret"
}
connection {
url = ${secret_manager.abcdfeg-kn67972.url}
}
multi_account {
enabled = true
list = "abcdefg-kn67972 abcdefg-kh90823"
}

In that case, the connector walk though the list of accounts and fetch the credentials from the secret manager. However, setting a connection url in the connection file is still mandatory.

Data Extraction

To extract information, the connector runs requests on the information_schema.tables table. The connector will also use results from the following commands:

  • show columns in account
  • show primary keys in account
  • show imported keys in account

Collected Metadata

Inventory

The inventory collects all tables and views accessible by the user.

Lineage

Starting with version 44 of the Plugin, the Snowflake connector integrates the lineage functionality to identify and represent the origin of the data. This functionality can be activated through the configuration settings and is used in several use cases:

  • Views: The creation of views in Snowflake can be represented automatically in Zeenea. A data process object will be created for each view with the tables used for its construction as input and the target view as output.
  • Snowpipe: Snowpipe will be modeled in Zeenea in case of data ingestion from external sources. Datasets from other types of systems will have to be imported from the corresponding connectors. The Snowflake connector will be able to identify them and make the necessary link through a data process that will be created in the catalog for each Snowpipe of the platform.
  • (BETA) Execution history: The connector is able to analyze query executions to identify data insertions from other tables of the same Snowflake instance. The query will be presented in the catalog as a data process.

Dataset

Here, a dataset can be a table or a view.

  • Name
  • Source Description
  • Technical Data:
    • Catalog: Database catalog
    • Last Alter Time: Last time the table was updated
    • Schema: Database schema
    • Table: Table name
    • Type:
      • base table
      • temporary table
      • view

Field

Dataset field.

  • Name
  • Source Description
  • Type
  • Can be null: Depending on the field settings
  • Multivalued: Not supported. Default value FALSE.
  • Primary Key: Depending on the field's "Primary Key" attribute
  • Technical Data:
    • Technical Name: field technical name
    • Native type: field native type
    • Privacy Category
    • Semantic Category

Data Process

A data process can be the representation of a Snowpipe, a view construction request or a data insertion.

  • Name
  • Source Description
  • Snowpipe Technical data:
    • Is Auto-ingest Enabled
    • Definition
    • Last Forwarded File Path
    • Notification Channel Name

Data Profiling

IMPORTANT

The Data Profiling feature, which can be enabled on this connection, allows your Explorers to get a better grasp on the type of data stored in each fields. This feature, which can be activated in the Scanner, is by default set to run on a weekly basis, every Saturday. However, depending on the number of fields you've activated this feature for, the calculation can quickly become costly. Please make sure the estimated impact of this feature is acceptable and that the default frequency appropriate, before enabling it.

The statical profiles feature, also named "data profiling", is available for this connector. The impact of this feature must be evaluated before its activation on any of your connections. You can find more information about the resulting statistics here: Data Profiling.

Read access on targeted tables is mandatory to activate the feature. For Snowflake technologies, the connector executes the following request to get a data sample:

SELECT COUNT(*) AS result FROM tableName

The request above defines the number of rows in the table tableName.

SELECT field1, field2 FROM tableName TABLESAMPLE (linesPercentage)

The request above collects a data sample for each field where the feature is activated through the studio (field1, field2). The limit is 10.000 lines (linesPercentage parameter) deduced from a calculation with the number of rows set in the previous request.

These requests will be executed, whether manually, in case of user action directly on the admin portal, or periodically according to the parameter collect-fingerprint from the application.conf file, as described here: Zeenea Scanner Setup.

Data Classification

Since version 43 of the scanner, the connector is compatible with the data classification functionality offered by Snowflake. In order to benefit from the Semantic Category and Privacy Category metadata, the user account configured in the scanner for the connection must have read rights on the snowflake.account_usage.tag_references view.

You can find the details of how this feature works directly in the Snowflake documentation.

Object Identification Keys

An identification key is associated with each object in the catalog. In the case of the object being created by a connector, the connector builds it.

More information about how it works can be found here: Identification Keys.

ObjectIdentification KeyDescription
Datasetcode/schema/dataset name
  • code: Unique identifier of the connection noted in the configuration file
  • schema: Dataset schema
  • dataset name
Fieldcode/schema/dataset name/field name
  • code: Unique identifier of the connection noted in the configuration file
  • schema: Dataset schema
  • dataset name
  • field name