Adding an Elasticsearch Connection
Prerequisites
- In case the Elasticsearch security has been enabled, the connector must reference a user with sufficient permissions.
- Zeenea traffic flows towards the base must be opened.
A link to the configuration template can be found here: Zeenea Connector Downloads.
Supported Versions
The Elasticsearch module has been tested with versions 5, 6 and 7.
Its behavior will differ based on the version, as to adapt itself to the changes in the Elasticsearch modeling (namely, the removal of mapping types). For instance, when cataloging on a version 5 instance, the dataset is the mapping type; whereas for versions 6 and 7, the dataset is the index. However, version 6 still displays the type, a unique value but whose name might have been customized.
Installing the Plugin
Since scanner version 26.9, the Elasticsearch plugin can be downloaded here: Zeenea Connector Downloads.
For more information on how to install a plugin, please refer to the following article: Installing and Configuring Connectors as a Plugin.
Declaring the Connection
Creating and configuring connectors is done through a dedicated configuration file located in the /connections
folder of the relevant scanner.
Read more: Managing Connections
In order to establish a connection with Elasticsearch, specifying the following parameters in the dedicated file is required:
Parameter | Expected value |
---|---|
name | The name that will be displayed to catalog users for this connection. |
code | The unique identifier of the connection on the Zeenea platform. Once registered on the platform, this code must not be modified or the connection will be considered as new and the old one removed from the scanner. |
connector_id | The type of connector to be used for the connection. Here, the value must be Elasticsearch and this value must not be modified. |
connection.protocol | Connection protocol to the cluster. Possible values are https or http . |
connection.nodes | A list of cluster connection nodes separated with spaces. Each node is shown using the Examples
This list does not have to be complete, but it is recommended to have multiple nodes. |
tls.trust_store.type | Trust Store type. Possible values are pkcs12 or jks . |
tls.trust_store.path | This file must be provided in case TLS encryption is activated (protocol https) and when certificates of Elasticsearch servers are delivered by a specific authority. It must contain the certification chain. See https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-tls.html#node-certificates. |
tls.trust_store.password | Trust Store password |
connection.username | Name of the Elasticsearch user used to connect to Zeenea. The user must have sufficient permissions. |
connection.password | User password |
inventory.index_partition.pattern | The index partition pattern. Refer to Partitioned Virtual Indexes for more information. |
Partitioned Virtual Indexes
The connector is able to manage virtual indexes that have been partitioned on multiple real indexes.
A Partitioned Virtual Index is a collection of indexes containing the same dataset and whose use the following naming convention:
- Shared first part, used to identify the Partitioned Index
- Specific second part, showing the contents of the partition.
For instance, in the case of the Elasticsearch indexation of the Cogip application logs, best practice is to split the indexes by time frames (for example, on a daily basis). The indexes names will then be:
log-cogip-2020-06-02
log-cogip-2020-06-03
log-cogip-2020-06-04
The share part is log-cogip
, whereas the part representing the partition can be described with the rational expression:
\d{4}-\d{2}-\d{2}$
Declaring the rational expression above will be enough for Zeenea to take into account the partitioned virtual indexes.
During inventory, Zeenea will replace the first occurrence of this model with a star. All indexes with the same name structure will be considered as being a part of the same partitioned virtual index.
In our example above, all three partitions will be read as a unique dataset named log-cogip-*
.
If multiple models are used, all corresponding rational expressions must be declared and separated with spaces. The first of these expressions that returns a result will be used.
For instance, if the following indexes are used:
log-cogip-2020-05-30
log-cogip-2020-06-01
log-acme-2020-06-03
log-acme-2020-06-04
x-files-01
x-files-02
By defining the setting Index Partition Pattern with the value: \d{4}-\d{2}-\d{2}$ \d{2}$
, Zeenea will display the following datasets:
log-cogip-*
log-acme-*
x-files-*
However, be mindful of the pattern order: if above expression is replaced with \d{2}$ \d{4}-\d{2}-\d{2}$
, the first option, replacing the last two digits with a star, will be used, and the datasets displayed by Zeenea will be the following:
log-cogip-2020-05-*
log-cogip-2020-06-*
log-acme-2020-06-*
x-files-*
The pattern does not have to be at the end of a string; when using the rational expression \d{4}-\d{2}-\d{2}
(note the absence of the dollar sign) on the following indexes:
cogip-2020-06-02-log
cogip-2020-06-03-log
will return the unique dataset copgip-$-log
.
On the other hand, an index named log-cogip-2020-06-01_2020-06-07
will be identified as log-cogip-*_2020-06-07
when using an Index Partition Pattern value of \d{4}-\d{2}-\d{2}
, and log-cogip-2020-06-01
with a value of \d{4}-\d{2}-\d{2}$
.
It is then strongly recommended to use a rational expression as specific as possible to avoid unexpected results.
User Permissions
The Elasticsearch user being used for the connection must have sufficient permissions to extract metadata and collect data statistics.
The following requests are necessary:
GET /
: Returns the search engine versionGET /_cat/indices
: Returns the list of indexesGET /{index}/_mappings
orGET /{index}/_mappings/{esType}
for versions before v7. Returns the index mapping.GET /{index}/_stats
: Returns index statsGET /{index}/_search
: Returns data profiles
Collected Metadata
Inventory
For Elasticsearch version 5, the inventory collects the names of indexes and their types. The dataset corresponds to the type.
For Elasticsearch version 6, the inventory collects the names of indexes and their unique type.
From version 7 onwards, the inventory collects the names of indexes.
Indexes are unpartitioned using the procedure described in Partitioned Virtual Indexes.
Dataset
Based on the version of Elasticsearch, it is one dataset per index (v6 onwards) or one dataset per type (v5).
- Name:
- The index name for v6 and onwards
- An
Index/Type
template for v6
- Source Description: not managed
- Technical Data:
- Index: Index name
- Type: type name (mapping)
- Dataset size: in octets
- Number of records: in the table
Field
- Name
- Source Description: Not managed
- Type
- Can be null:
true
- Multivalued:
true
- Primary Key: Not managed
- Technical Data:
- Technical Name
- Native type
Object Identification Keys
An identification key is associated with each object in the catalog. In the case of the object being created by a connector, the connector builds it.
Read more: Identification Keys
Object | Identification Key | Description |
---|---|---|
Dataset | code/keyspace/dataset name |
|
Field | code/keyspace/dataset name/field name |
|