Skip to main content

Adding an Elasticsearch Connection

Prerequisites

  • In case the Elasticsearch security has been enabled, the connector must reference a user with sufficient permissions.
  • Zeenea traffic flows towards the base must be opened.
note

A link to the configuration template can be found here: Zeenea Connector Downloads.

Supported Versions

The Elasticsearch module has been tested with versions 5, 6 and 7.

Its behavior will differ based on the version, as to adapt itself to the changes in the Elasticsearch modeling (namely, the removal of mapping types). For instance, when cataloging on a version 5 instance, the dataset is the mapping type; whereas for versions 6 and 7, the dataset is the index. However, version 6 still displays the type, a unique value but whose name might have been customized.

Installing the Plugin

Since scanner version 26.9, the Elasticsearch plugin can be downloaded here: Zeenea Connector Downloads.

For more information on how to install a plugin, please refer to the following article: Installing and Configuring Connectors as a Plugin.

Declaring the Connection

Creating and configuring connectors is done through a dedicated configuration file located in the /connections folder of the relevant scanner.

Read more: Managing Connections

In order to establish a connection with Elasticsearch, specifying the following parameters in the dedicated file is required:

ParameterExpected value
nameThe name that will be displayed to catalog users for this connection.
codeThe unique identifier of the connection on the Zeenea platform. Once registered on the platform, this code must not be modified or the connection will be considered as new and the old one removed from the scanner.
connector_idThe type of connector to be used for the connection. Here, the value must be Elasticsearch and this value must not be modified.
connection.protocolConnection protocol to the cluster. Possible values are https or http.
connection.nodes

A list of cluster connection nodes separated with spaces.

Each node is shown using the host_name[:port]. The port value is optional, and 9200 will be used as a default.

Examples

  • es1.cogip.com:9200
  • es1.cogip.com:9200

This list does not have to be complete, but it is recommended to have multiple nodes.

tls.trust_store.typeTrust Store type. Possible values are pkcs12 or jks.
tls.trust_store.pathThis file must be provided in case TLS encryption is activated (protocol https) and when certificates of Elasticsearch servers are delivered by a specific authority. It must contain the certification chain.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-tls.html#node-certificates.
tls.trust_store.passwordTrust Store password
connection.usernameName of the Elasticsearch user used to connect to Zeenea. The user must have sufficient permissions.
connection.passwordUser password
inventory.index_partition.patternThe index partition pattern. Refer to Partitioned Virtual Indexes for more information.

Partitioned Virtual Indexes

The connector is able to manage virtual indexes that have been partitioned on multiple real indexes.

A Partitioned Virtual Index is a collection of indexes containing the same dataset and whose use the following naming convention:

  • Shared first part, used to identify the Partitioned Index
  • Specific second part, showing the contents of the partition.

For instance, in the case of the Elasticsearch indexation of the Cogip application logs, best practice is to split the indexes by time frames (for example, on a daily basis). The indexes names will then be:

  • log-cogip-2020-06-02
  • log-cogip-2020-06-03
  • log-cogip-2020-06-04

The share part is log-cogip, whereas the part representing the partition can be described with the rational expression:

\d{4}-\d{2}-\d{2}$

Declaring the rational expression above will be enough for Zeenea to take into account the partitioned virtual indexes.

During inventory, Zeenea will replace the first occurrence of this model with a star. All indexes with the same name structure will be considered as being a part of the same partitioned virtual index.

In our example above, all three partitions will be read as a unique dataset named log-cogip-*.

If multiple models are used, all corresponding rational expressions must be declared and separated with spaces. The first of these expressions that returns a result will be used.

For instance, if the following indexes are used:

  • log-cogip-2020-05-30
  • log-cogip-2020-06-01
  • log-acme-2020-06-03
  • log-acme-2020-06-04
  • x-files-01
  • x-files-02

By defining the setting Index Partition Pattern with the value: \d{4}-\d{2}-\d{2}$ \d{2}$, Zeenea will display the following datasets:

  • log-cogip-*
  • log-acme-*
  • x-files-*

However, be mindful of the pattern order: if above expression is replaced with \d{2}$ \d{4}-\d{2}-\d{2}$, the first option, replacing the last two digits with a star, will be used, and the datasets displayed by Zeenea will be the following:

  • log-cogip-2020-05-*
  • log-cogip-2020-06-*
  • log-acme-2020-06-*
  • x-files-*

The pattern does not have to be at the end of a string; when using the rational expression \d{4}-\d{2}-\d{2} (note the absence of the dollar sign) on the following indexes:

  • cogip-2020-06-02-log
  • cogip-2020-06-03-log

will return the unique dataset copgip-$-log.

On the other hand, an index named log-cogip-2020-06-01_2020-06-07 will be identified as log-cogip-*_2020-06-07 when using an Index Partition Pattern value of \d{4}-\d{2}-\d{2}, and log-cogip-2020-06-01 with a value of \d{4}-\d{2}-\d{2}$.

It is then strongly recommended to use a rational expression as specific as possible to avoid unexpected results.

User Permissions

The Elasticsearch user being used for the connection must have sufficient permissions to extract metadata and collect data statistics.

The following requests are necessary:

  • GET /: Returns the search engine version
  • GET /_cat/indices: Returns the list of indexes
  • GET /{index}/_mappings or GET /{index}/_mappings/{esType} for versions before v7. Returns the index mapping.
  • GET /{index}/_stats: Returns index stats
  • GET /{index}/_search: Returns data profiles

Collected Metadata

Inventory

For Elasticsearch version 5, the inventory collects the names of indexes and their types. The dataset corresponds to the type.

For Elasticsearch version 6, the inventory collects the names of indexes and their unique type.

From version 7 onwards, the inventory collects the names of indexes.

Indexes are unpartitioned using the procedure described in Partitioned Virtual Indexes.

Dataset

Based on the version of Elasticsearch, it is one dataset per index (v6 onwards) or one dataset per type (v5).

  • Name:
    • The index name for v6 and onwards
    • An Index/Type template for v6
  • Source Description: not managed
  • Technical Data:
    • Index: Index name
    • Type: type name (mapping)
    • Dataset size: in octets
    • Number of records: in the table

Field

  • Name
  • Source Description: Not managed
  • Type
  • Can be null: true
  • Multivalued: true
  • Primary Key: Not managed
  • Technical Data:
    • Technical Name
    • Native type

Object Identification Keys

An identification key is associated with each object in the catalog. In the case of the object being created by a connector, the connector builds it.

Read more: Identification Keys

ObjectIdentification KeyDescription
Datasetcode/keyspace/dataset name
  • code: Unique identifier of the connection noted in the configuration file
  • schema: Dataset keyspace
  • dataset name
Fieldcode/keyspace/dataset name/field name
  • code: Unique identifier of the connection noted in the configuration file
  • schema: Dataset keyspace
  • dataset name
  • field name