Configuration

Configuration is statically defined in a YAML file. The application reads anomdec.yml from $ANOMDEC_HOME path. The following section describes this file structure.

streams

Streams is the main section of this file. It’s a named list that defines how is a signal processed.

source / engine

It has two required sections, the source and the engine This is the minimal configuration to start processing signals but, at this point we are not persisting the result.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
version: 1

streams:
  - name: my_kafka_source_one
    source:
      type: kafka
      params:
        broker_servers: localhost:9092
        input_topic: test1
    engine:
      type: robust
      params:
        window: 30
        threshold: 0.9999

sink

To persist the result we need to add a sink configuration section. This can be a list of sinks.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
version: 1

streams:
  - name: my_kafka_source_one
    source:
      type: kafka
      params:
        broker_servers: localhost:9092
        input_topic: test1
    engine:
      type: robust
      params:
        window: 30
        threshold: 0.9999
    sink:
      - name: sqlite
        type: repository
        repository:
          type: sqlite
          params:
            database: /tmp/my_kafka_source_one.sqlite
      - name: kafka
        type: stream
        stream:
          type: kafka
          params:
            broker_servers: localhost:9092
            output_topic: test2

warmup

warmup section has two roles, the first is to be used to warm up the engine before starting making predictions. The second one is to make the data accessible from the dashboard to visualize it. We will define a warmup configuration section with one repository that is also used in sink.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
version: 1

streams:
  - name: my_kafka_source_one
    source:
      type: kafka
      params:
        broker_servers: localhost:9092
        input_topic: test1
    engine:
      type: robust
      params:
        window: 30
        threshold: 0.9999
    sink:
      - name: sqlite
        type: repository
        repository:
          type: sqlite
          params:
            database: /tmp/my_kafka_source_one.sqlite
      - name: kafka
        type: stream
        stream:
          type: kafka
          params:
            broker_servers: localhost:9092
            output_topic: test2
    warmup:
      - name: sqlite
        repository:
          type: sqlite
          params:
            database: /tmp/my_kafka_source_one.sqlite

repository

Repository section could be found in sink and in warmup sections, it defines an storage backend that is supported by BaseSink implementations, RepositorySink and ObservableRepository respectively.

That sink repository could be also used as warmup to warm up the model in case of a model that require previous data to evaluate new data. Although it is defined as a list, only the first element will be used to warm up the model.

1
2
3
4
stream:
  type: kafka
  params:
    broker_servers: localhost:9092

websocket

There is a websocket section that is used to send output ticks to the dashboard. This allows to update dashboard plots in realtime.

1
websocket: ws://localhost:5000/ws/

Example configuration file

anomdec.yml

A full example configuration. This configuration reflects a full message flow reading from a kafka broker, processing with robust detector warmed up with the same repository that persists the output.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
version: 1

websocket: ws://localhost:5000/ws/

streams:
  - name: my_kafka_source_one
    source:
      type: kafka
      params:
        broker_servers: localhost:9092
        input_topic: test1
    engine:
      type: robust
      params:
        window: 30
        threshold: 0.9999
    sink:
      - name: sqlite
        type: repository
        repository:
          type: sqlite
          params:
            database: /tmp/my_kafka_source_one.sqlite
      - name: kafka
        type: stream
        stream:
          type: kafka
          params:
            broker_servers: localhost:9092
            output_topic: test2
    warmup:
      - name: sqlite
        repository:
          type: sqlite
          params:
            database: /tmp/my_kafka_source_one.sqlite

diagram

Here it is a diagram that represents the full configuration file. We can see that the output of the engine could be sinked to a repository and to an streaming system to visualize and react for anomalies, and is also used to warm up the engine in case of restart or failure.

_images/config-example.svg