Commit Graph

25 Commits

Author SHA1 Message Date
Vincent Bernat
297e04b95c common: clickHouse → clickhouse
Let's say that we use "ClickHouse" and "clickhouse".
2024-06-09 14:59:09 +02:00
Vincent Bernat
28783ff4f3 orchestrator/clickhouse: add support for distributed/replicated tables
Fix #605

All MergeTree tables are now replicated.

For some tables, a `_local` variant is added and the non-`_local`
variant is now distributed. The distributed tables are the `flows`
table, the `flows_DDDD` tables (where `DDDD` is a duration), as well as
the `flows_raw_errors` table. The `exporters` table is not distributed
and stays local.

The data is following this schema:

- data is coming from `flows_HHHH_raw` table, using the Kafka engine

- the `flows_HHHH_raw_consumer` reads data from `flows_HHHH_raw` (local)
  and sends it to `flows` (distributed) when there is no error

- the `flows_raw_errors_consumer` reads data from
  `flows_HHHH_raw` (local) and sends it to
  `flows_raw_errors` (distributed)

- the `flows_DDDD_consumer` reads fata from `flows_local` (local) and
  sends it to `flow_DDDD_local` (local)

- the `exporters_consumer` reads data from `flows` (distributed) and
  sends it to `exporters` (local)

The reason for `flows_HHHH_raw_consumer` to send data to the distributed
`flows` table, and not the local one is to ensure flows are
balanced (for example, if there is not enough Kafka partitions). But
sending it to `flows_local` would have been possible.

On the other hand, it is important for `flows_DDDD_consumer` to read
from local to avoid duplication. It could have sent to distributed, but
the data is now balanced correctly and we just send it to local instead
for better performance.

The `exporters_consumer` is allowed to read from the distributed `flows`
table because it writes the result to the local `exporters` table.
2024-04-04 22:03:12 +02:00
Vincent Bernat
318c6de17c orchestrator: move geoip as a top-level component
It is easier if we have a flat model for components.
2024-03-13 11:43:22 +01:00
Vincent Bernat
5feebdf79b orchestrator/clickhouse: reduce tests verbosity for skipped tests
Instantiate ClickHouse component earlier to reduce verbosity of a test
when skipped. Maybe there is a related test change in Go 1.22 as I don't
remember this behavior.
2024-03-11 15:44:57 +01:00
Francois Espinet
87a57bf82e Do geoip enrich in clickhouse instead of inlet
One solution to https://github.com/akvorado/akvorado/issues/62
2024-03-11 15:29:09 +01:00
Vincent Bernat
b8eeabc73e docker: ensure ClickHouse init script is always executed
Recent versions of ClickHouse do not execute the provided entrypoint
script when the database exists. Workaround this by using our own
entrypoint and use it in place of the official one.

See https://github.com/ClickHouse/ClickHouse/pull/50724.
2023-11-12 23:21:29 +01:00
Marvin Gaube
e6effd1335 feat: add custom dictionaries for additional, customized flow hydration 2023-08-25 22:10:30 +02:00
Vincent Bernat
62521e629d common/http: rename to common/httpserver
This is a preparation to introduce an httpclient common package. And it
makes it easier to use http from the standard library.
2023-05-28 09:08:29 +02:00
Vincent Bernat
85561c44f7 orchestrator/clickhouse: add method and headers for source HTTP request
See discussion #640. This can be used to use POST or specify
authentication tokens.
2023-04-23 12:53:41 +02:00
Vincent Bernat
a912da7fa1 build: use gofumpt
Undecided if we need to use it. I think it's nice.
2023-02-11 10:03:45 +01:00
Vincent Bernat
c6a9319b57 common/schema: turns into a component
This is a first step to make it accept configuration. Most of the
changes are quite trivial, but I also ran into some difficulties with
query columns and filters. They need the schema for parsing, but parsing
happens before dependencies are instantiated (and even if it was not the
case, parsing is stateless). Therefore, I have added a `Validate()`
method that must be called after instantiation. Various bits `panic()`
if not validated to ensure we catch all cases.

The alternative to make the component manages a global state would have
been simpler but it would break once we add the ability to add or
disable columns.
2023-01-18 12:22:10 +01:00
Vincent Bernat
e352202631 inlet: make use of schema for inlet
This is a huge change to make the various subcomponents of the inlet use
the schema to generate the protobuf. For it to make sense, we also
modify the way we parse flows to directly serialize non-essential fields
to Protobuf.

The performance is mostly on par with the previous commit. We are a bit
less efficient because we don't have a fixed structure, but we avoid
loosing too much performance by not relying on reflection and keeping
the production of messages as code. We use less of Goflow2: raw flow
parsing is still done by Goflow2, but we don't use the producer part
anymore. This helps a bit with the performance as we parse less.
Overall, we are 20% than the previous commit and twice faster than the
1.6.4!

```
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
BenchmarkDecodeEncodeNetflow
BenchmarkDecodeEncodeNetflow/with_encoding
BenchmarkDecodeEncodeNetflow/with_encoding-12             151484              7789 ns/op            8272 B/op        143 allocs/op
BenchmarkDecodeEncodeNetflow/without_encoding
BenchmarkDecodeEncodeNetflow/without_encoding-12          162550              7133 ns/op            8272 B/op        143 allocs/op
BenchmarkDecodeEncodeSflow
BenchmarkDecodeEncodeSflow/with_encoding
BenchmarkDecodeEncodeSflow/with_encoding-12                94844             13193 ns/op            9816 B/op        295 allocs/op
BenchmarkDecodeEncodeSflow/without_encoding
BenchmarkDecodeEncodeSflow/without_encoding-12             92569             12456 ns/op            9816 B/op        295 allocs/op
```

There was a tentative to parse sFlow packets with gopackets, but the
adhoc parser used here is more performant.
2023-01-17 20:53:00 +01:00
Vincent Bernat
06a867616c orchestrator/clickhouse: set TTL for system logs using configuration 2023-01-09 11:47:04 +01:00
Vincent Bernat
95482c9201 orchestrator/clickhouse: ability to fetch network attributes with HTTP 2022-10-14 19:50:04 +02:00
Vincent Bernat
3e3bcbdada http: use a method to get local address
And limit its export to testing.
2022-08-21 08:20:14 +02:00
Vincent Bernat
5691b13050 orchestrator/clickhouse: use SubnetMap for parsing networks 2022-08-01 09:03:48 +02:00
Vincent Bernat
02dc9401e2 orchestrator: add more attributes to classify networks
Like for exporters, we add role, site, region, and tenant. This time,
this is done in ClickHouse.
2022-07-18 11:34:56 +02:00
Vincent Bernat
8be1bca4fd license: AGPL-3.0-only
```
git ls-files \*.js \*.go \
  | xargs sed -i '1i // SPDX-FileCopyrightText: 2022 Free Mobile\n// SPDX-License-Identifier: AGPL-3.0-only\n'
git ls-files \*.vue \
  | xargs sed -i '1i <!-- SPDX-FileCopyrightText: 2022 Free Mobile -->\n<!-- SPDX-License-Identifier: AGPL-3.0-only -->\n'
```
2022-06-29 11:42:28 +02:00
Vincent Bernat
510d78a927 orchestrator/clickhouse: add ability to override AS mappings 2022-06-25 20:10:34 +02:00
Vincent Bernat
c76f4e406d orchestrator/clickhouse: implement network names 2022-06-03 15:32:41 +02:00
Vincent Bernat
2209a24cae orchestrator/clickhouse: fix (again) test around the CSV file 2022-05-22 20:17:07 +02:00
Vincent Bernat
ef2d237db0 orchestrator/clickhouse: make test on ASN map file not rely on its content 2022-05-20 21:06:08 +02:00
Vincent Bernat
f73b6f3b73 build: delegate to asn2org for ASN list 2022-05-10 09:30:33 +02:00
Vincent Bernat
cea2ab3497 orchestrator/clickhouse: fix test for embedded ASN list 2022-04-22 22:43:33 +02:00
Vincent Bernat
93da599adf cmd: take configuration as a mandatory argument (+ other changes)
The other changes are:
 - rename configure service to orchestrator service
 - turn DefaultConfiguration variables into functions
2022-04-10 15:14:39 +02:00