Commit Graph

19 Commits

Author SHA1 Message Date
Vincent Bernat
e2f1df9add tests: replace godebug by go-cmp for structure diffs
go-cmp is stricter and allow to catch more problems. Moreover, the
output is a bit nicer.
2025-08-23 16:03:09 +02:00
Vincent Bernat
18beb310ee chore: replace interface{} with any 2025-07-29 07:42:49 +02:00
Vincent Bernat
ac68c5970e inlet: split inlet into new inlet and outlet
This change split the inlet component into a simpler inlet and a new
outlet component. The new inlet component receive flows and put them in
Kafka, unparsed. The outlet component takes them from Kafka and resume
the processing from here (flow parsing, enrichment) and puts them in
ClickHouse.

The main goal is to ensure the inlet does a minimal work to not be late
when processing packets (and restart faster). It also brings some
simplification as the number of knobs to tune everything is reduced: for
inlet, we only need to tune the queue size for UDP, the number of
workers and a few Kafka parameters; for outlet, we need to tune a few
Kafka parameters, the number of workers and a few ClickHouse parameters.

The outlet component features a simple Kafka input component. The core
component becomes just a callback function. There is also a new
ClickHouse component to push data to ClickHouse using the low-level
ch-go library with batch inserts.

This processing has an impact on the internal representation of a
FlowMessage. Previously, it was tailored to dynamically build the
protobuf message to be put in Kafka. Now, it builds the batch request to
be sent to ClickHouse. This makes the FlowMessage structure hides the
content of the next batch request and therefore, it should be reused.
This also changes the way we decode flows as they don't output
FlowMessage anymore, they reuse one that is provided to each worker.

The ClickHouse tables are slightly updated. Instead of using Kafka
engine, the Null engine is used instead.

Fix #1122
2025-07-27 21:44:28 +02:00
Itah
705e56cac4 inlet/metadata/static: allow exporters configuration refresh from http data source using common/remotedatasourcefetcher, refactored with orchestrator/clickhouse network-sources. 2024-01-17 11:11:34 +01:00
Vincent Bernat
85561c44f7 orchestrator/clickhouse: add method and headers for source HTTP request
See discussion #640. This can be used to use POST or specify
authentication tokens.
2023-04-23 12:53:41 +02:00
Vincent Bernat
95482c9201 orchestrator/clickhouse: ability to fetch network attributes with HTTP 2022-10-14 19:50:04 +02:00
Vincent Bernat
a9bef1c3fc tests: extend Diff() helper to accept new options 2022-08-31 14:30:08 +02:00
Vincent Bernat
c41fa8cb55 common/helpers: opt-in for custom formatters for diff
Many stuff has a `String()` method that would hide details.
2022-08-16 21:21:22 +02:00
Vincent Bernat
f9b507ff35 common/helpers: add a helper to test configuration decoding
For each case, we test from native map and from YAML. This should
capture all the cases we are interested.

Also, simplify pretty diff by using stringer for everything. I don't
remember why this wasn't the case. Maybe IP addresses? It's possible
to opt out by overriding formatters.
2022-08-16 21:15:23 +02:00
Vincent Bernat
985e678e42 chore: replace map[string]interface{} by gin.H 2022-08-16 19:43:28 +02:00
Vincent Bernat
5691b13050 orchestrator/clickhouse: use SubnetMap for parsing networks 2022-08-01 09:03:48 +02:00
Vincent Bernat
8ee2750012 inlet/geoip: rename country-database to geo-database
This is a first step to accept another kind of GeoIP database (like
City). This also introduces the way we want to deprecate stuff:
transform the map structure.
2022-07-29 15:55:39 +02:00
Vincent Bernat
085d4e7946 cmd: add a registration mechanism for mapstructure hooks 2022-07-21 17:46:01 +02:00
Vincent Bernat
02dc9401e2 orchestrator: add more attributes to classify networks
Like for exporters, we add role, site, region, and tenant. This time,
this is done in ClickHouse.
2022-07-18 11:34:56 +02:00
Vincent Bernat
6121aaea15 config: use a validator for better configuration validation 2022-06-30 01:23:29 +02:00
Vincent Bernat
8be1bca4fd license: AGPL-3.0-only
```
git ls-files \*.js \*.go \
  | xargs sed -i '1i // SPDX-FileCopyrightText: 2022 Free Mobile\n// SPDX-License-Identifier: AGPL-3.0-only\n'
git ls-files \*.vue \
  | xargs sed -i '1i <!-- SPDX-FileCopyrightText: 2022 Free Mobile -->\n<!-- SPDX-License-Identifier: AGPL-3.0-only -->\n'
```
2022-06-29 11:42:28 +02:00
Vincent Bernat
c76f4e406d orchestrator/clickhouse: implement network names 2022-06-03 15:32:41 +02:00
Vincent Bernat
89713dcba0 orchestrator/clickhouse: give up on sampling
This is not efficient. We can half the number of columns with a 1-hour
sampling, but we also half the compression ratio. This makes it quite
inefficient. Moreover, on a 5m resolution, Clickhouse executes
requests as fast as with the default 1s resolution.

```
┌─table────────┬─count()─┬───────────────size─┬──────────────ratio─┐
│ flows        │     565 │  7.613838967867196 │  22.40685541786694 │
│ flows_10s    │     585 │ 10.035293261520565 │ 12.433305037652858 │
│ flows_1h0m0s │     577 │  4.217916643247008 │  9.856020485544366 │
│ flows_5m0s   │     584 │  5.837094695307314 │ 10.835867553690852 │
│ flows_1m0s   │     582 │  7.629598970524967 │ 11.400152339478653 │
└──────────────┴─────────┴────────────────────┴────────────────────┘
```

Instead, start from a clean sheet by rebuilding the flow table and
dropping the Date column, using smaller partitions and ordering a bit
more data to help compression.
2022-04-24 11:14:19 +02:00
Vincent Bernat
73b3d36008 orchestrator/clickhouse: tentative to downsample flows
This is a tentative to downsample flows. However, we can only group by
a prefix of the primary key. Therefore, all downsampling intervals
should be encoded in the order by clause, which is not what we want.
2022-04-24 10:56:21 +02:00