Commit Graph

59 Commits

Author SHA1 Message Date
Vincent Bernat
f974d5591a orchestrator/clickhouse: run some tests without a ClickHouse database
Some tests don't rely on the ClickHouse database at all. Allow them to
run without it.
2025-08-17 10:42:10 +02:00
Vincent Bernat
03b947e3c5 chore: fix many staticcheck warnings
The most important ones were fixed in the two previous commit.
2025-08-02 20:54:49 +02:00
Vincent Bernat
f5ae97e30d orchestrator/clickhouse: guess IP by connecting to port 80
It seems MacOS does not like to connect to port 0 (even if this is not
really a connection).
2025-07-30 08:36:12 +02:00
Vincent Bernat
a70029a4cd orchestrator/clickhouse: also guess the port when guessing HTTP URL 2025-07-30 08:11:28 +02:00
Vincent Bernat
5e669db4b3 chore: use errors.New() instead of fmt.Errorf() 2025-07-29 07:42:49 +02:00
Vincent Bernat
18beb310ee chore: replace interface{} with any 2025-07-29 07:42:49 +02:00
Vincent Bernat
ac68c5970e inlet: split inlet into new inlet and outlet
This change split the inlet component into a simpler inlet and a new
outlet component. The new inlet component receive flows and put them in
Kafka, unparsed. The outlet component takes them from Kafka and resume
the processing from here (flow parsing, enrichment) and puts them in
ClickHouse.

The main goal is to ensure the inlet does a minimal work to not be late
when processing packets (and restart faster). It also brings some
simplification as the number of knobs to tune everything is reduced: for
inlet, we only need to tune the queue size for UDP, the number of
workers and a few Kafka parameters; for outlet, we need to tune a few
Kafka parameters, the number of workers and a few ClickHouse parameters.

The outlet component features a simple Kafka input component. The core
component becomes just a callback function. There is also a new
ClickHouse component to push data to ClickHouse using the low-level
ch-go library with batch inserts.

This processing has an impact on the internal representation of a
FlowMessage. Previously, it was tailored to dynamically build the
protobuf message to be put in Kafka. Now, it builds the batch request to
be sent to ClickHouse. This makes the FlowMessage structure hides the
content of the next batch request and therefore, it should be reused.
This also changes the way we decode flows as they don't output
FlowMessage anymore, they reuse one that is provided to each worker.

The ClickHouse tables are slightly updated. Instead of using Kafka
engine, the Null engine is used instead.

Fix #1122
2025-07-27 21:44:28 +02:00
Vincent Bernat
d60a714b8c orchestrator/clickhouse: do not embed clickhouse database settings
Instead, properly use them from the clickhousedb component. Also provide
some automatic migration.
2025-07-08 09:06:31 +02:00
netixx
f0d85ebb9e Fix system reload request to include db name 2024-12-17 18:23:00 +01:00
Vincent Bernat
aa9e5d1d67 orchestrator/clickhouse: escape user-provided strings
Notably username and password may contain quotes or backslashes.
2024-10-27 08:43:19 +01:00
Paul Galceran
43c169677a Resolve L4 ports protocol names (#1257)
* fix: generation of protocols.csv file

* feat: generation of ports-tcp.csv and ports-udp.csv files

* build: add rules for creating udp and tcp csv files

* feat: create dictionary tcp and udp

* refactor: add replaceRegexpOne

* test: transform src port and dest port columns in SQL

* test: add TCP and UDP dictionaries for migration testing
2024-06-14 21:52:56 +02:00
Vincent Bernat
297e04b95c common: clickHouse → clickhouse
Let's say that we use "ClickHouse" and "clickhouse".
2024-06-09 14:59:09 +02:00
Vincent Bernat
8d96aa070a orchestrator/clickhouse: simplify use of QueryRow()
No need to check for errors, this is also done when invoking Scan().
2024-04-05 22:00:06 +02:00
Vincent Bernat
032d28561c orchestrator/clickhouse: add support for replication only
If there is only one shard, do not create distributed tables.
2024-04-05 21:54:14 +02:00
Vincent Bernat
28783ff4f3 orchestrator/clickhouse: add support for distributed/replicated tables
Fix #605

All MergeTree tables are now replicated.

For some tables, a `_local` variant is added and the non-`_local`
variant is now distributed. The distributed tables are the `flows`
table, the `flows_DDDD` tables (where `DDDD` is a duration), as well as
the `flows_raw_errors` table. The `exporters` table is not distributed
and stays local.

The data is following this schema:

- data is coming from `flows_HHHH_raw` table, using the Kafka engine

- the `flows_HHHH_raw_consumer` reads data from `flows_HHHH_raw` (local)
  and sends it to `flows` (distributed) when there is no error

- the `flows_raw_errors_consumer` reads data from
  `flows_HHHH_raw` (local) and sends it to
  `flows_raw_errors` (distributed)

- the `flows_DDDD_consumer` reads fata from `flows_local` (local) and
  sends it to `flow_DDDD_local` (local)

- the `exporters_consumer` reads data from `flows` (distributed) and
  sends it to `exporters` (local)

The reason for `flows_HHHH_raw_consumer` to send data to the distributed
`flows` table, and not the local one is to ensure flows are
balanced (for example, if there is not enough Kafka partitions). But
sending it to `flows_local` would have been possible.

On the other hand, it is important for `flows_DDDD_consumer` to read
from local to avoid duplication. It could have sent to distributed, but
the data is now balanced correctly and we just send it to local instead
for better performance.

The `exporters_consumer` is allowed to read from the distributed `flows`
table because it writes the result to the local `exporters` table.
2024-04-04 22:03:12 +02:00
Vincent Bernat
e910160c17 orchestrator/clickhouse: don't use TO syntax for exporters table
Create a table, then use a consumer view.
2024-03-28 11:42:17 +01:00
Vincent Bernat
cc24077491 orchestrator/clickhouse: don't use TO syntax for flows_raw_errors table 2024-03-27 18:47:44 +01:00
Francois Espinet
87a57bf82e Do geoip enrich in clickhouse instead of inlet
One solution to https://github.com/akvorado/akvorado/issues/62
2024-03-11 15:29:09 +01:00
Vincent Bernat
e3c8f13562 build: enable loopvar experiment
This is enabled by default once we switch to Go 1.22 (in go.mod). See
https://tip.golang.org/wiki/LoopvarExperiment
2024-02-18 09:46:49 +01:00
Vincent Bernat
1c6599e879 orchestrator: standardize how we capture variables in for loops 2024-01-22 20:34:26 +01:00
Vincent Bernat
cec8661387 chore: capitalize comments 2024-01-22 20:34:08 +01:00
Marvin Gaube
e6effd1335 feat: add custom dictionaries for additional, customized flow hydration 2023-08-25 22:10:30 +02:00
Vincent Bernat
87be0ed374 orchestrator/clickhouse: add a version check to avoid buggy version 2023-07-17 08:24:25 +02:00
Vincent Bernat
0e1b5a3351 common/schema: introduce ICMPv4/ICMPv6 virtual columns 2023-06-03 18:57:19 +02:00
Vincent Bernat
002a93b036 orchestrator/clickhouse: add an end message for migration process 2023-02-12 15:01:04 +01:00
Vincent Bernat
4343f32acd orchestrator/clickhouse: fix panic when migrations are not successful 2023-01-09 10:42:48 +01:00
Vincent Bernat
3912b8bbb8 orchestrator/clickhouse: stop meddling with TTL of system tables
This does not seem to survive a restart. There is no indication in the
documentation this is the right way. One should modify settings
directly. I need to investigate how to do this properly with Docker.
2023-01-09 08:50:12 +01:00
Vincent Bernat
4dcde85523 orchestrator/clickhouse: make migrations test more reliable
Wait longer while migrations are running, fail fast on errors.
2023-01-03 16:14:25 +01:00
Vincent Bernat
874d52f05f orchestrator/clickhouse: set TTL for system logs tables 2023-01-03 14:26:58 +01:00
Vincent Bernat
7d1ba478a1 orchestrator/clickhouse: rework migrations to use an abstract schema
We introduce an leaky abstraction for flows schema and use it for
migrations as a first step.

For views and dictionaries, we stop relying on a hash to know if they
need to be recreated, but we compare the select statements with our
target statement. This is a bit fragile, but strictly better than the
hash.

For data tables, we add the missing columns.

We give up on the abstraction of a migration step and just rely on
helper functions to get the same result. The migration code is now
shorter and we don't need to update it when adding new columns.

This is a preparatory work for #211 to allow a user to specify
additional fields to collect.
2023-01-02 23:42:05 +01:00
Vincent Bernat
689497aa13 console: add SrcNetPrefix and DstNetPrefix as dimensions
This is not added to filtering as I fail to see how it would be useful.
One can still filter on SrcAddr and DstAddr.

Fix #218
2022-11-26 15:49:26 +01:00
Vincent Bernat
5b88b5f30a orchestrator/clickhouse: export netmask to ClickHouse 2022-11-26 14:24:54 +01:00
Vincent Bernat
e2e94e7a3c orchestrator/clickhouse: tell explicitely when no migration is needed
Otherwise, logs are confusing.
2022-11-03 20:54:52 +01:00
Vincent Bernat
25d2e6efd7 orchestrator/clickhouse: collect Kafka errors 2022-09-30 09:53:52 +02:00
Vincent Bernat
dc91670fb3 orchestrator/clickhouse: ingest large communities 2022-09-28 14:31:34 +02:00
Vincent Bernat
e2672503b8 orchestrator/clickhouse: ingest DstCommunities
It's only available in the main flow table.
2022-09-27 00:34:41 +02:00
Vincent Bernat
7e876902d7 orchestrator/clickhouse: prefix ASPath/1stAS/2ndAS/3rdAS with Dst 2022-09-27 00:34:41 +02:00
Vincent Bernat
714340997a orchestrator/clickhouse: ingest ASPath
The complete AS path is only available in the `flows` table. The
consolidated tables are only left with 1st, 2nd and 3rd AS numbers.
2022-09-27 00:34:41 +02:00
Vincent Bernat
3e3bcbdada http: use a method to get local address
And limit its export to testing.
2022-08-21 08:20:14 +02:00
Vincent Bernat
d2595dfef5 orchestrator/clickhouse: fix SrcCountry/DstCountry columns
In aggregated tables, these columns were missing from the ORDER BY
clause. This means they were set to some random values. This is not
possible to fix that after their creation (see #60 for a tentative),
therefore, we have to drop and recreate the columns. This only affects
aggregated tables, not the main table, but nonetheless, unless you
look at the last hour, the data is lost.
2022-07-29 18:52:51 +02:00
Vincent Bernat
02dc9401e2 orchestrator: add more attributes to classify networks
Like for exporters, we add role, site, region, and tenant. This time,
this is done in ClickHouse.
2022-07-18 11:34:56 +02:00
Vincent Bernat
1e147704c7 inlet: classify exporters to group, role, site, region, and tenant
Previously, this was done only for groups. Encoding everything into
groups is a bit restrictive. The same should be done for IP networks.
2022-07-18 11:01:30 +02:00
Vincent Bernat
f285114278 orchestrator/clickhouse: cap the number of consumers
ClickHouse does not allow more consumers than the number of physical
CPUs. Unless configured otherwise, the number of threads match the
number of physical CPUs. We bound the number of consumers to this
number.

Fix #13
2022-07-17 00:49:05 +02:00
Vincent Bernat
2cee0e80f8 orchestrator/clickhouse: reload dictionaries when starting 2022-07-05 22:25:14 +02:00
Vincent Bernat
8be1bca4fd license: AGPL-3.0-only
```
git ls-files \*.js \*.go \
  | xargs sed -i '1i // SPDX-FileCopyrightText: 2022 Free Mobile\n// SPDX-License-Identifier: AGPL-3.0-only\n'
git ls-files \*.vue \
  | xargs sed -i '1i <!-- SPDX-FileCopyrightText: 2022 Free Mobile -->\n<!-- SPDX-License-Identifier: AGPL-3.0-only -->\n'
```
2022-06-29 11:42:28 +02:00
Vincent Bernat
c76f4e406d orchestrator/clickhouse: implement network names 2022-06-03 15:32:41 +02:00
Vincent Bernat
9d3b74d305 console: add PacketSizeBucket dimensions 2022-05-22 23:48:40 +02:00
Vincent Bernat
db0b2dfb50 orchestrator/clickhouse: second tentative for consolidated tables
This time, just forget about IP addresses after the predefined time.
2022-05-10 09:30:28 +02:00
Vincent Bernat
73b3d36008 orchestrator/clickhouse: tentative to downsample flows
This is a tentative to downsample flows. However, we can only group by
a prefix of the primary key. Therefore, all downsampling intervals
should be encoded in the order by clause, which is not what we want.
2022-04-24 10:56:21 +02:00
Vincent Bernat
42b12d9db5 orchestrator/clickhouse: create consolidated tables for flows
The idea is to not query the flows table unless absolutely necessary.
It would have been nice to not have this Date field, but rebuilding
the table is costly, we'll do that later when the table is smaller. We
will also need to use a small PARTITION BY.

Also remove some migrations not needed anymore.
2022-04-23 20:20:01 +02:00