Commit Graph

14 Commits

Author SHA1 Message Date
Vincent Bernat
11f878ca21 inlet/kafka: make Kafka load-balancing algorithm configurable
Some checks failed
CI / 🤖 Check dependabot status (push) Has been cancelled
CI / 🐧 Test on Linux (${{ github.ref_type == 'tag' }}, misc) (push) Has been cancelled
CI / 🐧 Test on Linux (coverage) (push) Has been cancelled
CI / 🐧 Test on Linux (regular) (push) Has been cancelled
CI / ❄️ Build on Nix (push) Has been cancelled
CI / 🍏 Build and test on macOS (push) Has been cancelled
CI / 🧪 End-to-end testing (push) Has been cancelled
CI / 🔍 Upload code coverage (push) Has been cancelled
CI / 🔬 Test only Go (push) Has been cancelled
CI / 🔬 Test only JS (${{ needs.dependabot.outputs.package-ecosystem }}, 20) (push) Has been cancelled
CI / 🔬 Test only JS (${{ needs.dependabot.outputs.package-ecosystem }}, 22) (push) Has been cancelled
CI / 🔬 Test only JS (${{ needs.dependabot.outputs.package-ecosystem }}, 24) (push) Has been cancelled
CI / ⚖️ Check licenses (push) Has been cancelled
CI / 🐋 Build Docker images (push) Has been cancelled
CI / 🐋 Tag Docker images (push) Has been cancelled
CI / 🚀 Publish release (push) Has been cancelled
Update Nix dependency hashes / Update dependency hashes (push) Has been cancelled
And use random by default. This scales better. And even when not using
multiple outlets, there is little drawback to pin an exporter to a
partition.
2025-11-25 22:42:33 +01:00
Vincent Bernat
39cc372826 inlet/kafka: increase default QueueSize to 4096
And reverts the configuration change in commit 3575e7994b.
2025-10-29 05:02:33 +01:00
Vincent Bernat
03b947e3c5 chore: fix many staticcheck warnings
The most important ones were fixed in the two previous commit.
2025-08-02 20:54:49 +02:00
Vincent Bernat
756e4a8fbd */kafka: switch to franz-go
The concurrency of this library is easier to handle than Sarama.
Notably, it is more compatible with the new model of "almost share
nothing" we use for the inlet and the outlet. The lock for workers in
outlet is removed. We can now use sync.Pool to allocate slice of bytes
in inlet.

It may also be more performant.

In the future, we may want to commit only when pushing data to
ClickHouse. However, this does not seem easy when there is a rebalance.
In case of rebalance, we need to do something when a partition is
revoked to avoid duplicating data. For example, we could flush the
current batch to ClickHouse. Have a look at the
`example/mark_offsets/main.go` file in franz-go repository for a
possible approach. In the meantime, we rely on autocommit.

Another contender could be https://github.com/segmentio/kafka-go. Also
see https://github.com/twmb/franz-go/pull/1064.
2025-07-27 21:44:28 +02:00
Vincent Bernat
ac68c5970e inlet: split inlet into new inlet and outlet
This change split the inlet component into a simpler inlet and a new
outlet component. The new inlet component receive flows and put them in
Kafka, unparsed. The outlet component takes them from Kafka and resume
the processing from here (flow parsing, enrichment) and puts them in
ClickHouse.

The main goal is to ensure the inlet does a minimal work to not be late
when processing packets (and restart faster). It also brings some
simplification as the number of knobs to tune everything is reduced: for
inlet, we only need to tune the queue size for UDP, the number of
workers and a few Kafka parameters; for outlet, we need to tune a few
Kafka parameters, the number of workers and a few ClickHouse parameters.

The outlet component features a simple Kafka input component. The core
component becomes just a callback function. There is also a new
ClickHouse component to push data to ClickHouse using the low-level
ch-go library with batch inserts.

This processing has an impact on the internal representation of a
FlowMessage. Previously, it was tailored to dynamically build the
protobuf message to be put in Kafka. Now, it builds the batch request to
be sent to ClickHouse. This makes the FlowMessage structure hides the
content of the next batch request and therefore, it should be reused.
This also changes the way we decode flows as they don't output
FlowMessage anymore, they reuse one that is provided to each worker.

The ClickHouse tables are slightly updated. Instead of using Kafka
engine, the Null engine is used instead.

Fix #1122
2025-07-27 21:44:28 +02:00
Vincent Bernat
513e1dceb6 inlet/kafka: slightly alter some defaults 2024-06-09 17:18:57 +02:00
Vincent Bernat
d2ebd76a5d common/kafka: switch to github.com/IBM/sarama 2023-07-18 08:02:49 +02:00
Vincent Bernat
8a6b6614d4 build: update Shopify/sarama
This should reduce the memory usage when using zstd.
2023-01-02 23:42:05 +01:00
Vincent Bernat
6121aaea15 config: use a validator for better configuration validation 2022-06-30 01:23:29 +02:00
Vincent Bernat
8be1bca4fd license: AGPL-3.0-only
```
git ls-files \*.js \*.go \
  | xargs sed -i '1i // SPDX-FileCopyrightText: 2022 Free Mobile\n// SPDX-License-Identifier: AGPL-3.0-only\n'
git ls-files \*.vue \
  | xargs sed -i '1i <!-- SPDX-FileCopyrightText: 2022 Free Mobile -->\n<!-- SPDX-License-Identifier: AGPL-3.0-only -->\n'
```
2022-06-29 11:42:28 +02:00
Vincent Bernat
2fed10c8d2 orchestrator/broker: implement the broker component
Currently, it only exposes the configuration to other components. In
the future, it should be able to interact with them somehow.
2022-04-10 17:43:15 +02:00
Vincent Bernat
93da599adf cmd: take configuration as a mandatory argument (+ other changes)
The other changes are:
 - rename configure service to orchestrator service
 - turn DefaultConfiguration variables into functions
2022-04-10 15:14:39 +02:00
Vincent Bernat
cf77be32e0 kafka: inline configuration
Otherwise, this is a bit odd to have this special "connect" substructure.
2022-04-09 13:03:39 +02:00
Vincent Bernat
1dc253764d global: split Akvorado into 3 services 2022-04-01 20:21:53 +02:00