299 Commits

Author SHA1 Message Date
Vincent Bernat
6118bb7aac common/helpers: convert SubnetMap to github.com/gaissmai/bart
I did not benchmark it myself, but it was benchmarked here:
 https://github.com/osrg/gobgp/issues/1414#issuecomment-3067255941

Of course, no guarantee that this benchmark matches our use cases.
Moreover, SubnetMap have been optimized to avoid parsing keys all
the time.

Also, the interface is a bit nicer and it uses netip.Prefix directly.

The next step is to convert outlet/routing/provider/bmp.
2025-08-16 09:38:44 +02:00
Vincent Bernat
3e68a41f57 docker: for dev, separate standalone ClickHouse setup from cluster
This way, there is no need to start a whole cluster just to work on a
single ClickHouse. Also add some hints in CONTRIBUTING.md.
2025-08-08 08:55:29 +02:00
Vincent Bernat
98eb1bdba5 chore: make a run of gofumpt 2025-08-05 06:21:34 +02:00
Vincent Bernat
a248997454 chore: more staticcheck fixes 2025-08-02 21:10:06 +02:00
Vincent Bernat
03b947e3c5 chore: fix many staticcheck warnings
The most important ones were fixed in the two previous commit.
2025-08-02 20:54:49 +02:00
Vincent Bernat
75b2d4821a common/reporter: avoid allocating on the stack with sync.Pool
Always return a pointer.
2025-08-02 20:18:12 +02:00
Vincent Bernat
3d01a68bcb common/helpers: cache skip decision when requiring external services 2025-07-30 08:19:00 +02:00
Vincent Bernat
a70029a4cd orchestrator/clickhouse: also guess the port when guessing HTTP URL 2025-07-30 08:11:28 +02:00
Vincent Bernat
8c85d54b3b common/remotedatasource: ensure we have at least one goroutine
Otherwise, Stop() will block.
2025-07-29 09:24:32 +02:00
Vincent Bernat
0aef1503a8 common/remotedatasource: disable the regular ticker on failure 2025-07-29 08:37:50 +02:00
Vincent Bernat
19d07d350c common/remotedatasource: add a Stop() method
This is cleaner this way. We can't use it for the static provider as we
cannot stop a provider.
2025-07-29 08:36:16 +02:00
Vincent Bernat
1a160c83b5 common/remotedatasource: move errors higher in the file
Otherwise, I am always confused on where is the New() function.
2025-07-29 08:35:47 +02:00
Vincent Bernat
aeb102c748 outlet/metadata: do not start fetcher for static until first query
We don't want initialization to spawn goroutines. All the more that we
don't stop them.
2025-07-29 08:29:57 +02:00
Vincent Bernat
239bf33f3a common/remotedatasource: make the test a bit more robust
We may have a 404 if the test is too slow.
2025-07-29 08:00:28 +02:00
Vincent Bernat
5e669db4b3 chore: use errors.New() instead of fmt.Errorf() 2025-07-29 07:42:49 +02:00
Vincent Bernat
18beb310ee chore: replace interface{} with any 2025-07-29 07:42:49 +02:00
Vincent Bernat
fa7e4745b8 common/remotedatasource: be stricter on results from remote sources
Also:
 - don't return partial results (not used)
 - fix tests
 - add more tests
2025-07-29 07:25:42 +02:00
Vincent Bernat
cce61cb0d6 common/remotedatasource: rename from remotedatasourcefetcher
Also rename RemoteDataSource to Source.
2025-07-28 18:41:50 +02:00
Vincent Bernat
e20645c92e outlet/metadata: synchronous fetching of metadata
As we are not constrained by time that much in the outlet, we can
simplify the fetching of metadata by doing it synchronously. We still
keep the breaker design to avoid continously polling a source that is
not responsive, so we still can loose some data if we are not able to
poll metadata. We also keep the background cache refresh. We also
introduce a grace time of 1 minute to avoid loosing data during start.

For the static provider, we wait for the remote data sources to be
ready. For the gNMI provider, there are target windows of availability
during which the cached data can be polled. The SNMP provider is loosing
its ability to coalesce requests.
2025-07-27 21:44:28 +02:00
Vincent Bernat
4c0b15e1cd inlet/outlet: rename a few metrics
For example:

```
 17:35 ❱ curl -s 127.0.0.1:8080/api/v0/outlet/metrics | promtool check metrics
akvorado_outlet_core_classifier_exporter_cache_size_items counter metrics should have "_total" suffix
akvorado_outlet_core_classifier_interface_cache_size_items counter metrics should have "_total" suffix
akvorado_outlet_flow_decoder_netflow_flowset_records_sum counter metrics should have "_total" suffix
akvorado_outlet_flow_decoder_netflow_flowset_records_sum non-histogram and non-summary metrics should not have "_sum" suffix
akvorado_outlet_flow_decoder_netflow_flowset_sum counter metrics should have "_total" suffix
akvorado_outlet_flow_decoder_netflow_flowset_sum non-histogram and non-summary metrics should not have "_sum" suffix
akvorado_outlet_kafka_buffered_fetch_records_total non-counter metrics should not have "_total" suffix
akvorado_outlet_kafka_buffered_produce_records_total non-counter metrics should not have "_total" suffix
akvorado_outlet_metadata_cache_refreshs counter metrics should have "_total" suffix
akvorado_outlet_routing_provider_bmp_peers_total non-counter metrics should not have "_total" suffix
akvorado_outlet_routing_provider_bmp_routes_total non-counter metrics should not have "_total" suffix
```

Also ensure metrics using errors as label don't have a too great
cardinality by using constants for error messages used.
2025-07-27 21:44:28 +02:00
Vincent Bernat
76151bea66 common/helpers: make some mapstructure hooks work with embedded structs
When using `mapstructure:",squash"`, most structure-specific hook don't
dive into the structure as they are provided with the parent structure.
Add an helper to make them work on the embedded structure as well and
use it for the generic "deprecated fields" hook, but also for the hook
for the common Kafka configuration.

This is a bit brittle. There are other use cases, but they may not need
this change.
2025-07-27 21:44:28 +02:00
Vincent Bernat
756e4a8fbd */kafka: switch to franz-go
The concurrency of this library is easier to handle than Sarama.
Notably, it is more compatible with the new model of "almost share
nothing" we use for the inlet and the outlet. The lock for workers in
outlet is removed. We can now use sync.Pool to allocate slice of bytes
in inlet.

It may also be more performant.

In the future, we may want to commit only when pushing data to
ClickHouse. However, this does not seem easy when there is a rebalance.
In case of rebalance, we need to do something when a partition is
revoked to avoid duplicating data. For example, we could flush the
current batch to ClickHouse. Have a look at the
`example/mark_offsets/main.go` file in franz-go repository for a
possible approach. In the meantime, we rely on autocommit.

Another contender could be https://github.com/segmentio/kafka-go. Also
see https://github.com/twmb/franz-go/pull/1064.
2025-07-27 21:44:28 +02:00
Vincent Bernat
85226d0326 docker: create a database "test" for ClickHouse
Keep using the default one for the migration tests, but for the small
tests, use the "test" one.
2025-07-27 21:44:28 +02:00
Vincent Bernat
ac68c5970e inlet: split inlet into new inlet and outlet
This change split the inlet component into a simpler inlet and a new
outlet component. The new inlet component receive flows and put them in
Kafka, unparsed. The outlet component takes them from Kafka and resume
the processing from here (flow parsing, enrichment) and puts them in
ClickHouse.

The main goal is to ensure the inlet does a minimal work to not be late
when processing packets (and restart faster). It also brings some
simplification as the number of knobs to tune everything is reduced: for
inlet, we only need to tune the queue size for UDP, the number of
workers and a few Kafka parameters; for outlet, we need to tune a few
Kafka parameters, the number of workers and a few ClickHouse parameters.

The outlet component features a simple Kafka input component. The core
component becomes just a callback function. There is also a new
ClickHouse component to push data to ClickHouse using the low-level
ch-go library with batch inserts.

This processing has an impact on the internal representation of a
FlowMessage. Previously, it was tailored to dynamically build the
protobuf message to be put in Kafka. Now, it builds the batch request to
be sent to ClickHouse. This makes the FlowMessage structure hides the
content of the next batch request and therefore, it should be reused.
This also changes the way we decode flows as they don't output
FlowMessage anymore, they reuse one that is provided to each worker.

The ClickHouse tables are slightly updated. Instead of using Kafka
engine, the Null engine is used instead.

Fix #1122
2025-07-27 21:44:28 +02:00
Vincent Bernat
ad59598831 inlet/kafka: move metric handling into common/kafka
This will be used for consumer as well.
2025-07-27 21:44:28 +02:00
Vincent Bernat
5a9a6e6f0a common/helpers: add a hook to deprecate some fields
And apply it to SystemLogTTL and PrometheusEndpoint. It would be nice to
log a warning, but we don't have access to a logger here.
2025-07-27 21:44:28 +02:00
Vincent Bernat
8ae23f9ae3 common/httpserver: log healthcheck and metrics endpoint at debug level 2025-07-27 11:38:20 +02:00
Vincent Bernat
d60a714b8c orchestrator/clickhouse: do not embed clickhouse database settings
Instead, properly use them from the clickhousedb component. Also provide
some automatic migration.
2025-07-08 09:06:31 +02:00
Vincent Bernat
1c42211219 common/helpers: fix mapstructure tests 2025-06-19 06:57:37 +02:00
Vincent Bernat
c073040673 common/clickhousedb: don't run cluster test if no cluster configured
Some checks failed
CI / 🤖 Check dependabot status (push) Waiting to run
CI / 🐧 Build and test on Linux (push) Blocked by required conditions
CI / 🍏 Build and test on macOS (push) Blocked by required conditions
CI / 🔍 Upload code coverage (push) Blocked by required conditions
CI / 🔭 Build Go backend (1.24) (push) Blocked by required conditions
CI / 🔭 Build JS frontend (18) (push) Blocked by required conditions
CI / 🔭 Build JS frontend (20) (push) Blocked by required conditions
CI / 🔭 Build JS frontend (22) (push) Blocked by required conditions
CI / ⚖️ Check licenses (push) Waiting to run
CI / 🐋 Build Docker images (push) Blocked by required conditions
CI / 🚀 Publish release (push) Blocked by required conditions
Update Nix dependencies / Update Nix lockfile (asn2org) (push) Has been cancelled
Update Nix dependencies / Update Nix lockfile (nixpkgs) (push) Has been cancelled
Update Nix dependencies / Update dependency hashes (push) Has been cancelled
2025-06-15 14:04:11 +02:00
Vincent Bernat
fb3f5f976b common: use slices from standard library instead of x/exp/slices 2025-06-15 13:58:30 +02:00
Vincent Bernat
322ddbe2ab common/helpers: add a useless test for how Diff() works with []byte 2025-06-11 22:47:27 +02:00
Vincent Bernat
3ee5aea894 tests: use b.Loop() instead of range b.N for benchmarks
See https://go.dev/blog/testing-b-loop
2025-05-25 15:16:23 +02:00
Vincent Bernat
a70744429a common/kafka: ability to specify OAuth scopes 2025-05-02 06:55:47 +02:00
Vincent Bernat
55b74a1954 common/kafka: rely on mechanism to enable or disable SASL
Some checks are pending
CI / 🤖 Check dependabot status (push) Waiting to run
CI / 🐧 Build and test on Linux (push) Blocked by required conditions
CI / 🍏 Build and test on macOS (push) Blocked by required conditions
CI / 🔍 Upload code coverage (push) Blocked by required conditions
CI / 🔭 Build Go backend (1.24) (push) Blocked by required conditions
CI / 🔭 Build JS frontend (18) (push) Blocked by required conditions
CI / 🔭 Build JS frontend (20) (push) Blocked by required conditions
CI / 🔭 Build JS frontend (22) (push) Blocked by required conditions
CI / ⚖️ Check licenses (push) Waiting to run
CI / 🐋 Build Docker images (push) Blocked by required conditions
CI / 🚀 Publish release (push) Blocked by required conditions
Update Nix dependencies / Update Nix lockfile (asn2org) (push) Waiting to run
Update Nix dependencies / Update Nix lockfile (nixpkgs) (push) Waiting to run
Update Nix dependencies / Update dependency hashes (push) Waiting to run
Instead of username. This should be the same, but the code is more
correct this way.
2025-05-01 20:16:03 +02:00
Vincent Bernat
f672ac98d9 common/kafka: add support for OAuth2
The support is still pretty basic. Notably, scopes are not
configurable (waiting for someone to request them) and maybe there
client ID and secrets should not be provided as username/password.

Fix #1714
2025-05-01 19:37:06 +02:00
Vincent Bernat
edf37390d4 common/helpers: remove nonexistent fields from TLS validation 2025-05-01 19:37:06 +02:00
Vincent Bernat
113df1995f common/kafka: put SASL parameters in their own section 2025-05-01 19:37:06 +02:00
Vincent Bernat
e1672c7f32 common/helpers: fix decoding of strings as slice
We use the previous version of the function from upstream.
2025-02-15 14:51:17 +01:00
Vincent Bernat
e08331a286 common/helpers: switch to a blessed fork of mapstructure 2025-02-15 14:51:17 +01:00
Vincent Bernat
88087809dd inlet/flow: decode destination BGP communities in sFlow packets 2025-01-18 19:29:55 +01:00
Vincent Bernat
4a9430e74b inlet/metadata: merge SNMP communities and USM into credentials
This unifies both structure and allows a user to define exception for
some specific subnets. See #1606.
2025-01-18 17:05:43 +01:00
Vincent Bernat
faf58ba5bb common/helpers: be stricter when trying to look for subnets
Otherwise, if the map contains "cafe", we may think this is a subnet
while it is obviously not. But we want to catch user errors like
"2o01:db8::/64" to provide a better error message.
2025-01-18 13:50:35 +01:00
Vincent Bernat
4332750edb common/kafka: use enumer for SASLMechanism 2024-11-23 23:48:02 +01:00
Vincent Bernat
a6ce62dda0 chore: fix inline comments in exported const blocks
They should not be inline, as this is not supported by godoc. This is
reported by revive 1.5.0.
2024-11-11 07:03:52 +01:00
Vincent Bernat
82b53b7792 common/helper: let Go deduce generic type for DefaultValuesUnmarshallerHook 2024-08-21 19:19:38 +02:00
Vincent Bernat
c948b9779e common/helper: add a helper to rename a configuration setting 2024-08-21 19:19:38 +02:00
Vincent Bernat
a449736a62 build: use Go 1.22 range over ints
Done with:

```
git grep -l 'for.*:= 0.*++' \
  | xargs sed -i -E 's/for (.*) := 0; \1 < (.*); \1\+\+/for \1 := range \2/'
```

And a few manual fixes due to unused variables. There is something fishy
in BMP rib test. Add a comment about that. This is not equivalent (as
with range, random is evaluated once, while in the original loop, it is
evaluated at each iteration). I believe the intent was to behave like
with range.
2024-08-14 10:11:35 +02:00
Vincent Bernat
51cae28c23 common/helpers: make readPcap a test helper 2024-07-21 16:22:18 +02:00
Vincent Bernat
102cd7fe9c common/kafka: set Kafka ClientID to akvorado instead of sarama 2024-07-21 16:18:01 +02:00