Commit Graph

617 Commits

Author SHA1 Message Date
Vincent Bernat
45ab047c80 config: also listen to 4739 for IPFIX
This is the port defined in RFC 7011.
2025-08-29 08:12:30 +02:00
Vincent Bernat
45c684dc67 docker: add Loki to the observability stack
Currently, only Akvorado logs are parsed.
2025-08-28 06:54:13 +02:00
Vincent Bernat
528b0399c1 docs: explain dependency versioning 2025-08-27 08:44:23 +02:00
Vincent Bernat
866658bc70 outlet/kafka: fix crash when scaling down and up the workers
The same metrics cannot be registered twice. Introduce a new method in
reporter to unregister a previously registered collector.

Fix #1908
2025-08-27 08:28:14 +02:00
Vincent Bernat
fa11e7de6d common/reporter: simplify interface for collecting metrics
Remove unused methods and always collect scoped metrics. As a
side-effect, BioRIS gRPC metrics are now correctly scoped.
2025-08-27 07:37:38 +02:00
Vincent Bernat
3affecc309 Revert "docs: prepare for 2.0.0 release and squash the changelog"
This reverts commit 3bff625a56. We are not
ready yet to do a final release.
2025-08-26 23:59:57 +02:00
Vincent Bernat
8eb7cd63b1 docker: make Alloy configuration use only Docker labels
This is a bit like Traefik. We set metrics.port on each container we
want to scrape metrics from (and optionally metrics.path).

Semi-related, but we also rely on exposed port for Traefik and we override
it for all containers to be sure we select the right one. This is less
error prone as we need at least one exposed port and some containers may
or may not have one. Just always set an exposed port if we have metrics
or traefik rules.
2025-08-26 23:22:09 +02:00
Vincent Bernat
291386b057 docs: fix unittest requiring an exact match on documentation 2025-08-26 08:39:48 +02:00
Vincent Bernat
06e3f334fd docs: proofread the whole documentation
Notably, more active voice and simplify a bit.
2025-08-26 08:25:57 +02:00
Vincent Bernat
2b3c463729 docker: switch from prometheus to alloy for scrapping metrics
The idea is that alloy can also be used for more. For example, we could
introduce Loki (with a `docker-compose-loki.yml`) and it would use alloy
too. Alloy configuration needs to be split into several parts and both
`docker-compose-prometheus.yml` and `docker-compose-loki.yml` would
define it but with an additional volume for their specific part of the
configuration (using the `extend` mechanism).

However, we don't use the bundled Node Exporter, nor the bundled
cAdvisor. It is better to have individual components to avoid reduce the
amount of code with elevated privileges (both Node Exporter and cAdvisor
need specific privileges). Also, we keep Prometheus instead of switching
to the full Grafana stack with Mimir as it is a more common setup and
this is not a goal to provide something universally scalable.

Also, Prometheus is now behind the private endpoint as it is possible to
send metrics.
2025-08-26 06:28:56 +02:00
Vincent Bernat
ccc2474dc0 docs: let's consider outlet a new feature! 2025-08-25 09:02:54 +02:00
Vincent Bernat
3bff625a56 docs: prepare for 2.0.0 release and squash the changelog 2025-08-25 07:23:54 +02:00
尤理衡 (Li-Heng Yu)
6ab52f2687 docs: Added BMP for Arista 2025-08-24 13:35:47 +02:00
Vincent Bernat
fe42529bbc build: switch to Go 1.25
But Go 1.24 is still supported.
2025-08-24 09:12:21 +02:00
Vincent Bernat
883e19922e build: add end-to-end testing 2025-08-20 13:41:54 +02:00
Vincent Bernat
f8161d9375 docker: set bridge name
Docker can easily break the firewall rules such that masquerading
happens internally.

```
ip saddr 247.16.12.0/24 oifname != "br-65eaa81ed142" counter packets 812 bytes 132030 masquerade
ip saddr 247.16.12.0/24 oifname != "br-fa3db0ecc1de" counter packets 0 bytes 0 masquerade
ip saddr 247.16.12.0/24 oifname != "br-c7a7788478c5" counter packets 0 bytes 0 masquerade
```

When the "current" bridge is the second one, inter-container
communication gets masqueraded. I didn't find an associated issue.
2025-08-20 06:27:17 +02:00
Vincent Bernat
67da65827f docker: don't start demo exporters by default
And provide a `.env.demo` file to be used on demo.akvorado.net.
2025-08-19 21:20:21 +02:00
Vincent Bernat
ed7e1ee67f docker: update Prometheus and node-exporter
Also fix configuration of node-exporter to really monitor the host. And
fix Prometheus configuration which was broken since we tried to monitor
Traefik (in 8f73f70050).
2025-08-19 00:14:25 +02:00
Vincent Bernat
83d8556d6d docker: add cAdvisor for container monitoring
This seems quite invasive. Not sure I want to keep it...
2025-08-19 00:14:25 +02:00
Vincent Bernat
b75a1825a2 docs: highlight how the configuration is distributed
I think this is a common error to only restart the service with the
changed configuration.
2025-08-18 20:35:45 +02:00
Vincent Bernat
f03c97e2da docs: explain the consequences of the updated Kafka volume path
When the final version will be released, this will be collapsed a bit as
we will keep only one changelog entry for all versions.
2025-08-18 20:28:19 +02:00
Vincent Bernat
faf985c738 docs: run tcpdump inside inlet container for SNMP 2025-08-18 20:25:43 +02:00
François HORTA
e682215b2e docker: update kafka data volume mount path
The apache image defines a volume under /var/lib/kafka/data, which is
created as an anonymous volume by docker unless docker compose properly
mounts to the right path.

This is unfortunately a breaking change.
2025-08-18 20:25:36 +02:00
Vincent Bernat
bf2ce871ea docs: last beta! 2025-08-18 08:36:27 +02:00
Vincent Bernat
5f7de0a16c docs: document the metric about buffer size 2025-08-17 16:16:20 +02:00
Vincent Bernat
736c4da8a0 outlet/routing: add an option to tune TCP receive buffer for BMP
The default value is quite low. This is a bit of a stop gap. The
alternative would be to maintain a circular buffer of the same size
inside the outlet for each connection and ensure there is no lock in the
path. But doing it in the kernel means almost no code, even if it is a
bit complex for the user.

Fix #1461
2025-08-17 15:13:49 +02:00
Vincent Bernat
255ab47898 docs: add documentation for ipfixprobe as well
It means there are two solutions available for Linux to get flows into
Akvorado.

Fix #156.
2025-08-17 12:04:34 +02:00
Vincent Bernat
2c3834c6fc docs: expand a bit on how to use pmacctd with Akvorado
An alternative would be ipfixprobe. See #156.
2025-08-17 09:06:42 +02:00
Vincent Bernat
b672c08c62 docs: tune allow_suspicious_low_cardinality_types 2025-08-16 21:48:29 +02:00
Vincent Bernat
2332bac5b7 docs: harmonize some categories in changelog 2025-08-16 21:38:19 +02:00
Vincent Bernat
5f1e9a49c7 docs: document schema update for installations before 1.10.0
Fix #1223
2025-08-16 21:36:14 +02:00
Vincent Bernat
68ac0f8bc6 docs: add a note about BMP status 2025-08-16 17:59:26 +02:00
Vincent Bernat
92ee2e05b7 outlet/routing: use gaissmai/bart instead of kentik/patricia
For each prefix, list of routes is stored into a map (like what is done
with kentik/patricia). The benchmark shows an improvement, both in
insertion time and in memory.

Before:

```
goos: linux
goarch: amd64
pkg: akvorado/outlet/routing/provider/bmp
cpu: AMD Ryzen 5 5600X 6-Core Processor
BenchmarkRandomRealWorldRoutes4-12         10000               106.4 ns/route
BenchmarkRIBInsertion/1000_routes,_1_peers-12               2504               100.0 %ins              361.4 bytes/route               515.6 ns/route
BenchmarkRIBInsertion/1000_routes,_2_peers-12               1234               100.0 %ins              337.5 bytes/route               545.7 ns/route
BenchmarkRIBInsertion/1000_routes,_5_peers-12                412               100.0 %ins              377.6 bytes/route               646.9 ns/route
BenchmarkRIBInsertion/10000_routes,_1_peers-12               170                99.98 %ins             373.0 bytes/route               780.7 ns/route
BenchmarkRIBInsertion/10000_routes,_2_peers-12                52                99.99 %ins             373.2 bytes/route              1136 ns/route
BenchmarkRIBInsertion/10000_routes,_5_peers-12                13                99.99 %ins             299.8 bytes/route              1877 ns/route
BenchmarkRIBInsertion/100000_routes,_1_peers-12                4                99.84 %ins             300.0 bytes/route              2918 ns/route
BenchmarkRIBInsertion/100000_routes,_2_peers-12                2                99.83 %ins             300.2 bytes/route              5220 ns/route
BenchmarkRIBInsertion/100000_routes,_5_peers-12                1                99.81 %ins             340.4 bytes/route             22259 ns/route
BenchmarkRIBLookup/1000_routes,_1_peers-12                 56382               214.2 ns/op
BenchmarkRIBLookup/1000_routes,_2_peers-12                 52376               227.3 ns/op
BenchmarkRIBLookup/1000_routes,_5_peers-12                 46570               257.8 ns/op
BenchmarkRIBLookup/10000_routes,_1_peers-12                 4084               277.2 ns/op
BenchmarkRIBLookup/10000_routes,_2_peers-12                 3552               295.2 ns/op
BenchmarkRIBLookup/10000_routes,_5_peers-12                 3586               340.0 ns/op
BenchmarkRIBLookup/100000_routes,_1_peers-12                 300               382.2 ns/op
BenchmarkRIBLookup/100000_routes,_2_peers-12                 240               474.1 ns/op
BenchmarkRIBLookup/100000_routes,_5_peers-12                 156               752.9 ns/op
BenchmarkRIBFlush/1000_routes,_1_peers-12                   8642                 0.1422 ms/op
BenchmarkRIBFlush/1000_routes,_2_peers-12                   4234                 0.2829 ms/op
BenchmarkRIBFlush/1000_routes,_5_peers-12                   1995                 0.5927 ms/op
BenchmarkRIBFlush/10000_routes,_1_peers-12                   807                 1.411 ms/op
BenchmarkRIBFlush/10000_routes,_2_peers-12                   360                 3.341 ms/op
BenchmarkRIBFlush/10000_routes,_5_peers-12                   166                 7.186 ms/op
BenchmarkRIBFlush/100000_routes,_1_peers-12                   58                20.85 ms/op
BenchmarkRIBFlush/100000_routes,_2_peers-12                   22                51.13 ms/op
BenchmarkRIBFlush/100000_routes,_5_peers-12                    8               135.5 ms/op
```

After:

```
goos: linux
goarch: amd64
pkg: akvorado/outlet/routing/provider/bmp
cpu: AMD Ryzen 5 5600X 6-Core Processor
BenchmarkRandomRealWorldRoutes4-12         10000               110.2 ns/route
BenchmarkRIBInsertion/1000_routes,_1_peers-12               2299               100.0 %ins              348.7 bytes/route               578.4 ns/route
BenchmarkRIBInsertion/1000_routes,_2_peers-12               1112               100.0 %ins              328.7 bytes/route               579.0 ns/route
BenchmarkRIBInsertion/1000_routes,_5_peers-12                432               100.0 %ins              279.7 bytes/route               615.6 ns/route
BenchmarkRIBInsertion/10000_routes,_1_peers-12               182                99.98 %ins             278.1 bytes/route               722.5 ns/route
BenchmarkRIBInsertion/10000_routes,_2_peers-12                61                99.99 %ins             273.0 bytes/route              1013 ns/route
BenchmarkRIBInsertion/10000_routes,_5_peers-12                14                99.99 %ins             232.4 bytes/route              1717 ns/route
BenchmarkRIBInsertion/100000_routes,_1_peers-12                4                99.84 %ins             228.3 bytes/route              2857 ns/route
BenchmarkRIBInsertion/100000_routes,_2_peers-12                2                99.83 %ins             214.3 bytes/route              4944 ns/route
BenchmarkRIBInsertion/100000_routes,_5_peers-12                1                99.81 %ins             265.4 bytes/route             22098 ns/route
BenchmarkRIBLookup/1000_routes,_1_peers-12                 61369               190.1 ns/op
BenchmarkRIBLookup/1000_routes,_2_peers-12                 64584               186.5 ns/op
BenchmarkRIBLookup/1000_routes,_5_peers-12                 63253               190.2 ns/op
BenchmarkRIBLookup/10000_routes,_1_peers-12                 5934               188.7 ns/op
BenchmarkRIBLookup/10000_routes,_2_peers-12                 5386               207.7 ns/op
BenchmarkRIBLookup/10000_routes,_5_peers-12                 5348               220.3 ns/op
BenchmarkRIBLookup/100000_routes,_1_peers-12                 516               227.1 ns/op
BenchmarkRIBLookup/100000_routes,_2_peers-12                 477               241.7 ns/op
BenchmarkRIBLookup/100000_routes,_5_peers-12                 428               264.2 ns/op
BenchmarkRIBFlush/1000_routes,_1_peers-12                   5246                 0.2294 ms/op
BenchmarkRIBFlush/1000_routes,_2_peers-12                   2984                 0.3965 ms/op
BenchmarkRIBFlush/1000_routes,_5_peers-12                   1406                 0.8498 ms/op
BenchmarkRIBFlush/10000_routes,_1_peers-12                   578                 2.084 ms/op
BenchmarkRIBFlush/10000_routes,_2_peers-12                   295                 3.988 ms/op
BenchmarkRIBFlush/10000_routes,_5_peers-12                   100                10.15 ms/op
BenchmarkRIBFlush/100000_routes,_1_peers-12                   33                30.82 ms/op
BenchmarkRIBFlush/100000_routes,_2_peers-12                   18                61.41 ms/op
BenchmarkRIBFlush/100000_routes,_5_peers-12                    7               158.4 ms/op
```

This is a 20% improvement on insertion, 30% on lookups, but 36%
degradation for flushing.

Fix #253

Next steps:
- test lockless updates (with *Persist functions)
2025-08-16 17:06:36 +02:00
Vincent Bernat
b6eca2d721 docs: add a link to ClickHouse documentation to run with less memory 2025-08-13 22:33:13 +02:00
Vincent Bernat
34db6a9f2c docs: remove vague assertions around upgrades 2025-08-11 08:25:31 +02:00
Vincent Bernat
2d091617d3 docs: prepare for another beta 2025-08-11 07:35:37 +02:00
Vincent Bernat
a423ec44d6 docker: move TLS configuration into its own docker-compose file
This makes it easier to use.
2025-08-10 23:01:18 +02:00
Vincent Bernat
1a27bb1bc2 docker: add examples to enable authentication and TLS 2025-08-10 22:33:04 +02:00
Vincent Bernat
84b6f4584e docker: explain how to not expose Kafka-UI and Traefik dashboard 2025-08-10 15:58:37 +02:00
Vincent Bernat
dbadbf3adf docker: expose Traefik dashboard on the public endpoint
It is also read-only.
2025-08-10 15:55:04 +02:00
Vincent Bernat
1070e5b4f0 docker: document how to properly bind on port 80
Add more documentation around merging in Docker. The previous
documentation was incorrect.
2025-08-10 15:43:10 +02:00
Vincent Bernat
f976d66bd4 outlet/flow: decode IPFIX ingress/egressPhysicalInterface
Also, don't decode IPv4/IPv6 addresses when they are 0 (some templates
will include both). Also decode dot1VlanId and postDot1qVlanId but
prefer vlanId and postVlanId if they are present.

Fix #1621
2025-08-10 10:14:48 +02:00
Vincent Bernat
09a5a32375 docs: make the minimum configuration more prominent 2025-08-09 16:59:01 +02:00
Vincent Bernat
e5a625aecf outlet: make the number of Kafka workers dynamic
Inserting into ClickHouse should be done in large batches to minimize
the number of parts created. This would require the user to tune the
number of Kafka workers to match a target of around 50k-100k rows. Instead,
we dynamically tune the number of workers depending on the load to reach
this target.

We keep using async if we are too low in number of flows.

It is still possible to do better by consolidating batches from various
workers, but that's something I wanted to avoid.

Also, increase the maximum wait time to 5 seconds. It should be good
enough for most people.

Fix #1885
2025-08-09 15:58:25 +02:00
Vincent Bernat
e8bce09aec docs: ensure people don't run docker system prune -a
Maybe it would even be better to remove it?
2025-08-09 15:26:01 +02:00
Vincent Bernat
a74a41a6a0 docker: enable IPv6 networking, use a specific IPv4 subnet
And also add documentation on how to use IPv6. The proposed setup relies
on NAT66, which is not good, but it works on any host with IPv6
connectivity. The documentation explains how to configure routed IPv6.

By using an IPv4 subnet in class E, we ensure that it is very unlikely
users will have overlap between their Docker setup and their production
network. This way, no need to change the Docker daemon configuration.
2025-08-08 12:08:00 +02:00
Vincent Bernat
fd9dc0dbf3 docs: add more tips for incorrect metadata 2025-08-08 08:21:31 +02:00
Vincent Bernat
0bbe62b1d4 docs: remove advice on the active parts
The advice was not true. An active part is not one that should be
actively merged, it's one that is used (and not to be deleted).
ClickHouse is good with more than 10k parts.
2025-08-06 19:05:35 +02:00
Vincent Bernat
a862f302f2 docs: also mention tuning maximum-wait-time for ClickHouse 2025-08-06 07:44:03 +02:00
Vincent Bernat
abb9125502 outlet/clickhouse: use async insert when flow count is too low
This should help to resolve the issue behind #1885.
2025-08-06 06:58:12 +02:00