Commit Graph

39 Commits

Author SHA1 Message Date
Vincent Bernat
abca5e983d chore: modernize some code 2025-11-14 23:22:02 +01:00
Vincent Bernat
ac68c5970e inlet: split inlet into new inlet and outlet
This change split the inlet component into a simpler inlet and a new
outlet component. The new inlet component receive flows and put them in
Kafka, unparsed. The outlet component takes them from Kafka and resume
the processing from here (flow parsing, enrichment) and puts them in
ClickHouse.

The main goal is to ensure the inlet does a minimal work to not be late
when processing packets (and restart faster). It also brings some
simplification as the number of knobs to tune everything is reduced: for
inlet, we only need to tune the queue size for UDP, the number of
workers and a few Kafka parameters; for outlet, we need to tune a few
Kafka parameters, the number of workers and a few ClickHouse parameters.

The outlet component features a simple Kafka input component. The core
component becomes just a callback function. There is also a new
ClickHouse component to push data to ClickHouse using the low-level
ch-go library with batch inserts.

This processing has an impact on the internal representation of a
FlowMessage. Previously, it was tailored to dynamically build the
protobuf message to be put in Kafka. Now, it builds the batch request to
be sent to ClickHouse. This makes the FlowMessage structure hides the
content of the next batch request and therefore, it should be reused.
This also changes the way we decode flows as they don't output
FlowMessage anymore, they reuse one that is provided to each worker.

The ClickHouse tables are slightly updated. Instead of using Kafka
engine, the Null engine is used instead.

Fix #1122
2025-07-27 21:44:28 +02:00
Vincent Bernat
fb3f5f976b common: use slices from standard library instead of x/exp/slices 2025-06-15 13:58:30 +02:00
Paul Galceran
43c169677a Resolve L4 ports protocol names (#1257)
* fix: generation of protocols.csv file

* feat: generation of ports-tcp.csv and ports-udp.csv files

* build: add rules for creating udp and tcp csv files

* feat: create dictionary tcp and udp

* refactor: add replaceRegexpOne

* test: transform src port and dest port columns in SQL

* test: add TCP and UDP dictionaries for migration testing
2024-06-14 21:52:56 +02:00
Vincent Bernat
297e04b95c common: clickHouse → clickhouse
Let's say that we use "ClickHouse" and "clickhouse".
2024-06-09 14:59:09 +02:00
Vincent Bernat
dcdbf208d1 orchestrator/clickhouse: optimize dictionary lookup for networks 2024-03-13 20:32:20 +01:00
Francois Espinet
87a57bf82e Do geoip enrich in clickhouse instead of inlet
One solution to https://github.com/akvorado/akvorado/issues/62
2024-03-11 15:29:09 +01:00
Vincent Bernat
f321e8fa64 common/helpers: add a way to test Marshal/Unmarshal for bimaps 2024-01-22 21:53:26 +01:00
netixx
374a1fce55 Refactor to use common structs where possible 2024-01-22 20:50:13 +01:00
netixx
3188be5d23 Support providing exporter and iface metadata through metadata instead of classifiers
Sometime exporter name and interface description do not carry
all the required information for classification and metadata extraction,
supporting a way to provide the data through metadata compoenent (only static seems to make
sense at this points) enables more use-cases.
2024-01-22 20:50:13 +01:00
Vincent Bernat
4a7a779237 common/schema: add MPLS4thLabel
MPLS labels often go by pair. It makes sense to access the 4th one easily.
2023-11-28 19:47:03 +01:00
Vincent Bernat
82051b552f inlet: decode MPLS labels
They are stored in an array and there are some aliases to get 1st, 2nd
and third label. Support for sFlow would need a test to ensure it works
as expected.

Fix #960
2023-11-25 20:34:45 +01:00
Vincent Bernat
6dc0b512c6 console/filter: add filtering support for custom columns
Some of the code is based on #870.
2023-09-16 17:19:12 +02:00
Marvin Gaube
5efa368e79 feat: add option for materialized types & improve filter performance for materialized Prefixes 2023-09-08 20:54:27 +02:00
Vincent Bernat
d1cef41849 common/schema: don't store number of dynamic columns as state
Instead, we compute the maximum value for `Key` among the current set of columns.
2023-08-25 22:20:52 +02:00
Marvin Gaube
e6effd1335 feat: add custom dictionaries for additional, customized flow hydration 2023-08-25 22:10:30 +02:00
Vincent Bernat
bebfa28b5d common/schema: use LowCardinality for NextHop
It should be about an order higher than the number of exporters. For
example, if you get ~10 peers per exporters and 100 exporters, you get
1000 possible nexthops.

Also, make it disabled by default. Most new types should be opt-in as it
means more space on database.
2023-08-25 21:59:53 +02:00
Marvin Gaube
fa0ac2388a feat: expose nexthop as dimension 2023-08-25 21:57:26 +02:00
Vincent Bernat
0e1b5a3351 common/schema: introduce ICMPv4/ICMPv6 virtual columns 2023-06-03 18:57:19 +02:00
Vincent Bernat
5067072c12 common/schema: use separate fields for ICMP v4 and ICMP v6
They have different values. Use ICMPv4 and not ICMP, because it is IPv4
specific (all fields hold this convention currently, good).
2023-05-31 09:18:03 +02:00
Vincent Bernat
9ce245d236 common/schema: add TTL, fragments, ToS, TCP flags and ICMP type/code
Remaining tasks:

- [ ] use a dictionary for ICMP type/code and add completion
- [ ] add tests for ICMP (sFlow and Netflow)
- [ ] handle binary operators for TCP flags (optional, lot of work)

Fix #729
2023-05-31 09:08:16 +02:00
Vincent Bernat
63267f0f5b console: enable SrcAddr/DstAddr truncation to a fixed length 2023-02-22 20:55:21 +01:00
Vincent Bernat
a912da7fa1 build: use gofumpt
Undecided if we need to use it. I think it's nice.
2023-02-11 10:03:45 +01:00
Vincent Bernat
930c2daa4c common/schema: reduce storage for Src/DstAddr, Bytes, Packets
Also teach orchestrator to change compression codecs.
2023-02-06 22:56:54 +01:00
Vincent Bernat
65e3e1783a common/schema: check for dependencies between columns 2023-01-30 06:48:16 +01:00
Vincent Bernat
51988449d2 common/schema: fix GenerateFrom for DstNet* 2023-01-27 22:49:17 +01:00
Vincent Bernat
bd8928c9f1 common/schema: fix ClickHouse type for Src/DstMAC 2023-01-24 11:41:48 +01:00
Vincent Bernat
9c51b22845 common/schema: group some columns to skip them quickly when not enabled 2023-01-21 11:19:13 +01:00
Vincent Bernat
9eee46cade common/schema: add SrcMAC and DstMAC 2023-01-19 23:12:17 +01:00
Vincent Bernat
78caf8e07b common/schema: add Src/DstAddrNAT, Src/DstPortNAT, DstPortNAT`
Also parse them for IPFIX.

Fix #211
2023-01-19 19:38:27 +01:00
Vincent Bernat
72d51d0512 common/schema: make enabled/disabled columns configurable 2023-01-19 18:53:21 +01:00
Vincent Bernat
a8e05548a4 common/schema: add disabled columns
We introduce SrcVlan and DstVlan for that. On next commit, a user will
be able to enable/disable columns. Adding columns will still need to
have code for that.
2023-01-19 17:13:50 +01:00
Vincent Bernat
eba3af5183 schema/common: rename MainOnly to ClickHouseMainOnly 2023-01-18 18:09:45 +01:00
Vincent Bernat
c5aa1e7bfa common/schema: generate bimap for column names 2023-01-18 16:36:11 +01:00
Vincent Bernat
c6a9319b57 common/schema: turns into a component
This is a first step to make it accept configuration. Most of the
changes are quite trivial, but I also ran into some difficulties with
query columns and filters. They need the schema for parsing, but parsing
happens before dependencies are instantiated (and even if it was not the
case, parsing is stateless). Therefore, I have added a `Validate()`
method that must be called after instantiation. Various bits `panic()`
if not validated to ensure we catch all cases.

The alternative to make the component manages a global state would have
been simpler but it would break once we add the ability to add or
disable columns.
2023-01-18 12:22:10 +01:00
Vincent Bernat
1ae890cd7d common/schema: make SrcPort/DstPort 16-bit to reduce ClickHouse storage
We would need to do it for EType, Proto and ForwardingStatus but as they
are primary keys, this is something difficult to change right now.
2023-01-17 20:53:00 +01:00
Vincent Bernat
e352202631 inlet: make use of schema for inlet
This is a huge change to make the various subcomponents of the inlet use
the schema to generate the protobuf. For it to make sense, we also
modify the way we parse flows to directly serialize non-essential fields
to Protobuf.

The performance is mostly on par with the previous commit. We are a bit
less efficient because we don't have a fixed structure, but we avoid
loosing too much performance by not relying on reflection and keeping
the production of messages as code. We use less of Goflow2: raw flow
parsing is still done by Goflow2, but we don't use the producer part
anymore. This helps a bit with the performance as we parse less.
Overall, we are 20% than the previous commit and twice faster than the
1.6.4!

```
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
BenchmarkDecodeEncodeNetflow
BenchmarkDecodeEncodeNetflow/with_encoding
BenchmarkDecodeEncodeNetflow/with_encoding-12             151484              7789 ns/op            8272 B/op        143 allocs/op
BenchmarkDecodeEncodeNetflow/without_encoding
BenchmarkDecodeEncodeNetflow/without_encoding-12          162550              7133 ns/op            8272 B/op        143 allocs/op
BenchmarkDecodeEncodeSflow
BenchmarkDecodeEncodeSflow/with_encoding
BenchmarkDecodeEncodeSflow/with_encoding-12                94844             13193 ns/op            9816 B/op        295 allocs/op
BenchmarkDecodeEncodeSflow/without_encoding
BenchmarkDecodeEncodeSflow/without_encoding-12             92569             12456 ns/op            9816 B/op        295 allocs/op
```

There was a tentative to parse sFlow packets with gopackets, but the
adhoc parser used here is more performant.
2023-01-17 20:53:00 +01:00
Vincent Bernat
8a779fb905 common/schema: make schema fields private
This is useful later to bundle cached field and ensure they stay
up-to-date.
2023-01-17 20:53:00 +01:00
Vincent Bernat
727807b937 common/schema: use a symbol to identify columns 2023-01-17 20:53:00 +01:00