This change split the inlet component into a simpler inlet and a new
outlet component. The new inlet component receive flows and put them in
Kafka, unparsed. The outlet component takes them from Kafka and resume
the processing from here (flow parsing, enrichment) and puts them in
ClickHouse.
The main goal is to ensure the inlet does a minimal work to not be late
when processing packets (and restart faster). It also brings some
simplification as the number of knobs to tune everything is reduced: for
inlet, we only need to tune the queue size for UDP, the number of
workers and a few Kafka parameters; for outlet, we need to tune a few
Kafka parameters, the number of workers and a few ClickHouse parameters.
The outlet component features a simple Kafka input component. The core
component becomes just a callback function. There is also a new
ClickHouse component to push data to ClickHouse using the low-level
ch-go library with batch inserts.
This processing has an impact on the internal representation of a
FlowMessage. Previously, it was tailored to dynamically build the
protobuf message to be put in Kafka. Now, it builds the batch request to
be sent to ClickHouse. This makes the FlowMessage structure hides the
content of the next batch request and therefore, it should be reused.
This also changes the way we decode flows as they don't output
FlowMessage anymore, they reuse one that is provided to each worker.
The ClickHouse tables are slightly updated. Instead of using Kafka
engine, the Null engine is used instead.
Fix#1122
Done with:
```
git grep -l 'for.*:= 0.*++' \
| xargs sed -i -E 's/for (.*) := 0; \1 < (.*); \1\+\+/for \1 := range \2/'
```
And a few manual fixes due to unused variables. There is something fishy
in BMP rib test. Add a comment about that. This is not equivalent (as
with range, random is evaluated once, while in the original loop, it is
evaluated at each iteration). I believe the intent was to behave like
with range.
* fix: generation of protocols.csv file
* feat: generation of ports-tcp.csv and ports-udp.csv files
* build: add rules for creating udp and tcp csv files
* feat: create dictionary tcp and udp
* refactor: add replaceRegexpOne
* test: transform src port and dest port columns in SQL
* test: add TCP and UDP dictionaries for migration testing
Sometime exporter name and interface description do not carry
all the required information for classification and metadata extraction,
supporting a way to provide the data through metadata compoenent (only static seems to make
sense at this points) enables more use-cases.
They are stored in an array and there are some aliases to get 1st, 2nd
and third label. Support for sFlow would need a test to ensure it works
as expected.
Fix#960
It should be about an order higher than the number of exporters. For
example, if you get ~10 peers per exporters and 100 exporters, you get
1000 possible nexthops.
Also, make it disabled by default. Most new types should be opt-in as it
means more space on database.
Remaining tasks:
- [ ] use a dictionary for ICMP type/code and add completion
- [ ] add tests for ICMP (sFlow and Netflow)
- [ ] handle binary operators for TCP flags (optional, lot of work)
Fix#729
revive default configuration has changed in 1.3.0. Some stuff is a bit
silly (like empty blocks), but I find it easier to follow that than to
try to tweak the configuration.