The lookup benchmark was incorrect. When looking up a large number of
prefixes on each loop, b.Loop() calibrate on a larger and less precise
value than if it were measuring only one lookup where it would iterate
more to get a precise timing.
The problem may also exist for the insertion benchmark, but it's
difficult to do only one insertion per loop, as after many iterations,
there is nothing more we can insert. I suppose BART's author is not
trying to benchmark insertions because of this.
See https://github.com/akvorado/akvorado/pull/2040 and
https://github.com/gaissmai/bart/issues/351#issuecomment-3428806758.
Some of the files were quite big:
- asns.csv ~ 3 MB
- index.js ~ 1.5 MB
- *.svg ~ 2 MB
Use a ZIP archive to put them all and embed it. This reduce the binary
size from 89 MB to 82 MB. 🤯
This also pulls some code modernization (use of http.ServeFileFS).
I did not benchmark it myself, but it was benchmarked here:
https://github.com/osrg/gobgp/issues/1414#issuecomment-3067255941
Of course, no guarantee that this benchmark matches our use cases.
Moreover, SubnetMap have been optimized to avoid parsing keys all
the time.
Also, the interface is a bit nicer and it uses netip.Prefix directly.
The next step is to convert outlet/routing/provider/bmp.
This change split the inlet component into a simpler inlet and a new
outlet component. The new inlet component receive flows and put them in
Kafka, unparsed. The outlet component takes them from Kafka and resume
the processing from here (flow parsing, enrichment) and puts them in
ClickHouse.
The main goal is to ensure the inlet does a minimal work to not be late
when processing packets (and restart faster). It also brings some
simplification as the number of knobs to tune everything is reduced: for
inlet, we only need to tune the queue size for UDP, the number of
workers and a few Kafka parameters; for outlet, we need to tune a few
Kafka parameters, the number of workers and a few ClickHouse parameters.
The outlet component features a simple Kafka input component. The core
component becomes just a callback function. There is also a new
ClickHouse component to push data to ClickHouse using the low-level
ch-go library with batch inserts.
This processing has an impact on the internal representation of a
FlowMessage. Previously, it was tailored to dynamically build the
protobuf message to be put in Kafka. Now, it builds the batch request to
be sent to ClickHouse. This makes the FlowMessage structure hides the
content of the next batch request and therefore, it should be reused.
This also changes the way we decode flows as they don't output
FlowMessage anymore, they reuse one that is provided to each worker.
The ClickHouse tables are slightly updated. Instead of using Kafka
engine, the Null engine is used instead.
Fix#1122
Instead, just map configuration files inside the container. As we don't
have to push the schema anymore, pushing some arbitrary configuration
does not seem to be our job.
Prefer use of time.NewTimer() when there is a risk of accumulating
timers in a loop. This enables the use of t.Stop() to avoid leaking too
many timers.
For tests, we don't need to do that. For places where the alternative to
the timer is just the app dying, we don't need to do that either.
In Go 1.23, it won't make a difference.
I don't want to keep disabling the experimental analyzer forever. The
version check could be turned into disabling the experimental analyzer,
but this is better to push people to update their versions.
To be pushed only when 24.3 (LTS) and 24.4 gets the fix.
* fix: generation of protocols.csv file
* feat: generation of ports-tcp.csv and ports-udp.csv files
* build: add rules for creating udp and tcp csv files
* feat: create dictionary tcp and udp
* refactor: add replaceRegexpOne
* test: transform src port and dest port columns in SQL
* test: add TCP and UDP dictionaries for migration testing
This was introduced with #1059, but I think this was a mistake. Notably,
it enables erasing the tenants provided by the user.
It also opens the question whetever to have network sources or static
sources override more specific entries or not. This is currently not the
case, but then, if a more specific GeoIP entry appears, it may require
to add a more specific entry if overriding is needed.
This could also be configurable.