* fix: generation of protocols.csv file
* feat: generation of ports-tcp.csv and ports-udp.csv files
* build: add rules for creating udp and tcp csv files
* feat: create dictionary tcp and udp
* refactor: add replaceRegexpOne
* test: transform src port and dest port columns in SQL
* test: add TCP and UDP dictionaries for migration testing
It was done to add the ability to downgrade the lock. However, the
number of times the RW lock is taken has been greatly reduced and it
does not make sense to maintain another implementation just for that.
By defaut, the clickhouse-client does only failover (ConnOpenInOrder)
the first instance that replies is used.
This enabled round-robin so that servers can all be used
in parallel.
Gob decoding is quite liberal and accepts anything that is not
conflicting as long as there is at least one field matching. That's not
what we want. To check we are decoding the right things, use a string
representation of zero values.
To be more space-efficient and faster, change the primary key to replace
TimeReceived by toStartOfFiveMinutes(TimeReceived). This is only
effective for new installations.
Fix#475
Fix#605
All MergeTree tables are now replicated.
For some tables, a `_local` variant is added and the non-`_local`
variant is now distributed. The distributed tables are the `flows`
table, the `flows_DDDD` tables (where `DDDD` is a duration), as well as
the `flows_raw_errors` table. The `exporters` table is not distributed
and stays local.
The data is following this schema:
- data is coming from `flows_HHHH_raw` table, using the Kafka engine
- the `flows_HHHH_raw_consumer` reads data from `flows_HHHH_raw` (local)
and sends it to `flows` (distributed) when there is no error
- the `flows_raw_errors_consumer` reads data from
`flows_HHHH_raw` (local) and sends it to
`flows_raw_errors` (distributed)
- the `flows_DDDD_consumer` reads fata from `flows_local` (local) and
sends it to `flow_DDDD_local` (local)
- the `exporters_consumer` reads data from `flows` (distributed) and
sends it to `exporters` (local)
The reason for `flows_HHHH_raw_consumer` to send data to the distributed
`flows` table, and not the local one is to ensure flows are
balanced (for example, if there is not enough Kafka partitions). But
sending it to `flows_local` would have been possible.
On the other hand, it is important for `flows_DDDD_consumer` to read
from local to avoid duplication. It could have sent to distributed, but
the data is now balanced correctly and we just send it to local instead
for better performance.
The `exporters_consumer` is allowed to read from the distributed `flows`
table because it writes the result to the local `exporters` table.
Sometime exporter name and interface description do not carry
all the required information for classification and metadata extraction,
supporting a way to provide the data through metadata compoenent (only static seems to make
sense at this points) enables more use-cases.
They are stored in an array and there are some aliases to get 1st, 2nd
and third label. Support for sFlow would need a test to ensure it works
as expected.
Fix#960