27 KiB
Operations
While Akvorado itself does not require much memory and disk space, Kafka and ClickHouse have heavier needs. To get started, do not run the complete setup with less than 16 GB of RAM (32 GB or more is recommended) and with less than 50 GB of disk (100 GB or more is recommended). Use at least 8 vCPUs.
demo.akvorado.net currently runs in a VM with 4 vCPUs, 100 GB of disk, and 8
GB of RAM, but it uses 4 GB of swap.
Router configuration
Each router should be configured to send flows to the Akvorado inlet service and accept SNMP requests. For routers not listed below, see the configuration snippets from Kentik.
It is better to sample on ingress only. This requires sampling on both external and internal interfaces. This prevents flows from being counted twice when they enter and exit through external ports.
Exporter Address
The exporter address is set from the field inside the flow message by default
and is used, for example, for SNMP requests. However, if the set flow address
(also called agent ID) is wrong, you can use the source IP of the flow packet
instead by setting use-src-addr-for-exporter-addr: true in the flow
configuration.
Please note that with this configuration, your deployment must not change the source IP. This might happen with Docker or Kubernetes networking.
Cisco IOS-XE
You can enable NetFlow with the following configuration:
flow record Akvorado
match ipv4 tos
match ipv4 protocol
match ipv4 source address
match ipv4 destination address
match transport source-port
match transport destination-port
collect routing source as 4-octet
collect routing destination as 4-octet
collect routing next-hop address ipv4
collect transport tcp flags
collect interface output
collect interface input
collect counter bytes
collect counter packets
collect timestamp sys-uptime first
collect timestamp sys-uptime last
!
flow record Akvorado-IPV6
match ipv6 protocol
match ipv6 source address
match ipv6 destination address
match transport source-port
match transport destination-port
collect routing source as 4-octet
collect routing destination as 4-octet
collect routing next-hop address ipv4
collect transport tcp flags
collect interface output
collect interface input
collect counter bytes
collect counter packets
collect timestamp sys-uptime first
collect timestamp sys-uptime last
!
sampler random1in100
mode random 1 out-of 100
!
flow exporter AkvoradoExport
destination <akvorado-ip> vrf monitoring
source Loopback20
transport udp 2055
version 9
option sampler-table timeout 10
!
flow monitor AkvoradoMonitor
exporter AkvoradoExport
cache timeout inactive 10
cache timeout active 60
record Akvorado
!
flow monitor AkvoradoMonitor-IPV6
exporter AkvoradoExport
cache timeout inactive 10
cache timeout active 60
record Akvorado-IPV6
!
To enable NetFlow on an interface, use this snippet:
interface GigabitEthernet0/0/3
ip flow monitor AkvoradoMonitor sampler random1in100 input
ip flow monitor AkvoradoMonitor sampler random1in100 output
ipv6 flow monitor AkvoradoMonitor-IPV6 sampler random1in100 input
ipv6 flow monitor AkvoradoMonitor-IPV6 sampler random1in100 output
!
According to issue #89, the
sampling rate is not reported correctly on this platform. The solution is to set
a default sampling rate in akvorado.yaml. See the
documentation for more details.
inlet:
core:
default-sampling-rate: 100
NCS 5500 and ASR 9000
On each router, you can enable NetFlow with the following configuration. It is important to use a power of two for the sampling rate (at least on NCS).
sampler-map sampler1
random 1 out-of 32768
!
flow exporter-map akvorado
version v9
options sampler-table timeout 10
template options timeout 10
!
transport udp 2055
source Loopback20
destination <akvorado-ip> vrf private
!
flow monitor-map monitor1
record ipv4
exporter akvorado
cache entries 100000
cache timeout active 15
cache timeout inactive 2
cache timeout rate-limit 2000
!
flow monitor-map monitor2
record ipv6
exporter akvorado
cache entries 100000
cache timeout active 15
cache timeout inactive 2
cache timeout rate-limit 2000
!
Optionally, you can push the AS path to the forwarding database, and the source and destination AS will be present in NetFlow packets:
router bgp <asn>
address-family ipv4 unicast
bgp attribute-download
!
address-family ipv6 unicast
bgp attribute-download
To enable NetFlow on an interface, use this snippet:
interface Bundle-Ether4000
flow ipv4 monitor monitor1 sampler sampler1 ingress
flow ipv6 monitor monitor2 sampler sampler1 ingress
!
Also, see the troubleshooting section on how to scale NetFlow on the NCS 5500.
Then, you need to enable SNMP:
snmp-server community <community> RO IPv4
snmp-server ifindex persist
control-plane
management-plane
inband
interface all
allow SNMP peer
address ipv4 <akvorado-ip>
To configure BMP, adapt this snippet:
bmp server 1
host <akvorado-ip> port 10179
flapping-delay 60
bmp server all
route-monitoring policy post inbound
router bgp 65400
vrf public
neighbor 192.0.2.100
bmp-activate server 1
Juniper
NetFlow
For MX and SRX devices, you can use NetFlow v9 to export flows.
groups {
sampling {
interfaces {
<*> {
unit <*> {
family inet {
sampling {
input;
}
}
family inet6 {
sampling {
input;
}
}
}
}
}
}
}
forwarding-options {
sampling {
instance {
sample-ins {
input {
rate 1024;
max-packets-per-second 65535;
}
family inet {
output {
flow-server 192.0.2.1 {
port 2055;
autonomous-system-type origin;
source-address 203.0.113.2;
version9 {
template {
ipv4;
}
}
}
inline-jflow {
source-address 203.0.113.2;
}
}
}
family inet6 {
output {
flow-server 192.0.2.1 {
port 2055;
autonomous-system-type origin;
source-address 203.0.113.2;
version9 {
template {
ipv6;
}
}
}
inline-jflow {
source-address 203.0.113.2;
}
}
}
}
}
}
}
chassis {
fpc 0 {
sampling-instance sample-ins;
inline-services {
flex-flow-sizing;
}
}
}
services {
flow-monitoring {
version9 {
template ipv4 {
nexthop-learning enable;
flow-active-timeout 10;
flow-inactive-timeout 10;
template-refresh-rate {
packets 30;
seconds 30;
}
option-refresh-rate {
packets 30;
seconds 30;
}
ipv4-template;
}
template ipv6 {
nexthop-learning enable;
flow-active-timeout 10;
flow-inactive-timeout 10;
template-refresh-rate {
packets 30;
seconds 30;
}
option-refresh-rate {
packets 30;
seconds 30;
}
ipv6-template;
}
}
}
}
Then, for each interface you want to enable IPFIX on, use this:
interfaces {
xe-0/0/0.0 {
description "Transit: Cogent AS179 [3-10109101]";
apply-groups [ sampling ];
}
}
If inet.0 is not enough to reach Akvorado, you need to add a specific route:
routing-options {
static {
route 192.0.2.1/32 next-table internet.inet.0;
}
}
Another option is IPFIX (replace version9 with version-ipfix). However,
Juniper only includes total counters for bytes and packets, instead of delta
counters. Akvorado does not support these counters.
sFlow
For QFX devices, you can use sFlow.
protocols {
sflow {
agent-id 203.0.113.4;
polling-interval 5;
sample-rate ingress 8192;
source-ip 203.0.113.4;
collector 192.0.2.1 {
udp-port 6343;
}
interfaces et-0/0/13.0;
}
}
SNMP
Then, configure SNMP:
snmp {
location "Equinix PA1, FR";
community blipblop {
authorization read-only;
routing-instance internet;
}
routing-instance-access;
}
BMP
If needed, you can configure BMP on one router to send all AdjRIB-in to Akvorado.
routing-options {
bmp {
connection-mode active;
station-address 203.0.113.1;
station-port 10179;
station collector;
hold-down 30 flaps 10 period 30;
route-monitoring post-policy;
monitor enable;
}
}
See Juniper's documentation for more details.
Arista
sFlow
For Arista devices, you can use sFlow.
sflow sample 1024
sflow sample output subinterface
sflow sample input subinterface
sflow vrf VRF-MANAGEMENT destination 192.0.2.1
sflow vrf VRF-MANAGEMENT source-interface Management1
sflow interface egress enable default
sflow run
SNMP
Then, configure SNMP:
snmp-server community <community> ro
snmp-server vrf VRF-MANAGEMENT
BMP
If needed, you can also configure BMP.
router bgp 65001
bgp monitoring
monitoring station COLLECTOR
update-source Management1
connection address 10.122.4.51
connection mode active port 10179
export-policy received routes post-policy
export-policy bgp rib bestpaths
Nokia SR OS
The syntax below is for the model-driven command line interface (MD-CLI). The full context is provided to make it easier to adapt to the classic CLI.
Flows
sFlow is not well supported on devices running SR OS. It is best to use IPFIX.
/configure cflowd admin-state enable
/configure cflowd cache-size 250000
/configure cflowd template-retransmit 60
/configure cflowd active-flow-timeout 15
/configure cflowd inactive-flow-timeout 15
/configure cflowd sample-profile 1 sample-rate 2000
/configure cflowd collector 192.0.2.1 port 2055 admin-state enable
/configure cflowd collector 192.0.2.1 port 2055 description "akvorado.example.net"
/configure cflowd collector 192.0.2.1 port 2055 router-instance "Base"
/configure cflowd collector 192.0.2.1 port 2055 version 10
Either configure sampling on the individual interfaces:
/configure service ies "internet" interface "if1/1/c1/1:0" cflowd-parameters sampling unicast type interface
/configure service ies "internet" interface "if1/1/c1/1:0" cflowd-parameters sampling unicast direction ingress-only
/configure service ies "internet" interface "if1/1/c1/1:0" cflowd-parameters sampling unicast sample-profile 1
Or, add it to apply-groups that are probably already in place:
/configure groups group "peering" service ies "internet" interface "<i.*>" cflowd-parameters sampling unicast type interface
/configure groups group "peering" service ies "internet" interface "<i.*>" cflowd-parameters sampling unicast direction ingress-only
/configure groups group "peering" service ies "internet" interface "<i.*>" cflowd-parameters sampling unicast sample-profile 1
/configure service ies "internet" interface "if1/1/c1/1:0" apply-groups ["peering"]
SNMP
Nokia routers running SR OS use a different interface index in their flow
records than the SNMP interface index that is usually used by other devices. To
fix this, you need to use cflowd use-vrtr-if-index. You can find more
information in Nokia's
documentation.
GNMI
Instead of SNMP, you can use gNMI. The interface index challenge (see SNMP
above) also applies. See this
discussion for more
details and possible workarounds.
In the example below, unencrypted connections are used. Check the documentation to enable TLS for a more secure setup.
/configure system grpc admin-state enable
/configure system grpc allow-unsecure-connection
/configure system security user-params local-user user "akvorado" access grpc true
/configure system security user-params local-user user "akvorado" console member ["grpc_ro"]
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnmi-get permit
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnmi-set deny
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnmi-subscribe permit
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnoi-file-get deny
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnoi-file-transfertoremote deny
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnoi-file-put deny
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnoi-file-stat deny
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnoi-file-remove deny
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization md-cli-session deny
BMP
/configure bmp admin-state enable
/configure bmp station "akvorado" admin-state enable
/configure bmp station "akvorado" description "akvorado.example.net"
/configure bmp station "akvorado" stats-report-interval 300
/configure bmp station "akvorado" connection local-address 192.0.2.42
/configure bmp station "akvorado" connection station-address ip-address 192.0.2.1
/configure bmp station "akvorado" connection station-address port 10179
/configure bmp station "akvorado" family ipv4 true
/configure bmp station "akvorado" family ipv6 true
/configure router "Base" bgp monitor admin-state enable
/configure router "Base" bgp monitor route-monitoring post-policy true
/configure router "Base" bgp monitor station "akvorado" { }
GNU/Linux
pmacct
pmacct is a set of multi-purpose passive network monitoring tools, including an sFlow exporter.
Put the following configuration in /etc/pmacctd/config.conf. Replace
akvorado-inlet-receiver with the correct IP:
daemonize: false
plugins: sfprobe[any]
sfprobe_receiver: akvorado-inlet-receiver:6343
aggregate: src_host,dst_host,in_iface,out_iface,src_port,dst_port,proto
pcap_ifindex: map
pcap_interfaces_map: /etc/pmacctd/interfaces.map
pcap_interface_wait: true
sfprobe_agentsubid: 1402
sampling_rate: 1000
snaplen: 128
In /etc/pmacctd/interfaces.map, adapt this snippet to your setup:
ifindex=1 ifname=lo direction=in
ifindex=1 ifname=lo direction=out
ifindex=3 ifname=eth0 direction=in
ifindex=3 ifname=eth0 direction=out
ifindex=4 ifname=eth1 direction=in
ifindex=4 ifname=eth1 direction=out
We set the interface indexes manually based on the interface names to avoid running an SNMP daemon. Use the static metadata provider to match the exporter and provide interface names and descriptions to Akvorado:
inlet:
providers:
- type: static
exporters:
2001:db8:1::1:
name: exporter1
ifindexes:
3:
name: eth0
description: PNI Google
speed: 10000
4:
name: eth1
description: PNI Netflix
speed: 10000
ipfixprobe
ipfixprobe is a modular IPFIX flow exporter. It
supports a pcap plugin for low bandwidth use (less than 1 Gbps) and a
dpdk plugin for 100 Gbps or more.
Here is an example of how to invoke the pcap plugin:
ipfixprobe \
-i "pcap;ifc=eth0;snaplen=128" \
-s "cache;active=5;inactive=5" \
-o "ipfix;host=akvorado-inlet-receiver;port=2055;udp;id=1;dir=1"
You need to run one ipfixprobe instance for each interface. Each interface
should have its own id and dir. As with pmacct, use the static metadata
provider to provide interface names and descriptions to Akvorado.
Warning
Until Akvorado supports bidirectional flows (RFC 5103), only incoming flows are correctly counted. The
splitoption for the cache plugin would help to count both directions, but the input interface would be incorrect for outgoing flows.
Kafka
When you use docker compose, a Kafka UI runs at
http://127.0.0.1:8080/kafka-ui/. It provides various operational
metrics that you can check, such as the space used by each topic.
ClickHouse
While ClickHouse works well out-of-the-box, we still recommend that you read its documentation. Altinity also provides a knowledge base with other tips.
Tip
To connect to the ClickHouse database in the Docker Compose setup, use
docker compose exec clickhouse clickhouse-client.
System tables
ClickHouse is configured to log various events in MergeTree tables. By default, these tables are unbounded. Unless you configure it otherwise, the orchestrator sets a TTL of 30 days. You can also customize these tables in the configuration files or disable them completely. See the ClickHouse documentation for more details.
This request is useful to see how much space is used for each table:
SELECT database, name, formatReadableSize(total_bytes)
FROM system.tables
WHERE total_bytes > 0
ORDER BY total_bytes DESC
If you see tables with the suffix _0 or _1, you can delete them. They are
created when ClickHouse is updated with the data from the tables before the
upgrade.
Memory usage
The networks dictionary can use a lot of memory. You can check with these queries:
SELECT name, status, type, formatReadableSize(bytes_allocated)
FROM system.dictionaries
Moreover, ClickHouse is tuned for 32 GB of RAM or more. The ClickHouse documentation has some tips to run with 16 GB or less.
Space usage
To get the space used by ClickHouse, use this query:
SELECT formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
You can get an idea of how much space is used by each table with this query:
SELECT table, formatReadableSize(sum(bytes_on_disk)) AS size, MIN(partition_id) AS oldest
FROM system.parts
WHERE table LIKE 'flow%'
GROUP by table
This query shows how much space is used by each column for the flows
table and how much they are compressed. This can be helpful if you find that this
table uses too much space.
SELECT
database,
table,
column,
type,
sum(rows) AS rows,
sum(column_data_compressed_bytes) AS compressed_bytes,
formatReadableSize(compressed_bytes) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
sum(column_data_uncompressed_bytes) / compressed_bytes AS ratio,
any(compression_codec) AS codec
FROM system.parts_columns AS pc
LEFT JOIN system.columns AS c ON (pc.database = c.database) AND (c.table = pc.table) AND (c.name = pc.column)
WHERE table = 'flows' AND active
GROUP BY
database,
table,
column,
type
ORDER BY
database ASC,
table ASC,
sum(column_data_compressed_bytes) DESC
You can also look at the system tables:
SELECT * EXCEPT size, formatReadableSize(size) AS size FROM (
SELECT database, table, sum(bytes_on_disk) AS size, MIN(partition_id) AS oldest
FROM system.parts
GROUP by database, table
ORDER by size DESC
)
All the system tables with the suffix _0 or _1 are tables from an older version of
ClickHouse. You can drop them by using this SQL query and copying and pasting the
result:
SELECT concat('DROP TABLE IF EXISTS system.', name, ';')
FROM system.tables
WHERE (database = 'system') AND match(name, '_[0-9]+$')
FORMAT TSVRaw
CPU usage
If ClickHouse has high CPU usage, you can find slow queries with:
SELECT formatReadableTimeDelta(query_duration_ms/1000) AS duration, query
FROM system.query_log
WHERE query_kind = 'Select'
ORDER BY query_duration_ms DESC
LIMIT 10
FORMAT Vertical
Also, check for slow inserts:
SELECT formatReadableTimeDelta(query_duration_ms/1000) AS duration, query
FROM system.query_log
WHERE query_kind = 'Insert'
ORDER BY query_duration_ms DESC
LIMIT 10
FORMAT Vertical
Altinity's knowledge base contains other useful queries.
Old tables
Tables that are not used anymore may still exist. Check with SHOW TABLES. You can
drop these tables:
flows_raw_errorsflows_raw_errors_consumer- any
flows_XXXXXXX_raw_errors - any
flows_XXXXXXX_rawandflows_XXXXXXX_raw_consumerwhenXXXXXXXdoes not end withvNwhereNis a number - any
flows_XXXXXvN_rawandflows_XXXXXvN_raw_consumerwhen another table exists with a higherNvalue
These tables do not contain data. If you make a mistake, you can restart the orchestrator to recreate them.
Update the database schema
In version 1.10.0, the primary key of the flows table was changed to improve
performance. This update is not automatically applied to existing installations
because it requires copying data. You can check if your schema needs to be
updated with this SQL command:
SELECT primary_key
FROM system.tables
WHERE (name = 'flows') AND (database = currentDatabase())
If the primary key starts with TimeReceived instead of
toStartOfFiveMinutes(TimeReceived), you are using the old schema. You may get
better performance by switching to the new one.
The idea is to create a new table and transfer the data from the old table, partition by partition. Execute this request and make sure you have enough space to store the largest partition:
SELECT
partition,
formatReadableSize(sum(bytes_on_disk)) AS size,
count() AS count
FROM system.parts
WHERE (database = currentDatabase()) AND (`table` = 'flows') AND active
GROUP BY partition
ORDER BY partition ASC
Important
There is a risk of data loss if something goes wrong. Back up your data if it is important to you. This guide only covers the non-clustered scenario.
Preparation
You need to stop the outlet service to make sure that nothing is writing to
ClickHouse while the migration is in progress. Get the current parameters for
the flows table:
SELECT engine_full
FROM system.tables
WHERE (database = currentDatabase()) AND (`table` = 'flows')
FORMAT TSVRaw
You need to change the ORDER BY directive to replace TimeReceived with
toStartOfFiveMinutes(TimeReceived). You should get something like this:
MergeTree PARTITION BY toYYYYMMDDhhmmss(toStartOfInterval(TimeReceived, toIntervalSecond(25920))) ORDER BY (toStartOfFiveMinutes(TimeReceived), ExporterAddress, InIfName, OutIfName) TTL TimeReceived + toIntervalSecond(1296000) SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
Also, check the current number of flows stored in ClickHouse:
SELECT count(*)
FROM flows
Rename the old table
Rename the current flows table to flows_old:
RENAME TABLE flows TO flows_old
Create the new table
Allow suspicious low cardinality types:
SET allow_suspicious_low_cardinality_types = true
Create the new flows table with the updated ORDER BY directive. After
ENGINE = , copy and paste the engine definition that you prepared earlier:
CREATE TABLE flows AS flows_old
ENGINE =
Create an intermediate table
Create an intermediate table to copy data to. This is needed to avoid duplicating data in the aggregated tables. Use the same engine definition as before:
CREATE TABLE flows_temp AS flows_old
ENGINE =
Generate the migration statements
Use this SQL query to create the migration:
SELECT
concat('insert into flows_temp select * from flows_old where _partition_id = \'', partition_id, '\';\n',
'alter table flows_old drop partition \'', partition_id, '\';\n',
'alter table flows attach partition id \'', partition, '\' from flows_temp;') AS cmd
FROM system.parts
WHERE (database = currentDatabase()) AND (`table` = 'flows_old')
GROUP BY
database,
`table`,
partition_id,
partition
ORDER BY partition_id ASC
FORMAT TSVRaw
Execute the migration statements
You can execute them one by one. You can check that you still have all the flows
after each attach partition directive:
SELECT (
SELECT count(*)
FROM flows
) + (
SELECT count(*)
FROM flows_old
)
Drop the old table
The last step is to remove the empty flows_old table and the intermediate
table:
DROP TABLE flows_old;
DROP TABLE flows_temp;
Then, you can restart the outlet service.
Docker
The default Docker Compose setup is meant to help you get started quickly. However, you can also use it for a production setup.
The .env file selects the Docker Compose files that are assembled for a
complete setup. Look at the comments for guidance. You should avoid modifying
any existing files, except for docker/docker-compose-local.yml, which should
contain your local setup.
This file can override parts of the configuration. The merge rules are a bit complex. The general rule is that scalars are replaced, while lists and mappings are merged. However, there are exceptions.
Tip
Always check if the final configuration matches your expectations with
docker compose config.
You can disable some services by using profiles:
services:
akvorado-inlet:
profiles: [ disabled ]
It is possible to remove a value with the !reset tag:
services:
akvorado-outlet:
environment:
AKVORADO_CFG_OUTLET_METADATA_CACHEPERSISTFILE: !reset null
With Docker Compose v2.24.4 or later, you can override a value:
services:
traefik:
ports: !override
- 127.0.0.1:8080:8080/tcp
- 80:8081/tcp
The docker/docker-compose-local.yml file contains more examples that you can
adapt to your needs. You can also enable TLS by uncommenting the appropriate
section in .env.
Networking
The default setup has both IPv4 and IPv6 enabled, using the NAT setup.
For IPv6 to work correctly, you need either Docker Engine v27 or to set
ip6tables to true in /etc/docker/daemon.json.
If you prefer to keep the default Docker configuration, you can add this snippet
to docker/docker-compose-local.yml:
networks: !reset {}