See: - https://github.com/akvorado/akvorado/issues/1703#issuecomment-2900537048 - https://www.juniper.net/documentation/us/en/software/junos/cli-reference/topics/ref/statement/nexthop-learning-edit-services-flow-monitoring.html
18 KiB
Operations
While Akvorado itself does not require much memory and disk space, both Kafka and ClickHouse have heavier needs. To get started, do not try to run the complete setup with less than 16 GB of RAM (32 GB or more is advised) and with less than 50 GB of disk (100 GB or more is advised). Use at least 8 vCPU.
Router configuration
Each router should be configured to send flows to Akvorado inlet service and accepts SNMP requests. For routers not listed below, have a look at the configuration snippets from Kentik.
It is better to sample on ingress only. This requires to sample on both external and internal interfaces, but this prevents flow to be accounted twice when they enter and exit through external ports.
Exporter Address
The exporter address is set from the field inside the flow message by default,
and used e.g. for SNMP requests. However, if for some reasons the set flow
address (also called agent id) is wrong, you can use the source IP of the flow
packet instead by setting use-src-addr-for-exporter-addr: true for the flow
configuration.
Please note that with this configuration, your deployment must not touch the source IP! This might occur with Docker or Kubernetes networking.
Cisco IOS-XE
Netflow can be enabled with the following configuration:
flow record Akvorado
match ipv4 tos
match ipv4 protocol
match ipv4 source address
match ipv4 destination address
match transport source-port
match transport destination-port
collect routing source as 4-octet
collect routing destination as 4-octet
collect routing next-hop address ipv4
collect transport tcp flags
collect interface output
collect interface input
collect counter bytes
collect counter packets
collect timestamp sys-uptime first
collect timestamp sys-uptime last
!
flow record Akvorado-IPV6
match ipv6 protocol
match ipv6 source address
match ipv6 destination address
match transport source-port
match transport destination-port
collect routing source as 4-octet
collect routing destination as 4-octet
collect routing next-hop address ipv4
collect transport tcp flags
collect interface output
collect interface input
collect counter bytes
collect counter packets
collect timestamp sys-uptime first
collect timestamp sys-uptime last
!
sampler random1in100
mode random 1 out-of 100
!
flow exporter AkvoradoExport
destination <akvorado-ip> vrf monitoring
source Loopback20
transport udp 2055
version 9
option sampler-table timeout 10
!
flow monitor AkvoradoMonitor
exporter AkvoradoExport
cache timeout inactive 10
cache timeout active 60
record Akvorado
!
flow monitor AkvoradoMonitor-IPV6
exporter AkvoradoExport
cache timeout inactive 10
cache timeout active 60
record Akvorado-IPV6
!
To enable Netflow on an interface, use the following snippet:
interface GigabitEthernet0/0/3
ip flow monitor AkvoradoMonitor sampler random1in100 input
ip flow monitor AkvoradoMonitor sampler random1in100 output
ipv6 flow monitor AkvoradoMonitor-IPV6 sampler random1in100 input
ipv6 flow monitor AkvoradoMonitor-IPV6 sampler random1in100 output
!
As per issue #89, the sample
rate is not reported correctly on this platform. The solution is to set a
default sample rate in akvorado.yaml. Check the
documentation for more details.
inlet:
core:
default-sampling-rate: 100
NCS 5500 and ASR 9000
On each router, Netflow can be enabled with the following configuration. It is important to use a power of two for the sampling rate (at least on NCS).
sampler-map sampler1
random 1 out-of 32768
!
flow exporter-map akvorado
version v9
options sampler-table timeout 10
template options timeout 10
!
transport udp 2055
source Loopback20
destination <akvorado-ip> vrf private
!
flow monitor-map monitor1
record ipv4
exporter akvorado
cache entries 100000
cache timeout active 15
cache timeout inactive 2
cache timeout rate-limit 2000
!
flow monitor-map monitor2
record ipv6
exporter akvorado
cache entries 100000
cache timeout active 15
cache timeout inactive 2
cache timeout rate-limit 2000
!
Optionally, AS path can be pushed to the forwarding database and the source and destination AS will be present in Netflow packets:
router bgp <asn>
address-family ipv4 unicast
bgp attribute-download
!
address-family ipv6 unicast
bgp attribute-download
To enable Netflow on an interface, use the following snippet:
interface Bundle-Ether4000
flow ipv4 monitor monitor1 sampler sampler1 ingress
flow ipv6 monitor monitor2 sampler sampler1 ingress
!
Also check the troubleshooting section on how to scale Netflow on the NCS 5500.
Then, SNMP needs to be enabled:
snmp-server community <community> RO IPv4
snmp-server ifindex persist
control-plane
management-plane
inband
interface all
allow SNMP peer
address ipv4 <akvorado-ip>
To configure BMP, adapt the following snippet:
bmp server 1
host <akvorado-ip> port 10179
flapping-delay 60
bmp server all
route-monitoring policy post inbound
router bgp 65400
vrf public
neighbor 192.0.2.100
bmp-activate server 1
Juniper
Netflow
For MX and SRX devices, you can use Netflow v9 to export flows.
groups {
sampling {
interfaces {
<*> {
unit <*> {
family inet {
sampling {
input;
}
}
family inet6 {
sampling {
input;
}
}
}
}
}
}
}
forwarding-options {
sampling {
instance {
sample-ins {
input {
rate 1024;
max-packets-per-second 65535;
}
family inet {
output {
flow-server 192.0.2.1 {
port 2055;
autonomous-system-type origin;
source-address 203.0.113.2;
version9 {
template {
ipv4;
}
}
}
inline-jflow {
source-address 203.0.113.2;
}
}
}
family inet6 {
output {
flow-server 192.0.2.1 {
port 2055;
autonomous-system-type origin;
source-address 203.0.113.2;
version9 {
template {
ipv6;
}
}
}
inline-jflow {
source-address 203.0.113.2;
}
}
}
}
}
}
}
chassis {
fpc 0 {
sampling-instance sample-ins;
inline-services {
flex-flow-sizing;
}
}
}
services {
flow-monitoring {
version9 {
template ipv4 {
nexthop-learning enable;
flow-active-timeout 10;
flow-inactive-timeout 10;
template-refresh-rate {
packets 30;
seconds 30;
}
option-refresh-rate {
packets 30;
seconds 30;
}
ipv4-template;
}
template ipv6 {
nexthop-learning enable;
flow-active-timeout 10;
flow-inactive-timeout 10;
template-refresh-rate {
packets 30;
seconds 30;
}
option-refresh-rate {
packets 30;
seconds 30;
}
ipv6-template;
}
}
}
}
Then, for each interface you want to enable IPFIX on, use:
interfaces {
xe-0/0/0.0 {
description "Transit: Cogent AS179 [3-10109101]";
apply-groups [ sampling ];
}
}
If inet.0 is not enough to join Akvorado, you need to add a specific route:
routing-options {
static {
route 192.0.2.1/32 next-table internet.inet.0;
}
}
Another option would be IPFIX (replace version9 by version-ipfix).
However, Juniper includes only total counters for bytes and packets
rather than using delta counters. Akvorado does not support such
counters.
sFlow
For QFX devices, you can use sFlow.
protocols {
sflow {
agent-id 203.0.113.4;
polling-interval 5;
sample-rate ingress 8192;
source-ip 203.0.113.4;
collector 192.0.2.1 {
udp-port 6343;
}
interfaces et-0/0/13.0;
}
}
SNMP
Then, configure SNMP:
snmp {
location "Equinix PA1, FR";
community blipblop {
authorization read-only;
routing-instance internet;
}
routing-instance-access;
}
BMP
If needed, you can configure BMP on one router to send all AdjRIB-in to Akvorado.
routing-options {
bmp {
connection-mode active;
station-address 203.0.113.1;
station-port 10179;
station collector;
hold-down 30 flaps 10 period 30;
route-monitoring post-policy;
monitor enable;
}
}
See Juniper's documentation for more details.
Arista
sFlow
For Arista devices, you can use sFlow.
sflow sample 1024
sflow sample output subinterface
sflow sample input subinterface
sflow vrf VRF-MANAGEMENT destination 192.0.2.1
sflow vrf VRF-MANAGEMENT source-interface Management1
sflow interface egress enable default
sflow run
SNMP
Then, configure SNMP:
snmp-server community <community> ro
snmp-server vrf VRF-MANAGEMENT
Nokia SROS
Model-driven command line interface (MD-CLI) syntax is used below. The full-context is provided as this is probably easier to adapt to classic CLI.
Flows
sFlow is currently merely supported on devices running SROS, one mostly has to stick to IPFIX
/configure cflowd admin-state enable
/configure cflowd cache-size 250000
/configure cflowd template-retransmit 60
/configure cflowd active-flow-timeout 15
/configure cflowd inactive-flow-timeout 15
/configure cflowd sample-profile 1 sample-rate 2000
/configure cflowd collector 192.0.2.1 port 2055 admin-state enable
/configure cflowd collector 192.0.2.1 port 2055 description "akvorado.example.net"
/configure cflowd collector 192.0.2.1 port 2055 router-instance "Base"
/configure cflowd collector 192.0.2.1 port 2055 version 10
Either configure sampling on the individual interfaces
/configure service ies "internet" interface "if1/1/c1/1:0" cflowd-parameters sampling unicast type interface
/configure service ies "internet" interface "if1/1/c1/1:0" cflowd-parameters sampling unicast direction ingress-only
/configure service ies "internet" interface "if1/1/c1/1:0" cflowd-parameters sampling unicast sample-profile 1
or add it to apply groups which are probably already in place
/configure groups group "peering" service ies "internet" interface "<i.*>" cflowd-parameters sampling unicast type interface
/configure groups group "peering" service ies "internet" interface "<i.*>" cflowd-parameters sampling unicast direction ingress-only
/configure groups group "peering" service ies "internet" interface "<i.*>" cflowd-parameters sampling unicast sample-profile 1
/configure service ies "internet" interface "if1/1/c1/1:0" apply-groups ["peering"]
SNMP
Nokia routers running SROS use a different interface index in their flow records
as the SNMP interface index usually used by other devices. To fix this issue,
you need to use cflowd use-vrtr-if-index. More information can be found in
Nokia's
documentation
GNMI
Instead of SNMP GNMI can be used. The interface index challenge (see SNMP above) also applies. See this discussion for further details and possible workarounds.
Unencrypted connections are used in this example (TLS encyption is out of scope here), do not use in production (or at least ensure the user has RO only permissions)
/configure system grpc admin-state enable
/configure system grpc allow-unsecure-connection
Akvorado only needs Read-Only access
/configure system security user-params local-user user "akvorado" access grpc true
/configure system security user-params local-user user "akvorado" console member ["grpc_ro"]
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnmi-get permit
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnmi-set deny
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnmi-subscribe permit
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnoi-file-get deny
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnoi-file-transfertoremote deny
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnoi-file-put deny
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnoi-file-stat deny
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization gnoi-file-remove deny
/configure system security aaa local-profiles profile "grpc_ro" grpc rpc-authorization md-cli-session deny
BMP
/configure bmp admin-state enable
/configure bmp station "akvorado" admin-state enable
/configure bmp station "akvorado" description "akvorado.example.net"
/configure bmp station "akvorado" stats-report-interval 300
/configure bmp station "akvorado" connection local-address 192.0.2.42
/configure bmp station "akvorado" connection station-address ip-address 192.0.2.1
/configure bmp station "akvorado" connection station-address port 10179
/configure bmp station "akvorado" family ipv4 true
/configure bmp station "akvorado" family ipv6 true
/configure router "Base" bgp monitor admin-state enable
/configure router "Base" bgp monitor route-monitoring post-policy true
/configure router "Base" bgp monitor station "akvorado" { }
GNU/Linux
pmacctd
Configure pmacctd with sFlow receiver:
/etc/pmacctd/config.conf: |
daemonize: false
plugins: sfprobe[any]
sfprobe_receiver: akvorado-inlet-receiver-replace-me:6343
aggregate: src_host,dst_host,in_iface,out_iface,src_port,dst_port,proto
pcap_ifindex: map
pcap_interfaces_map: /etc/pmacctd/interfaces.map
pcap_interface_wait: true
sfprobe_agentsubid: 1402
sampling_rate: 1000
snaplen: 128
/etc/pmacctd/interfaces.map: |
ifindex=1 ifname=lo direction=in
ifindex=1 ifname=lo direction=out
ifindex=3 ifname=eth0 direction=in
ifindex=3 ifname=eth0 direction=out
ifindex=4 ifname=eth1 direction=in
ifindex=4 ifname=eth1 direction=out
Here we set the interface indexes manually entirely based on the interface
names and completely ignoring the kernel ifIndex for the flows. pmacctd can
be run inside containers where SNMPd does not return description for the
interfaces, which is a required field for the flow. With this setup, you can
make use of the static metadata provider to match the exporter and accept the
flow for further classification.
Kafka
When using docker compose, there is a Kafka UI running at
http://127.0.0.1:8080/kafka-ui/. It provides various operational
metrics you can check, notably the space used by each topic.
ClickHouse
While ClickHouse works pretty good out-of-the-box, it is still encouraged to read its documentation. Altinity also provides a knowledge base with various other tips.
System tables
ClickHouse is configured to log various events into MergeTree tables. By default, these tables are unbounded. Unless configured otherwise, the orchestrator sets a TTL of 30 days. These tables can also be customized in the configuration files or disabled completly. See ClickHouse documentation for more details.
The following request is useful to see how much space is used for each table:
SELECT database, name, formatReadableSize(total_bytes)
FROM system.tables
WHERE total_bytes > 0
ORDER BY total_bytes DESC
If you see tables suffixed by _0 or _1, they can be deleted: they are
created when ClickHouse is updated with the data from the tables before the
upgrade.
Memory usage
The networks dictionary can take a bit of memory. You can check with the following queries:
SELECT name, status, type, formatReadableSize(bytes_allocated)
FROM system.dictionaries
Space usage
You can get an idea on how much space is used by each table with the following query:
SELECT table, formatReadableSize(sum(bytes_on_disk)) AS size, MIN(partition_id) AS oldest
FROM system.parts
WHERE table LIKE 'flow%'
GROUP by table
The following query shows how much space is eaten by each column for the flows
table and how much they are compressed. This can be helpful if you find too much
space is used by this table.
SELECT
database,
table,
column,
type,
sum(rows) AS rows,
sum(column_data_compressed_bytes) AS compressed_bytes,
formatReadableSize(compressed_bytes) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
sum(column_data_uncompressed_bytes) / compressed_bytes AS ratio,
any(compression_codec) AS codec
FROM system.parts_columns AS pc
LEFT JOIN system.columns AS c ON (pc.database = c.database) AND (c.table = pc.table) AND (c.name = pc.column)
WHERE table = 'flows' AND active
GROUP BY
database,
table,
column,
type
ORDER BY
database ASC,
table ASC,
sum(column_data_compressed_bytes) DESC
Slow queries
You can extract slow queries with:
SELECT formatReadableTimeDelta(query_duration_ms/1000) AS duration, query
FROM system.query_log
WHERE query_kind = 'Select'
ORDER BY query_duration_ms DESC
LIMIT 10
FORMAT Vertical
Altinity's knowledge base contains some other useful queries.