Files
akvorado/console/data/docs/04-operations.md
2022-12-07 08:10:06 +01:00

9.7 KiB

Operations

Router configuration

Each router should be configured to send flows to Akvorado inlet service and accepts SNMP requests. For routers not listed below, have a look at the configuration snippets from Kentik.

Cisco IOS-XE

Netflow can be enabled with the following configuration:

flow record Akvorado
    match ipv4 tos
    match ipv4 protocol
    match ipv4 source address
    match ipv4 destination address
    match transport source-port
    match transport destination-port
    collect routing source as 4-octet
    collect routing destination as 4-octet
    collect routing next-hop address ipv4
    collect transport tcp flags
    collect interface output
    collect interface input
    collect counter bytes
    collect counter packets
    collect timestamp sys-uptime first
    collect timestamp sys-uptime last
!
sampler random1in100
    mode random 1 out-of 100
!
flow exporter AkvoradoExport
    destination <akvorado-ip> vrf monitoring
    source Loopback20
    transport udp 2055
    version 9
    option sampler-table timeout 10
!
flow monitor AkvoradoMonitor
    exporter AkvoradoExport
    cache timeout inactive 10
    cache timeout active 60
    record Akvorado
! 

To enable Netflow on an interface, use the following snippet:

interface GigabitEthernet0/0/3
 ip flow monitor AkvoradoMonitor sampler random1in100 input
 ip flow monitor AkvoradoMonitor sampler random1in100 output
!

As per issue #89, the sample rate is not reported correctly on this platform. The solution is to set a default sample rate in akvorado.yaml. Check the documentation for more details.

inlet:
  core:
    default-sampling-rate: 100

NCS 5500 and ASR 9000

On each router, Netflow can be enabled with the following configuration:

sampler-map sampler1
 random 1 out-of 30000
!
flow exporter-map akvorado
 version v9
  options sampler-table timeout 10
  template options timeout 10
 !
 transport udp 2055
 source Loopback20
 destination <akvorado-ip> vrf private
!
flow monitor-map monitor1
 record ipv4
 exporter akvorado
 cache entries 100000
 cache timeout active 15
 cache timeout inactive 2
 cache timeout rate-limit 2000
!
flow monitor-map monitor2
 record ipv6
 exporter akvorado
 cache entries 100000
 cache timeout active 15
 cache timeout inactive 2
 cache timeout rate-limit 2000
!

Optionally, AS path can be pushed to the forwarding database and the source and destination AS will be present in Netflow packets:

router bgp <asn>
 address-family ipv4 unicast
  bgp attribute-download
!
 address-family ipv6 unicast
  bgp attribute-download

To enable Netflow on an interface, use the following snippet:

interface Bundle-Ether4000
 flow ipv4 monitor monitor1 sampler sampler1 ingress
 flow ipv6 monitor monitor2 sampler sampler1 ingress
!

Also check the troubleshooting section on how to scale Netflow on the NCS 5500.

Also, SNMP needs to be enabled:

snmp-server community <community> RO IPv4
snmp-server ifindex persist
control-plane
 management-plane
  inband
   interface all
    allow SNMP peer
     address ipv4 <akvorado-ip>

To configure BMP, adapt the following snippet:

bmp server 1
 host <akvorado-ip> port 10179
 flapping-delay 60
bmp server all
 route-monitoring policy post inbound
router bgp 65400
 vrf public
  neighbor 192.0.2.100
   bmp-activate server 1

Juniper

Netflow

For MX and SRX devices, you can use Netflow v9 to export flows.

groups {
  sampling {
    interfaces {
      <*> {
        unit <*> {
          family inet {
            sampling {
              input;
            }
          }
          family inet6 {
            sampling {
              input;
            }
          }
        }
      }
    }
  }
}
forwarding-options {
  sampling {
    instance {
      sample-ins {
        input {
          rate 1024;
          max-packets-per-second 65535;
        }
        family inet {
          output {
            flow-server 192.0.2.1 {
              port 2055;
              autonomous-system-type origin;
              source-address 203.0.113.2;
              version9 {
                template {
                  ipv4;
                }
              }
            }
            inline-jflow {
              source-address 203.0.113.2;
            }
          }
        }
        family inet6 {
          output {
            flow-server 192.0.2.1 {
              port 2055;
              autonomous-system-type origin;
              source-address 203.0.113.2;
              version9 {
                template {
                  ipv6;
                }
              }
            }
            inline-jflow {
              source-address 203.0.113.2;
            }
          }
        }
      }
    }
  }
}
chassis {
  fpc 0 {
    sampling-instance sample-ins;
    inline-services {
      flex-flow-sizing;
    }
  }
}
services {
  flow-monitoring {
    version9 {
      template ipv4 {
        flow-active-timeout 10;
        flow-inactive-timeout 10;
        template-refresh-rate {
          packets 30;
          seconds 30;
        }
        option-refresh-rate {
          packets 30;
          seconds 30;
        }
        ipv4-template;
      }
      template ipv6 {
        flow-active-timeout 10;
        flow-inactive-timeout 10;
        template-refresh-rate {
          packets 30;
          seconds 30;
        }
        option-refresh-rate {
          packets 30;
          seconds 30;
        }
        ipv6-template;
      }
    }
  }
}

Then, for each interface you want to enable IPFIX on, use:

interfaces {
  xe-0/0/0.0 {
    description "Transit: Cogent AS179 [3-10109101]";
    apply-groups [ sampling ];
  }
}

If inet.0 is not enough to join Akvorado, you need to add a specific route:

routing-options {
  static {
    route 192.0.2.1/32 next-table internet.inet.0;
  }
}

Another option would be IPFIX (replace version9 by version-ipfix). However, Juniper includes only total counters for bytes and packets rather than using delta counters. Akvorado does not support such counters.

sFlow

For QFX devices, you can use sFlow.

protocols {
    sflow {
        agent-id 203.0.113.4;
        polling-interval 5;
        sample-rate ingress 8192;
        source-ip 203.0.113.4;
        collector 192.0.2.1 {
            udp-port 6343;
        }
        interfaces et-0/0/13.0;
    }
}

SNMP

Then, configure SNMP:

snmp {
  location "Equinix PA1, FR";
  community blipblop {
    authorization read-only;
    routing-instance internet;
  }
  routing-instance-access;
}

BMP

If needed, you can configure BMP on one router to send all AdjRIB-in to Akvorado.

routing-options {
    bmp {
        connection-mode active;
        station-address 203.0.113.1;
        station-port 10179;
        station collector;
        hold-down 30 flaps 10 period 30;
        route-monitoring post-policy;
        monitor enable;
    }
}

See Juniper's documentation for more details.

Arista

sFlow

For Arista devices, you can use sFlow.

sflow sample 1024
sflow vrf VRF-MANAGEMENT destination 192.0.2.1
sflow vrf VRF-MANAGEMENT source-interface Management1
sflow interface egress enable default
sflow run

SNMP

Then, configure SNMP:

snmp-server community <community> ro
snmp-server vrf VRF-MANAGEMENT

Kafka

When using docker-compose, there is a Kafka UI running at http://127.0.0.1:8080/kafka-ui/. It provides various operational metrics you can check, notably the space used by each topic.

ClickHouse

While ClickHouse works pretty good out-of-the-box, it is still encouraged to read its documentation. Altinity also provides a knowledge base with various other tips.

System tables

ClickHouse is configured to log various events into MergeTree tables. By default, these tables are unbounded. You should set a TTL to avoid them to grow indefinitely:

ALTER TABLE system.trace_log MODIFY TTL event_date + INTERVAL 30 day DELETE ;
ALTER TABLE system.query_thread_log MODIFY TTL event_date + INTERVAL 30 day DELETE ;
ALTER TABLE system.query_log MODIFY TTL event_date + INTERVAL 30 day DELETE ;
ALTER TABLE system.asynchronous_metric_log MODIFY TTL event_date + INTERVAL 30 day DELETE ;
ALTER TABLE system.metric_log MODIFY TTL event_date + INTERVAL 30 day DELETE ;
ALTER TABLE system.part_log MODIFY TTL event_date + INTERVAL 30 day DELETE ;
ALTER TABLE system.session_log MODIFY TTL event_date + INTERVAL 30 day DELETE

These tables can also be customized in the configuration files or disabled completly. See ClickHouse documentation for more details.

The following request is useful to see how much space is used for each table:

SELECT database, name, formatReadableSize(total_bytes)
FROM system.tables
WHERE total_bytes > 0

Space usage

You can get an idea on how much space is used by each table with the following query:

SELECT table, formatReadableSize(sum(bytes_on_disk)) AS size, MIN(partition_id) AS oldest
FROM system.parts
WHERE table LIKE 'flow%'
GROUP by table

Slow queries

You can extract slow queries with:

SELECT formatReadableTimeDelta(query_duration_ms/1000) AS duration, query
FROM system.query_log
WHERE query_kind = 'Select'
ORDER BY query_duration_ms DESC
LIMIT 10
FORMAT Vertical

Altinity's knowledge base contains some other useful queries.