10 KiB
Internal design
Akvorado is written in Go. Each service has its code in a distinct
directory (inlet/, orchestrator/ and console/). The common/
directory contains components common to several services. The cmd/
directory contains the main entry points.
Each service is splitted into several components. This is heavily inspired by the Component framework in Clojure. A component is a piece of software with its configuration, its state and its dependencies on other components.
Each component features the following piece of code:
- A
Componentstructure containing its state. - A
Configurationstructure containing the configuration of the component. It maps to a section of Akvorado configuration file. - A
DefaultConfigurationfunction with the default values for the configuration. - A
New()function instantiating the component. This method takes the configuration and the dependencies. It is inert. - Optionally, a
Start()method to start the routines associated to the component. - Optionally, a
Stop()method to stop the component.
Each component is tested independently. If a component is complex, a
NewMock() function can create a component with a compatible
interface to be used in place of the real component. In this case, it
takes a testing.T struct as first argument and starts the component
immediately. It could return the real component or a mocked version.
For example, the Kafka component returns a component using a mocked
Kafka producer.
Dependencies are handled manually, unlike more complex component-based solutions like Uber Fx.
Reporter
The reporter is a special component handling logs and metrics for all the other components. In the future, this could also be the place to handle crash reports.
For logs, it is mostly a façade to github.com/rs/zerolog with some additional code to append the module name to the logs.
For metrics, it is a façade to the Prometheus instrumentation library. It provides a registry which automatically append metric names with the module name.
It also exposes a simple way to report healthchecks from various
components. While it could be used to kill the application
proactively, currently, it is only exposed through HTTP. Not all
components have healthchecks. For example, for the flow component,
it is difficult to read from UDP while watching for a check. For the
http component, the healthcheck would be too trivial (not in the
routine handling the heavy work). For kafka, the hard work is hidden
by the underlying library and we wouldn't want to be declared
unhealthy because of a transient problem by checking broker states
manually. The daemon component tracks the important goroutines, so it
is not vital.
The general idea is to give a good visibility to an operator. Everything that moves should get a counter, errors should either be fatal, or rate-limited and accounted into a metric.
CLI
The CLI (not a component) is handled by Cobra. The configuration file is handled by mapstructure.
Flow decoding
Decoding is handled by GoFlow2. The network code to receive flows is heavily inspired but was not reused. While logging is often abstracted, this is not the case for metrics. Moreover, the design to scale is a bit different as Akvorado will create a socket for each worker instead of distributing incoming flows using message passing.
Only Netflow v9 and IPFIX are currently handled. However, as GoFlow2 also supports sFlow, support can be added later.
The design of this component is modular as it is possible to "plug" new decoders and new inputs easily. It is expected that most buffering to be done at this level by input modules that need them. However, some buffering also happens in the Kafka module. When the input is the network, this does not really matter as we cannot really block without losing messages. But with file-backed modules, it may be more reliable to not have buffers elsewhere as they can be lost during shutdown.
GeoIP
The component is mostly boring, with the exception of having a goroutine watching for the modification of the databases to update them.
Kafka
The Kafka component relies on
Sarama. It is tested using the
mock interface provided by this package. Sarama uses go-metrics to
store metrics. We convert them to Prometheus to keep them.
If a real broker is available under the DNS name kafka or at
localhost on port 9092, it will be used for a quick functional test.
ClickHouse
Migrations are done with a simple loop checking if a step is needed using a custom query and executing it with Go code. Database migration systems exist in Go, notably migrate, but as the tables we need to create depend on user configuration, it is more flexible to use code to check if the existing tables are up-to-date and to update them. For example, we may want to check if the Kafka settings of a table or the source URL of a dictionary are current.
Functional tests are run when a ClickHouse server is available under
the name clickhouse or on localhost.
SNMP
SNMP polling is done with GoSNMP. The cache layer is tailored specifically for our needs. Information contained in it expires if not accessed and is refreshed periodically otherwise. Some coaelescing of the requests are done when they are piling up. This adds some code complexity, maybe it was not worth it. If a exporter fails to answer too frequently, it will be blacklisted for a minute just to ensure it does not eat up all the workers' capacity.
Testing is done by another implementation of an SNMP agent.
Web console
The web console is built as a REST API with a single page application on top of it.
REST API
The REST API is mostly built using the Gin framework which removes some boilerplate compared to using pure Go. Also, it uses the validator package which implements value validations based on tags. The validation options are quite rich.
Single page application
The SPA is built using mostly the following components:
- Yarn as a package manager,
- Vite as a builder,
- Vue as the reactive JavaScript framework,
- TailwindCSS for styling pages directly inside HTML,
- Headless UI for some unstyled UI components,
- ECharts to plot charts.
- CodeMirror to edit filter expressions.
There is no full-blown component library despite the existence of many candidates:
- Vuetify is only compatible with Vue 2.
- BootstrapVue is only compatible with Vue 2.
- PrimeVue is quite heavyweight and many stuff are not opensource.
- VueTailwind would be the perfect match but it is not compatible with Vue 2.
- Naive UI may be a future option but the styling is not using TailwindCSS which is annoying for responsive stuff, but we can just stay away from the proposed layout.
So, currently, components are mostly taken from Flowbite, copy/pasted or from Headless UI and styled like Flowbite.
Use of TailwindCSS is also a strong choice. Their documentation explains this choice. It makes sense but this is sometimes a burden. Many components are scattered around the web and when there is no need for JS, it is just a matter of copy/pasting and customizing.
Other components
The core component is the main one. It takes the other as dependencies but there is nothing exciting about it.
The HTTP component exposes a web server. Its main role is to manage the lifecycle of the HTTP server and to provide a method to add handlers. The web component provides the web interface of Akvorado. Currently, this is only the documentation. Other components may expose some various endpoints. They are documented in the usage section.
The daemon component handles the lifecycle of the whole application. It watches for the various goroutines (through tombs, see below) spawned by the other components and wait for signals to terminate. If Akvorado had a systemd integration, it would take place here too.
Other interesting dependencies
- gopkg.in/tomb.v2 handles clean goroutine tracking and termination. Like contexts, it allows to signal termination of a bunch of goroutines. Unlike contexts, it also enables us to catch errors in goroutines and react to them (most of the time by dying).
- github.com/benbjohnson/clock is
used in place of the
timemodule when we want to be able to mock the clock. This is used for example to test the cache of the SNMP poller. - github.com/cenkalti/backoff/v4 provides an exponential backoff algorithm for retries.
- github.com/eapache/go-resiliency implements several resiliency pattersn, including the breaker pattern.
- github.com/go-playground/validator implements struct validation using tags. We use it to had better validation on configuration structures.
Future plans
In the future, we may:
- Buffer message to disks instead of blocking (when sending to Kafka) or dropping (when querying the SNMP poller). We could probable just have a system service running tcpdump dumping packets to a directory and use that as input. This would be allow Akvorado to block from end-to-end instead of trying to be realtime.
- Collect routes by integrating GoBGP. This is low priority if we consider information from Maxmind good enough for our use. However, this would also allows us to get AS paths.
- DDoS service to detect and mitigate DDoS (with Flowspec).
- Dynamic configuration with something like go-archaius or Harvester.