orchestrator/clickhouse: rework migrations to use an abstract schema

We introduce an leaky abstraction for flows schema and use it for
migrations as a first step.

For views and dictionaries, we stop relying on a hash to know if they
need to be recreated, but we compare the select statements with our
target statement. This is a bit fragile, but strictly better than the
hash.

For data tables, we add the missing columns.

We give up on the abstraction of a migration step and just rely on
helper functions to get the same result. The migration code is now
shorter and we don't need to update it when adding new columns.

This is a preparatory work for #211 to allow a user to specify
additional fields to collect.
This commit is contained in:
Vincent Bernat
2023-01-02 23:35:01 +01:00
parent 86810beb6e
commit 7d1ba478a1
Notes: Vincent Bernat 2023-01-02 23:51:06 +01:00
Hashing was not as fragile as we were only hashing column names, types
and positions, so it is unknown if the new way is strictly better.

The next steps are to use the schema abstraction in other places where
the schema is hard coded: column names for
console (`console/query_consts.go`), protobuf file and flow decoding.
13 changed files with 929 additions and 977 deletions

36
common/schema/root.go Normal file
View File

@@ -0,0 +1,36 @@
// SPDX-FileCopyrightText: 2022 Free Mobile
// SPDX-License-Identifier: AGPL-3.0-only
// Package schema is an abstraction of the data schema used by Akvorado. It is a
// leaky abstraction as there are multiple parts dependant of the subsystem that
// will use it.
package schema
// Schema is the data schema.
type Schema struct {
Columns []Column
// For ClickHouse. This is the set of primary keys (order is important and
// may not follow column order).
PrimaryKeys []string
}
// Column represents a column of data.
type Column struct {
Name string
MainOnly bool
// For ClickHouse. `NotSortingKey' is for columns generated from other
// columns. It is only useful if not MainOnly and not Alias. `GenerateFrom'
// is for a column that's generated from an SQL expression instead of being
// retrieved from the protobuf. `TransformFrom' and `TransformTo' work in
// pairs. The first one is the set of column in the raw table while the
// second one is how to transform it for the main table.
Type string
Codec string
Alias string
NotSortingKey bool
GenerateFrom string
TransformFrom []Column
TransformTo string
}