Notably when the main table is required, but also on rare conditions
when another table would be selected because of the interval selection.
This is not perfect as sometimes, we won't have the data.
Fix#605
All MergeTree tables are now replicated.
For some tables, a `_local` variant is added and the non-`_local`
variant is now distributed. The distributed tables are the `flows`
table, the `flows_DDDD` tables (where `DDDD` is a duration), as well as
the `flows_raw_errors` table. The `exporters` table is not distributed
and stays local.
The data is following this schema:
- data is coming from `flows_HHHH_raw` table, using the Kafka engine
- the `flows_HHHH_raw_consumer` reads data from `flows_HHHH_raw` (local)
and sends it to `flows` (distributed) when there is no error
- the `flows_raw_errors_consumer` reads data from
`flows_HHHH_raw` (local) and sends it to
`flows_raw_errors` (distributed)
- the `flows_DDDD_consumer` reads fata from `flows_local` (local) and
sends it to `flow_DDDD_local` (local)
- the `exporters_consumer` reads data from `flows` (distributed) and
sends it to `exporters` (local)
The reason for `flows_HHHH_raw_consumer` to send data to the distributed
`flows` table, and not the local one is to ensure flows are
balanced (for example, if there is not enough Kafka partitions). But
sending it to `flows_local` would have been possible.
On the other hand, it is important for `flows_DDDD_consumer` to read
from local to avoid duplication. It could have sent to distributed, but
the data is now balanced correctly and we just send it to local instead
for better performance.
The `exporters_consumer` is allowed to read from the distributed `flows`
table because it writes the result to the local `exporters` table.
When requesting a too small resolution for data that are too far in the
past, prefer to select a table with data than a table without.
Previously, the resolution was a hard requirement.
This displays a line for the previous period on stacked graphs.
Previous period depends on the current period. It could be hour, day,
week, month, or year.
This is needed if we want to be able to mix use of several tables
inside a single query (for example, flows_1m0s for a part of the query
and flows_5m0s for another part to overlay historical data).
Also, the way we handle time buckets is now cleaner. The previous way
had two stages of rounding and was incorrect. We were discarding the
first and last value for this reason. The new way only has one stage
of rounding and is correct. It tries hard to align the buckets at the
specified start time. We don't need to discard these values anymore.
We still discard the last one because it could be incomplete (when end
is "now").