sync: implement --list-cutoff to allow on disk sorting for reduced memory use

Before this change, rclone had to load an entire directory into RAM in
order to sort it so it could be synced.

With directories with millions of entries, this used too much memory.

This fixes the probem by using an on disk sort when there are more
than --list-cutoff entries in a directory.

Fixes #7974
This commit is contained in:
Nick Craig-Wood
2024-12-09 11:30:34 +00:00
parent 0148bd4668
commit 385465bfa9
9 changed files with 493 additions and 18 deletions

View File

@@ -233,12 +233,18 @@ value, say `export GOGC=20`. This will make the garbage collector
work harder, reducing memory size at the expense of CPU usage.
The most common cause of rclone using lots of memory is a single
directory with millions of files in. Rclone has to load this entirely
into memory as rclone objects. Each rclone object takes 0.5k-1k of
memory. There is
directory with millions of files in.
Before rclone v1.70 has to load this entirely into memory as rclone
objects. Each rclone object takes 0.5k-1k of memory. There is
[a workaround for this](https://github.com/rclone/rclone/wiki/Big-syncs-with-millions-of-files)
which involves a bit of scripting.
However with rclone v1.70 and later rclone will automatically save
directory entries to disk when a directory with more than
[`--list-cutoff`](/docs/#list-cutoff) (1,000,000 by default) entries
is detected.
From v1.70 rclone also has the [--max-buffer-memory](/docs/#max-buffer-memory)
flag which helps particularly when multi-thread transfers are using
too much memory.