cmd/dedupe: make largest directory primary to minimize data moved (#3648)

This change makes dedupe recursively count elements in same-named directories
and make the largest one primary. This allows to minimize the amount of data
moved (or at least the amount of API calls) when dedupe merges them.
It also adds a new fs.Object interface `ParentIDer` with function `ParentID` and
implements it for the drive and opendrive backends. This function returns
parent directory ID for objects on filesystems that allow same-named dirs.
We use it to correctly count sizes of same-named directories.

Fixes #2568

Co-authored-by: Ivan Andreev <ivandeex@gmail.com>
This commit is contained in:
Saksham Khanna
2021-03-11 23:10:29 +05:30
committed by GitHub
parent 6a9ae32012
commit 4d8ef7bca7
5 changed files with 187 additions and 67 deletions

View File

@@ -396,6 +396,12 @@ type IDer interface {
ID() string
}
// ParentIDer is an optional interface for Object
type ParentIDer interface {
// ParentID returns the ID of the parent directory if known or nil if not
ParentID() string
}
// ObjectUnWrapper is an optional interface for Object
type ObjectUnWrapper interface {
// UnWrap returns the Object that this Object is wrapping or