fs: implement optional Metadata interface for Objects #111

This implements integration tests for the feature also.
This commit is contained in:
Nick Craig-Wood
2022-05-24 11:00:00 +01:00
parent 461d041c4d
commit 6a0e021dac
8 changed files with 363 additions and 54 deletions

View File

@@ -425,6 +425,126 @@ This can be used when scripting to make aged backups efficiently, e.g.
rclone sync -i remote:current-backup remote:previous-backup
rclone sync -i /path/to/files remote:current-backup
## Metadata support {#metadata}
Metadata is data about a file which isn't the contents of the file.
Normally rclone only preserves the modification time and the content
(MIME) type where possible.
Rclone supports preserving all the available metadata on files (not
directories) when using the `--metadata` or `-M` flag.
Exactly what metadata is supported and what that support means depends
on the backend. Backends that support metadata have a metadata section
in their docs and are listed in the [features table](/overview/#features)
(Eg [local](/local/#metadata), [s3](/s3/#metadata))
Rclone only supports a one-time sync of metadata. This means that
metadata will be synced from the source object to the destination
object only when the source object has changed and needs to be
re-uploaded. If the metadata subsequently changes on the source object
without changing the object itself then it won't be synced to the
destination object. This is in line with the way rclone syncs
`Content-Type` without the `--metadata` flag.
Using `--metadata` when syncing from local to local will preserve file
attributes such as file mode, owner, extended attributes (not
Windows).
Note that arbitrary metadata may be added to objects using the
`--upload-metadata key=value` flag when the object is first uploaded.
This flag can be repeated as many times as necessary.
### Types of metadata
Metadata is divided into two type. System metadata and User metadata.
Metadata which the backend uses itself is called system metadata. For
example on the local backend the system metadata `uid` will store the
user ID of the file when used on a unix based platform.
Arbitrary metadata is called user metadata and this can be set however
is desired.
When objects are copied from backend to backend, they will attempt to
interpret system metadata if it is supplied. Metadata may change from
being user metadata to system metadata as objects are copied between
different backends. For example copying an object from s3 sets the
`content-type` metadata. In a backend which understands this (like
`azureblob`) this will become the Content-Type of the object. In a
backend which doesn't understand this (like the `local` backend) this
will become user metadata. However should the local object be copied
back to s3, the Content-Type will be set correctly.
### Metadata framework
Rclone implements a metadata framework which can read metadata from an
object and write it to the object when (and only when) it is being
uploaded.
This metadata is stored as a dictionary with string keys and string
values.
There are some limits on the names of the keys (these may be clarified
further in the future).
- must be lower case
- may be `a-z` `0-9` containing `.` `-` or `_`
- length is backend dependent
Each backend can provide system metadata that it understands. Some
backends can also store arbitrary user metadata.
Where possible the key names are standardized, so, for example, it is
possible to copy object metadata from s3 to azureblob for example and
metadata will be translated apropriately.
Some backends have limits on the size of the metadata and rclone will
give errors on upload if they are exceeded.
### Metadata preservation
The goal of the implementation is to
1. Preserve metadata if at all possible
2. Interpret metadata if at all possible
The consequences of 1 is that you can copy an S3 object to a local
disk then back to S3 losslessly. Likewise you can copy a local file
with file attributes and xattrs from local disk to s3 and back again
losslessly.
The consequence of 2 is that you can copy an S3 object with metadata
to Azureblob (say) and have the metadata appear on the Azureblob
object also.
### Standard system metadata
Here is a table of standard system metadata which, if appropriate, a
backend may implement.
| key | description | example |
|---------------------|-------------|---------|
| mode | File type and mode: octal, unix style | 0100664 |
| uid | User ID of owner: decimal number | 500 |
| gid | Group ID of owner: decimal number | 500 |
| rdev | Device ID (if special file) => hexadecimal | 0 |
| atime | Time of last access: RFC 3339 | 2006-01-02T15:04:05.999999999Z07:00 |
| mtime | Time of last modification: RFC 3339 | 2006-01-02T15:04:05.999999999Z07:00 |
| btime | Time of file creation (birth): RFC 3339 | 2006-01-02T15:04:05.999999999Z07:00 |
| cache-control | Cache-Control header | no-cache |
| content-disposition | Content-Disposition header | inline |
| content-encoding | Content-Encoding header | gzip |
| content-language | Content-Language header | en-US |
| content-type | Content-Type header | text/plain |
The metadata keys `mtime` and `content-type` will take precedence if
supplied in the metadata over reading the `Content-Type` or
modification time of the source object.
Hashes are not included in system metadata as there is a well defined
way of reading those already.
Options
-------
@@ -1206,6 +1326,12 @@ When the limit is reached all transfers will stop immediately.
Rclone will exit with exit code 8 if the transfer limit is reached.
## --metadata / -M
Setting this flag enables rclone to copy the metadata from the source
to the destination. For local backends this is ownership, permissions,
xattr etc. See the [#metadata](metadata section) for more info.
### --cutoff-mode=hard|soft|cautious ###
This modifies the behavior of `--max-transfer`

View File

@@ -14,48 +14,48 @@ show through.
Here is an overview of the major features of each cloud storage system.
| Name | Hash | ModTime | Case Insensitive | Duplicate Files | MIME Type |
| ---------------------------- |:----------------:|:-------:|:----------------:|:---------------:|:---------:|
| 1Fichier | Whirlpool | - | No | Yes | R |
| Akamai Netstorage | MD5, SHA256 | R/W | No | No | R |
| Amazon Drive | MD5 | - | Yes | No | R |
| Amazon S3 (or S3 compatible) | MD5 | R/W | No | No | R/W |
| Backblaze B2 | SHA1 | R/W | No | No | R/W |
| Box | SHA1 | R/W | Yes | No | - |
| Citrix ShareFile | MD5 | R/W | Yes | No | - |
| Dropbox | DBHASH ¹ | R | Yes | No | - |
| Enterprise File Fabric | - | R/W | Yes | No | R/W |
| FTP | - | R/W ¹⁰ | No | No | - |
| Google Cloud Storage | MD5 | R/W | No | No | R/W |
| Google Drive | MD5 | R/W | No | Yes | R/W |
| Google Photos | - | - | No | Yes | R |
| HDFS | - | R/W | No | No | - |
| HTTP | - | R | No | No | R |
| Hubic | MD5 | R/W | No | No | R/W |
| Internet Archive | MD5, SHA1, CRC32 | R/W ¹¹ | No | No | - |
| Jottacloud | MD5 | R/W | Yes | No | R |
| Koofr | MD5 | - | Yes | No | - |
| Mail.ru Cloud | Mailru ⁶ | R/W | Yes | No | - |
| Mega | - | - | No | Yes | - |
| Memory | MD5 | R/W | No | No | - |
| Microsoft Azure Blob Storage | MD5 | R/W | No | No | R/W |
| Microsoft OneDrive | SHA1 ⁵ | R/W | Yes | No | R |
| OpenDrive | MD5 | R/W | Yes | Partial ⁸ | - |
| OpenStack Swift | MD5 | R/W | No | No | R/W |
| pCloud | MD5, SHA1 ⁷ | R | No | No | W |
| premiumize.me | - | - | Yes | No | R |
| put.io | CRC-32 | R/W | No | Yes | R |
| QingStor | MD5 | - ⁹ | No | No | R/W |
| Seafile | - | - | No | No | - |
| SFTP | MD5, SHA1 ² | R/W | Depends | No | - |
| Sia | - | - | No | No | - |
| SugarSync | - | - | No | No | - |
| Storj | - | R | No | No | - |
| Uptobox | - | - | No | Yes | - |
| WebDAV | MD5, SHA1 ³ | R ⁴ | Depends | No | - |
| Yandex Disk | MD5 | R/W | No | No | R |
| Zoho WorkDrive | - | - | No | No | - |
| The local filesystem | All | R/W | Depends | No | - |
| Name | Hash | ModTime | Case Insensitive | Duplicate Files | MIME Type | Metadata |
| ---------------------------- |:----------------:|:-------:|:----------------:|:---------------:|:---------:|:--------:|
| 1Fichier | Whirlpool | - | No | Yes | R | - |
| Akamai Netstorage | MD5, SHA256 | R/W | No | No | R | - |
| Amazon Drive | MD5 | - | Yes | No | R | - |
| Amazon S3 (or S3 compatible) | MD5 | R/W | No | No | R/W | RWU |
| Backblaze B2 | SHA1 | R/W | No | No | R/W | - |
| Box | SHA1 | R/W | Yes | No | - | - |
| Citrix ShareFile | MD5 | R/W | Yes | No | - | - |
| Dropbox | DBHASH ¹ | R | Yes | No | - | - |
| Enterprise File Fabric | - | R/W | Yes | No | R/W | - |
| FTP | - | R/W ¹⁰ | No | No | - | - |
| Google Cloud Storage | MD5 | R/W | No | No | R/W | - |
| Google Drive | MD5 | R/W | No | Yes | R/W | - |
| Google Photos | - | - | No | Yes | R | - |
| HDFS | - | R/W | No | No | - | - |
| HTTP | - | R | No | No | R | - |
| Hubic | MD5 | R/W | No | No | R/W | - |
| Internet Archive | MD5, SHA1, CRC32 | R/W ¹¹ | No | No | - | - |
| Jottacloud | MD5 | R/W | Yes | No | R | - |
| Koofr | MD5 | - | Yes | No | - | - |
| Mail.ru Cloud | Mailru ⁶ | R/W | Yes | No | - | - |
| Mega | - | - | No | Yes | - | - |
| Memory | MD5 | R/W | No | No | - | - |
| Microsoft Azure Blob Storage | MD5 | R/W | No | No | R/W | - |
| Microsoft OneDrive | SHA1 ⁵ | R/W | Yes | No | R | - |
| OpenDrive | MD5 | R/W | Yes | Partial ⁸ | - | - |
| OpenStack Swift | MD5 | R/W | No | No | R/W | - |
| pCloud | MD5, SHA1 ⁷ | R | No | No | W | - |
| premiumize.me | - | - | Yes | No | R | - |
| put.io | CRC-32 | R/W | No | Yes | R | - |
| QingStor | MD5 | - ⁹ | No | No | R/W | - |
| Seafile | - | - | No | No | - | - |
| SFTP | MD5, SHA1 ² | R/W | Depends | No | - | - |
| Sia | - | - | No | No | - | - |
| SugarSync | - | - | No | No | - | - |
| Storj | - | R | No | No | - | - |
| Uptobox | - | - | No | Yes | - | - |
| WebDAV | MD5, SHA1 ³ | R ⁴ | Depends | No | - | - |
| Yandex Disk | MD5 | R/W | No | No | R | - |
| Zoho WorkDrive | - | - | No | No | - | - |
| The local filesystem | All | R/W | Depends | No | - | RWU |
### Notes
@@ -438,6 +438,22 @@ remote which supports writing (`W`) then rclone will preserve the MIME
types. Otherwise they will be guessed from the extension, or the
remote itself may assign the MIME type.
### Metadata
Backends may or may support reading or writing metadata. They may
support reading and writing system metadata (metadata intrinsic to
that backend) and/or user metadata (general purpose metadata).
The levels of metadata support are
| Key | Explanation |
|-----|-------------|
| `R` | Read only System Metadata |
| `RW` | Read and write System Metadata |
| `RWU` | Read and write System Metadata and read and write User Metadata |
See [the metadata docs](/docs/#metadata) for more info.
## Optional Features ##
All rclone remotes support a base command set. Other features depend