photoprism

mirror of https://github.com/photoprism/photoprism.git synced 2025-12-12 00:34:13 +01:00

Files

Michael Mayer 75f183aa25 AI: Add support for OLLAMA_BASE_URL env expansion in vision.yml #5361

Signed-off-by: Michael Mayer <michael@photoprism.app>

2025-12-10 10:52:26 +01:00

ollama

AI: Add support for OLLAMA_BASE_URL env expansion in vision.yml #5361

2025-12-10 10:52:26 +01:00

openai

AI: Set default Model & URI depending on OLLAMA_API_KEY env var #5361

2025-12-04 16:10:29 +01:00

schema

CI: Apply Go linter recommendations to "ai/vision" package #5330

2025-11-22 11:15:17 +01:00

testdata

AI: Add support for OLLAMA_BASE_URL env expansion in vision.yml #5361

2025-12-10 10:52:26 +01:00

api_client_test.go

AI: Improve model configuration and documentation #5123 #5232 #5322

2025-11-24 14:41:13 +01:00

api_client.go

AI: Improve model configuration and documentation #5123 #5232 #5322

2025-11-24 14:41:13 +01:00

api_format.go

CI: Apply Go linter recommendations to "ai/vision" package #5330

2025-11-22 11:15:17 +01:00

api_ollama.go

AI: Generate Captions & Labels using the OpenAI Responses API #5322

2025-11-14 11:10:40 +01:00

api_request_test.go

AI: Include NSFW flag & score when generating labels with Ollama #5232

2025-10-05 04:23:36 +02:00

api_request.go

AI: Rename vision.ApiRequestOptions to vision.ModelOptions

2025-12-02 17:05:22 +01:00

api_response_test.go

AI: Include NSFW flag & score when generating labels with Ollama #5232

2025-10-05 04:23:36 +02:00

api_response.go

CI: Apply Go linter recommendations to "ai/vision" package #5330

2025-11-22 11:15:17 +01:00

caption_test.go

AI: Include NSFW flag & score when generating labels with Ollama #5232

2025-10-05 04:23:36 +02:00

caption.go

AI: Improve model configuration and documentation #5123 #5232 #5322

2025-11-24 14:41:13 +01:00

config_test.go

AI: Auto-add model defaults when loading "vision.yml" #5234

2025-11-27 19:49:30 +01:00

config.go

AI: Auto-add model defaults when loading "vision.yml" #5234

2025-11-27 19:49:30 +01:00

engine_ollama_test.go

AI: Add support for OLLAMA_BASE_URL env expansion in vision.yml #5361

2025-12-10 10:52:26 +01:00

engine_ollama.go

AI: Add support for OLLAMA_BASE_URL env expansion in vision.yml #5361

2025-12-10 10:52:26 +01:00

engine_openai_test.go

AI: Rename vision.ApiRequestOptions to vision.ModelOptions

2025-12-02 17:05:22 +01:00

engine_openai.go

AI: Set default Model & URI depending on OLLAMA_API_KEY env var #5361

2025-12-04 16:10:29 +01:00

engine_test.go

AI: Refactor engine defaults in internal/ai/vision/** #5232 #5233 #5234

2025-10-01 00:27:36 +02:00

engine.go

AI: Set default Model & URI depending on OLLAMA_API_KEY env var #5361

2025-12-04 16:10:29 +01:00

errors.go

CI: Apply Go linter recommendations to "ai/vision" package #5330

2025-11-22 11:15:17 +01:00

face_test.go

AI: Refactor face detection code #5167

2025-10-07 11:28:52 +02:00

face.go

AI: Refactor face detection code #5167

2025-10-07 11:28:52 +02:00

faces.go

AI: Improve model configuration and documentation #5123 #5232 #5322

2025-11-24 14:41:13 +01:00

label_normalizer_test.go

AI: Fix and improve label normalization in vision package #5232

2025-10-29 17:21:12 +01:00

label_normalizer.go

AI: Fix and improve label normalization in vision package #5232

2025-10-29 17:21:12 +01:00

labels_test.go

AI: Allow users to use a custom source for TensorFlow labels #5011 #5232

2025-10-30 12:40:03 +01:00

labels.go

AI: Improve model configuration and documentation #5123 #5232 #5322

2025-11-24 14:41:13 +01:00

model_filters_test.go

AI: Configure vision model execution and scheduling #5232 #5233 #5234

2025-09-30 15:51:48 +02:00

model_filters.go

AI: Improve Face Detection with an ONNX-based model #5167

2025-10-06 18:51:49 +02:00

model_options.go

AI: Use OLLAMA_API_KEY as API auth token if specified #5361

2025-12-03 10:47:08 +01:00

model_run_test.go

AI: Add "on-newly-indexed" alias for "newly-indexed" in RunTypes #5234

2025-10-02 13:26:57 +02:00

model_run.go

CI: Apply Go linter recommendations to "ai/vision" package #5330

2025-11-22 11:15:17 +01:00

model_test.go

AI: Use OLLAMA_API_KEY as API auth token if specified #5361

2025-12-03 10:47:08 +01:00

model_types.go

CI: Apply Go linter recommendations to "ai/vision" package #5330

2025-11-22 11:15:17 +01:00

model.go

AI: Set default Model & URI depending on OLLAMA_API_KEY env var #5361

2025-12-04 16:10:29 +01:00

models.go

AI: Ensure default caption model only runs manual #5361

2025-12-04 16:20:39 +01:00

nsfw.go

AI: Improve model configuration and documentation #5123 #5232 #5322

2025-11-24 14:41:13 +01:00

README.md

AI: Add support for OLLAMA_BASE_URL env expansion in vision.yml #5361

2025-12-10 10:52:26 +01:00

resolution_test.go

AI: Default to the 720x720 fit thumb for generating captions #3438 #5011

2025-07-16 14:58:44 +02:00

resolution.go

AI: Add "photoprism vision run" command and vision worker #127 #1090

2025-04-11 05:15:14 +02:00

service_test.go

AI: Add support for OLLAMA_BASE_URL env expansion in vision.yml #5361

2025-12-10 10:52:26 +01:00

service.go

AI: Add support for OLLAMA_BASE_URL env expansion in vision.yml #5361

2025-12-10 10:52:26 +01:00

thresholds_test.go

AI: Set OpenAI API service key via OPENAI_API_KEY(_FILE) variable #5322

2025-11-14 12:04:44 +01:00

thresholds.go

AI: Include NSFW flag & score when generating labels with Ollama #5232

2025-10-05 04:23:36 +02:00

topicality_test.go

Vision: Move PriorityFromTopicality() to topicality.go #5232

2025-10-02 14:18:36 +02:00

topicality.go

Vision: Move PriorityFromTopicality() to topicality.go #5232

2025-10-02 14:18:36 +02:00

vision_env_test.go

AI: Add support for OLLAMA_BASE_URL env expansion in vision.yml #5361

2025-12-10 10:52:26 +01:00

vision_env.go

AI: Add support for OLLAMA_BASE_URL env expansion in vision.yml #5361

2025-12-10 10:52:26 +01:00

vision_test.go

Config: Change default vision model assets path to assets/models/ #127

2025-08-08 19:06:56 +02:00

vision.go

AI: Add support for OLLAMA_BASE_URL env expansion in vision.yml #5361

2025-12-10 10:52:26 +01:00

README.md

PhotoPrism — Vision Package

Last Updated: December 10, 2025

Overview

internal/ai/vision provides the shared model registry, request builders, and parsers that power PhotoPrism’s caption, label, face, NSFW, and future generate workflows. It reads vision.yml, normalizes models, and dispatches calls to one of three engines:

TensorFlow (built‑in) — default Nasnet / NSFW / Facenet models, no remote service required.
Ollama — local or proxied multimodal LLMs. See ollama/README.md for tuning and schema details. The engine defaults to ${OLLAMA_BASE_URL:-http://ollama:11434}/api/generate, trimming any trailing slash on the base URL; set OLLAMA_BASE_URL=https://ollama.com to opt into cloud defaults.
OpenAI — cloud Responses API. See openai/README.md for prompts, schema variants, and header requirements.

Configuration

Models

The vision.yml file is usually kept in the storage/config directory (override with PHOTOPRISM_VISION_YAML). It defines a list of models under Models:. Key fields are captured below. If a type is omitted entirely, PhotoPrism will auto-append the built-in defaults (labels, nsfw, face, caption) so you no longer need placeholder stanzas. The Thresholds block is optional; missing or out-of-range values fall back to defaults.

Field	Default	Notes
`Type` (required)	—	`labels`, `caption`, `face`, `nsfw`, `generate`. Drives routing & scheduling.
`Name`	derived from type/version	Display name; lower-cased by helpers.
`Model`	`""`	Raw identifier override; precedence: `Service.Model` → `Model` → `Name`.
`Version`	`latest` (non-OpenAI)	OpenAI payloads omit version.
`Engine`	inferred from service/alias	Aliases set formats, file scheme, resolution. Explicit `Service` values still win.
`Run`	`auto`	See Run modes table below.
`Default`	`false`	Keep one per type for TensorFlow fallbacks.
`Disabled`	`false`	Registered but inactive.
`Resolution`	224 (TensorFlow) / 720 (Ollama/OpenAI)	Thumbnail edge in px; TensorFlow models default to 224 unless you override.
`System` / `Prompt`	engine defaults	Override prompts per model.
`Format`	`""`	Response hint (`json`, `text`, `markdown`).
`Schema` / `SchemaFile`	engine defaults / empty	Inline vs file JSON schema (labels).
`TensorFlow`	nil	Local TF model info (paths, tags).
`Options`	nil	Sampling/settings merged with engine defaults.
`Service`	nil	Remote endpoint config (see below).

Run Modes

Value	When it runs	Recommended use
`auto`	TensorFlow defaults during index; external via metadata/schedule	Leave as-is for most setups.
`manual`	Only when explicitly invoked (CLI/API)	Experiments and diagnostics.
`on-index`	During indexing + manual	Fast built-in models only.
`newly-indexed`	Metadata worker after indexing + manual	External/Ollama/OpenAI without slowing import.
`on-demand`	Manual, metadata worker, and scheduled jobs	Broad coverage without index path.
`on-schedule`	Scheduled jobs + manual	Nightly/cron-style runs.
`always`	Indexing, metadata, scheduled, manual	High-priority models; watch resource use.
`never`	Never executes	Keep definition without running it.

Note: For performance reasons, on-index is only supported for the built-in TensorFlow models.

Model Options

The model Options adjust model parameters such as temperature, top-p, and schema constraints when using Ollama or OpenAI. Rows are ordered exactly as defined in vision/model_options.go.

Option	Engines	Default	Description
`Temperature`	Ollama, OpenAI	engine default	Controls randomness with a value between `0.01` and `2.0`; not used for OpenAI's GPT-5.
`TopK`	Ollama	engine default	Limits sampling to the top K tokens to reduce rare or noisy outputs.
`TopP`	Ollama, OpenAI	engine default	Nucleus sampling; keeps the smallest token set whose cumulative probability ≥ `p`.
`MinP`	Ollama	engine default	Drops tokens whose probability mass is below `p`, trimming the long tail.
`TypicalP`	Ollama	engine default	Keeps tokens with typicality under the threshold; combine with TopP/MinP for flow.
`TfsZ`	Ollama	engine default	Tail free sampling parameter; lower values reduce repetition.
`Seed`	Ollama	random per run	Fix for reproducible outputs; unset for more variety between runs.
`NumKeep`	Ollama	engine default	How many tokens to keep from the prompt before sampling starts.
`RepeatLastN`	Ollama	engine default	Number of recent tokens considered for repetition penalties.
`RepeatPenalty`	Ollama	engine default	Multiplier >1 discourages repeating the same tokens or phrases.
`PresencePenalty`	OpenAI	engine default	Increases the likelihood of introducing new tokens by penalizing existing ones.
`FrequencyPenalty`	OpenAI	engine default	Penalizes tokens in proportion to their frequency so far.
`PenalizeNewline`	Ollama	engine default	Whether to apply repetition penalties to newline tokens.
`Stop`	Ollama, OpenAI	engine default	Array of stop sequences (e.g., `["\\n\\n"]`).
`Mirostat`	Ollama	engine default	Enables Mirostat sampling (`0` off, `1/2` modes).
`MirostatTau`	Ollama	engine default	Controls surprise target for Mirostat sampling.
`MirostatEta`	Ollama	engine default	Learning rate for Mirostat adaptation.
`NumPredict`	Ollama	engine default	Ollama-specific max output tokens; synonymous intent with `MaxOutputTokens`.
`MaxOutputTokens`	Ollama, OpenAI	engine default	Upper bound on generated tokens; adapters raise low values to defaults.
`ForceJson`	Ollama, OpenAI	engine default	Forces structured output when enabled.
`SchemaVersion`	Ollama, OpenAI	derived from schema	Override when coordinating schema migrations.
`CombineOutputs`	OpenAI	engine default	Controls whether multi-output models combine results automatically.
`Detail`	OpenAI	engine default	Controls OpenAI vision detail level (`low`, `high`, `auto`).
`NumCtx`	Ollama, OpenAI	engine default	Context window length (tokens).
`NumThread`	Ollama	runtime auto	Caps CPU threads for local engines.
`NumBatch`	Ollama	engine default	Batch size for prompt processing.
`NumGpu`	Ollama	engine default	Number of GPUs to distribute work across.
`MainGpu`	Ollama	engine default	Primary GPU index when multiple GPUs are present.
`LowVram`	Ollama	engine default	Enable VRAM-saving mode; may reduce performance.
`VocabOnly`	Ollama	engine default	Load vocabulary only for quick metadata inspection.
`UseMmap`	Ollama	engine default	Memory map model weights instead of fully loading them.
`UseMlock`	Ollama	engine default	Lock model weights in RAM to reduce paging.
`Numa`	Ollama	engine default	Enable NUMA-aware allocations when available.

Model Service

Configures the endpoint URL, method, format, and authentication for Ollama, OpenAI, and other engines that perform remote HTTP requests:

Field	Default	Notes
`Uri`	required for remote	Endpoint base. Empty keeps model local (TensorFlow). Ollama alias fills `${OLLAMA_BASE_URL}/api/generate`, defaulting to `http://ollama:11434`.
`Method`	`POST`	Override verb if provider needs it.
`Key`	`""`	Bearer token; prefer env expansion (OpenAI: `OPENAI_API_KEY`, Ollama: `OLLAMA_API_KEY`).
`Username` / `Password`	`""`	Injected as basic auth when URI lacks userinfo.
`Model`	`""`	Endpoint-specific override; wins over model/name.
`Org` / `Project`	`""`	OpenAI headers (org/proj IDs)
`RequestFormat` / `ResponseFormat`	set by engine alias	Explicit values win over alias defaults.
`FileScheme`	set by engine alias (`data` or `base64`)	Controls image transport.
`Disabled`	`false`	Disable the endpoint without removing the model.

Authentication: All credentials and identifiers support ${ENV_VAR} expansion. Service.Key sets Authorization: Bearer <token>; Username/Password injects HTTP basic authentication into the service URI when it is not already present. When Service.Key is empty, PhotoPrism defaults to OPENAI_API_KEY (OpenAI engine) or OLLAMA_API_KEY (Ollama engine), also honoring their _FILE counterparts.

Field Behavior & Precedence

Model identifier resolution order: Service.Model → Model → Name. Model.GetModel() returns (id, name, version) where Ollama receives name:version and other engines receive name plus a separate Version.
Env expansion runs for all Service credentials and Model overrides; empty or disabled models return empty identifiers.
Options merging: engine defaults fill missing fields; explicit values always win. Temperature is capped at MaxTemperature.
Authentication: Service.Key sets Authorization: Bearer <token>; Username/Password inject HTTP basic auth into the service URI when not already present.

Minimal Examples

TensorFlow (built‑in defaults)

Models:
  - Type: labels
    Default: true
    Run: auto

  - Type: nsfw
    Default: true
    Run: auto

  - Type: face
    Default: true
    Run: auto

Ollama Labels

Models:
  - Type: labels
    Model: gemma3:latest
    Engine: ollama
    Run: newly-indexed
    Service:
      Uri: ${OLLAMA_BASE_URL}/api/generate

More Ollama guidance: internal/ai/vision/ollama/README.md.

OpenAI Captions

Models:
  - Type: caption
    Model: gpt-5-mini
    Engine: openai
    Run: newly-indexed
    Service:
      Uri: https://api.openai.com/v1/responses
      Org: ${OPENAI_ORG}
      Project: ${OPENAI_PROJECT}
      Key: ${OPENAI_API_KEY}

More OpenAI guidance: internal/ai/vision/openai/README.md.

Custom TensorFlow Labels (SavedModel)

Models:
  - Type: labels
    Name: transformer
    Engine: tensorflow
    Path: transformer   # resolved under assets/models
    Resolution: 224     # keep standard TF input size unless your model differs
    TensorFlow:
      Output:
        Logits: true    # set true for most TF2 SavedModel classifiers

Custom TensorFlow Models — What’s Supported

Scope: Classification tasks only (labels). TensorFlow models cannot generate captions today; use Ollama or OpenAI for captions.
Location & paths: If Path is empty, the model is loaded from assets/models/<name> (lowercased, underscores). If Path is set, it is still searched under assets/models; absolute paths are not supported.
Expected files: saved_model.pb, a variables/ directory, and a labels.txt alongside the model; use TF2 SavedModel classifiers.
Resolution: Stays at 224px unless your model requires a different input size; adjust Resolution and the TensorFlow.Input block if needed.
Sources: Labels produced by TensorFlow models are recorded with source image; overriding the source isn’t supported yet.
Config file: vision.yml is the conventional name; in the latest version, .yaml is also supported by the loader.

CLI Quick Reference

List models: photoprism vision ls (shows resolved IDs, engines, options, run mode, disabled flag).
Run a model: photoprism vision run -m labels --count 5 (use --force to bypass Run rules).
Validate config: photoprism vision ls --json to confirm env-expanded values without triggering calls.

When to Choose Each Engine

TensorFlow: fast, offline defaults for core features (labels, faces, NSFW). Zero external deps.
Ollama: private, GPU/CPU-hosted multimodal LLMs; best for richer captions/labels without cloud traffic.
OpenAI: highest quality reasoning and multimodal support; requires API key and network access.

Ollama specifics: internal/ai/vision/ollama/README.md
OpenAI specifics: internal/ai/vision/openai/README.md
REST API reference: https://docs.photoprism.dev/
Developer guide (Vision): https://docs.photoprism.app/developer-guide/api/

README.md Unescape Escape

PhotoPrism — Vision Package

Overview

Configuration

Models

Run Modes

Model Options

Model Service

Field Behavior & Precedence

Minimal Examples

TensorFlow (built‑in defaults)

Ollama Labels

OpenAI Captions

Custom TensorFlow Labels (SavedModel)

Custom TensorFlow Models — What’s Supported

CLI Quick Reference

When to Choose Each Engine

Related Docs

README.md