GitFlowPersistenceProvider
stores flow contents under a Git directory.
In contrast to FileSystemFlowPersistenceProvider
, this provider uses human friendly Bucket and Flow names so that those files can be accessed by external tools. However, it is NOT supported to modify stored files outside of NiFi Registry. Persisted files are only read when NiFi Registry starts up.
Buckets are represented as directories and Flow contents are stored as files in a Bucket directory they belong to. Flow snapshot histories are managed as Git commits, meaning only the latest version of Buckets and Flows exist in the Git directory. Old versions are retrieved from Git commit histories.
Example persisted files
Flow Storage Directory/
├── .git/
├── Bucket_A/
│ ├── bucket.yml
│ ├── Flow_1.snapshot
│ └── Flow_2.snapshot
└── Bucket_B/
├── bucket.yml
└── Flow_4.snapshot
Each Bucket directory contains a YAML file named bucket.yml
. The file manages links from NiFi Registry Bucket and Flow IDs to actual directory and file names. When NiFi Registry starts, this provider reads through Git commit histories and lookup these bucket.yml
files to restore Buckets and Flows for each snapshot version.
Example bucket.yml🔗
layoutVer: 1
bucketId: d1beba88-32e9-45d1-bfe9-057cc41f7ce8
flows:
219cf539-427f-43be-9294-0644fb07ca63: {ver: 7, file: Flow_1.snapshot}
22cccb6c-3011-4493-a996-611f8f112969: {ver: 3, file: Flow_2.snapshot}
Qualified class name: org.apache.nifi.registry.provider.flow.git.GitFlowPersistenceProvider
Property
|
Description
|
Flow Storage Directory
|
REQUIRED: File system path for a directory where flow contents files are persisted to. The directory must exist when NiFi registry starts. Also must be initialized as a Git directory.
|
Remote To Push
|
When a new flow snapshot is created, this persistence provider updates files in the specified Git directory, then creates a commit to the local repository. If Remote To Push is defined, it also pushes to the specified remote repository (e.g. origin ). To define more detailed remote spec such as branch names, use Refspec (see https://git-scm.com/book/en/v2/Git-Internals-The-Refspec).
|
Remote Access User
|
This username is used to make push requests to the remote repository when Remote To Push is enabled, and the remote repository is accessed by HTTP protocol. If SSH is used, user authentication is done with SSH keys.
|
Remote Access Password
|
The password for the Remote Access User .
|
Remote Clone Repository
|
Remote repository URI to use to clone into Flow Storage Directory , if local repository is not present in Flow Storage Directory . If left empty the git directory needs to be configured. If URI is provided then Remote Access User and Remote Access Password also should be present. Currently, default branch of remote will be cloned.
|
Git user authentication🔗
By default, this persistence repository only create commits to local repository. No user authentication is needed to do so. However, if 'Commit To Push' is enabled, user authentication to the remote Git repository is required.
If the remote repository is accessed by HTTP, then username and password for authentication can be configured in the providers XML configuration file.
When SSH is used, SSH keys are used to identify a Git user. In order to pick the right key to a remote server, the SSH configuration file ${USER_HOME}/.ssh/config
is used. The SSH configuration file can contain multiple Host
entries to specify a key file to login to a remote Git server. The Host
must match with the target remote Git server hostname.
example SSH config file
Host git.example.com
HostName git.example.com
IdentityFile ~/.ssh/id_rsa
Host github.com
HostName github.com
IdentityFile ~/.ssh/key-for-github
Host bitbucket.org
HostName bitbucket.org
IdentityFile ~/.ssh/key-for-bitbucket