Skip to content

feat(roxctl): central migrate-to-operator subcommand#20119

Draft
porridge wants to merge 4 commits intomasterfrom
porridge/migrate-to-operator
Draft

feat(roxctl): central migrate-to-operator subcommand#20119
porridge wants to merge 4 commits intomasterfrom
porridge/migrate-to-operator

Conversation

@porridge
Copy link
Copy Markdown
Contributor

@porridge porridge commented Apr 21, 2026

Description

This is the scaffolding and very basic partial implementation for https://redhat.atlassian.net/browse/ROX-33122 and https://redhat.atlassian.net/browse/ROX-33127
Follow-up PR's will provide support for more tunables, and to support secured-cluster (https://redhat.atlassian.net/browse/ROX-33121) in a similar spririt.

My initial plan was to only provide in 4.11 timeframe a simple description for humans to read and manually prepare a central custom resource for migrating from legacy methods. I had claude draft such document in https://github.com/stackrox/stackrox/blob/porridge/migraion/MIGRATION/options-master-list.md However, going through such instructions would be quite tedious and maintaining this documentation without making without a way of making sure it actually works would also be difficult and labour-intensice. Providing this as code seems like a better approach.

Co-authored-by: Claude Opus 4.6 noreply@anthropic.com

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

change me!

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 21, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@porridge
Copy link
Copy Markdown
Contributor Author

/test all

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The source interface comment says CentralDBDeployment may return nil when not found, but both dirSource and clusterSource implementations return a non-nil error instead; consider updating the comment or behavior so they are consistent.
  • The splitYAMLDocuments helper relies on a simple strings.Split on "\n---", which may break in edge cases (e.g., documents starting with ---, Windows line endings, or --- in content); consider using yaml.Decoder to iterate documents more robustly.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `source` interface comment says `CentralDBDeployment` may return nil when not found, but both `dirSource` and `clusterSource` implementations return a non-nil error instead; consider updating the comment or behavior so they are consistent.
- The `splitYAMLDocuments` helper relies on a simple `strings.Split` on `"\n---"`, which may break in edge cases (e.g., documents starting with `---`, Windows line endings, or `---` in content); consider using `yaml.Decoder` to iterate documents more robustly.

## Individual Comments

### Comment 1
<location path="roxctl/central/migratetooperator/source.go" line_range="10" />
<code_context>
+// source provides access to Kubernetes resources from either a directory
+// of YAML manifests or a live cluster.
+type source interface {
+	// CentralDBDeployment returns the central-db Deployment, or nil if not found.
+	CentralDBDeployment() (*appsv1.Deployment, error)
+}
</code_context>
<issue_to_address>
**issue (bug_risk):** Align the `CentralDBDeployment` contract with the implementations that never return nil without an error

The interface comment implies `CentralDBDeployment` may return `nil` without an error, but current implementations always return either a deployment or a non-nil error, and callers (e.g. `detectStorage`) assume a non-nil deployment on success. Please either update the comment to state that "not found" is returned as an error, or update implementations and callers to correctly support a `nil`-without-error result.
</issue_to_address>

### Comment 2
<location path="roxctl/central/migratetooperator/dir_source_test.go" line_range="12-21" />
<code_context>
+	"github.com/stretchr/testify/require"
+)
+
+func TestDirSource_PVC(t *testing.T) {
+	dir := t.TempDir()
+	centralDir := filepath.Join(dir, "central")
+	require.NoError(t, os.MkdirAll(centralDir, 0755))
+
+	deploymentYAML := `apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: central-db
+spec:
+  template:
+    spec:
+      volumes:
+      - name: disk
+        persistentVolumeClaim:
+          claimName: my-pvc
+`
+	require.NoError(t, os.WriteFile(filepath.Join(centralDir, "01-central-12-central-db.yaml"), []byte(deploymentYAML), 0644))
+
+	src, err := newDirSource(dir)
+	require.NoError(t, err)
+
+	dep, err := src.CentralDBDeployment()
+	require.NoError(t, err)
+	assert.Equal(t, "central-db", dep.Name)
+
+	require.Len(t, dep.Spec.Template.Spec.Volumes, 1)
+	require.NotNil(t, dep.Spec.Template.Spec.Volumes[0].PersistentVolumeClaim)
+	assert.Equal(t, "my-pvc", dep.Spec.Template.Spec.Volumes[0].PersistentVolumeClaim.ClaimName)
+}
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for `newDirSource` error cases (nonexistent path and non-directory path)

The existing tests only cover the happy path for `dirSource`. Please also add tests that assert `newDirSource` fails as expected when the path does not exist/unreadable and when it points to a non-directory, including checking the returned error messages (e.g., "accessing directory" vs. "is not a directory").

Suggested implementation:

```golang
import (
	"os"
	"path/filepath"
	"testing"

	"github.com/stretchr/testify/assert"
	"github.com/stretchr/testify/require"
)

func TestDirSource_PVC(t *testing.T) {
	dir := t.TempDir()
	centralDir := filepath.Join(dir, "central")
	require.NoError(t, os.MkdirAll(centralDir, 0o755))

	deploymentYAML := `apiVersion: apps/v1
kind: Deployment
metadata:
  name: central-db
spec:
  template:
    spec:
      volumes:
      - name: disk
        persistentVolumeClaim:
          claimName: my-pvc
`
	require.NoError(t, os.WriteFile(filepath.Join(centralDir, "01-central-12-central-db.yaml"), []byte(deploymentYAML), 0o644))

	src, err := newDirSource(dir)
	require.NoError(t, err)

	dep, err := src.CentralDBDeployment()
	require.NoError(t, err)
	assert.Equal(t, "central-db", dep.Name)

	require.Len(t, dep.Spec.Template.Spec.Volumes, 1)
	require.NotNil(t, dep.Spec.Template.Spec.Volumes[0].PersistentVolumeClaim)
	assert.Equal(t, "my-pvc", dep.Spec.Template.Spec.Volumes[0].PersistentVolumeClaim.ClaimName)
}

func TestNewDirSource_NonexistentPath(t *testing.T) {
	// Use a subpath of TempDir that we do not create, ensuring it does not exist.
	base := t.TempDir()
	nonexistentPath := filepath.Join(base, "does-not-exist")

	src, err := newDirSource(nonexistentPath)
	require.Error(t, err)
	assert.Nil(t, src)
	// newDirSource should surface that it failed while accessing the directory.
	require.ErrorContains(t, err, "accessing directory")
}

func TestNewDirSource_NonDirectoryPath(t *testing.T) {
	dir := t.TempDir()
	filePath := filepath.Join(dir, "not-a-directory")

	require.NoError(t, os.WriteFile(filePath, []byte("not a directory"), 0o644))

	src, err := newDirSource(filePath)
	require.Error(t, err)
	assert.Nil(t, src)
	// newDirSource should clearly indicate that the path is not a directory.
	require.ErrorContains(t, err, "is not a directory")
}

```

These changes assume that:
1. `newDirSource` returns a `nil` source on error and wraps filesystem errors with messages containing "accessing directory" when the path doesn't exist/unreadable, and "is not a directory" when the path is a non-directory.
2. No other tests in this file depend on the duplicated import block that was consolidated.

If the actual error messages differ slightly, update the `require.ErrorContains` substrings to match the real messages used in `newDirSource`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread roxctl/central/migratetooperator/source.go Outdated
Comment thread roxctl/central/migratetooperator/dir_source_test.go Outdated
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 21, 2026

📝 Walkthrough

Walkthrough

This PR introduces a new roxctl central migrate-to-operator command that detects Central deployment configuration (storage type, node selectors) from either a running Kubernetes cluster or manifest directory, then generates a custom resource YAML compatible with operator-based deployment.

Changes

Cohort / File(s) Summary
Command Registration
roxctl/central/central.go
Registered the new migratetooperator command subtree in the central command tree.
Core Command Implementation
roxctl/central/migratetooperator/command.go
Implemented the main migrateto-operator Cobra command with --from-dir, --namespace, and --output flag handling; executes detect → generate → marshal → write flow.
Source Abstractions
roxctl/central/migratetooperator/source.go, cluster_source.go, dir_source.go
Defined source interface and two implementations: clusterSource loads from running cluster via kubeconfig; dirSource scans directory manifests to locate central-db Deployment.
Storage Detection
roxctl/central/migratetooperator/detect.go
Inspects central-db Deployment volumes to detect PersistentVolumeClaim or hostPath storage configuration and preserves pod nodeSelector when applicable.
Resource Generation
roxctl/central/migratetooperator/generate.go
Constructs a platform.stackrox.io/v1alpha1 Central CR from detected storage config; marshals to YAML output excluding operator status fields.
Unit Tests
roxctl/central/migratetooperator/*_test.go
Comprehensive test coverage for detect, dir_source, and generate functions with PVC, hostPath, and multi-document YAML scenarios.
Integration Tests
tests/roxctl/bats-tests/local/central-migrate-to-operator.bats
End-to-end Bats tests validating migrateto-operator output for PVC/hostPath storage modes, node selectors, error cases, and CR metadata.
Import Validation
tools/roxvet/analyzers/validateimports/analyzer.go
Extended allowlist to permit operator/api imports in roxctl packages, matching existing pkg-level permissions.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant Cmd as Command Handler
    participant Src as Source<br/>(Cluster or Dir)
    participant Det as Storage Detector
    participant Gen as CR Generator
    participant Out as Output Writer

    User->>Cmd: execute with flags<br/>(--from-dir or --namespace)
    Cmd->>Cmd: createSource()
    Cmd->>Src: (instantiate ClusterSource<br/>or DirSource)
    Cmd->>Det: detect(source)
    Det->>Src: CentralDBDeployment()
    Src-->>Det: *Deployment
    Det->>Det: inspect volumes &<br/>nodeSelector
    Det-->>Cmd: *detectedConfig
    Cmd->>Gen: generateCR(config)
    Gen->>Gen: build Central CR<br/>with storage spec
    Gen-->>Cmd: *platform.Central
    Cmd->>Cmd: marshalCR(cr)
    Cmd->>Cmd: marshal to YAML
    Cmd->>Out: write output
    Out-->>User: Central CR YAML
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description fails to adequately explain the changes. While it provides context about referenced issues and co-authorship, critical sections remain incomplete with placeholder text. Complete the 'How I validated my change' section by describing validation activities performed. Replace placeholder 'change me!' text. Check relevant checkboxes (CHANGELOG, documentation, testing, validation) to indicate what was completed or explain why items are not applicable.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding a 'central migrate-to-operator' subcommand to roxctl, which aligns with the changeset scope.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch porridge/migrate-to-operator

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
tests/roxctl/bats-tests/local/central-migrate-to-operator.bats (1)

14-21: mktemp -u creates a TOCTOU-style race and leaves cleanup brittle.

mktemp -d -u / mktemp -u only generate names without creating the file/dir, so two parallel tests could theoretically collide, and if a test fails before the path is created, rm -rf "" is fine but obscures intent. Prefer out_dir="$(mktemp -d)" and then pass --output-dir to a subdirectory, or just let roxctl create it and use real mktemp -d for the parent. Low-impact but standard bats hygiene.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/roxctl/bats-tests/local/central-migrate-to-operator.bats` around lines
14 - 21, The setup/teardown use mktemp -u which is racy and brittle; change
setup() to create real temp paths with mktemp -d for out_dir and mktemp for
cr_out (i.e., create the directory/file rather than only generating names), or
let roxctl create the output and use mktemp -d to make a real parent dir and
pass that as --output-dir; update teardown() to remove those actual created
paths; modify references to out_dir and cr_out in the test to match the created
paths.
roxctl/central/migratetooperator/detect.go (1)

34-49: Consider matching the central-db volume by more than just name "disk".

The lookup relies on a hardcoded volume name "disk". If roxctl central generate ever renames this volume (or a user-edited manifest uses a different name), detection silently fails with a misleading error. Since you already have the Deployment, consider also falling back to detecting the volume via the central-db container's volumeMounts (find the mount at the DB data path and resolve its Name), which is more robust against naming drift in generate.

Not blocking for this PR, but worth a follow-up.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@roxctl/central/migratetooperator/detect.go` around lines 34 - 49, The current
detectStorage function only matches a volume named "disk" and should fall back
to resolving the volume by the central DB container's volumeMounts to avoid
brittle name assumptions: after calling src.CentralDBDeployment() and failing to
find a Volume with Name == "disk" set diskVolume to nil, iterate
dep.Spec.Template.Spec.Containers to locate the central DB container (by
container.Name or other identifying property), then scan that container's
VolumeMounts for the mount whose MountPath equals the DB data path (the path
where central stores DB files) and use that VolumeMount's Name to locate and set
diskVolume from dep.Spec.Template.Spec.Volumes; return the same error only if
neither the named volume nor the mount-based volume can be resolved. Ensure you
update detectStorage to reference dep.Spec.Template.Spec.Containers, the
container's VolumeMounts, and the volume name resolution logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@roxctl/central/migratetooperator/cluster_source.go`:
- Around line 40-44: The CentralDBDeployment method currently calls
s.client.AppsV1().Deployments(...).Get with context.Background(), which can
hang; change it to use a bounded context (e.g., create a ctx, cancel :=
context.WithTimeout(context.Background(), <reasonableDuration>) and defer
cancel()) and pass that ctx to the Get call in CentralDBDeployment (or use an
existing timeout field on the clusterSource if one exists), ensuring the cancel
is deferred to avoid leaks and that the Get call uses the new ctx instead of
context.Background().

In `@roxctl/central/migratetooperator/command.go`:
- Around line 59-70: The code uses os.Stdout directly which bypasses the
injected environment.Environment and defers f.Close() losing close/flush errors;
change the default writer to use the injected environment's writer (use
env.Writer() or the equivalent accessor from environment.Environment) instead of
os.Stdout, and when cmd.output != "" open the file and set w to that file but do
not defer Close — after writing out (the Write call that currently uses
w.Write(out)) explicitly close the file and check/return any error from Close so
close/flush failures are surfaced; reference variables/functions: w, cmd.output,
and environment.Environment.

In `@roxctl/central/migratetooperator/detect.go`:
- Around line 58-67: The nodeSelector propagation is currently applied only
inside the hostPath branch in detect.go (when hp := diskVolume.HostPath), which
drops nodeSelector for PVC-backed volumes; update the logic in the function that
builds the storageConfig (the block creating cfg with Type storageHostPath and
HostPath) to extract and set cfg.NodeSelector from
dep.Spec.Template.Spec.NodeSelector outside/after the hostPath-specific branch
so it runs for both hostPath and PVC cases (keep using the existing
storageConfig struct and its NodeSelector field and let the CR generator decide
emission).

In `@tests/roxctl/bats-tests/local/central-migrate-to-operator.bats`:
- Around line 60-72: Tests currently assert that user-provided
--db-storage-class and --db-size are dropped during migration; instead preserve
those values in the generated CR and update the tests to assert them. Modify the
migration logic in migrate-to-operator (the code exercised by
generate_and_migrate) so that flags map into
spec.central.db.persistence.persistentVolumeClaim.storageClassName and .size on
the resulting CR (cr_out), then change the two tests to expect "fast-ssd" for
storageClassName and "200" (or the appropriate unitized value) for size;
alternatively, if dropping is intentional, add a clear code comment in
migrate-to-operator and emit a stderr warning during migration and update the
tests to assert that warning instead.
- Around line 8-12: The setup_file() uses skip in a way that breaks Bats
1.7.0–1.8.1; add a version guard by calling bats_require_minimum_version 1.8.2
at the top of this test file (or ensure CI installs Bats <1.7.0 or ≥1.8.2), and
update any other test files that use skip in setup_file() (e.g., other roxctl
tests) to include the same bats_require_minimum_version 1.8.2 guard to avoid the
regression; target the change around the setup_file() and its skip usage so
future runs won’t fail under 1.7.0–1.8.1.

---

Nitpick comments:
In `@roxctl/central/migratetooperator/detect.go`:
- Around line 34-49: The current detectStorage function only matches a volume
named "disk" and should fall back to resolving the volume by the central DB
container's volumeMounts to avoid brittle name assumptions: after calling
src.CentralDBDeployment() and failing to find a Volume with Name == "disk" set
diskVolume to nil, iterate dep.Spec.Template.Spec.Containers to locate the
central DB container (by container.Name or other identifying property), then
scan that container's VolumeMounts for the mount whose MountPath equals the DB
data path (the path where central stores DB files) and use that VolumeMount's
Name to locate and set diskVolume from dep.Spec.Template.Spec.Volumes; return
the same error only if neither the named volume nor the mount-based volume can
be resolved. Ensure you update detectStorage to reference
dep.Spec.Template.Spec.Containers, the container's VolumeMounts, and the volume
name resolution logic.

In `@tests/roxctl/bats-tests/local/central-migrate-to-operator.bats`:
- Around line 14-21: The setup/teardown use mktemp -u which is racy and brittle;
change setup() to create real temp paths with mktemp -d for out_dir and mktemp
for cr_out (i.e., create the directory/file rather than only generating names),
or let roxctl create the output and use mktemp -d to make a real parent dir and
pass that as --output-dir; update teardown() to remove those actual created
paths; modify references to out_dir and cr_out in the test to match the created
paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: a1894a54-59cb-4c3a-99f5-587e4bc6bbed

📥 Commits

Reviewing files that changed from the base of the PR and between ac35f0b and 295d323.

📒 Files selected for processing (12)
  • roxctl/central/central.go
  • roxctl/central/migratetooperator/cluster_source.go
  • roxctl/central/migratetooperator/command.go
  • roxctl/central/migratetooperator/detect.go
  • roxctl/central/migratetooperator/detect_test.go
  • roxctl/central/migratetooperator/dir_source.go
  • roxctl/central/migratetooperator/dir_source_test.go
  • roxctl/central/migratetooperator/generate.go
  • roxctl/central/migratetooperator/generate_test.go
  • roxctl/central/migratetooperator/source.go
  • tests/roxctl/bats-tests/local/central-migrate-to-operator.bats
  • tools/roxvet/analyzers/validateimports/analyzer.go

Comment thread roxctl/central/migratetooperator/cluster_source.go
Comment thread roxctl/central/migratetooperator/command.go Outdated
Comment thread roxctl/central/migratetooperator/detect.go Outdated
Comment thread tests/roxctl/bats-tests/local/central-migrate-to-operator.bats
Comment thread tests/roxctl/bats-tests/local/central-migrate-to-operator.bats
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 21, 2026

🚀 Build Images Ready

Images are ready for commit 9843b90. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-702-g9843b90c14

porridge and others added 3 commits April 21, 2026 11:26
- Fix source interface comment: not-found is returned as error, not nil
- Add tests for newDirSource error cases (nonexistent path, non-directory)
- Add 30s context.WithTimeout to cluster source Kubernetes API call
- Route default output through env.InputOutput().Out() instead of os.Stdout
- Handle file close errors explicitly instead of deferring
- Propagate nodeSelector for both PVC and hostPath storage types
- Add comment explaining why size/storageClassName are omitted for BYO PVCs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use 'yq e' (eval subcommand) instead of bare 'yq' for compatibility
  with older yq v4 versions that require an explicit subcommand.
- Wrap errors from interface methods and external packages to satisfy
  the wrapcheck linter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant