feat(roxctl): central migrate-to-operator subcommand#20119
feat(roxctl): central migrate-to-operator subcommand#20119
Conversation
|
Skipping CI for Draft Pull Request. |
|
/test all |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The
sourceinterface comment saysCentralDBDeploymentmay return nil when not found, but bothdirSourceandclusterSourceimplementations return a non-nil error instead; consider updating the comment or behavior so they are consistent. - The
splitYAMLDocumentshelper relies on a simplestrings.Spliton"\n---", which may break in edge cases (e.g., documents starting with---, Windows line endings, or---in content); consider usingyaml.Decoderto iterate documents more robustly.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `source` interface comment says `CentralDBDeployment` may return nil when not found, but both `dirSource` and `clusterSource` implementations return a non-nil error instead; consider updating the comment or behavior so they are consistent.
- The `splitYAMLDocuments` helper relies on a simple `strings.Split` on `"\n---"`, which may break in edge cases (e.g., documents starting with `---`, Windows line endings, or `---` in content); consider using `yaml.Decoder` to iterate documents more robustly.
## Individual Comments
### Comment 1
<location path="roxctl/central/migratetooperator/source.go" line_range="10" />
<code_context>
+// source provides access to Kubernetes resources from either a directory
+// of YAML manifests or a live cluster.
+type source interface {
+ // CentralDBDeployment returns the central-db Deployment, or nil if not found.
+ CentralDBDeployment() (*appsv1.Deployment, error)
+}
</code_context>
<issue_to_address>
**issue (bug_risk):** Align the `CentralDBDeployment` contract with the implementations that never return nil without an error
The interface comment implies `CentralDBDeployment` may return `nil` without an error, but current implementations always return either a deployment or a non-nil error, and callers (e.g. `detectStorage`) assume a non-nil deployment on success. Please either update the comment to state that "not found" is returned as an error, or update implementations and callers to correctly support a `nil`-without-error result.
</issue_to_address>
### Comment 2
<location path="roxctl/central/migratetooperator/dir_source_test.go" line_range="12-21" />
<code_context>
+ "github.com/stretchr/testify/require"
+)
+
+func TestDirSource_PVC(t *testing.T) {
+ dir := t.TempDir()
+ centralDir := filepath.Join(dir, "central")
+ require.NoError(t, os.MkdirAll(centralDir, 0755))
+
+ deploymentYAML := `apiVersion: apps/v1
+kind: Deployment
+metadata:
+ name: central-db
+spec:
+ template:
+ spec:
+ volumes:
+ - name: disk
+ persistentVolumeClaim:
+ claimName: my-pvc
+`
+ require.NoError(t, os.WriteFile(filepath.Join(centralDir, "01-central-12-central-db.yaml"), []byte(deploymentYAML), 0644))
+
+ src, err := newDirSource(dir)
+ require.NoError(t, err)
+
+ dep, err := src.CentralDBDeployment()
+ require.NoError(t, err)
+ assert.Equal(t, "central-db", dep.Name)
+
+ require.Len(t, dep.Spec.Template.Spec.Volumes, 1)
+ require.NotNil(t, dep.Spec.Template.Spec.Volumes[0].PersistentVolumeClaim)
+ assert.Equal(t, "my-pvc", dep.Spec.Template.Spec.Volumes[0].PersistentVolumeClaim.ClaimName)
+}
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for `newDirSource` error cases (nonexistent path and non-directory path)
The existing tests only cover the happy path for `dirSource`. Please also add tests that assert `newDirSource` fails as expected when the path does not exist/unreadable and when it points to a non-directory, including checking the returned error messages (e.g., "accessing directory" vs. "is not a directory").
Suggested implementation:
```golang
import (
"os"
"path/filepath"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestDirSource_PVC(t *testing.T) {
dir := t.TempDir()
centralDir := filepath.Join(dir, "central")
require.NoError(t, os.MkdirAll(centralDir, 0o755))
deploymentYAML := `apiVersion: apps/v1
kind: Deployment
metadata:
name: central-db
spec:
template:
spec:
volumes:
- name: disk
persistentVolumeClaim:
claimName: my-pvc
`
require.NoError(t, os.WriteFile(filepath.Join(centralDir, "01-central-12-central-db.yaml"), []byte(deploymentYAML), 0o644))
src, err := newDirSource(dir)
require.NoError(t, err)
dep, err := src.CentralDBDeployment()
require.NoError(t, err)
assert.Equal(t, "central-db", dep.Name)
require.Len(t, dep.Spec.Template.Spec.Volumes, 1)
require.NotNil(t, dep.Spec.Template.Spec.Volumes[0].PersistentVolumeClaim)
assert.Equal(t, "my-pvc", dep.Spec.Template.Spec.Volumes[0].PersistentVolumeClaim.ClaimName)
}
func TestNewDirSource_NonexistentPath(t *testing.T) {
// Use a subpath of TempDir that we do not create, ensuring it does not exist.
base := t.TempDir()
nonexistentPath := filepath.Join(base, "does-not-exist")
src, err := newDirSource(nonexistentPath)
require.Error(t, err)
assert.Nil(t, src)
// newDirSource should surface that it failed while accessing the directory.
require.ErrorContains(t, err, "accessing directory")
}
func TestNewDirSource_NonDirectoryPath(t *testing.T) {
dir := t.TempDir()
filePath := filepath.Join(dir, "not-a-directory")
require.NoError(t, os.WriteFile(filePath, []byte("not a directory"), 0o644))
src, err := newDirSource(filePath)
require.Error(t, err)
assert.Nil(t, src)
// newDirSource should clearly indicate that the path is not a directory.
require.ErrorContains(t, err, "is not a directory")
}
```
These changes assume that:
1. `newDirSource` returns a `nil` source on error and wraps filesystem errors with messages containing "accessing directory" when the path doesn't exist/unreadable, and "is not a directory" when the path is a non-directory.
2. No other tests in this file depend on the duplicated import block that was consolidated.
If the actual error messages differ slightly, update the `require.ErrorContains` substrings to match the real messages used in `newDirSource`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
📝 WalkthroughWalkthroughThis PR introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/CLI
participant Cmd as Command Handler
participant Src as Source<br/>(Cluster or Dir)
participant Det as Storage Detector
participant Gen as CR Generator
participant Out as Output Writer
User->>Cmd: execute with flags<br/>(--from-dir or --namespace)
Cmd->>Cmd: createSource()
Cmd->>Src: (instantiate ClusterSource<br/>or DirSource)
Cmd->>Det: detect(source)
Det->>Src: CentralDBDeployment()
Src-->>Det: *Deployment
Det->>Det: inspect volumes &<br/>nodeSelector
Det-->>Cmd: *detectedConfig
Cmd->>Gen: generateCR(config)
Gen->>Gen: build Central CR<br/>with storage spec
Gen-->>Cmd: *platform.Central
Cmd->>Cmd: marshalCR(cr)
Cmd->>Cmd: marshal to YAML
Cmd->>Out: write output
Out-->>User: Central CR YAML
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
tests/roxctl/bats-tests/local/central-migrate-to-operator.bats (1)
14-21:mktemp -ucreates a TOCTOU-style race and leaves cleanup brittle.
mktemp -d -u/mktemp -uonly generate names without creating the file/dir, so two parallel tests could theoretically collide, and if a test fails before the path is created,rm -rf ""is fine but obscures intent. Preferout_dir="$(mktemp -d)"and then pass--output-dirto a subdirectory, or just letroxctlcreate it and use realmktemp -dfor the parent. Low-impact but standard bats hygiene.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/roxctl/bats-tests/local/central-migrate-to-operator.bats` around lines 14 - 21, The setup/teardown use mktemp -u which is racy and brittle; change setup() to create real temp paths with mktemp -d for out_dir and mktemp for cr_out (i.e., create the directory/file rather than only generating names), or let roxctl create the output and use mktemp -d to make a real parent dir and pass that as --output-dir; update teardown() to remove those actual created paths; modify references to out_dir and cr_out in the test to match the created paths.roxctl/central/migratetooperator/detect.go (1)
34-49: Consider matching thecentral-dbvolume by more than just name"disk".The lookup relies on a hardcoded volume name
"disk". Ifroxctl central generateever renames this volume (or a user-edited manifest uses a different name), detection silently fails with a misleading error. Since you already have the Deployment, consider also falling back to detecting the volume via thecentral-dbcontainer'svolumeMounts(find the mount at the DB data path and resolve itsName), which is more robust against naming drift ingenerate.Not blocking for this PR, but worth a follow-up.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@roxctl/central/migratetooperator/detect.go` around lines 34 - 49, The current detectStorage function only matches a volume named "disk" and should fall back to resolving the volume by the central DB container's volumeMounts to avoid brittle name assumptions: after calling src.CentralDBDeployment() and failing to find a Volume with Name == "disk" set diskVolume to nil, iterate dep.Spec.Template.Spec.Containers to locate the central DB container (by container.Name or other identifying property), then scan that container's VolumeMounts for the mount whose MountPath equals the DB data path (the path where central stores DB files) and use that VolumeMount's Name to locate and set diskVolume from dep.Spec.Template.Spec.Volumes; return the same error only if neither the named volume nor the mount-based volume can be resolved. Ensure you update detectStorage to reference dep.Spec.Template.Spec.Containers, the container's VolumeMounts, and the volume name resolution logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@roxctl/central/migratetooperator/cluster_source.go`:
- Around line 40-44: The CentralDBDeployment method currently calls
s.client.AppsV1().Deployments(...).Get with context.Background(), which can
hang; change it to use a bounded context (e.g., create a ctx, cancel :=
context.WithTimeout(context.Background(), <reasonableDuration>) and defer
cancel()) and pass that ctx to the Get call in CentralDBDeployment (or use an
existing timeout field on the clusterSource if one exists), ensuring the cancel
is deferred to avoid leaks and that the Get call uses the new ctx instead of
context.Background().
In `@roxctl/central/migratetooperator/command.go`:
- Around line 59-70: The code uses os.Stdout directly which bypasses the
injected environment.Environment and defers f.Close() losing close/flush errors;
change the default writer to use the injected environment's writer (use
env.Writer() or the equivalent accessor from environment.Environment) instead of
os.Stdout, and when cmd.output != "" open the file and set w to that file but do
not defer Close — after writing out (the Write call that currently uses
w.Write(out)) explicitly close the file and check/return any error from Close so
close/flush failures are surfaced; reference variables/functions: w, cmd.output,
and environment.Environment.
In `@roxctl/central/migratetooperator/detect.go`:
- Around line 58-67: The nodeSelector propagation is currently applied only
inside the hostPath branch in detect.go (when hp := diskVolume.HostPath), which
drops nodeSelector for PVC-backed volumes; update the logic in the function that
builds the storageConfig (the block creating cfg with Type storageHostPath and
HostPath) to extract and set cfg.NodeSelector from
dep.Spec.Template.Spec.NodeSelector outside/after the hostPath-specific branch
so it runs for both hostPath and PVC cases (keep using the existing
storageConfig struct and its NodeSelector field and let the CR generator decide
emission).
In `@tests/roxctl/bats-tests/local/central-migrate-to-operator.bats`:
- Around line 60-72: Tests currently assert that user-provided
--db-storage-class and --db-size are dropped during migration; instead preserve
those values in the generated CR and update the tests to assert them. Modify the
migration logic in migrate-to-operator (the code exercised by
generate_and_migrate) so that flags map into
spec.central.db.persistence.persistentVolumeClaim.storageClassName and .size on
the resulting CR (cr_out), then change the two tests to expect "fast-ssd" for
storageClassName and "200" (or the appropriate unitized value) for size;
alternatively, if dropping is intentional, add a clear code comment in
migrate-to-operator and emit a stderr warning during migration and update the
tests to assert that warning instead.
- Around line 8-12: The setup_file() uses skip in a way that breaks Bats
1.7.0–1.8.1; add a version guard by calling bats_require_minimum_version 1.8.2
at the top of this test file (or ensure CI installs Bats <1.7.0 or ≥1.8.2), and
update any other test files that use skip in setup_file() (e.g., other roxctl
tests) to include the same bats_require_minimum_version 1.8.2 guard to avoid the
regression; target the change around the setup_file() and its skip usage so
future runs won’t fail under 1.7.0–1.8.1.
---
Nitpick comments:
In `@roxctl/central/migratetooperator/detect.go`:
- Around line 34-49: The current detectStorage function only matches a volume
named "disk" and should fall back to resolving the volume by the central DB
container's volumeMounts to avoid brittle name assumptions: after calling
src.CentralDBDeployment() and failing to find a Volume with Name == "disk" set
diskVolume to nil, iterate dep.Spec.Template.Spec.Containers to locate the
central DB container (by container.Name or other identifying property), then
scan that container's VolumeMounts for the mount whose MountPath equals the DB
data path (the path where central stores DB files) and use that VolumeMount's
Name to locate and set diskVolume from dep.Spec.Template.Spec.Volumes; return
the same error only if neither the named volume nor the mount-based volume can
be resolved. Ensure you update detectStorage to reference
dep.Spec.Template.Spec.Containers, the container's VolumeMounts, and the volume
name resolution logic.
In `@tests/roxctl/bats-tests/local/central-migrate-to-operator.bats`:
- Around line 14-21: The setup/teardown use mktemp -u which is racy and brittle;
change setup() to create real temp paths with mktemp -d for out_dir and mktemp
for cr_out (i.e., create the directory/file rather than only generating names),
or let roxctl create the output and use mktemp -d to make a real parent dir and
pass that as --output-dir; update teardown() to remove those actual created
paths; modify references to out_dir and cr_out in the test to match the created
paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: a1894a54-59cb-4c3a-99f5-587e4bc6bbed
📒 Files selected for processing (12)
roxctl/central/central.goroxctl/central/migratetooperator/cluster_source.goroxctl/central/migratetooperator/command.goroxctl/central/migratetooperator/detect.goroxctl/central/migratetooperator/detect_test.goroxctl/central/migratetooperator/dir_source.goroxctl/central/migratetooperator/dir_source_test.goroxctl/central/migratetooperator/generate.goroxctl/central/migratetooperator/generate_test.goroxctl/central/migratetooperator/source.gotests/roxctl/bats-tests/local/central-migrate-to-operator.batstools/roxvet/analyzers/validateimports/analyzer.go
🚀 Build Images ReadyImages are ready for commit 9843b90. To use with deploy scripts: export MAIN_IMAGE_TAG=4.11.x-702-g9843b90c14 |
- Fix source interface comment: not-found is returned as error, not nil - Add tests for newDirSource error cases (nonexistent path, non-directory) - Add 30s context.WithTimeout to cluster source Kubernetes API call - Route default output through env.InputOutput().Out() instead of os.Stdout - Handle file close errors explicitly instead of deferring - Propagate nodeSelector for both PVC and hostPath storage types - Add comment explaining why size/storageClassName are omitted for BYO PVCs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use 'yq e' (eval subcommand) instead of bare 'yq' for compatibility with older yq v4 versions that require an explicit subcommand. - Wrap errors from interface methods and external packages to satisfy the wrapcheck linter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
This is the scaffolding and very basic partial implementation for https://redhat.atlassian.net/browse/ROX-33122 and https://redhat.atlassian.net/browse/ROX-33127
Follow-up PR's will provide support for more tunables, and to support secured-cluster (https://redhat.atlassian.net/browse/ROX-33121) in a similar spririt.
My initial plan was to only provide in 4.11 timeframe a simple description for humans to read and manually prepare a central custom resource for migrating from legacy methods. I had claude draft such document in https://github.com/stackrox/stackrox/blob/porridge/migraion/MIGRATION/options-master-list.md However, going through such instructions would be quite tedious and maintaining this documentation without making without a way of making sure it actually works would also be difficult and labour-intensice. Providing this as code seems like a better approach.
Co-authored-by: Claude Opus 4.6 noreply@anthropic.com
User-facing documentation
Testing and quality
Automated testing
How I validated my change
change me!