Skip to content

Conversation

@stelfrag
Copy link
Collaborator

@stelfrag stelfrag commented Jan 12, 2026

Summary
  • Implement asynchronous logging

Summary by cubic

Add asynchronous logging with a dedicated background thread to offload I/O from hot paths and reduce contention. Adds safe flush and shutdown to prevent log loss during reopen and process exit.

  • New Features

    • Introduced nd_log_queue: command-driven ring buffer (4096 entries by default) processed by a "LOGGER" thread, with drop-on-full and basic stats.
    • Routes non-critical logs (below ALERT) for FILE/STDOUT/STDERR/SYSLOG through the async path; ALERT and above remain synchronous. Skips async when the FILE* is a fallback to avoid message loss.
    • Reopen runs inside the logger thread via a REOPEN command to avoid FILE* races.
    • Captures syslog init state at enqueue time to avoid races; falls back to stderr when syslog isn’t ready.
    • Uses a 512B inline buffer for small messages and allocates up to 16KB for large ones.
    • CMake updated to build nd_log-queue.c/.h.
  • Migration

    • Call nd_log_shutdown() during application shutdown to flush pending logs and stop the thread.
    • No config changes needed; nd_log_reopen_log_files() now reopens via the logger thread and waits until complete.

Written for commit b1b9ab6. Summary will update on new commits.

@github-actions github-actions bot added the area/build Build system (autotools and cmake). label Jan 12, 2026
@stelfrag
Copy link
Collaborator Author

@cubic-dev-ai review this PR

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Jan 20, 2026

@cubic-dev-ai review this PR

@stelfrag I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 9 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/libnetdata/log/nd_log-queue.c">

<violation number="1" location="src/libnetdata/log/nd_log-queue.c:362">
P1: `nd_log_queue_shutdown()` can hang indefinitely when the command queue is full because no shutdown signal is sent, but the thread is still joined. Ensure a shutdown request is issued even when the queue is full to avoid deadlock.</violation>
</file>

<file name="src/libnetdata/log/nd_log.c">

<violation number="1" location="src/libnetdata/log/nd_log.c:312">
P2: Async logging can drop messages when select_output remaps a non-file method (e.g., journald not initialized) to stderr. The async path ignores the fallback FILE* and later writes via nd_log.sources[source].fp, which is NULL for NDLM_JOURNAL/ETW/WEL sources, so the message is lost. Skip async when the FILE* doesn’t match the source’s fp (or persist the fallback fp in the queue entry).</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@stelfrag
Copy link
Collaborator Author

@cubic-dev-ai review this PR

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Jan 20, 2026

@cubic-dev-ai review this PR

@stelfrag I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 9 files

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements asynchronous logging with a dedicated background thread to improve performance by offloading I/O operations from hot paths.

Changes:

  • Introduces a ring buffer-based command queue (4096 entries) processed by a "LOGGER" thread
  • Routes non-critical logs (below ALERT priority) through async path while keeping ALERT+ synchronous for crash safety
  • Implements safe log rotation via REOPEN command handled entirely in the logger thread to avoid FILE* races

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/libnetdata/log/nd_log-queue.h Defines queue structures, opcodes, and public API for async logging
src/libnetdata/log/nd_log-queue.c Implements ring buffer, logger thread event loop, and queue management
src/libnetdata/log/nd_log.c Integrates async logging path with sync fallback and routing logic
src/libnetdata/log/nd_log.h Adds nd_log_shutdown() to public API
src/libnetdata/log/nd_log-init.c Initializes async queue on startup and handles log file reopening
src/libnetdata/log/nd_log-internals.h Adds nd_logger_journal_format() for journal formatting
src/libnetdata/log/nd_log-to-systemd-journal.c Refactors journal formatting into reusable function
src/daemon/daemon-shutdown.c Calls nd_log_shutdown() to flush logs before exit
CMakeLists.txt Adds nd_log-queue.c and nd_log-queue.h to build

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@stelfrag
Copy link
Collaborator Author

@cubic-dev-ai review this PR

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Jan 21, 2026

@cubic-dev-ai review this PR

@stelfrag I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/libnetdata/log/nd_log-queue.h">

<violation number="1" location="src/libnetdata/log/nd_log-queue.h:76">
P3: The comment claims a boolean return value, but nd_log_queue_init returns void. Update the comment (or change the signature) to avoid misleading callers about failure handling.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

- Introduce `nd_log_queue` for handling log entries asynchronously via background threads.
- Implement queue initialization, enqueue, and shutdown functions.
- Add thread-safe mechanisms for processing and flushing log entries.
- Integrate async logging with the existing logging infrastructure, falling back to synchronous if the queue is unavailable.
- Update `CMakeLists.txt` to include new files: `nd_log-queue.c` and `nd_log-queue.h`.
- Introduce a `shutdown_acknowledged` flag with supporting mutex and condition variable for safe thread shutdown.
- Ensure logger thread signals shutdown acknowledgment before exiting.
- Update cleanup process to properly destroy new synchronization primitives.
… async logging

- Store `journal_direct`, `journal_libsystemd`, and `syslog` initialization states during enqueue to ensure consistent behavior during logging.
- Add fallback mechanisms for logging to `stderr` when target destinations are not initialized.
- Introduce a flush operation before log destination reinitialization to avoid losing pending log messages.
- Refactor logging queue thread signature and improve error handling for asynchronous journal and syslog writes.
…lized send logic

- Introduce `nd_logger_journal_format` for consistent journal native protocol (KEY=VALUE\n) formatting.
- Update to use `journal_direct_send` for improved handling and fallback mechanisms.
- Simplify `nd_logger_journal_direct` by separating formatting and sending responsibilities.
- Replace the legacy `nd_log_queue` structure with the `nd_log_event_loop` and `nd_log_cmd_pool` architecture for cleaner and more efficient log entry processing.
- Transition from a queue-based logging system to a command-driven event loop using uv_async callbacks.
- Introduce reusable command pool with support for opcodes (e.g., `ND_LOG_OP_ENTRY`, `ND_LOG_OP_FLUSH`, `ND_LOG_OP_SHUTDOWN`) to enable more flexible event-driven log operations.
- Simplify logging thread to focus on processing commands in an event loop, improving modularity and reducing synchronization complexity.
- Update public API to reflect the new architecture, including changes to queue management, shutdown logic, and entry enqueueing.
- Remove unused initialization fields (`journal_direct`, `journal_libsystemd`) from log queue entries to reduce enqueue-time complexity.
- Replace raw buffer handling with `CLEAN_BUFFER` to streamline memory management and auto-free resources on return.
- Adapt async file logging to look up `FILE*` at write time, enabling proper log rotation handling.
- Deprecate journal async logging support due to sync path optimization.
- Introduce timed shutdown wait with fallback to prevent indefinite hangs during logger thread termination.
- Prevent async logging for fallback `FILE*` mismatches to avoid message loss.
- Add atomic flag signaling for immediate shutdown when the queue is full.
…R` remapping to `NDLM_FILE` during `select_output`.

- Add atomic queue depth decrement during entry processing.
- Prevent memory leaks by freeing allocated buffers when enqueueing is skipped.
…ocated log messages.

- Add source index bounds check to prevent out-of-bounds access during log entry processing.
- Optimize command pool cleanup by iterating between active indices instead of full range.
- Refactor initialization function to ensure thread-safe single execution using `FUNCTION_RUN_ONCE`.
- Enhance shutdown logic to safely handle completion destruction after thread termination.
- Use CAS loop for accurate high-water mark updates in concurrent scenarios.
- Adjust async signaling to prevent race conditions during shutdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/build Build system (autotools and cmake). area/daemon

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant