feat(statsd): support for user-defined tags #21495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

ilyam8 wants to merge 1 commit into netdata:master from ilyam8:feat-statsd-tags

Member

ilyam8 commented Dec 20, 2025 •

edited by cubic-dev-ai bot

Loading

Summary

Extends the StatsD collector to support arbitrary user-defined tags on metrics, in addition to the existing reserved tags (units, name, family). User tags are automatically applied as chart labels.

Usage Examples

# Basic gauge with tags
echo "cpu.usage:45.2|g|#host:server1,env:prod" | nc -u localhost 8125

# Counter with reserved and user tags
echo "requests:1|c|#units:requests/s,env:prod,service:api" | nc -u localhost 8125

# Histogram with multiple tags
echo "response.time:234|h|#endpoint:/api/users,method:GET,status:200" | nc -u localhost 8125

Test Plan

Additional Information

For users: How does this change affect me?

Summary by cubic

Adds support for user-defined StatsD tags. Metrics are now keyed by name+tags, and tags are added as chart labels.

New Features
- Parse reserved tags (units, name, family) and arbitrary tags from "#key:value".
- Generate stable metric keys from name+tags with a small cache for faster lookups.
- Keep the original metric name and append a sanitized tag suffix to chart IDs; apply tags as chart labels.

^{Written for commit 4aa1fa9. Summary will update automatically on new commits.}


          feat(statsd): support for user-defined tags

4aa1fa9

github-actions bot added area/collectors collectors/statsd labels

ilyam8 requested a review from Copilot

December 20, 2025 11:06

Copilot started reviewing on behalf of ilyam8

December 20, 2025 11:06

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR extends the StatsD collector to support arbitrary user-defined tags on metrics, enabling richer metric context and better organization. Tags are incorporated into metric identification and exposed as chart labels for enhanced filtering and grouping capabilities.

Key Changes

Tag parsing infrastructure that separates reserved tags (units, name, family) from user-defined tags
Metric key generation system that creates unique identifiers by combining metric names with sorted, sanitized tag values
Tag-based chart labeling that automatically applies user tags to both private and app charts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/collectors/statsd.plugin/statsd.c

Comment on lines +589 to +602

    
                          if(isalnum(*src) || *src == '-') {

                              *ptr++ = *src;

                          } else {

                              *ptr++ = '_';

                          }

                          src++;

                      }

                      *ptr++ = '_';

                      // Sanitize tag value

                      src = pt->values[i];

                      while(*src) {

                          if(isalnum(*src) || *src == '-') {

Copilot AI Dec 20, 2025

The use of isalnum() without considering locale could cause issues. In some locales, isalnum() might accept characters beyond ASCII alphanumerics. For consistent and predictable behavior with metric names and tags, consider using an explicit ASCII check like: ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9')).

Copilot uses AI. Check for mistakes.

src/collectors/statsd.plugin/statsd.c

Comment on lines +533 to +535

    
                                          "STATSD: duplicate tag key '%s' found in metric '%s', both values will be kept in metric name but label will use last value",

                                          tagkey,

                                          name);

Copilot AI Dec 20, 2025

The error message references a variable 'name' that is not available in the current scope. The function 'statsd_parse_tags' only receives the 'tags' parameter, not the metric name. This will result in a compilation error or undefined behavior.

Suggested change

      
                                        "STATSD: duplicate tag key '%s' found in metric '%s', both values will be kept in metric name but label will use last value",
          
                                        tagkey,
          
                                        name);
          
                                        "STATSD: duplicate tag key '%s' found in tags '%s', both values will be kept in metric name but label will use last value",
          
                                        tagkey,
          
                                        tags);

Copilot uses AI. Check for mistakes.

src/collectors/statsd.plugin/statsd.c

Comment on lines +654 to +684

    
                  uint32_t hash = statsd_compute_metric_hash(name, pt);

                  // Use hash as string key for dictionary

                  char hash_key[32];

                  snprintfz(hash_key, sizeof(hash_key), "%u", hash);

              #ifdef STATSD_MULTITHREADED

                  // Try to get from cache first (read-only, no lock needed for existing entries)

                  const char *cached = dictionary_get(statsd.metric_key_cache, hash_key);

                  if(cached)

                      return cached;

                  // Not in cache, generate and insert

                  char *generated_key = statsd_generate_metric_key_internal(name, pt);

                  const char *result = dictionary_set(statsd.metric_key_cache, hash_key, generated_key, strlen(generated_key) + 1);

                  freez(generated_key);

                  return result;

              #else

                  // Single-threaded: simpler logic

                  const char *cached = dictionary_get(statsd.metric_key_cache, hash_key);

                  if(cached)

                      return cached;

                  char *generated_key = statsd_generate_metric_key_internal(name, pt);

                  dictionary_set(statsd.metric_key_cache, hash_key, generated_key, strlen(generated_key) + 1);

                  freez(generated_key);

                  // Get the cached value (dictionary owns it now)

                  return dictionary_get(statsd.metric_key_cache, hash_key);

              #endif

              }

Copilot AI Dec 20, 2025

Hash collisions could cause incorrect metric key retrieval. When two different metric name+tag combinations produce the same hash (converted to string as the dictionary key), the second one will either reuse the first one's generated key (if already cached) or overwrite it. This means metrics with different names/tags could be incorrectly combined. The cache should validate that the retrieved key actually matches the expected name+tags combination, or use a collision-resistant key (e.g., include a portion of the actual metric name in the hash_key).

Copilot uses AI. Check for mistakes.

src/collectors/statsd.plugin/statsd.c

Comment on lines +550 to +551

    
              // Compute hash from name + pre-computed tag value hashes

              // Note: Assumes tag order is consistent across collections for same metric

Copilot AI Dec 20, 2025

The comment states "Assumes tag order is consistent across collections for same metric" but the code actually sorts tags before computing the hash (line 652). This ensures tag order consistency regardless of the order in which tags are sent. The comment should be updated to reflect that tag sorting ensures consistent ordering.

Copilot uses AI. Check for mistakes.

src/collectors/statsd.plugin/statsd.c

Comment on lines +579 to +580

    
                  strcpy(ptr, name);

                  ptr += strlen(name);

Copilot AI Dec 20, 2025

Using strcpy and then strlen to advance the pointer is inefficient. Consider using a single loop or stpcpy (which returns a pointer to the null terminator) to avoid traversing the string twice.

Copilot uses AI. Check for mistakes.

src/collectors/statsd.plugin/statsd.c

Comment on lines +529 to +544

    
                              // Check for duplicate keys and warn

                              for (size_t i = 0; i < pt->count; i++) {

                                  if (strcmp(pt->keys[i], tagkey) == 0) {

                                      netdata_log_error(

                                          "STATSD: duplicate tag key '%s' found in metric '%s', both values will be kept in metric name but label will use last value",

                                          tagkey,

                                          name);

                                      // Don't break - keep both in the arrays

                                  }

                              }

                              pt->keys[pt->count] = strdupz(tagkey);

                              pt->values[pt->count] = strdupz(tagvalue);

                              pt->key_hashes[pt->count] = simple_hash(tagkey);

                              pt->value_hashes[pt->count] = simple_hash(tagvalue);

                              pt->count++;

Copilot AI Dec 20, 2025

The duplicate tag detection only logs a warning but still adds the duplicate tag to the arrays. This contradicts the warning message which states "both values will be kept in metric name but label will use last value". However, duplicate tags in the metric name can cause confusion and incorrect metric identification. Consider either rejecting duplicates entirely or updating only the value for existing keys.

Suggested change

      
                            // Check for duplicate keys and warn
          
                            for (size_t i = 0; i < pt->count; i++) {
          
                                if (strcmp(pt->keys[i], tagkey) == 0) {
          
                                    netdata_log_error(
          
                                        "STATSD: duplicate tag key '%s' found in metric '%s', both values will be kept in metric name but label will use last value",
          
                                        tagkey,
          
                                        name);
          
                                    // Don't break - keep both in the arrays
          
                                }
          
                            }
          
                            pt->keys[pt->count] = strdupz(tagkey);
          
                            pt->values[pt->count] = strdupz(tagvalue);
          
                            pt->key_hashes[pt->count] = simple_hash(tagkey);
          
                            pt->value_hashes[pt->count] = simple_hash(tagvalue);
          
                            pt->count++;
          
                            // Check for duplicate keys and, if found, overwrite the existing value
          
                            ssize_t duplicate_index = -1;
          
                            for (size_t i = 0; i < pt->count; i++) {
          
                                if (strcmp(pt->keys[i], tagkey) == 0) {
          
                                    netdata_log_error(
          
                                        "STATSD: duplicate tag key '%s' found in metric '%s', previous value will be replaced with last value",
          
                                        tagkey,
          
                                        name);
          
                                    duplicate_index = (ssize_t)i;
          
                                    break;
          
                                }
          
                            }
          
                            if (duplicate_index >= 0) {
          
                                // Overwrite the existing value and its hash; do not add a new tag entry
          
                                freez(pt->values[duplicate_index]);
          
                                pt->values[duplicate_index] = strdupz(tagvalue);
          
                                pt->value_hashes[duplicate_index] = simple_hash(tagvalue);
          
                            }
          
                            else {
          
                                pt->keys[pt->count] = strdupz(tagkey);
          
                                pt->values[pt->count] = strdupz(tagvalue);
          
                                pt->key_hashes[pt->count] = simple_hash(tagkey);
          
                                pt->value_hashes[pt->count] = simple_hash(tagvalue);
          
                                pt->count++;
          
                            }

Copilot uses AI. Check for mistakes.

src/collectors/statsd.plugin/statsd.c

    
                  // Calculate required buffer size

                  size_t total_len = strlen(name) + 1;

                  for(size_t i = 0; i < pt->count; i++) {

                      total_len += 1 + strlen(pt->keys[i]) + 1 + strlen(pt->values[i]); // _key_value

Copilot AI Dec 20, 2025

The buffer size calculation doesn't account for characters that will be sanitized to underscores. When a tag key or value contains non-alphanumeric characters (except '-'), they're replaced with '_', but the buffer size is calculated based on the original string length. This works correctly since sanitization doesn't increase the string length, but the comment format suggests key:value pairs when the actual format is key_value (without colon). Consider updating the comment to match the actual format.

Copilot uses AI. Check for mistakes.

src/collectors/statsd.plugin/statsd.c

Comment on lines +2550 to +2555

    
                      // Add labels from first linked metric (they should all have same tags since they're on same chart)

                      STATSD_APP_CHART_DIM *dim;

                      for (dim = chart->dimensions; dim; dim = dim->next) {

                          if (dim->source_metric && dim->source_metric->tags) {

                              statsd_add_metric_tags_to_chart(chart->st, dim->source_metric);

                              break; // Only need to add once

Copilot AI Dec 20, 2025

The comment states "they should all have same tags since they're on same chart" but this assumption may not hold true. If an app chart aggregates multiple metrics with different tag values (e.g., metrics from different hosts or environments), only the first metric's tags will be applied to the chart. This could lead to misleading or incomplete labels. Consider either validating that all source metrics have consistent tags, or merging tags from all dimensions, or documenting this limitation clearly.

Suggested change

      
                    // Add labels from first linked metric (they should all have same tags since they're on same chart)
          
                    STATSD_APP_CHART_DIM *dim;
          
                    for (dim = chart->dimensions; dim; dim = dim->next) {
          
                        if (dim->source_metric && dim->source_metric->tags) {
          
                            statsd_add_metric_tags_to_chart(chart->st, dim->source_metric);
          
                            break; // Only need to add once
          
                    // Add labels from all linked metrics; if multiple metrics define the same label,
          
                    // the underlying label API will resolve any conflicts (typically last writer wins).
          
                    STATSD_APP_CHART_DIM *dim;
          
                    for (dim = chart->dimensions; dim; dim = dim->next) {
          
                        if (dim->source_metric && dim->source_metric->tags) {
          
                            statsd_add_metric_tags_to_chart(chart->st, dim->source_metric);

Copilot uses AI. Check for mistakes.

src/collectors/statsd.plugin/statsd.c Show resolved Hide resolved

src/collectors/statsd.plugin/statsd.c

    
              }

              // Parse tags and separate reserved vs user tags (assumes consistent tag order)

Copilot AI Dec 20, 2025

The comment states "assumes consistent tag order" but this is misleading. While the parsing preserves the order of tags as received, the tags are later sorted before hash computation and key generation (in sort_parsed_tags at line 652). The parsing itself doesn't assume or require consistent ordering - that's handled later in the pipeline.

Suggested change

      
            // Parse tags and separate reserved vs user tags (assumes consistent tag order)
          
            // Parse tags and separate reserved vs user tags; preserves input order (sorting/normalization is done later)

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/collectors collectors/statsd