[linux] migrate Linux metrics data streams to TSDB#17379
[linux] migrate Linux metrics data streams to TSDB#17379AndersonQ wants to merge 8 commits intoelastic:mainfrom
Conversation
57914bb to
15d89c6
Compare
There was a problem hiding this comment.
Pull request overview
Migrates several Linux integration metrics data streams to Elasticsearch TSDB / time_series data streams by enabling index_mode: "time_series" and annotating fields with metric_type/dimension so metrics can be stored and queried as time series efficiently.
Changes:
- Enable TSDB (
elasticsearch.index_mode: "time_series") for conntrack, entropy, iostat, ksm, memory, pageinfo, raid, and service data streams. - Mark common identifying fields (e.g., agent/cloud/container/host) as
dimension: trueand add stream-specific dimensions (e.g., device/service/raid name). - Annotate numeric metric fields with
metric_type(gauge/counter).
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/linux/data_stream/service/manifest.yml | Enables TSDB index mode for the service metrics data stream. |
| packages/linux/data_stream/service/fields/fields.yml | Adds dimension for service name and metric_type for service resource metrics. |
| packages/linux/data_stream/service/fields/ecs.yml | Marks host.name as a TSDB dimension for service metrics. |
| packages/linux/data_stream/service/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container, etc.) for service metrics. |
| packages/linux/data_stream/raid/manifest.yml | Enables TSDB index mode for the raid metrics data stream. |
| packages/linux/data_stream/raid/fields/fields.yml | Marks raid name as a dimension and annotates numeric fields with metric_type. |
| packages/linux/data_stream/raid/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for raid metrics. |
| packages/linux/data_stream/pageinfo/manifest.yml | Enables TSDB index mode for the pageinfo metrics data stream. |
| packages/linux/data_stream/pageinfo/fields/fields.yml | Annotates buddyinfo numeric fields with metric_type: gauge for TSDB. |
| packages/linux/data_stream/pageinfo/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for pageinfo metrics. |
| packages/linux/data_stream/memory/manifest.yml | Enables TSDB index mode for the memory metrics data stream. |
| packages/linux/data_stream/memory/fields/fields.yml | Adds metric_type annotations across paging/swap/hugepages metrics for TSDB. |
| packages/linux/data_stream/memory/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for memory metrics. |
| packages/linux/data_stream/ksm/manifest.yml | Enables TSDB index mode for the ksm metrics data stream. |
| packages/linux/data_stream/ksm/fields/fields.yml | Annotates KSM numeric fields with metric_type for TSDB. |
| packages/linux/data_stream/ksm/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for ksm metrics. |
| packages/linux/data_stream/iostat/manifest.yml | Enables TSDB index mode for the iostat metrics data stream. |
| packages/linux/data_stream/iostat/fields/fields.yml | Marks disk device name as a dimension and annotates iostat numeric fields with metric_type. |
| packages/linux/data_stream/iostat/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for iostat metrics. |
| packages/linux/data_stream/entropy/manifest.yml | Enables TSDB index mode for the entropy metrics data stream. |
| packages/linux/data_stream/entropy/fields/fields.yml | Annotates entropy numeric fields with metric_type: gauge for TSDB. |
| packages/linux/data_stream/entropy/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for entropy metrics. |
| packages/linux/data_stream/conntrack/manifest.yml | Enables TSDB index mode for the conntrack metrics data stream. |
| packages/linux/data_stream/conntrack/fields/fields.yml | Annotates conntrack numeric fields with metric_type for TSDB. |
| packages/linux/data_stream/conntrack/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for conntrack metrics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| description: bytes in | ||
| - name: in.packets | ||
| type: long | ||
| format: bytes |
There was a problem hiding this comment.
system.service.resources.network.in.packets is a packet count but is still declared with format: bytes, which will cause incorrect formatting/units in Kibana and exported field docs. Remove the bytes format (or switch to a numeric format appropriate for counts).
| format: bytes |
Vale Linting ResultsSummary: 1 warning, 4 suggestions found
|
| File | Line | Rule | Message |
|---|---|---|---|
| packages/linux/docs/README.md | 306 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'for example' instead of 'e.g'. |
💡 Suggestions (4)
| File | Line | Rule | Message |
|---|---|---|---|
| packages/linux/docs/README.md | 100 | Elastic.WordChoice | Consider using 'can, might' instead of 'may', unless the term is in the UI. |
| packages/linux/docs/README.md | 214 | Elastic.WordChoice | Consider using 'can, might' instead of 'may', unless the term is in the UI. |
| packages/linux/docs/README.md | 281 | Elastic.WordChoice | Consider using 'can, might' instead of 'may', unless the term is in the UI. |
| packages/linux/docs/README.md | 331 | Elastic.Wordiness | Consider using 'all' instead of 'all of '. |
The Vale linter checks documentation changes against the Elastic Docs style guide.
To use Vale locally or report issues, refer to Elastic style guide for Vale.
Enable time series data streams (TSDB) for 8 of 11 data streams in the Linux integration: conntrack, entropy, iostat, ksm, memory, pageinfo, raid, and service. For each data stream: - Add `elasticsearch.index_mode: "time_series"` to manifest.yml - Annotate numeric fields with appropriate metric_type (gauge/counter) - Mark dimension fields to uniquely identify each time series Common dimensions (all 8 data streams): - agent.id - agent.name - cloud.account.id - cloud.availability_zone - cloud.instance.id - cloud.provider - cloud.region - container.id - host.name Integration-specific dimensions: - iostat: linux.iostat.name (disk device) - raid: system.raid.name (RAID array) - service: system.service.name (systemd service) Excluded data streams: - socket: transient entities with no persistent time series - users: transient sessions with no numeric metrics - network_summary: fields use object wildcard mappings that cannot carry metric_type annotations, limiting TSDB benefits Assisted by Cursor
fa8f4eb to
f53ba33
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: agent.id | ||
| external: ecs | ||
| dimension: true | ||
| - name: agent.name | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
agent.name is marked as a TSDB dimension. In ECS this field is a user-supplied, potentially empty/mutable label, so using it as a time series dimension can cause unnecessary time series churn and higher cardinality. Other TSDB-enabled metric integrations typically use agent.id as the stable agent dimension and do not dimension agent.name (for example packages/system/data_stream/core/fields/ecs.yml). Consider keeping agent.name mapped from ECS but removing dimension: true.
There was a problem hiding this comment.
agent.name is often used in aggregation and filters, it's indeed not necessary for uniqueness. If we want to optimise for frequent queries, it's good to have it.
| - name: agent.id | ||
| external: ecs | ||
| dimension: true | ||
| - name: agent.name | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
agent.name is marked as a TSDB dimension. In ECS this field is a user-supplied, potentially empty/mutable label, so using it as a time series dimension can cause unnecessary time series churn and higher cardinality. Other TSDB-enabled metric integrations typically use agent.id as the stable agent dimension and do not dimension agent.name (for example packages/system/data_stream/core/fields/ecs.yml). Consider keeping agent.name mapped from ECS but removing dimension: true.
| - name: agent.id | ||
| external: ecs | ||
| dimension: true | ||
| - name: agent.name | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
agent.name is marked as a TSDB dimension. In ECS this field is a user-supplied, potentially empty/mutable label, so using it as a time series dimension can cause unnecessary time series churn and higher cardinality. Other TSDB-enabled metric integrations typically use agent.id as the stable agent dimension and do not dimension agent.name (for example packages/system/data_stream/core/fields/ecs.yml). Consider keeping agent.name mapped from ECS but removing dimension: true.
| - name: agent.id | ||
| external: ecs | ||
| dimension: true | ||
| - name: agent.name | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
agent.name is marked as a TSDB dimension. In ECS this field is a user-supplied, potentially empty/mutable label, so using it as a time series dimension can cause unnecessary time series churn and higher cardinality. Other TSDB-enabled metric integrations typically use agent.id as the stable agent dimension and do not dimension agent.name (for example packages/system/data_stream/core/fields/ecs.yml). Consider keeping agent.name mapped from ECS but removing dimension: true.
There was a problem hiding this comment.
same as the other ones
| - name: agent.id | ||
| external: ecs | ||
| dimension: true | ||
| - name: agent.name | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
agent.name is marked as a TSDB dimension. In ECS this field is a user-supplied, potentially empty/mutable label, so using it as a time series dimension can cause unnecessary time series churn and higher cardinality. Other TSDB-enabled metric integrations typically use agent.id as the stable agent dimension and do not dimension agent.name (for example packages/system/data_stream/core/fields/ecs.yml). Consider keeping agent.name mapped from ECS but removing dimension: true.
There was a problem hiding this comment.
same as the other ones
| metric_type: counter | ||
| description: packets out | ||
| - name: out.bytes | ||
| type: long |
There was a problem hiding this comment.
system.service.resources.network.out.bytes is missing format: bytes (while in.bytes has it). Adding the bytes format keeps field formatting consistent in UI and aligns with the field semantics.
| type: long | |
| type: long | |
| format: bytes |
| type: long | ||
| format: percent |
There was a problem hiding this comment.
linux.memory.hugepages.used.pct is declared as type: long with format: percent, while other percent fields in this data stream (for example linux.memory.swap.used.pct) use scaled_float with unit: percent. If the hugepages percentage is non-integer, the current mapping will truncate/round; consider switching this field to scaled_float and adding unit: percent for consistency.
| type: long | |
| format: percent | |
| type: scaled_float | |
| format: percent | |
| unit: percent |
| - name: agent.id | ||
| external: ecs | ||
| dimension: true | ||
| - name: agent.name | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
agent.name is marked as a TSDB dimension. In ECS this field is a user-supplied, potentially empty/mutable label, so using it as a time series dimension can cause unnecessary time series churn and higher cardinality. Other TSDB-enabled metric integrations typically use agent.id as the stable agent dimension and do not dimension agent.name (for example packages/system/data_stream/core/fields/ecs.yml). Consider keeping agent.name mapped from ECS but removing dimension: true.
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
💚 Build Succeeded
History
cc @AndersonQ |
Why did we pick these specific dimensions? The number one thing we want to do is minimize the storage size of these metrics, is this optimal for that? The Also, |
|
Why are the dimensions here different from what is in system which is used in the exact same way? |
|
I see system has host.name as a dimensions maybe it's necessary when you are on k8s or cloud where there is minimal other data to differentiate on. |
The absolute minimum is agent ID and any metric specific identifier. The others are identifiers that identify the host without relying on the agent.id. If we go for absolute minimal, they don't need to be dimensions. I'm just not 100% sure about removing container ID. I know it's possible to have 2 of the same container at the same time, it happens during a crashLoopBackoff. But I'm not sure if in this scenario we could have 2 agents running, one on each container. It should not be the case, there should not be possible to run 2 agents with the same home. Also, if it's a crashLoopBackoff, something really wrong is going on.
The new one is |
|
My feeling is that the dimensions in this package should be the same as the ones in system, since this is often installed as a companion to the system integration, unless there is a data source specific additional dimension that we need. This could mean we also need to update the system package dimensions if something is missing there. |
Proposed commit message
Migrated data streams
Excluded Data Streams
Common Dimensions (all 9 data streams)
Added to
agent.yml(orecs.ymlwhere the field was already mapped there):agent.idagent.namecloud.account.idcloud.availability_zonecloud.instance.idcloud.providercloud.regioncontainer.idhost.namePer-Data-Stream Changes
conntrack
metric_typelinux.conntrack.summary.entrieslinux.conntrack.summary.droplinux.conntrack.summary.early_droplinux.conntrack.summary.foundlinux.conntrack.summary.ignorelinux.conntrack.summary.insert_failedlinux.conntrack.summary.invalidlinux.conntrack.summary.search_restartentropy
metric_typesystem.entropy.available_bitssystem.entropy.pctiostat
linux.iostat.namemetric_typelinux.iostat.read.request.merges_per_seclinux.iostat.write.request.merges_per_seclinux.iostat.read.request.per_seclinux.iostat.write.request.per_seclinux.iostat.read.per_sec.byteslinux.iostat.read.awaitlinux.iostat.write.per_sec.byteslinux.iostat.write.awaitlinux.iostat.request.avg_sizelinux.iostat.queue.avg_sizelinux.iostat.awaitlinux.iostat.service_timelinux.iostat.busyksm
metric_typelinux.ksm.stats.pages_sharedlinux.ksm.stats.pages_sharinglinux.ksm.stats.pages_unsharedlinux.ksm.stats.pages_volatilelinux.ksm.stats.full_scanslinux.ksm.stats.stable_node_chainslinux.ksm.stats.stable_node_dupsmemory
metric_typelinux.memory.page_stats.pgscan_kswapd.pageslinux.memory.page_stats.pgscan_direct.pageslinux.memory.page_stats.pgfree.pageslinux.memory.page_stats.pgsteal_kswapd.pageslinux.memory.page_stats.pgsteal_direct.pageslinux.memory.page_stats.direct_efficiency.pctlinux.memory.page_stats.kswapd_efficiency.pctlinux.memory.swap.readahead.cachedlinux.memory.hugepages.totallinux.memory.hugepages.used.byteslinux.memory.hugepages.used.pctlinux.memory.hugepages.freelinux.memory.hugepages.reservedlinux.memory.hugepages.surpluslinux.memory.hugepages.default_sizelinux.memory.hugepages.swap.out.fallbacklinux.memory.hugepages.swap.out.pagespageinfo
metric_typelinux.pageinfo.buddy_info.DMA.{0..10}linux.pageinfo.buddy_info.DMA32.{0..10}linux.pageinfo.buddy_info.Normal.{0..10}linux.pageinfo.nodes.*(object) left as-is.raid
system.raid.namemetric_typesystem.raid.disks.activesystem.raid.disks.totalsystem.raid.disks.sparesystem.raid.disks.failedsystem.raid.blocks.totalsystem.raid.blocks.syncedsystem.raid.disks.states.*(object) left as-is.service
system.service.namemetric_typesystem.service.resources.cpu.usage.nssystem.service.resources.memory.usage.bytessystem.service.resources.tasks.countsystem.service.resources.network.in.bytessystem.service.resources.network.in.packetssystem.service.resources.network.out.packetssystem.service.resources.network.out.bytesFiles Changed per Data Stream
manifest.ymlelasticsearch.index_mode: "time_series"fields/agent.ymlagent.id+agent.nameas new fields withdimension: true; addeddimension: trueto 7 existing fieldsfields/ecs.ymldimension: trueonhost.name(service only, wherehost.nameis mapped in ecs.yml instead of agent.yml)fields/fields.ymlmetric_typeanddimensionannotations as detailed abovefields/base-fields.ymlconstant_keyworddoes not supporttime_series_dimension)Tests with TSDB-migration-test-kit
metrics-linux.conntrack-defaultmetrics-linux.entropy-defaultmetrics-linux.iostat-defaultmetrics-linux.ksm-defaultmetrics-linux.memory-defaultmetrics-linux.pageinfo-defaultmetrics-linux.raid-defaultmetrics-linux.service-defaultChecklist
[ ] I have reviewed tips for building integrations and this pull request is aligned with them.changelog.ymlfile.[ ] I have verified that Kibana version constraints are current according to guidelines.[ ] I have verified that any added dashboard complies with Kibana's Dashboard good practicesHow to test this PR locally
conntrack, entropy, iostat, ksm, memory, pageinfo, raid, and serviceelastic-package build -v && elastic-package install -vRelated issues