Import

How data enters AVM

AVM uses a staged import model so inventory data can be validated, previewed, and preserved with its original context. Importing software is the start of the workflow, not the end of it.

Import is not the end of the process

AVM imports inventory through staged workflows. Asset data and software data are first validated into staging records before being committed into the main operational tables. This helps keep import behavior visible and reviewable.

Software import preserves raw values such as vendor, publisher, product, and version information. Those raw values are important because canonical resolution and vulnerability matching happen after import, not instead of import.

Key point: successful import does not mean the software is already canonically resolved or fully ready for matching.

Import flow

Upload data
Validate into staging
Preview results
Commit import
Review and backfill

Import staging exists to make inventory ingestion inspectable, not to hide it inside one opaque upload action.

Assets first, then software

AVM expects software rows to be tied to assets. In practice, this means asset records should exist before software is imported against them.

The asset acts as the host anchor for later software review and alert generation, so software import depends on asset identity being available.

Asset import

Creates the host-level records that software rows will attach to.

Software import

Adds observed software rows linked to known asset identity.

How assets and software are linked

In AVM import, software rows are not linked to assets by internal IDs. Instead, both asset and software JSON use the same external_key to establish the relationship.

This means that the asset and its software must share the same stable external identifier during import.

Minimal asset example

{
  "external_key": "example-uuid-0001",
  "name": "server-01"
}

Minimal software example

{
  "external_key": "example-uuid-0001",
  "product": "Docker Desktop",
  "vendor": "Docker Inc.",
  "version": "4.60.1"
}

During import, AVM resolves this shared external_key to an internal asset_id. The software row is then stored as a record linked to that asset.

The external_key should remain stable across imports. If it changes, AVM will treat the asset as a different system.

Key idea: external_key is the import-side identity that ties asset and software together. AVM converts it into internal relationships after validation.

Example osquery queries

The following examples show how similar data can be collected using osquery. In practice, these results are transformed into AVM JSON format before import.

Asset-side example

osqueryi --json "
SELECT
  uuid AS external_key,
  hostname AS name
FROM system_info;
"

The system_info table provides stable host identity such as uuid and hostname, which can be mapped to AVM asset fields.

Software-side example (Windows)

osqueryi --json "
SELECT
  (SELECT uuid FROM system_info LIMIT 1) AS external_key,
  name AS product,
  publisher AS vendor,
  version
FROM programs;
"

The programs table provides installed software inventory. The external_key is typically injected during transformation to link software rows to the corresponding asset.

Using osquery as a data source

AVM does not require a specific data source, but tools like osquery are commonly used to collect asset and software inventory.

osquery exposes system information as SQL tables. These tables can be queried and transformed into AVM-compatible JSON for import.

Asset-side data

Tables such as system_info and os_version provide host identity, OS details, hardware, and CPU information that map to AVM asset fields.

Software-side data

On Windows, the programs table provides installed software inventory, including name, version, publisher, and install location.

Key idea: osquery data is typically transformed into AVM JSON rather than imported directly. The transformation step maps osquery fields to AVM fields and assigns a stable external_key.

Learn more about osquery: Official documentation

What AVM can store for assets and software

Before thinking about import-source fields, it helps to understand what AVM is actually designed to preserve. The point of richer import is not to collect every possible field from the source.

In other words, richer source data only matters when it maps to actual AVM fields.

Asset record in AVM

The assets table stores host identity and context: ownership, platform, OS, hardware, CPU, and system metadata.

external_key name asset_type owner note source platform os_name os_version os_build os_major os_minor os_patch arch system_uuid serial_number hardware_vendor hardware_model hardware_version board_vendor board_model board_version board_serial cpu_brand cpu_physical_cores cpu_logical_cores cpu_sockets physical_memory computer_name hostname local_hostname last_seen_at

Software record in AVM

The software_installs table stores observed software with both raw evidence and canonical linkage fields.

asset_id type source source_type vendor_raw product_raw version_raw publisher vendor product version version_norm normalized_vendor normalized_product cpe_name cpe_vendor_id cpe_product_id install_location installed_at package_identifier install_source package_manager bundle_id edition channel release_label purl last_seen_at import_run_id canonical_link_disabled

Key idea: import should populate fields that AVM can actually preserve and use later for linking, review, and matching.

Richer asset example

AVM assets can preserve significantly more host context than the minimal external_key plus name shape. This becomes useful when your source already knows operating system, hardware, CPU, and host identity details and you want that context to remain available inside AVM.

{
  "arch": "64-bit",
  "asset_type": "endpoint",
  "computer_name": "example-host-01",
  "cpu_brand": "Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz",
  "cpu_logical_cores": "2",
  "cpu_physical_cores": "2",
  "cpu_sockets": "1",
  "external_key": "example-uuid-0001",
  "hardware_model": "VirtualBox",
  "hardware_vendor": "innotek GmbH",
  "hardware_version": "-1",
  "hostname": "example-host-01",
  "local_hostname": "example-host-01",
  "name": "example-host-01",
  "os_build": "26200",
  "os_major": "10",
  "os_minor": "0",
  "os_name": "Microsoft Windows 11 Pro",
  "os_version": "10.0.26200",
  "owner": "example-team",
  "physical_memory": "-1",
  "platform": "windows",
  "serial_number": "example-serial-1234",
  "source": "OSQUERY",
  "system_uuid": "example-uuid-0001"
}

This maps naturally to AVM asset-side fields in the assets table: host identity (external_key, name, hostname, computer_name, local_hostname, system_uuid, serial_number), platform and OS (platform, os_name, os_version, os_build, os_major, os_minor, os_patch, arch), hardware (hardware_vendor, hardware_model, hardware_version, board_vendor, board_model, board_version, board_serial), and CPU or memory context (cpu_brand, cpu_physical_cores, cpu_logical_cores, cpu_sockets, physical_memory).

Richer software example

AVM software rows can also preserve much richer observed evidence than a minimal vendor / product / version payload.

{
  "external_key": "example-uuid-0001",
  "install_location": "C:\\Program Files\\Docker\\Docker",
  "last_seen_at": "2026-03-19 00:05:43",
  "product_raw": "Docker Desktop",
  "publisher": "Docker Inc.",
  "source": "osquery",
  "source_type": "osquery",
  "type": "application",
  "vendor_raw": "Docker Inc.",
  "version_raw": "4.60.1"
}

This maps to AVM software-side fields in the software_installs table. AVM can preserve raw evidence such as vendor_raw, product_raw, version_raw, and publisher, provenance such as source, source_type, and import_run_id, installation context such as install_location, installed_at, package_identifier, install_source, package_manager, and product-shape metadata such as type, arch, edition, channel, release_label, bundle_id, and purl.

AVM also has normalized and canonical fields such as vendor, product, version, version_norm, normalized_vendor, normalized_product, cpe_name, cpe_vendor_id, and cpe_product_id, but those should be understood as later operational fields rather than something a source system must always provide directly.

How osquery data maps into AVM records

osquery is usually not exported into AVM one-to-one. Instead, a transformation step reads osquery tables and builds AVM-shaped asset and software JSON that matches the fields AVM can actually store.

Asset-side mapping

Host-oriented osquery tables can be transformed into one AVM asset record per system. Typical target fields are external_key, name, hostname, computer_name, system_uuid, platform, os_name, os_version, os_build, arch, hardware_vendor, hardware_model, cpu_brand, and related host metadata.

Software-side mapping

Installed-software osquery tables can be transformed into AVM software rows tied to that asset through the same stable external_key. Typical target fields are type, source, source_type, vendor_raw, product_raw, version_raw, publisher, install_location, installed_at, install_source, and package_identifier.

Why this matters

The goal is not just to ingest source data. The goal is to populate AVM's own operational model in a way that preserves evidence and supports later review, linking, and matching.

Typical osquery sources

When AVM import JSON is built from osquery, the final payload is usually assembled from multiple osquery tables rather than copied from a single query result as-is.

A common pattern is to produce one asset object per host from host-side tables, and many software objects for that same host from installed-software tables. Both sides are then tied together through the same external_key.

Asset-side tables

Tables such as system_info, os_version, and cpu_info are useful for building AVM asset records.

In practice, these tables can provide host identity, platform, OS version, hardware, CPU, and related host metadata that map naturally into the assets table.

Software-side tables

On Windows, programs is typically the most direct source for installed software inventory.

Fields such as software name, version, publisher, install location, install source, install date, and identifying number can then be mapped into AVM software-side fields.

Transformation step

The final AVM import files are usually transformed outputs, not raw osquery tables. That transform step is where source-side field names are mapped into AVM's asset and software model.

Example field mapping from osquery to AVM

Asset-side examples

osquery field AVM field
system_info.hostname hostname or name
system_info.uuid system_uuid or stable external_key
system_info.cpu_brand cpu_brand
system_info.cpu_physical_cores cpu_physical_cores
system_info.cpu_logical_cores cpu_logical_cores
system_info.cpu_sockets cpu_sockets
system_info.physical_memory physical_memory
system_info.hardware_vendor hardware_vendor
system_info.hardware_model hardware_model
os_version.name os_name
os_version.version os_version
os_version.major os_major
os_version.minor os_minor
os_version.patch os_patch
os_version.build os_build
os_version.platform platform
os_version.arch arch

Software-side examples

osquery field AVM field
programs.name product_raw
programs.version version_raw
programs.publisher publisher; optionally vendor_raw
programs.install_location install_location
programs.install_source install_source
programs.install_date installed_at
programs.identifying_number package_identifier

Exact transformation rules are up to your import pipeline. AVM does not require osquery-specific field names, but it works best when stable asset identity and raw software evidence are preserved.

Accepted field styles

AVM accepts common JSON naming patterns used by inventory and integration tooling. This reduces the need for users to reshape data aggressively before import.

Common style

Snake case such as external_key, product_raw, and version_raw.

Also supported

Camel case forms used by some integrations, especially where import-side keys need to map to the same operational field.

The goal is not to encourage inconsistent naming. It is to make import practical for real inventory sources.

Why raw values are preserved

AVM stores imported software as observed inventory, not as if it had already been normalized perfectly. Raw vendor, publisher, product, and version values may still be needed for review, canonical linking, alias creation, and auditability.

This is important because inventory data is rarely perfectly clean. Preserving raw evidence makes it possible to improve resolution later without losing the original source-side view.

Design principle: import should preserve evidence, not erase it.

Import staging

AVM uses explicit staging entities for asset and software import. This means uploads can be validated and previewed before they affect the main inventory records.

ImportRun

Tracks a specific import execution and ties together staged rows, status, and later review context.

ImportStagingAsset

Holds staged asset rows before import commit.

ImportStagingSoftware

Holds staged software rows before import commit.

Why this matters

Operators can inspect what is valid, what is invalid, and what will be imported before the final step.

Import does not guarantee canonical resolution

A software row can be imported successfully and still remain unresolved from a canonical perspective. This is expected.

Import answers the question “what was observed and accepted into the system?” Canonical resolution answers the different question “what reference identity does this software correspond to?”

Import success means

The row passed import validation and became part of the operational inventory.

Import success does not mean

The row is already fully normalized, canonically linked, or ready for perfect vulnerability matching.

Choosing Replace or Append during software import

AVM allows two import modes depending on how you want to manage software inventory over time.

Append rows

Use this when you want to add newly observed software without removing existing data.

Typical use:

  • A new application was installed on an asset
  • You are collecting data from multiple sources
  • You want to accumulate observations over time

Behavior:

  • Existing software records remain unchanged
  • New rows are added
  • Alerts for existing software are not affected

Replace asset software

Use this when the import represents the current full state of each asset.

Typical use:

  • Scheduled inventory updates (e.g. daily scan)
  • Synchronizing with a source of truth (CMDB, full scan)

Behavior:

  • Existing software is replaced by the imported set
  • Software not present in the new import is removed
  • Alerts linked to removed software are automatically closed on the next recalculation

Important: alert updates are not performed during import. After importing software (especially when using Replace), you should run Generate Alerts to synchronize alert state with the current software inventory.

Recommended operational flow

  • Import software
  • Append → incremental update
    Replace → full state refresh
  • (Optional) Review linking / unresolved mappings
  • Run: Generate Alerts

Key idea: Append grows the dataset (observation-oriented), while Replace reflects the current state (state-oriented). Generate Alerts synchronizes security results with that state.

What happens after import

After software is imported, AVM continues with canonical resolution and review workflows. Depending on the software row, that may include dictionary resolution, alias or synonym help, unresolved mapping review, canonical backfill, and later alert recalculation.

Imported software
Canonical linking
Unresolved review if needed
Backfill + matching

When unresolved entries are expected

Unresolved rows are normal when software naming is noisy, package metadata is incomplete, vendor names differ from the canonical dictionary, or product strings are too broad to map safely on first pass.

AVM keeps those rows visible because they are part of the real import result, not an exceptional corner case to hide.

Why version information helps

Product identity is not the only thing that matters after import. Version information can strongly affect whether a vulnerability applies. Providing version values at import time makes later matching more useful and reduces unnecessary ambiguity.

With version data

AVM can perform stronger version-aware evaluation in later matching steps.

Without version data

Canonical identity may still be useful, but downstream applicability decisions may be less precise.

Import sources and provenance

AVM keeps track of import-side context such as source and source type. This helps preserve how a row entered the system and supports later review, troubleshooting, and data-quality work.

In practice, import data may come from manual preparation, inventory scripts, platform exports, or tooling such as osquery. AVM benefits from that context instead of discarding it.

Example

A software upload may contain a product name that is valid enough to import but not specific enough to resolve to a canonical product immediately. AVM should still preserve the row, attach it to the correct asset, and make it available for unresolved review.

Later, after alias or synonym improvements and canonical backfill, that same software row may become matchable against vulnerability conditions without needing to re-import the raw evidence.

What AVM is trying to avoid

One-step opaque ingestion

Import should remain visible and reviewable, not disappear behind a single all-or-nothing action.

Discarding raw source evidence

Source-side values may still be important after import.

Equating import with normalization

Getting data into the system is different from resolving its canonical identity.

Hiding incomplete coverage

Unresolved rows should remain visible so the system can be improved iteratively.

Summary

Import in AVM is a staged process that preserves observed inventory, validates it visibly, and commits it into the system without pretending that canonical identity and vulnerability applicability are already solved.

That makes import a reliable operational starting point. Assets and software enter the platform with their original context intact, and the system can then improve coverage through canonical linking, unresolved review, backfill, and matching.