Software & Data Integrity Failures (OWASP A08) Guide

A08 is the OWASP category that gets the least attention in developer training and the most attention in incident retrospectives. Software and data integrity failures covers insecure deserialization, software supply chain compromise, and CI/CD pipeline tampering — three threat classes that share one root cause: code, data, or artifacts get loaded into a trust zone with no cryptographic verification. The 2021 reorganization of the OWASP Top 10 merged Insecure Deserialization with supply-chain concerns, and the 2025 list keeps the merged A08 because implicit trust without cryptographic verification produces the highest-impact, hardest-to-detect incidents in modern software. SolarWinds, CodeCov, Kaseya, 3CX, and xz-utils are instances of A08 at different points in the supply chain.

What A08 Actually Covers

The 2021 revision merged Insecure Deserialization (A8 in 2017) with supply-chain integrity concerns that had no prior category. Both share the same flaw: code or data loaded into a process under the assumption it is trustworthy, with the assumption enforced by where the bytes came from rather than by a cryptographic check. The 2025 list keeps A08 because the supply-chain dimension has only grown.

The category covers three overlapping threat classes. Insecure deserialization: the application accepts a serialized object — Java ObjectInputStream, Python pickle, .NET BinaryFormatter, PHP unserialize, Ruby Marshal, unsafe-loader YAML — and the deserializer instantiates objects whose hooks execute attacker-controlled code. Software supply chain compromise: a dependency, build tool, base image, or vendor product gets backdoored upstream and the application ships it unverified. CI/CD pipeline tampering: the build system itself — runners, caches, secrets, deployment credentials — is compromised, injecting malicious changes into clean application code at build or deploy time.

What unites the three is the absence of a verifiable integrity boundary. The deserialization path trusts the bytes because they came over the wire. The supply-chain path trusts the dependency because npm returned it. The CI/CD path trusts the build because the team's own pipeline ran it. None of those is a cryptographic source of trust. A08 asks, at every boundary, whether trust is enforced by signatures and attestations — or by the assumption that the upstream is well-behaved.

The Three Faces of A08

The three threat classes differ in where the integrity failure happens but converge on one defensive pattern: every artifact crossing into the trust zone is cryptographically verified, with verification gating the trust decision rather than running as a logging-only check.

Three faces of OWASP A08 — insecure deserialization at runtime, supply chain compromise at build time, CI/CD pipeline tampering at deploy time — sharing one root cause: implicit trust without cryptographic verification. — **Figure:** A08 has three faces — runtime deserialization, build-time supply chain compromise, deploy-time pipeline tampering — all collapsing into one root cause: implicit trust at a boundary that was never enforced by a cryptographic check.

Insecure deserialization is the runtime face. The application receives serialized data — network, queue, cookie, upload, session store — and the deserialization library converts bytes into an in-memory object graph. The conversion is not passive parsing; serializers execute constructors, type hooks, or magic methods, and an attacker controlling the bytes chooses which classes get instantiated and which methods get invoked. The worst case is RCE from a single call. The fix is structural — never deserialize untrusted data with a format that supports arbitrary type instantiation.

Software supply chain compromise is the build-time face. The source is clean, the developers are trustworthy, the review worked. The compromise happens upstream — a transitive npm dependency, a backdoored base image, a build plugin that exfiltrates secrets, a vendor product shipping malicious updates. The threat model never considered the upstream because it was implicitly trusted. This dimension overlaps heavily with our SCA software composition analysis guide.

CI/CD pipeline tampering is the deploy-time face. The source is clean, the dependencies are clean, the deployed artifacts are not. The compromise happens inside the pipeline — a runner with a vulnerable plugin, a cached layer with an injected backdoor, a deployment credential exfiltrated from a misconfigured workflow, a branch protection bypass. The artifacts the pipeline produces are not what the source describes. This is the dimension DevSecOps teams own most directly.

Insecure Deserialization Deep-Dive

Every major language ecosystem ships at least one serialization format that conflates data with code. The format was designed when the threat model was a single trusted application reading its own state from disk; a decade later it gets used for cross-network communication, and the original threat model no longer holds. The pattern is the same in Java, Python, .NET, PHP, and Ruby; only the API names change.

Java ObjectInputStream is the most documented case. A call to readObject() on attacker-controlled bytes can instantiate any class on the classpath and invoke its readObject, readResolve, or finalize methods. The Apache Commons Collections gadget chain demonstrated that side effects across a graph of innocuous-looking objects compose into arbitrary code execution:

// Vulnerable
ObjectInputStream ois = new ObjectInputStream(request.getInputStream());
SessionState state = (SessionState) ois.readObject();

// Safer: JSON into a fixed schema
ObjectMapper mapper = new ObjectMapper();
mapper.deactivateDefaultTyping();
SessionState state = mapper.readValue(
    request.getInputStream(), SessionState.class);

Jackson with default typing disabled, Gson, or Protobuf produce a value of a fixed declared type. The deserializer never instantiates a type the developer did not declare; the gadget-chain surface disappears.

Python pickle is the canonical misuse. The pickle protocol is a stack-based VM that executes during unpickling; an attacker who controls the bytes can construct a pickle that calls os.system, subprocess.Popen, or eval:

# Vulnerable
import pickle
session = pickle.loads(request.body)

# Safer: JSON + schema validation
import json
from pydantic import BaseModel
class Session(BaseModel):
    user_id: str
    role: str
parsed = Session.model_validate(json.loads(request.body))

If pickle cannot be replaced — some scientific pipelines depend on it — use a custom Unpickler with a strict find_class allowlist of expected types. The allowlist is a structural defense; filtering pickle bytes for "dangerous" content is not.

.NET BinaryFormatter has been deprecated by Microsoft as unsafe for any input not fully controlled by the application. Migrate to System.Text.Json, or to DataContractJsonSerializer/XmlSerializer with a fixed type. NetDataContractSerializer and SoapFormatter share the same problem.

PHP unserialize exposes object instantiation via magic methods (__wakeup, __destruct). Attacker-controlled serialized PHP can invoke any class on the autoload path, producing the same gadget-chain class as Java. Mitigate by passing ['allowed_classes' => false] or switching to json_decode.

Ruby Marshal.load mirrors pickle and Java. YAML.load with the legacy unsafe loader is the more common variant in practice. Use YAML.safe_load or JSON.parse across trust boundaries.

YAML's two loaders. PyYAML's yaml.load without a loader argument was an RCE primitive for years before SafeLoader became the default. Cross-language rule: never call a function named "load" on YAML input without verifying the loader is the safe one, and prefer JSON when YAML's expressiveness is not needed.

JSON is safer but not immune. JSON parsers do not instantiate arbitrary types, eliminating the gadget-chain class. They still admit logical attacks — prototype pollution, type confusion without schema validation, DoS via deeply nested documents. Strict schema validation (Zod, Joi, ajv, Pydantic, Jackson with no default typing) is the structural defense.

Real-World Deserialization RCEs

The deserialization category is not theoretical. The track record of named CVEs is long enough that the recurring patterns are worth reviewing.

Apache Commons Collections gadget chain. Frohoff and Lawrence's 2015 disclosure demonstrated that any Java application with Commons Collections on the classpath was vulnerable to RCE via readObject on an untrusted stream. The gadget-chain surface is the union of every library on the classpath; JVM default deserialization is unsafe by construction with any non-trivial dependency graph.

ysoserial. Frohoff's tool packaged gadget chains for Spring, Hibernate, BeanUtils, Groovy, and Click, collapsing the gap between "theoretically dangerous" and "trivially exploitable in any production Java stack." Equivalents followed in Python (pickle-payload) and .NET (ysoserial.net).

Log4Shell as JNDI deserialization. Log4j (CVE-2021-44228) is sometimes classified as injection because the vector was a log message with a ${jndi:ldap://...} lookup, but the actual primitive was deserialization: the JNDI lookup retrieved a remote object reference, and the JVM deserialized the remote bytes into executing code. The deserialization surface includes any code path resolving remote references, fetching code from URLs, or processing untrusted data through a library that deserializes internally.

Liferay, WebLogic, and JBoss. The app-server tier produced a string of high-severity deserialization CVEs through the late 2010s; WebLogic alone had multiple unauthenticated RCE flaws across multiple years. Pattern: an admin or RPC interface accepted serialized objects, the server deserialized without auth, a gadget chain produced RCE. CISA's KEV catalog tracks deserialization RCEs as a distinct class of routinely-exploited bugs.

The deserialization surface is not a niche legacy concern. It is a structural property of every format that conflates data with code; the only durable fix is to stop using those formats across trust boundaries.

Software Supply Chain Compromise

The supply-chain dimension has grown most visibly since 2021. The five named incidents below cover the main patterns.

SolarWinds (December 2020). Attackers gained access to the SolarWinds build environment and modified source code into SolarWinds.Orion.Core.BusinessLayer.dll immediately before compilation. The repository stayed clean; the build produced a backdoored binary signed with SolarWinds' code-signing certificate and distributed to ~18,000 customers as a normal update. Lesson: code signing alone is not integrity verification — if the key signs whatever the build produces, a compromised build produces a validly-signed wrong artifact.

PHP git.php.net Backdoor (March 2021). Two attacker-signed commits were pushed directly to the official PHP source repository, planting a hidden RCE Easter egg in zend_proc_open behind a User-Agent header check. The compromise of the language's own Git server is the canonical example of why commit signing alone is not commit integrity: signatures cover the contents an attacker controls, not the trust path that put those contents on the server. Walk the full chain — push, code review failure, and detection — in our PHP git backdoor incident walkthrough.

CodeCov (April 2021). The Codecov bash uploader, used in tens of thousands of CI pipelines, was modified by an attacker who had gained access to a Docker image creation step. The modified script exfiltrated environment variables from every CI run that used it, undetected for months. Lesson: tooling running inside CI has access to its secrets, and any compromise cascades into every downstream organization.

Kaseya VSA (July 2021). Kaseya's remote management product was used to deploy ransomware to MSPs and their customers via an authentication bypass plus malicious update channel. Lesson: the trust relationships that make managed services efficient also make them a single point of catastrophic failure when the manager is compromised.

3CX (March 2023). The 3CX desktop softphone was compromised through a chain that started with a different supply-chain attack — an employee installed a Trojanized X_TRADER, which gave attackers access to 3CX's build environment, which let them ship a signed-backdoor installer. Defenders have to consider the build integrity of every tool their developers run on their workstations.

xz-utils (March 2024). A long-running social engineering campaign placed a backdoor in the xz compression library — reachable by sshd through libsystemd — via a maintainer relationship cultivated over years. An unrelated developer caught it after noticing sshd performance regressions, before it reached most stable distributions. Lesson: open-source maintainership pressure produces social engineering vulnerabilities no automated scanning catches.

The patterns converge: implicit trust at a build, distribution, or maintenance boundary; a high-leverage position in the dependency graph; and long detection delays because code review, vulnerability scanning, and EDR were never designed to catch supply-chain attacks.

CI/CD Pipeline as Attack Surface

Most organizations think of their CI/CD pipeline as a build tool. Attackers see the most privileged execution environment in the company — running with deployment credentials, signing keys, cloud admin access, and the authority to push to production with no human in the loop. The shift from "build tool" to "production-equivalent attack surface" is the precondition for taking pipeline integrity seriously.

Build-step injection. A pipeline that builds a PR branch runs whatever code that branch contains. If the same pipeline runs for trusted internal branches and untrusted external PRs, an attacker submits a PR with a malicious postinstall script, Makefile target, or Dockerfile step. The build runs the script; the script exfiltrates secrets or modifies the artifact. Separate the pipeline that runs untrusted PR code from the one with deployment secrets — GitHub Actions' pull_request vs pull_request_target, GitLab's protected variables, equivalents elsewhere.

Secrets in pipelines. CI credentials are accessible to every step, including third-party actions and reusable workflows. A compromise of any — including transitive dependencies of CI tooling — exposes the credentials. Use short-lived OIDC-issued credentials, workload-identity federation, and remove long-lived static secrets entirely.

Runner compromise. Self-hosted runners building public-repo code are a documented attack vector. If reused or networked to internal resources, one malicious build pivots into broader access. Use ephemeral runners with no persistent state, strict egress segmentation, and avoid self-hosted runners for public-PR workloads.

Artifact tampering. Between build and deploy, anyone with access to the registry, network path, or storage backend can substitute a different artifact. Sign at build time, verify at deploy time — the attestation chain runs from source commit through build to deploy, and any substitution fails verification.

Branch protection bypass. Branch protection has historical bypass paths — admin override, force-push from privileged accounts, GitHub Apps with elevated permissions, misconfigured CODEOWNERS — exploitable by an attacker with a developer account foothold. Audit rules regularly, require signed commits on protected branches, alert on every administrative override.

The Integrity Toolchain

The A08 toolchain has matured rapidly since 2021. SLSA, in-toto, and Sigstore have moved from research projects to production-ready primitives organizations can adopt incrementally.

SLSA framework levels. Supply-chain Levels for Software Artifacts defines four maturity levels from Level 1 (documented build with provenance metadata) up to Level 4 (hermetic, reproducible, two-person-reviewed builds). Most adopting teams target Level 2 or 3. The framework's value is a shared vocabulary for "how trusted is this build" rather than a binary yes/no.

in-toto attestations. A signed JSON document that captures what was built, what produced it, and what verified it — the de-facto interchange format for SLSA provenance:

{
  "_type": "https://in-toto.io/Statement/v1",
  "subject": [{
    "name": "ghcr.io/example/app",
    "digest": { "sha256": "a3f4..." }
  }],
  "predicateType": "https://slsa.dev/provenance/v1",
  "predicate": {
    "buildDefinition": {
      "buildType": "https://github.com/actions/runner",
      "externalParameters": {
        "workflow": ".github/workflows/release.yml",
        "ref": "refs/tags/v1.4.0"
      }
    },
    "runDetails": {
      "builder": { "id": "https://github.com/actions/runner" },
      "metadata": {
        "invocationId": "https://github.com/example/app/actions/runs/12345"
      }
    }
  }
}

The attestation is signed by the build platform's identity, attached to the artifact, and verified at deploy time against the expected build configuration. "Built by GitHub Actions, from example/app, at a tagged release, by a workflow matching the expected hash" is a stronger claim than a code signature alone.

Sigstore and Cosign. Sigstore is the umbrella project for keyless signing; Cosign is the primary CLI. Keyless signing uses short-lived certificates issued from an OIDC identity, with each signing event recorded in the public Rekor transparency log. The model eliminates the long-lived key as an attack target. A signing workflow:

name: build-and-sign
on:
  push:
    tags: ['v*']
permissions:
  id-token: write   # OIDC token for keyless signing
  contents: read
  packages: write
jobs:
  release:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - id: build
        uses: docker/build-push-action@v5
        with:
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.ref_name }}
      - uses: sigstore/cosign-installer@v3
      - name: Sign image (keyless)
        env:
          DIGEST: ${{ steps.build.outputs.digest }}
        run: |
          cosign sign --yes \
            ghcr.io/${{ github.repository }}@${DIGEST}

The signing certificate is bound to the workflow's OIDC identity and expires in minutes. A deploy-time policy verifies signature, certificate, and Rekor inclusion proof together:

cosign verify \
  --certificate-identity-regexp \
    "https://github.com/example/.+/.github/workflows/release.yml@.+" \
  --certificate-oidc-issuer \
    "https://token.actions.githubusercontent.com" \
  ghcr.io/example/app:v1.4.0

Verification fails if the artifact was signed by a different workflow, a different identity, or not at all. The policy is enforceable at Kubernetes admission, CI pull time, or runtime.

SBOM with attestation. An attested SBOM — produced at build time, signed by the build platform, verified at deploy time — has materially different security properties than a post-hoc-scanned SBOM. The attested form cannot be modified after the build without invalidating the signature.

Package registry trust. npm, PyPI, Maven Central, and crates.io have rolled out signed publish attestations. npm's --provenance flag publishes a Sigstore-signed SLSA attestation; PyPI Trusted Publishers binds uploads to a specific GitHub Actions workflow via OIDC. The pattern: bind the publish event to a verifiable build identity, record it in a transparency log, verify at install time.

Auto-Update and Distribution Integrity

Automatic update mechanisms collapse three trust decisions — what code to fetch, that it is authentic, that it should run with the application's privileges — into one channel. SolarWinds and 3CX both succeeded because the auto-update channel was trusted to deliver authentic vendor code, and the vendor's build pipeline was the compromised link.

Signed updates with verified provenance. The minimum bar: the update artifact is signed, the signing identity is pinned in the application, verification fails closed on mismatch. The stronger bar — and the SolarWinds lesson — is that the signing identity attests not just that the artifact came from the vendor but that it was built by a specific verified pipeline from a specific source commit. A certificate that signs whatever the build produces is not enough.

Code-signing certificate hijack. Historical attacks steal a vendor's code-signing certificate or sign with a certificate the OS accepts. D-Link, Bit9, and MSI all involved compromised keys. Keyless signing (Sigstore, Apple Developer ID notarization, Microsoft Trusted Signing) reduces this surface by eliminating long-lived keys, but does not address the trust-the-pipeline problem SolarWinds exposed.

App-store vs side-load trust models. Apple Notarization plus App Store Review, Google Play Protect, and Microsoft Store certification apply review-and-signing layers between developer and user. The model concentrates trust in the store operator — a reasonable tradeoff against unsigned side-loaded binaries — and fails when review is subverted (counterfeit apps, compromised developer accounts pushing trojanized updates).

Transparency logs. The signing event is recorded in a public append-only log (Rekor for Sigstore, Certificate Transparency for TLS, Apple's notarization ticket records). Anyone can inspect the log to surface unauthorized signing events the vendor might have missed. Several recent compromises were detected by external researchers reading transparency logs, not by vendors' internal monitoring.

Detection — How A08 Findings Surface

A08 findings are unusual in the OWASP catalog because the controls and findings often live in pipelines and registries rather than application code. Detection patterns differ from the SAST/DAST workflows that catch most other categories.

CI/CD audit trails. The build platform's audit log — every workflow run, commit, secret access, artifact — is the primary detection surface for pipeline tampering. Review for unexpected workflow runs (especially scheduled or manually triggered from unusual accounts), new secrets, and changes to workflow files outside normal PR flow catches early indicators of a supply-chain attack. SIEM ingestion of build logs alongside production logs is increasingly standard.

Build reproducibility checks. A reproducible build produces the same output bytes from the same source on the same configuration. Tampering produces a different artifact than the source predicts. Full reproducibility is hard, but a partial check that runs the build in two independent environments and diffs the outputs catches a meaningful class of build-environment compromises. Debian and NixOS have invested heavily here.

Dependency provenance verification. Tools that verify provenance at install time — npm install --provenance, Sigstore policy controllers, Cosign verification in deploy gates — give a deploy-time check that artifacts were built by the expected pipelines. Fail-closed: missing or mismatched provenance blocks the install. This overlaps with our SCA guide.

Signed-commit enforcement. Requiring every commit on protected branches to be signed by a verified developer key produces an audit trail of authorship independent of which account pushed it. Without signed commits, branch protection is bypassable via account compromise; with them, the attacker needs both the account and the signing key.

Deserialization-specific detection. SAST flags static patterns — ObjectInputStream.readObject, pickle.loads, BinaryFormatter.Deserialize, unserialize, Marshal.load, unsafe-loader YAML.load — regardless of whether the input is actually attacker-controlled. Findings need triage but are a reliable starting point. RASP-style instrumentation can catch attacker-controlled deserialization at runtime, but the ecosystem is less mature than SAST.

The Mitigation Playbook

The defenses converge on a small set of practices that, applied consistently across deserialization, supply chain, and pipeline boundaries, eliminate the bulk of A08 risk.

Avoid deserialization of untrusted data. Cross-trust communication uses a structured format with a fixed schema — JSON, Protobuf, MessagePack — and parsing produces a value of a declared type. Native formats are reserved for intra-process state. Exceptions get strict deserializer allowlists, not input filtering. Any call to readObject, pickle.loads, BinaryFormatter, unserialize, Marshal.load, or unsafe-loader yaml.load is a code-review finding that needs explicit justification.

Sign everything, attest provenance. Every artifact crossing a trust boundary — container images, packages, binary releases, IaC modules, ML models — is signed at production time, with the signing identity bound to the build pipeline. A code signature says "this came from us"; a provenance attestation says "built from this commit, by this pipeline, with these inputs" — the latter defends against SolarWinds-class build compromise. SLSA Level 2 or 3 attestations signed by the build platform are the production form. Sigstore + OIDC is operationally simplest.

Gate deploys on attestation verification. Logging-only verification adds nothing. The deploy-time check is fail-closed: artifacts without valid provenance do not deploy. Kubernetes admission controllers (Kyverno, Connaisseur, Gatekeeper) enforce at cluster admission; pipeline checks enforce at promotion gates.

Lockfile and integrity-hash discipline. Every manifest has a lockfile recording integrity hashes; every install verifies them — npm ci, pip install --require-hashes, Cargo by default, Go modules with GOFLAGS=-mod=readonly. A registry-compromise replacement fails on hash mismatch. Lockfile diffs need PR review the way source does.

Runner hardening. CI runners building untrusted code run on ephemeral VMs with no persistent state, no long-lived secrets, and tightly scoped egress. The runner is a production-equivalent attack surface — patched, monitored, isolated.

Training that connects the three faces. Most developer training treats deserialization, supply chain, and pipeline integrity as unrelated topics. The unifying frame — every byte crossing into the trust zone needs cryptographic verification — produces the recognition skill that catches the next variant. Teams already invested in cryptographic failures (A02) mitigation find the same primitives apply almost verbatim to A08.

· OWASP A08 · DEVELOPER ENABLEMENT ·

Your Build Is a Production Environment. Train Like It.

SolarWinds, CodeCov, 3CX, and xz-utils all succeeded against organizations with perfectly clean source code. The compromise happened at boundaries — deserialization, dependency, build, distribution — the developers were not trained to evaluate. SecureCodingHub builds the supply-chain awareness and signing-and-verification fluency that closes the A08 category as a class. If your team is responsible for software that ships through a CI/CD pipeline, we'd be glad to show you how our program closes the gap between "we trust our build" and "we can prove our build is what we think it is."

See the Platform

Closing: Trust, Made Verifiable

A08 does not look like a vulnerability when you read the code. The Java readObject call looks fine if the developer has not internalized that deserialization is a code-execution primitive. The npm install in the Dockerfile looks fine if no one has thought about what it means to trust the registry. The signed release looks fine if no one has audited what the signing pipeline actually attests to. The category is hard because the failures are at boundaries where the assumption was "this is trustworthy because of where it came from," and the assumption was never actually checked.

The defensive pattern across the three faces of A08 is the same: replace implicit trust with verifiable trust. Replace "this came from our pipeline" with "built by this workflow, from this commit, signed by this identity, recorded in this transparency log." Replace "we don't deserialize untrusted data" with "we have no code path that deserializes outside a fixed schema." None of those replacements is exotic in 2026 — the infrastructure exists. The institutional commitment to deploy it, review it in code, and gate production on it is what separates teams that have closed A08 from teams whose next major incident is sitting in their dependency graph.

The 2021 merge of Insecure Deserialization with supply-chain integrity recognized that the deserialization bug, the npm package compromise, the Jenkins runner takeover, and SolarWinds-style build tampering are the same bug — implicit trust without cryptographic verification — at different points in the pipeline. The category has moved from "deserialization is dangerous" to "your entire build is your attack surface," and the developer skill set that closes it is the one this guide, and our broader curriculum on related categories like injection, is built to teach.