Technical Architecture

How EASM Works

A technical breakdown of the five-stage pipeline that powers external attack surface management, from seed data to remediation.

External attack surface management is not a point-in-time scan. It is a continuous, autonomous pipeline that starts with a handful of seed inputs, a primary domain, a company name, a known IP range, and systematically expands outward until it has mapped every internet-facing asset your organization exposes. The pipeline then classifies, risk-scores, and feeds those findings into your security workflows.

This loop never stops. New cloud instances spin up, acquisitions bring unknown infrastructure, developers publish staging servers without telling anyone. A mature EASM platform re-runs its entire discovery and analysis cycle on a continuous cadence, correlating data from dozens of passive and active intelligence sources to keep your asset inventory current and your risk picture accurate.

Stage 01

Asset Discovery

Discovery is the foundation of the entire pipeline and typically the most technically complex stage. The goal is to enumerate every internet-reachable asset that can be attributed to your organization, including assets no one inside the company knows about.

Seed Data

Every EASM engagement begins with minimal seed data. At a minimum this is your primary domain (e.g. example.com), but most platforms also accept known IP ranges, autonomous system numbers (ASNs), organization names, and subsidiary brand names. The platform uses these seeds as starting points and expands the scope automatically from there.

Passive Intelligence Sources

Passive reconnaissance gathers data without sending a single packet to your infrastructure. This is low-risk, high-yield, and forms the bulk of initial discovery.

  • DNS records: Enumerating A, AAAA, CNAME, MX, NS, TXT, and SRV records reveals subdomains, mail infrastructure, third-party delegations, and service provider relationships.
  • Certificate Transparency logs: Every publicly trusted TLS certificate is logged in append-only CT logs. Querying these logs surfaces subdomains and hostnames that may not appear in DNS enumeration, including pre-production and internal-facing names that were accidentally issued public certificates.
  • WHOIS / RDAP: Registration records for domains and IP blocks reveal ownership, registrant organizations, registration dates, and name-server configurations that help attribute assets to your organization or its subsidiaries.
  • Passive DNS databases: Historical DNS resolution data (collected by providers like Farsight DNSDB and SecurityTrails) shows which domains resolved to which IPs over time, uncovering assets that have since been decommissioned or moved.
  • Search engine caches: Indexed pages, cached snapshots, and search-operator queries (site:, inurl:, intitle:) can surface web applications, login portals, and exposed directories that the organization may not track internally.
  • BGP routing data: Border Gateway Protocol route announcements identify IP prefixes owned or announced by your ASN, revealing network blocks that may host undocumented services.

Active Scanning

Where passive intelligence observes, active scanning interacts. After passive methods have built an initial target list, the platform probes those assets directly to gather richer technical detail.

  • Port scanning: SYN or full-connect scans across common and extended port ranges identify listening services. Open ports are the entry points attackers probe first.
  • Service fingerprinting: Once a port is open, protocol handshakes and response analysis determine what software is running (e.g., Apache 2.4.51, OpenSSH 8.9, MySQL 8.0).
  • Banner grabbing: Many services return version strings, product names, or configuration details in their initial connection banner. These banners feed directly into vulnerability matching.
  • Web crawling: HTTP/HTTPS endpoints are crawled to discover linked pages, JavaScript includes, API endpoints, forms, and embedded resources that reveal technology choices and potential misconfigurations.
  • HTTP header analysis: Response headers (Server, X-Powered-By, Content-Security-Policy, Strict-Transport-Security) leak technology stack details and security posture signals.

Internet-Scale Scanning

Traditional vulnerability scanners work from an IP list you give them. Advanced EASM platforms flip this model: they scan the entire IPv4 address space (all 4.3 billion addresses) and then attribute discovered assets back to organizations. This outside-in perspective is the key differentiator.

Platforms like RedHunt Labs and Censys maintain continuously updated indexes of every reachable host on the internet. Instead of scanning only known ranges, they match organizational fingerprints (domain ownership, certificate subjects, WHOIS registrants) against this global dataset to surface assets that targeted scanning would miss entirely: forgotten servers, shadow cloud instances, and infrastructure from acquisitions that never made it into the CMDB.

Recursive Expansion

Discovery is not a single pass. Every newly discovered asset becomes a seed for the next iteration. A subdomain reveals an IP address. That IP sits inside a /24 network block. Scanning the /24 reveals more hosts. Reverse DNS on those hosts reveals additional domains. Certificate Transparency logs for those domains reveal more subdomains. This recursive expansion continues until the platform reaches a stable state where no new assets are found.

Stage 02

Asset Inventory & Classification

Raw discovery data is noisy. Stage 2 transforms a list of IPs and hostnames into a structured, queryable asset inventory where every entry is classified, attributed, and mapped to its relationships.

Technology Fingerprinting

Each asset is fingerprinted to determine its technology stack. Platforms combine HTTP response signatures, JavaScript library detection, CSS framework markers, favicon hashes, HTML meta tags, and known URI patterns to identify CMS platforms, web frameworks, server software, CDN providers, and third-party analytics or marketing scripts. This fingerprint is the foundation for vulnerability matching in Stage 3.

Asset Categorization

  • Domains & subdomains: primary, secondary, wildcard, parked, expired
  • IP addresses & ranges: owned, leased, cloud-allocated, CDN edge nodes
  • Cloud resources: storage buckets, serverless functions, container registries, managed databases
  • Web applications: marketing sites, SaaS portals, admin dashboards, APIs, GraphQL endpoints
  • Mobile assets: published apps, associated API backends, deep-link configurations

Ownership Attribution

Discovered assets are attributed to business units, subsidiaries, or third-party providers using a combination of WHOIS registrant data, hosting provider metadata, certificate organization fields, and internal context provided during onboarding. Accurate attribution is critical; without it, findings are unactionable because no one knows who is responsible for remediation.

Relationship Mapping

Assets do not exist in isolation. The platform builds a graph of relationships: which domains share the same IP, which IPs belong to the same subnet, which certificates cover multiple hostnames, which JavaScript resources are loaded across multiple sites. This asset graph enables blast-radius analysis: if one host is compromised, the graph shows what else is reachable.

Technology Stack Enumeration

Beyond fingerprinting individual assets, mature platforms aggregate stack data across the entire inventory: how many assets run Apache vs. Nginx, which frameworks are most common, which TLS library versions are deployed. This aggregated view helps security teams assess systemic risk (e.g., “we have 340 assets running Log4j-vulnerable Java versions”).

Stage 03

Risk Analysis & Exposure Detection

With a classified inventory in place, the platform analyzes each asset for vulnerabilities, misconfigurations, and data exposures. This stage answers the question every security team cares about: where are we exposed?

Vulnerability Matching

Fingerprinted technology stacks are cross-referenced against CVE databases (NVD, vendor advisories, exploit-db) to identify known vulnerabilities. If the platform detects Apache 2.4.49 on a host, it flags CVE-2021-41773 (path traversal / RCE). This matching is continuous: when a new CVE drops, every asset in the inventory is re-evaluated immediately.

Misconfiguration Detection

  • Open ports: databases (3306, 5432, 27017) or management interfaces (22, 3389) exposed directly to the internet
  • Default credentials: login pages for routers, admin panels, or IoT devices still using factory defaults
  • Exposed admin panels: /wp-admin, /phpmyadmin, /admin, /console, /graphql playground accessible without authentication
  • Debug endpoints: stack traces, /debug, /status, /health, /env, or /actuator endpoints leaking internal configuration
  • Open storage buckets: publicly readable S3 buckets, Azure Blob containers, or GCS buckets exposing sensitive data

SSL/TLS Analysis

Every TLS-enabled endpoint is evaluated for certificate validity (expiration, chain completeness, hostname mismatch), cipher suite strength (flagging deprecated protocols like TLS 1.0/1.1 and weak ciphers like RC4 or 3DES), HSTS configuration, and certificate transparency compliance. Expired or misconfigured certificates are one of the most common causes of both outages and man-in-the-middle exposure.

Credential Exposure

Platforms monitor multiple channels for leaked credentials tied to your organization's domains:

  • Breach databases: aggregated collections from known data breaches (e.g., Have I Been Pwned datasets)
  • Dark web forums: credentials, session tokens, and access listings traded on underground marketplaces
  • Paste sites: Pastebin, GitHub Gists, and similar services where credentials are frequently dumped
  • Stealer logs: info-stealer malware output containing saved browser passwords, cookies, and session tokens harvested from infected endpoints
  • Code repositories: API keys, database connection strings, and secrets accidentally committed to public GitHub, GitLab, or Bitbucket repositories

AI Exposure Analysis

A growing category of risk: exposed ML model endpoints, publicly accessible Jupyter notebooks, vector database instances without authentication, LLM prompt injection surfaces, and data leaking into third-party AI training pipelines through misconfigured SaaS integrations. Advanced platforms now include these checks as part of their standard analysis suite.

Stage 04

Risk Prioritization

A typical EASM deployment surfaces hundreds or thousands of findings. Prioritization separates signal from noise. Raw CVSS scores alone are insufficient: a CVSS 9.8 vulnerability on an isolated dev server behind a VPN is less urgent than a CVSS 7.5 on a customer-facing payment endpoint.

Prioritization Factors

  • Exploitability: Is there a public proof-of-concept or weaponized exploit? Is it listed in CISA KEV? Actively exploited in the wild vulnerabilities jump to the top regardless of CVSS.
  • Business context: Customer-facing production systems, payment infrastructure, and authentication services carry higher weight than internal development or staging environments.
  • Threat intelligence: Are threat actors actively scanning for or exploiting this specific vulnerability? Have campaigns been observed targeting this technology stack?
  • Asset criticality: Business-defined importance tiers that reflect data sensitivity, revenue impact, and regulatory scope (e.g., PCI DSS cardholder data environments).
  • Exposure duration: How long has this vulnerability been exposed to the internet? Longer exposure windows increase the probability of exploitation.

Mature EASM platforms combine these signals into a composite risk score that reflects real-world exploitability and business impact, not just theoretical severity. Some integrate directly with threat intelligence feeds (GreyNoise, Shodan, Recorded Future) to overlay active exploitation data onto findings in near-real-time.

Stage 05

Remediation & Integration

Discovery without remediation is just expensive awareness. The final stage turns findings into actionable work items that integrate directly into your existing security operations workflows.

  • Actionable findings: Each issue includes specific remediation guidance: what to patch, what to reconfigure, what to decommission, and why.
  • Ticketing integration: Findings are pushed directly into Jira, ServiceNow, or other ITSM platforms with pre-populated severity, owner, and remediation steps so that nothing falls through the cracks.
  • SIEM / SOAR feeds: High-priority findings are ingested into Splunk, Sentinel, Chronicle, or SOAR platforms (Cortex XSOAR, Tines, Swimlane) to trigger automated response playbooks.
  • API access: REST or GraphQL APIs enable custom integrations with internal dashboards, data lakes, GRC platforms, or homegrown security tooling.
  • Automated playbooks: Predefined response workflows for common finding types: auto-create a firewall rule request for an exposed database, auto-notify the certificate team about an expiring cert, auto-open a decommission ticket for a forgotten staging server.
  • Verification scanning: After a remediation is marked complete, the platform re-scans the affected asset to confirm the vulnerability is actually resolved, closing the loop with evidence.

The Continuous Loop

These five stages are not a waterfall that runs once and delivers a report. They form a continuous loop. The moment Stage 5 completes a verification scan, Stage 1 is already running its next discovery cycle. New assets appear daily: a developer spins up an EC2 instance, marketing launches a campaign microsite, an acquisition closes and brings 200 unknown domains into scope. The loop catches these changes within hours, not quarters.

This continuous cadence is what separates EASM from traditional penetration testing and periodic vulnerability assessments. Your attack surface changes every day. Your monitoring should too.

The pipeline never stops.

A mature EASM deployment runs its full discovery-to-remediation cycle continuously, not weekly, not monthly, and certainly not quarterly. Every new asset, every DNS change, every certificate issuance triggers re-evaluation. Organizations that treat EASM as a one-time project miss the entire point: your attack surface is a living system, and your monitoring must match its pace.

Data Sources EASM Platforms Use

A comprehensive EASM platform correlates intelligence from across these sources to build and maintain its asset inventory:

DNS records
Certificate Transparency
WHOIS / RDAP
Passive DNS
BGP data
Web crawling
Port scanning
Banner grabbing
Search engines
Code repositories
Breach databases
Dark web forums
Paste sites
Stealer logs
App stores
Docker Hub
Cloud APIs
SSL/TLS certs

See Which Vendors Deliver Continuous EASM

Not all platforms run truly continuous discovery. Compare the vendors that scan 24/7 against those that run periodic assessments.