Heuristics Reference — Detection Rules | Malicious Document Analyzer

CVE Confidence

CVE exact The heuristic matches highly likely exploitation of a specific vulnerability.

CVE likely The heuristic matches on the known vulnerable primitive or delivery pattern, but does not confirm a full exploit chain.

CVE related The heuristic identifies presence of the vulnerable component or attack surface, but is unable to confirm an exploit.

Shellcode 30

CreateRemoteThread API reference critical SC_STR_CREATEREMOTETHREAD

String 'CreateRemoteThread' found in file bytes.

CreateRemoteThread starts execution in another process. In documents or embedded payloads this is a strong process-injection indicator.

Metasploit bind_tcp critical SC_MSF_BIND

Byte signature matching Metasploit Framework bind_tcp shellcode.

This byte sequence matches Metasploit's bind TCP shell, which opens a listening port on the victim's machine for the attacker to connect to.

Metasploit reverse_tcp critical SC_MSF_REVERSE

Byte signature matching Metasploit Framework reverse_tcp shellcode.

This exact byte sequence is the preamble of Metasploit's reverse TCP shell payload — one of the most widely-used exploitation tools. Its presence is consistent with a weaponised file.

URLDownloadToFile API reference critical SC_STR_URLDOWNLOAD

String 'URLDownloadToFile' found in file bytes.

URLDownloadToFile downloads a file from the internet to disk. This is one of the most common APIs used by shellcode to fetch and save second-stage malware. Its presence is a high-signal indicator of malicious intent.

WriteProcessMemory API reference critical SC_STR_WRITEPROCESSMEMORY

String 'WriteProcessMemory' found in file bytes.

WriteProcessMemory writes bytes into another process. Malware uses it with VirtualAllocEx and CreateRemoteThread for process injection.

XOR-encoded Windows strings critical SC_XOR_ENCODED

Windows DLL or API names found XOR-encoded with a single-byte key.

Shellcode frequently XOR-encodes strings like 'kernel32.dll' or 'LoadLibraryA' to evade signature-based detection. At runtime, the shellcode decodes them with the same key before calling the APIs. Finding known library or API names encoded under a single-byte XOR key is a high-signal indicator of obfuscated shellcode. The analyzer brute-forces all 255 possible single-byte keys against common Windows DLL and API names to detect this technique.

CreateProcess API reference high SC_STR_CREATEPROCESS

String 'CreateProcess' found in file bytes.

CreateProcess starts a new process. Its presence in raw document bytes suggests embedded code intended to execute programs on the system.

Egg-hunter shellcode high SC_EGG_HUNTER

Egg-hunter pattern that searches process memory for a marker ('egg').

When an exploit has limited buffer space, an egg-hunter is a small piece of shellcode that scans memory for a larger payload marked with a specific tag. This is a well-known exploitation technique.

GetProcAddress API reference high SC_STR_GETPROCADDRESS

String 'GetProcAddress' found in file bytes.

GetProcAddress resolves an exported function inside a loaded DLL. Together with LoadLibrary it is the foundational API resolution pair used by nearly all in-document shellcode. Ordinary documents rarely contain this string.

Heap-spray pattern high SC_HEAP_SPRAY

Repeated byte pattern typical of heap-spray payloads.

Heap spraying fills large areas of memory with repeated data (often NOP-sled + shellcode) so that a corrupted pointer is likely to land in attacker-controlled memory. Seeing long repeated byte patterns in a document is a strong exploit indicator.

LoadLibrary API reference high SC_STR_LOADLIBRARY

String 'LoadLibrary' (or LoadLibraryA/W/Ex) found in file bytes.

LoadLibrary maps a DLL into the current process and returns its base address; combined with GetProcAddress it is the standard primitive for shellcode to resolve Win32 API functions without using the import table. Documents do not normally embed this string in their data.

NOP sled high SC_NOP_SLED

Long run of 0x90 (NOP) bytes detected in the file.

A NOP sled is a sequence of no-operation instructions used by attackers to pad shellcode so that execution 'slides' into the payload regardless of the exact jump address. Long NOP runs are unusual in normal documents.

PEB API-hash resolver high SC_API_HASH_RESOLVER

PEB access combined with nearby ROR13-style API hashing.

Windows shellcode often walks the PEB to find loaded DLLs, then hashes export names to resolve APIs without storing cleartext imports. ROR13 hash loops are a common resolver primitive; seeing them near PEB access is a strong position-independent shellcode indicator.

PEB access (x64) high SC_PEB_ACCESS_X64

Access to the Process Environment Block via GS:[0x60].

The 64-bit equivalent of PEB access. Shellcode reads GS:[0x60] to find the PEB on 64-bit Windows, then walks DLL lists to resolve API functions.

PEB access (x86) high SC_PEB_ACCESS

Access to the Process Environment Block via FS:[0x30].

Windows shellcode accesses the PEB to find loaded DLLs and resolve API addresses without using imports. Reading FS:[0x30] on x86 is the standard way to reach the PEB — a hallmark of position-independent shellcode.

PowerShell reference high SC_STR_POWERSHELL

String 'powershell' found in file bytes.

PowerShell is a powerful scripting environment frequently abused by attackers to download payloads, execute scripts in memory, and evade detection.

ShellExecute API reference high SC_STR_SHELLEXEC

String 'ShellExecute' found in file bytes.

ShellExecute can open files, URLs, or run programs. In shellcode it is often used to launch payloads or open second-stage URLs.

WinExec API reference high SC_STR_WINEXEC

String 'WinExec' found in file bytes.

WinExec is a Windows API that runs a command. Shellcode resolves and calls WinExec to launch malicious commands. Normal documents should not contain this raw API name in their binary data.

Windows Script Host reference high SC_STR_WSCRIPT

String 'wscript' or 'cscript' found in file bytes.

Windows Script Host (wscript/cscript) executes VBScript and JScript. In malicious documents it is often used to run downloader or installer scripts.

XOR decoder loop high SC_XOR_DECODER

XOR-based decoder stub that decrypts shellcode at runtime.

Shellcode is often XOR-encoded to evade signature detection. A decoder stub at the start decrypts the real payload in memory. Finding a decoder stub strongly suggests the file carries encrypted shellcode.

bitsadmin reference high SC_STR_BITSADMIN

String 'bitsadmin' found in file bytes.

bitsadmin manages Background Intelligent Transfer Service jobs. Attackers abuse it to download files stealthily in the background.

certutil reference high SC_STR_CERTUTIL

String 'certutil' found in file bytes.

certutil is a legitimate Windows tool that attackers misuse to download files (-urlcache) or decode Base64 payloads (-decode). It is a commonly abused LOLBin.

cmd.exe reference high SC_STR_CMD

String 'cmd.exe' followed by an execution switch (/c, /k, or /r) — i.e. an actual invocation, not just a bare reference.

The rule matches 'cmd.exe' immediately followed by /c, /k, or /r, which is the shape of a real command invocation by shellcode or a macro launching a payload. Plain documentation mentions of 'cmd.exe' (e.g. in user manuals or embedded paths) do not fire this rule.

mshta.exe reference high SC_STR_MSHTA

String 'mshta' found in file bytes.

mshta.exe runs HTML Applications (.hta files) which can contain VBScript or JScript. It is a well-known 'living-off-the-land' binary (LOLBin) abused to execute malicious scripts.

x86 GetPC stub (CALL $+5) high SC_GETPC_CALL

x86 CALL $+5 instruction sequence that obtains the current instruction pointer.

Shellcode needs to know its own memory address to locate encoded payloads. CALL $+5 followed by POP is a common way to get the program counter (PC). This pattern is unusual in normal documents.

x86 GetPC stub (FSTENV) high SC_GETPC_FSTENV

x86 FSTENV-based instruction sequence to obtain the instruction pointer.

An alternative GetPC technique that uses floating-point environment save (FSTENV) to leak the instruction pointer. This is a strong shellcode indicator because this instruction pattern is uncommon in ordinary document data.

NOP-equivalent sled medium SC_NOP_EQUIV_SLED

Long run of NOP-equivalent instructions (e.g. INC, DEC, POPA).

Some shellcode replaces 0x90 NOPs with other single-byte instructions that have no meaningful side-effect (like INC ECX) to evade simple NOP-sled detection.

VirtualAlloc API reference medium SC_STR_VIRTUALALLOC

String 'VirtualAlloc' found in file bytes.

VirtualAlloc allocates executable memory. Shellcode uses it to create a writable+executable memory region where decoded payloads can run.

VirtualProtect API reference medium SC_STR_VIRTUALPROTECT

String 'VirtualProtect' found in file bytes.

VirtualProtect changes memory page permissions. Shellcode uses it to make memory regions executable (bypassing DEP). Not expected in documents.

x86 push-string-call medium SC_PUSH_STRING

Two or more consecutive PUSH imm32 instructions whose decoded bytes spell a Windows API or shell-keyword string.

Shellcode frequently constructs strings (like 'cmd.exe' or 'WinExec') on the stack by pushing 4-byte immediates with the 0x68 opcode. The rule matches a run of ≥2 PUSH imm32 instructions and only fires when the decoded bytes contain a known execution, network, or Windows API keyword — so generic numeric pushes do not trigger it.

PDF 194

/Launch /P parameter is a javascript: URL critical PDF_LAUNCH_JS_PROTOCOL

PDF /Launch action passes a `javascript:` URL as the /P parameter.

When mshta receives `javascript:...` as its argument, it can execute the script inside its scripting host. This converts the PDF /Launch action into an inline-script execution primitive. Combined with PDF_LAUNCH_MSHTA, this is T1218.005 + T1059.005.

/Launch action target critical PDF_LAUNCH_COMMAND

PDF /Launch action specifies an executable target (and optionally parameters).

The /Launch action can run an external program when activated, or on open if paired with an open trigger. This rule captures the launched command for display and elevates to CRITICAL when the target references a known-dangerous executable (cmd, PowerShell, etc.).

/Launch action targets mshta.exe (LOLBIN) critical PDF_LAUNCH_MSHTA

PDF /Launch action whose /F parameter explicitly names mshta.

Mshta is the Microsoft HTML Application host and a documented LOLBIN. Modern PDF launcher campaigns prefer it because mshta accepts a `javascript:` URL as its /P parameter, which executes inline JScript without requiring a dropper file on disk. The PDF /Launch carrier plus an mshta target is the unambiguous shape of MITRE ATT&CK T1218.005.

Adobe Reader JavaScript exploit kit (URL-keyed loader) critical PDF_JS_URL_KEYED_READER_EXPLOIT_KIT

PDF OpenAction JS uses a this.URL-keyed cipher + hex-decode + eval — a known anti-analysis Adobe Reader exploit kit.

Fires when the PDF OpenAction JavaScript contains the fingerprint of a specific anti-analysis Adobe Reader exploit kit: a substitution cipher keyed on this.URL (the delivery filename, extracted via indexOf('rtl')), a decoy fallback key when the URL contains ':' (so sandboxes/static analysis decode to garbage by design), and a parseInt(pair,16)->fromCharCode hex decoder feeding eval. The kit bundles the classic Adobe Reader JavaScript CVEs (CVE-2007-5659, CVE-2008-2992, CVE-2009-0927, CVE-2009-4324, CVE-2010-0188) and selects one at runtime by viewer version. Because the decode key is the runtime delivery URL, the specific CVE is not statically recoverable, so the loader fingerprint attributes the Reader-exploit family at related confidence (which removes it from the unknown-exploit/0-day hunt queue).

Adobe Reader U3D auto-activated 3D annotation — CVE-2009-3459 critical CVE_2009_3459_U3D_AUTOACTIVATE

PDF embeds a U3D stream behind a /3D annotation set to auto-activate on page view.

CVE-2009-3459 is a heap buffer overflow in Adobe Reader / Acrobat's U3D (Universal 3D, ECMA-363) CLODProgressiveMeshDeclaration parser, patched in APSB09-15 (Reader 9.2 / 8.1.7 / 7.1.4). The exploitable document shape is a /Subtype /3D annotation whose /3DA activation dictionary binds /A /PV with /AIS /I — that combination makes the U3D parser run on page view with no click required. Real-world samples pair this with a 0x0c0c0c0c heap-spray JavaScript that lays a urlmon-based download shellcode at the corrupted allocation. Legitimate 3D PDFs almost never use the auto-activate + JS combination.

Annotation subject hex-decoded eval stager critical PDF_ANNOT_SUBJECT_HEX_EVAL_STAGER

PDF JavaScript decodes dash-delimited hex from annotation subjects and evals the result.

Old PDF exploit kits often hide second-stage JavaScript in annotation /Subject fields. The rule requires the full staging shape: OpenAction JavaScript that enumerates annotations, converts dash-delimited hex bytes with String.fromCharCode(), and evals the recovered stage. This avoids flagging ordinary annotations or benign hexadecimal text.

Annotation subject percent-decoding eval stager critical PDF_ANNOT_SUBJECT_MARKER_EVAL_STAGER

OpenAction JavaScript reads an annotation /Subject payload, rewrites marker bytes into percent escapes, unescapes the result, and dispatches it through eval.

This rule is a high-confidence exploit-kit transport pattern, not a CVE attribution by itself. It requires an /OpenAction launcher, an annotation /Subj payload, syncAnnotScan/getAnnots annotation enumeration, marker-to-% rewriting, unescape(), and direct or indirect eval dispatch. Plain getAnnots({nPage:0}) is not enough; CVE-2009-1492 is only assigned by the separate CVE rule when getAnnots() carries crafted integer-overflow or long string arguments.

Base64-encoded Windows executable payload in PDF critical PDF_BASE64_PE_PAYLOAD

PDF text contains a long base64 blob that decodes to a verified MZ/PE executable payload.

Malicious PDFs may hide a Windows executable as base64 in comments, after %%EOF, or in plain object text rather than as a declared attachment or stream. Decoding to a verified PE header is a strong payload-smuggling indicator.

Embedded Windows executable payload in PDF stream critical PDF_EMBEDDED_PE_PAYLOAD

PDF stream bytes contain an embedded MZ/PE executable payload.

Exploit chains sometimes hide droppers inside ordinary PDF stream bytes rather than as declared /EmbeddedFile attachments. A verified PE header inside a PDF stream is strong staged-payload evidence.

Embedded export-and-launch chain — CVE-2010-1240 likely critical CVE_2010_1240_EMBEDDED_EXPORT_LAUNCH

PDF combines /Launch, EmbeddedFiles/EF, and exportDataObject with nLaunch:0.

This rule covers CVE-2010-1240-style documents where the attached payload is not a clean PE executable but the Adobe Reader drop-and-launch mechanism is explicit. The rule requires /Launch plus an embedded-file name tree and exportDataObject(... nLaunch:0), which is not a benign attachment workflow.

FORCEDENTRY fake-GIF PDF/JBIG2 exploit shape critical CVE_2021_30860

File starts as a GIF, contains a secondary PDF body, and the carved PDF has JBIG2 stream anomalies consistent with FORCEDENTRY-style CoreGraphics exploitation.

Project Zero's FORCEDENTRY analysis describes CVE-2021-30860 as a fake GIF that routes into the CoreGraphics PDF parser and then exercises a malicious JBIG2 stream. Analyzer only emits this CVE when the wrapper and malicious JBIG2 child indicators are both present, keeping plain scanned-PDF JBIG2 usage out of the CVE bucket.

Fake 'free download' SEO-poisoning PDF critical PDF_SEO_FAKE_DOWNLOAD

ML-flagged PDF that also carries a download/call-to-action lure and an off-domain downloadN.php?file=document gateway link.

The mass-generated 'free PDF download' / fake-document family ranks in search results for a lure query, then funnels the victim through an off-domain server-side download gateway (e.g. /download3.php?q=<name>.pdf) to malware, scareware, or ad-fraud redirects. The pages pad themselves with benign decoy links to dilute classifier scores, so the ML hit alone lands only in the suspicious band. This rule fires only on the conjunction of the ML hit, a visual download lure, and the gateway link — a combination benign PDFs essentially never carry — and promotes the verdict to malicious.

Flash ActionScript-3 exploit loader in PDF critical PDF_FLASH_AS3_EXPLOIT_LOADER

Embedded SWF's ActionScript-3 bytecode loads and executes an inner SWF from raw bytes (allowLoadBytesCodeExecution) and/or Vector heap-spray groomers.

The embedded Flash object's ActionScript-3 constant pool (recovered by the SWF/ABC parser) references exploit-loader primitives: LoaderContext.allowLoadBytesCodeExecution (to execute a second-stage SWF decompressed into a ByteArray) and/or Vector.<uint>/Vector.<Number> heap/JIT spray groomers with raw byte writes. Benign Flash content (sound players, scrollable text, Flex widgets) never loads and runs code from raw bytes; this is a staged Flash exploit delivered through the document. The specific Flash CVE lives in the inner second-stage SWF, so attribution stays family-level.

Hidden ZIP with executable payloads in PDF stream critical PDF_HIDDEN_ZIP_EXECUTABLE_PAYLOAD

PDF stream contains a hidden ZIP archive with executable entries.

PDFs can legitimately carry attachments, but normal attachments are declared through /EmbeddedFile, /EmbeddedFiles, or /EF metadata so the viewer and user can treat them as attachments. This rule looks for the different pattern of raw ZIP local-file headers hidden inside ordinary PDF stream bytes, then only fires when ZIP entry names end in executable payload extensions such as .dll, .exe, .scr, .ps1, .hta, or .lnk. Legitimate reasons for DLLs inside a PDF are very rare; a software manual or PDF portfolio should use explicit attachment metadata, not a concealed stream archive.

JBIG2Decode generic heap-spray exploit — CVE-2009-0658 likely critical CVE_2009_0658_GENERIC_SPRAY

PDF combines JBIG2Decode image streams with JavaScript heap-spray or decoder scaffolding.

The exact CVE-2009-0658 rule requires stronger Reader-version or decoded-shellcode fingerprints. This likely rule requires JBIG2Decode plus exploit-preparation JavaScript such as unescape heap-spray builders, large arrays, fromCharCode decoders, or eval dispatch, which is the static shape of the older Adobe Reader JBIG2 exploit family.

JavaScript heap-spray padding critical PDF_JS_HEAP_SPRAY_PADDING

A deflated /JS stream inflates into a large blob that is almost entirely whitespace wrapped around a small code core.

Classic JavaScript heap-spray shape: a tiny deflated /JS object expands to megabytes of tabs/spaces/newlines — the spray buffer that positions shellcode at a predictable address before a PDF parser CVE is triggered. Benign PDF JavaScript is never megabytes of whitespace, so this is treated as malicious on its own, even when the inner exploit stage cannot be decoded to an exact CVE.

Known malicious redirector link critical PDF_MALICIOUS_REDIRECTOR_LINK

PDF links to redirector infrastructure used by a known malicious PDF campaign.

This rule matches clickable PDF URIs pointing to known redirector infrastructure from the SEO/adware PDF campaign that used ttraff and related domains. The PDF itself is normally a redirect carrier: user interaction sends the browser through the redirector chain, which can end in unwanted software or malware delivery. This is not evidence of a PDF parser CVE.

Launch VBS dropper command chain — CVE-2010-1240 likely critical CVE likely CVE_2010_1240_LAUNCH_VBS_DROPPER

PDF /Launch invokes cmd.exe to build a VBS ADODB.Stream/XMLHTTP/FileSystemObject dropper.

CVE-2010-1240 covers Adobe Reader/Acrobat Launch File dialog abuse. This variant does not rely on PDF EmbeddedFiles; instead the Launch command constructs VBS that either reopens the PDF itself and extracts an appended byte range, or downloads a payload with XMLHTTP, saves it via ADODB.Stream/FileSystemObject, and runs it. The rule requires cmd.exe from /Launch plus VBS dropper APIs to avoid tagging ordinary Launch actions.

Launch action critical PDF_LAUNCH

PDF contains a /Launch action to start an external application.

A Launch action can start an external application when the action is activated, or on open if it is wired to an open trigger. This is a high-risk PDF feature and is useful evidence when reviewing a document.

Launch/export embedded executable chain — CVE-2010-1240 likely critical CVE_2010_1240_EMBEDDED_PE_EXPORT

PDF combines /Launch, EmbeddedFiles/EF, exportDataObject, and embedded executable bytes.

This conservative variant of the CVE-2010-1240 detector covers samples where the Launch dictionary does not expose the strict cmd.exe /Win shape. Requiring all four surfaces keeps benign attachments out while attributing the same drop-and-launch abuse chain.

Obfuscated multi-stage PDF JavaScript heap-spray exploit critical PDF_JS_OBFUSCATED_MULTISTAGE_HEAPSPRAY

PDF JS behind nested filters / a custom rolling-XOR decoder de-obfuscates to a heap-spray / ROP chain.

The PDF JavaScript is hidden behind nested stream filter chains (e.g. ASCIIHexDecode/FlateDecode/ASCIIHexDecode) and/or a custom in-JS decoder (a rolling-XOR 'ffts' stager that XORs each byte with a feedback key and evals the result). After the analyzer unwinds those layers, the recovered stage contains a heap-spray / ROP chain (repeated 0c0c/9090/4141 landing words plus shellcode). A spray that only appears after de-obfuscation is never benign — this is an obfuscated multi-stage Adobe Reader JavaScript exploit. ClamAV often labels these by the dropped Windows payload (Win.Trojan.Agent), which is the second stage, not the delivery; the family is attributed at related confidence because the exact Reader CVE trigger may sit in an even deeper layer.

Pidief-style multi-CVE JavaScript dispatcher critical PDF_PIDIEF_MULTI_CVE_DISPATCH

Single PDF JavaScript body branches on viewerVersion and invokes multiple Reader CVE sinks.

The 2009-2010 Pidief.J template carries three Reader exploits in one PDF: CVE-2007-5659 (Collab.collectEmailInfo), CVE-2008-2992 (util.printf with a field-width %f format string), and CVE-2009-0927 (Collab.getIcon). A small dispatcher reads app.viewerVersion and fires the matching sink. The rule requires both a viewerVersion switch and two or more distinct CVE sinks in the same JavaScript body, so it doesn't fire on benign code that mentions one of them.

PowerShell download cradle in PDF critical PDF_PS_DOWNLOAD_CRADLE

PDF action body contains a PowerShell download-and-execute cradle.

Patterns matched include `Invoke-Expression(Invoke-RestMethod ...)`, `IEX(IRM ...)`, `(New-Object Net.WebClient).DownloadString`, `[Net.WebClient]`, `[Net.ServicePointManager]::SecurityProtocol`, and `powershell -ep Bypass -enc <base64>`. These strings are rare in benign PDFs; their presence is strong evidence of the payload-staging stage of an attack chain (MITRE T1059.001 + T1105).

Repeated invisible payload link critical PDF_REPEATED_PAYLOAD_LINK_LURE

PDF uses invisible/repeated links to deliver a direct payload file.

Repeated invisible link annotations pointing to an archive or executable match malware-delivery PDFs where the visible page is just a lure and the actual payload is downloaded from the linked URL. Lure-like filenames such as document/unlock/verify archives further increase confidence.

RichMedia Flash exploit — CVE-2011-0611 likely critical CVE likely CVE_2011_0611_FLASH_RICHMEDIA

PDF combines RichMedia Flash activation, an AS3 ByteArray/loadBytes SWF, and shellcode staging.

CVE-2011-0611 affects Adobe Flash Player and Adobe Reader's Authplay Flash handling. This rule requires a RichMedia Flash annotation, an embedded SWF with AS3 ByteArray/loadBytes loader logic, and either PDF-side or SWF-internal shellcode/heap-spray staging. Those gates separate exploit-delivery PDFs from ordinary RichMedia content.

SEO/link-farm PDF carrier critical PDF_SEO_LINK_FARM

Small PDF contains many clickable external PDF links clustered on one host.

Generated malicious PDF campaigns often use scraped text and many clickable PDF URLs to make documents appear in search results and route users through attacker-controlled or compromised link farms. The rule is gated on a small file with many external .pdf URI actions clustered on one host to avoid treating ordinary references as malware. This is social-engineering and redirect-chain evidence, not a PDF parser CVE fingerprint.

Shell.Application.ShellExecute COM pivot critical PDF_SHELL_APPLICATION_PIVOT

PDF (or its embedded JavaScript stub) instantiates Shell.Application and calls ShellExecute.

Some PDF readers prompt the user before honouring a /Launch action. Attackers sidestep that prompt by having a JScript stub (typically loaded via mshta as a `javascript:` URL) instantiate the Shell.Application COM object and call ShellExecute to spawn the next-stage process — the reader's /Launch warning may not fire because the spawn happens inside the mshta host, not inside the PDF reader.

Time-locked SHA-1 XOR JavaScript loader critical PDF_JS_TIME_LOCKED_SHA1_LOADER

PDF JS embeds a SHA-1 routine keyed on the victim's wall-clock minute to XOR-decrypt and eval a payload.

The PDF JavaScript embeds a full inline SHA-1 implementation and derives an XOR key from the victim's wall-clock minute (getHours()+''+getMinutes()); the digest decrypts a small ciphertext array that is then eval'd, with a catch{} block retrying the previous minute for clock-rollover tolerance. The wall-clock keying is a deliberate anti-static / anti-sandbox measure — the decryption key only exists at the victim's open time, so the encrypted stage cannot be recovered by static analysis (~1440 possible minute keys, no validation oracle). The loader construction is nonetheless conclusive: no legitimate PDF hashes the clock to eval code. This is a known pre-2011 PDF exploit-kit loader whose encrypted second stage is an Adobe Reader JavaScript exploit; attributed at related confidence as an exploit-kit family because the exact CVE trigger is time-locked out of view.

U3D parser exploit with JavaScript heap spray — CVE-2011-2462 likely critical CVE likely CVE_2011_2462_U3D_HEAPSPRAY

PDF combines U3D/3D annotation content with JavaScript heap-spray shellcode.

Public CVE-2011-2462 exploit chains use a crafted U3D stream and JavaScript heap spray to control memory during Adobe Reader's U3D parser memory corruption. The rule requires both U3D/3D content and a heap-spray JavaScript shape, avoiding attribution for ordinary 3D PDFs.

U3D/RichMedia activation — CVE-2011-2462 likely critical CVE_2011_2462_RICHMEDIA_U3D

PDF combines U3D stream markers with RichMedia and JavaScript/XFA activation surfaces.

This rule covers U3D exploit documents where the U3D marker is present in stream data but not exposed through the canonical /Subtype /U3D dictionary. Requiring RichMedia plus active JavaScript/XFA surfaces keeps the rule focused on weaponized CVE-2011-2462-style delivery documents rather than benign 3D assets.

VBScript decimal byte array PE payload in PDF critical PDF_VBS_DECIMAL_ARRAY_PE_PAYLOAD

PDF comment text contains a decimal byte array that decodes to a verified MZ/PE executable payload.

Some malicious PDFs hide a Windows executable in commented VB/VBScript-style source lines such as Array(c(077),c(090),...). The detector only fires when that concealed decimal array decodes to a valid MZ/PE header, which keeps the rule focused on staged payloads rather than ordinary numeric arrays.

exportDataObject + nLaunch — embedded-file dropper critical PDF_JS_EXPORT_LAUNCH_DROPPER

PDF JavaScript calls exportDataObject() with nLaunch set, extracting and launching the document's embedded file on open.

exportDataObject({cName:..., nLaunch:2}) writes the PDF's embedded file to a temp folder and opens it in its default handler — a launch-on-open dropper. The embedded file is the real payload (commonly a VelvetSweatshop-encrypted Office document wrapping an Equation Editor exploit, or a script/executable). No benign PDF workflow auto-launches an extracted attachment, so this is a high-confidence malicious-delivery indicator.

/OpenAction targets an object not reachable from /Root high PDF_OPENACTION_HIDDEN_OBJECT

PDF defines an /OpenAction whose target object cannot be reached by walking indirect references from the document /Root catalog tree.

When a PDF is opened, the /OpenAction fires regardless of whether the target object is reachable from the catalog. Many static analysers and indexers enumerate the document via the /Root tree and never see hidden objects — yet the action still runs. This shape is associated with evasive samples that hide JavaScript or launch actions outside the normal catalog descent.

Adobe Reader APSB08-13 patch-range version gate (CVE-2007-5659) high PDF_JS_ADOBE_APSB08_13_PATCH_GATE

PDF JavaScript gates the payload on the Reader 7.0.x / 8.0–8.1.1 window.

A version gate of (>= 8 && < 8.11) OR (< 7.1) is the exact Reader release window patched by Adobe APSB08-13 for CVE-2007-5659 (Collab.collectEmailInfo buffer overflow). Pidief-family PDFs use this gate to fire the collectEmailInfo trigger only on vulnerable Readers and stay quiet on patched ones.

Adobe Reader APSB09-15 patch-range version gate (CVE-2009-3459) high PDF_JS_ADOBE_APSB09_15_PATCH_GATE

PDF JavaScript gates the payload on the exact Adobe APSB09-15 patch boundary.

A single JS body that simultaneously checks Reader version against 9.2, 8.17 (=8.1.7), and 7.14 (=7.1.4) is fingerprinting the APSB09-15 patch range, which covered CVE-2009-2990 and CVE-2009-3459 (Adobe Reader U3D parser bugs). No benign script tests all three of those Reader version points together; this is exploit-kit dispatcher logic.

Adobe viewer lure links off-domain high PDF_ADOBE_VIEWER_LURE

PDF uses Adobe secure-document/viewer lure wording and links to a non-Adobe host.

This catches fake 'View on Adobe', 'OnlineAdobe', or secured-PDF viewer carriers. The rule requires a clickable HTTP(S) destination outside Adobe-owned domains, so ordinary documents that merely mention Adobe do not match. Image-heavy or sparse-text PDFs using this wording are common credential-phishing carriers.

Annotation subject callee-key hex JavaScript stager high PDF_ANNOT_SUBJECT_CALLEE_HEX_STAGER

PDF JavaScript decodes an annotation /Subject payload with marker replacement and a callee.toString-derived key.

Agent-359xx/361xx style PDFs use syncAnnotScan()/getAnnots() only as a staging primitive: JavaScript reads an indirect annotation /Subject stream, rewrites marker bytes such as F/A/E or z to percent signs, or splits short delimiter-prefixed hex bytes such as mz/xyz. The recovered second-stage decoder then derives a small key from arguments.callee.toString() or an embedded numeric table to decode the final exploit JavaScript. The rule is emitted only after static decoding recovers exploit-like JavaScript; the exact CVE is then assigned by scanning the recovered stage for the real vulnerable API.

Base-N pair JavaScript stager high PDF_BASE_N_PAIR_JS_STAGER

PDF JavaScript rebuilds an exploit stage from base-N character pairs.

Some PDF exploit kits store the real payload as a long string of two-character tokens, decode each pair with parseInt(radix), turn the bytes into JavaScript with String.fromCharCode, and eval the result. The rule is bounded to long pair tables in JavaScript streams and only fires when the recovered stage contains concrete exploit markers.

Brand impersonation link high PDF_BRAND_IMPERSONATION_LINK

PDF links to a Microsoft-login impersonation domain.

The URI host imitates Microsoft login or Microsoft Online branding but is not a Microsoft-owned domain. This is a credential-phishing signal and does not require active PDF code to be dangerous.

CCITTFaxDecode + active content — LibTIFF CVE-family indicator high PDF_CCITT_CVE_2010_0188_RELATED

PDF uses CCITTFaxDecode alongside active-content indicators.

CVE-2010-0188 was widely exploited as a LibTIFF integer overflow primitive delivered via PDF. The rule matches /CCITTFaxDecode plus JavaScript, XFA, or RichMedia evidence; it does not validate malformed TIFF/CCITT data.

CFF CharString excessive subroutine calls high PDF_CFF_CHARSTRING_SUBR_STORM

CFF CharStrings contain an unusually high number of subroutine calls.

Dense callsubr/callgsubr usage stresses call-stack, bias, and bounds logic in CFF interpreters.

CFF CharString operand stack underflow high PDF_CFF_CHARSTRING_STACK_UNDERFLOW

Type 2 CharString bytecode invokes an operator without enough operands.

Underflow forces different font interpreters to reject, pad, or continue from corrupted state, which is the kind of parser divergence used by font-engine exploits.

CFF INDEX has an invalid offSize high PDF_CFF_OFFSIZE_INVALID

CFF INDEX or header declares an offSize outside the spec-allowed 1..4 range.

Implementations that accept the invalid value as a hint for offset-array stride read or write off-by-N-byte misaligned data — a shape associated with multiple Acrobat font-engine CVEs.

CFF INDEX offset array is not monotonically non-decreasing high PDF_CFF_INDEX_NOT_MONOTONIC

CFF INDEX's offset array contains entries that decrease, so successive elements appear in unexpected order.

Renderers that compute element sizes via subtraction (offset[i+1] - offset[i]) read negative or implausibly large values when offsets are non-monotonic — a known bug class in CFF parsers.

CFF INDEX offsets extend past stream end high PDF_CFF_INDEX_OFFSET_OVERFLOW

CFF INDEX offset array or data section is declared to extend beyond the available font bytes.

Renderers that follow the declared offsets read attacker-influenced bytes from adjacent memory. This is the structural shape behind several Acrobat CFF-parser CVEs.

CFF Private DICT offset points outside font high PDF_CFF_PRIVATE_DICT_OUT_OF_RANGE

CFF Top DICT points the Private DICT outside the embedded font stream.

The Private DICT controls subroutine and hint metadata. Out-of-range Private offsets create parser divergence and are a useful font-engine exploit primitive.

CFF2 BLEND operand-stack growth high PDF_CFF2_BLEND_STACK_OVERFLOW

CFF2 blend bytecode grows the operand stack beyond expected bounds.

Large operand-stack growth in CFF2 CharStrings is a font-engine parser exploit surface, especially when BLEND operators are present.

CFF2 BLEND operand-stack underflow high PDF_CFF2_BLEND_STACK_UNDERFLOW

CFF2 blend/stack bytecode consumes operands that are not available.

Operand-stack desynchronisation around BLEND and arithmetic operators is the key static shape from Project Zero's Adobe CoolType BLEND analysis.

CFF2 CharString repeated BLEND operators high PDF_CFF2_BLEND_STORM

Embedded CFF2 font bytecode contains repeated BLEND operators.

CFF2 blend operators drive variable-font CharString stack handling. Repeated BLEND usage in embedded PDF fonts is a strong parser-stress signal related to the Adobe CoolType BLEND bug class described by Project Zero.

Character-table JavaScript eval stager high PDF_JS_CHAR_TABLE_EVAL_STAGER

PDF JavaScript rebuilds an exploit stage through character-table indexes and eval.

Older PDF exploit kits hide the real Adobe Reader exploit APIs by keeping a small alphabet string and appending hundreds of single-character substr/charAt lookups into an array before join()+eval. This rule is emitted only after the bounded static decoder reconstructs an exploit-like stage, making it a low-cost fallback when exact CVE signatures are unavailable or were hidden from the first scan pass.

Clickable URI hides destination behind an obfuscated IP high PDF_URI_OBFUSCATED_IP_HOST

PDF clickable URI hides its real host as an obfuscated IP literal or behind a brand-looking user@ userinfo.

Legitimate links never encode their host as hex/octal/dword integer octets, and never hide a real IP destination behind a brand-shaped 'name@' user-info prefix that the browser discards. Both forms exist only to make the rendered link read like a trusted site while routing to disposable IP infrastructure — a hallmark of phishing and malware redirectors.

Compressed object stream hides active PDF content high PDF_OBJSTM_ACTIVE_CONTENT

A PDF /ObjStm stream contains active-content keys such as /JavaScript or /OpenAction.

Object streams are valid PDF, but hiding executable objects inside compressed object streams is a common way to bypass simple static scanners that do not expand /ObjStm content.

Direct payload download link high PDF_DIRECT_PAYLOAD_LINK

PDF clickable URI points directly to an executable, script, shortcut, disk image, or archive.

A document that links straight to a runnable payload or archive is a delivery risk. Legitimate manuals can link to installers, so this rule is strongest when combined with invisible link annotations, repeated links, or social-engineering wording.

DocuSign download lure links off-domain high PDF_DOCUSIGN_DOWNLOAD_LURE

PDF contains DocuSign-themed download/signing lure text and links to a non-DocuSign host.

Credential-phishing PDFs often imitate DocuSign with a fake document/download/signing prompt. The high-signal part is not the brand word alone; it is DocuSign lure text plus a clickable HTTP(S) action whose destination is outside DocuSign-owned domains. Legitimate DocuSign workflows keep the action on DocuSign infrastructure.

Document-phishing landing link high PDF_DOCUMENT_PHISHING_LINK

PDF links to a non-reputable host using a document-phishing landing path.

Some secure-document phishing PDFs rasterize the visible DocuSign/Adobe lure text or encode it through custom fonts, so in-process text extraction cannot always recover the brand wording. This rule catches the destination side instead: small linked PDFs whose clickable URL uses known document-phishing landing shapes such as wp-admin/wp-admin-like staging, project/folder/rfq index.php routes, repeated slm2 paths, or heavily percent-encoded symbol slugs on non-reputable hosts.

Embedded JS stream high PDF_JS

PDF references a /JS stream with inline JavaScript code.

An inline JavaScript stream can contain obfuscated exploit code that triggers when the PDF is opened. It is a red flag unless the PDF is a known interactive form.

Embedded script payload in PDF stream high PDF_EMBEDDED_SCRIPT_PAYLOAD

PDF stream bytes contain Windows or HTML script execution markers.

ActiveXObject/CreateObject, WScript.Shell, PowerShell, ADODB.Stream, and HTML <script> markers inside ordinary PDF streams indicate a hidden second-stage script payload rather than normal PDF JavaScript.

Encrypted PDF carrying executable triggers high PDF_ENCRYPTED_WITH_JS

PDF declares /Encrypt and also contains /JavaScript, /JS, /OpenAction, /AA, or /Launch — payload is hidden from static analysis.

Document encryption hides the JavaScript body and stream contents from static scanners. The combination with executable triggers is unusual and worth review because it can hide payloads from static inspection. Real-world droppers may use empty user passwords so the reader decrypts and runs the payload without prompting for a password.

Escaped URI image lure high PDF_ESCAPED_URI_IMAGE_LURE

PDF image lure hides its clickable HTTP(S) URI with PDF octal string escapes.

PDF literal strings may legally encode characters as octal escapes, but phishing carriers often encode URL punctuation this way so simple URL extractors miss the destination. Combined with an image-heavy, low-text document, this is a strong screenshot-lure signal.

Free-generator / game-hack redirector lure high PDF_GAME_HACK_REDIRECT_LURE

PDF's clickable action targets a /app/<id>/<slug>-game-hack redirector.

Landing-page PDFs for a large SEO 'free spins / generator / game hack' lure family (coin master, robux, roblox, tiktok, etc.). The single clickable action is a redirector of the shape https://<rotating-host>/app/<numeric-id>/<slug>-game-hack that funnels victims through disposable hosts to a malware or scam payload. The host rotates, so the rule anchors on the highly specific URL path; the multi-link variants also trip ML/link-farm rules, while this catches the single-link variants that otherwise score clean.

Hidden HTML iframe in PDF high PDF_HIDDEN_HTML_IFRAME

PDF bytes contain a zero-size external HTML iframe.

A hidden iframe pointing to an external URL is a browser exploit-kit or redirect/dropper pattern. It is not normal PDF structure, so this rule is high-signal while remaining cheap to evaluate.

ICC tag offset+size lies outside the profile high PDF_ICC_TAG_OUT_OF_RANGE

ICC tag entry points at byte ranges outside the embedded profile (or inside the tag-table region).

Colour-management stacks that follow the offset blindly read attacker-influenced bytes from adjacent memory. This shape has driven multiple ICC-parser CVEs and remains a regular finding in font/colour fuzzing campaigns.

Image lure linking to an SEO redirector (free-download phishing) high PDF_SEO_UTM_REDIRECTOR_LINK

PDF image lure with a clickable multi-word utm_term / FeedBurner-proxied SEO redirector link — the 'free ebook/manual download' phishing family.

The 'free ebook / solution-manual / document download' SEO-phishing family ships a tiny image-only (or image + filler-text) PDF whose single clickable /URI is a search-keyword gateway — a multi-word utm_term/keyword redirector (the natural-language phrase the page ranks for) or a FeedBurner-proxied feedproxy.google.com/~r/.../uplcv hop abusing a trusted Google host. The PDF carries no exploit; the payload lives on the linked destination. The broader PDF_SEO_DISPOSABLE_LINK_FARM rule needs many links, so single-link variants slip through, and ClamAV/ML miss the ones padded with a few text pages. This rule pairs the redirector with an image lure to flag HIGH structurally — independent of any ClamAV/ML signature and regardless of text-page count. The redirector alone (no image lure) is surfaced at LOW as an IOC only.

Image lure with local builder path and remote links high PDF_IMAGE_LURE_LOCAL_FILE_AND_REMOTE_URI

Image-only PDF contains both remote HTTP(S) links and a local file:/// builder path.

A scanned or photo PDF may legitimately be image-only, and a normal document may contain external links. The suspicious combination here is narrower: a click-action image lure exposes a local file:/// path such as a user desktop, appdata, temp, or generator work directory while also linking to remote web infrastructure. That points to generated clickbait/phishing carriers rather than a normal document workflow.

Image-heavy PDF with invisible suspicious link high PDF_SUSPICIOUS_LINK_LURE

PDF uses invisible link annotations over image-heavy content to send users to a suspicious URI.

This rule combines three conditions: a small image-heavy PDF, invisible link annotations (border [0 0 0]), and a URI on a high-risk TLD with account/login/verify/security/support-style wording or a staged path such as /step1. That combination is stronger than a generic external link because it matches credential-phishing PDFs where the visible document is a screenshot-like prompt and the actual collection flow is hosted on the linked site.

Image-only PDF links to deceptive (typosquat) host high PDF_IMAGE_LURE_DECEPTIVE_HOST_LINK

Image-only PDF's clickable action targets a host impersonating a service/brand word with a leetspeak digit substitution (serv1ce, upd4te, …).

Screenshot-like phishing/fake-update PDFs render an image and a single clickable action whose destination host impersonates a security, service, or brand word using a leetspeak digit in place of a letter (e.g. 'serv1ce', 'l0gin', 'm1crosoft', 'payp4l'). The rule fires only when the document is image-only with little real text AND the destination is not a known-good (Tranco/allowlisted) domain, so a legitimate flyer linking to its real site is unaffected. The digit-for-letter substitution is the deception tell.

Image-only PDF links to scheme-label deceptive host high PDF_IMAGE_LURE_SCHEME_HOST_LINK

Image-only PDF's clickable action targets a host beginning with a literal 'http.'/'https.' label.

Screenshot-like phishing PDFs render a brand image plus an open/download button whose destination host starts with a literal 'http'/'https' DNS label, dressing a throwaway domain up as a secure file-transfer link. Combined with the image-only lure shape this is a high-confidence credential-phishing carrier.

Image/button lure to file-hosting download high PDF_FILE_HOSTING_DOWNLOAD_LURE

PDF screenshot/button lure links to a public file-hosting download endpoint.

This rule combines multiple signals: the PDF is image-only or nearly image-only, contains a clickable PDF action/button, and the target URL points to a public file-hosting download endpoint such as Pixeldrain, Gofile, Filemail, file.io, transfer.sh, Catbox, MediaFire, or Workupload. That combination is much stronger than a generic URI or image-only PDF because it matches malware-delivery lures where the visible page is just a fake document/download prompt and the actual payload is retrieved from external hosting.

Invisible CAPTCHA web lure high PDF_CAPTCHA_LINK_LURE

PDF uses invisible links to a CAPTCHA/capcha-themed web path.

CAPTCHA-themed landing pages are commonly used in phishing and ClickFix chains to move the user out of the document and into a fake verification flow. Invisible PDF link annotations make this stronger than an ordinary visible web reference.

Invisible OAuth redirect link high PDF_OAUTH_REDIRECT_LINK_LURE

PDF uses invisible link annotations that point to an OAuth authorization URL with a redirector chain.

OAuth authorization URLs with client_id, response_type=code, and redirect_uri parameters are legitimate in web apps, but are unusual as repeated invisible PDF link targets. When the redirect_uri leads through safelink/photo-link redirect infrastructure, the shape matches credential-phishing PDFs that hide the actual collection site behind trusted-looking redirects.

JBIG2 segment refers forward to a later segment high PDF_JBIG2_FORWARD_REFERENCE

JBIG2 segment refers to one or more later segments by number.

Spec-conformant JBIG2 streams only refer backwards. Forward references are the structural shape that drove the FORCEDENTRY family of JBIG2 0-days (CVE-2021-30860 and relatives) — they confuse refcount tracking when the renderer dereferences a segment that has not been parsed yet.

JBIG2 segment refers to an undefined segment high PDF_JBIG2_REFERRED_OUT_OF_RANGE

JBIG2 segment refers to a segment number that has not been declared earlier in the stream.

This is the JBIG2 equivalent of a dangling pointer. Renderers that treat the missing segment as null follow a different code path than renderers that abort or read uninitialised state — a known parser-confusion primitive.

JBIG2 segment refers to itself high PDF_JBIG2_SELF_REFERENCE

JBIG2 segment lists its own segment number in its referred-to list.

The JBIG2 spec does not define what happens when a segment refers to itself, and renderers that follow refcount paths through self-references are a textbook exploit shape (recursive resolution, double-free, use-after-free).

JBIG2 unknown-length form on non-generic-region segment high PDF_JBIG2_DATA_LENGTH_OVERFLOW

JBIG2 segment uses the 0xFFFFFFFF 'unknown length' form on a segment type other than generic region.

The spec restricts the unknown-length form to generic-region segments (types 36, 38, 39). Using it on other segment types lets the parser continue past the intended segment boundary and read attacker-controlled bytes from later segments.

JPEG2000 COD declares too many decomposition levels high PDF_JPX_COD_DECOMP_LEVELS_HIGH

COD marker declares more than the spec-maximum 32 wavelet decomposition levels.

Buffers and lookup tables sized from the decomposition-level count overflow on real implementations once the count exceeds 32.

JPEG2000 PCLR palette declares too many entries high PDF_JPX_PCLR_OVERSIZE

PCLR (palette) sub-box declares more than the spec-maximum 1024 entries.

Palette allocators sized from this 16-bit field overflow when the declared entry count exceeds 1024 — a known-vulnerable shape across multiple JPEG2000 implementations.

JPEG2000 SIZ marker has anomalous parameters high PDF_JPX_SIZ_ANOMALY

JPEG2000 SIZ marker declares image dimensions, image offsets, or component counts outside plausible ranges.

SIZ overflows have triggered several past JPEG2000 parser bugs because downstream allocators multiply width × height × components × bit-depth. Zero dimensions, image offsets at or past dimensions, and absurd component counts are all known-bad shapes.

JPEG2000 box declares an impossibly small size high PDF_JPX_BOX_TOO_SMALL

JP2 box header declares a total length less than 8 bytes (the minimum for the size+type header alone).

Spec-conformant boxes are at least 8 bytes; smaller declarations cause the box walker to either loop or read box headers from the middle of the previous box's body, a known evasion shape.

JPEG2000 box extends past stream end high PDF_JPX_BOX_TRUNCATED

JP2 box declares a length that runs past the available stream bytes.

Different readers handle the broken box differently — some clamp to stream end and continue, some abort, some resync to the next plausible box header — leading to divergent interpretation of image geometry.

JPEG2000 codestream missing SOC start marker high PDF_JPX_NO_SOC

jp2c codestream box does not begin with the required Start Of Codestream (FF 4F) marker.

Renderers that tolerate the missing marker and those that abort see different content. Some implementations search forward for the next FF xx marker, which lets attacker-supplied bytes between the box header and the SOC be treated as codestream content.

JPEG2000 jp2h missing required ihdr sub-box high PDF_JPX_JP2H_MISSING_IHDR

jp2h header box does not begin with the mandatory ihdr (image header) sub-box.

Strict readers reject the file; lenient readers parse downstream sub-boxes anyway and infer image dimensions from defaults or from later codestream markers — leading to differing interpretations of image geometry between scanner and viewer.

JPEG2000 top-level boxes overlap high PDF_JPX_BOX_OVERLAP

Two top-level JP2 boxes claim overlapping byte ranges.

The spec requires non-overlapping box concatenation. Overlap lets a reader interpret the same bytes as two different boxes — a deliberate way to hide content from one parser while presenting it to another.

JPXDecode + active content — JPEG2000 CVE-family indicator high PDF_JPX_CVE_2018_4990_RELATED

PDF uses JPXDecode/JPEG2000 alongside active/exploit-delivery indicators.

CVE-2018-4990 is an out-of-bounds write in Adobe Reader's JPEG2000 parser. The rule matches /JPXDecode plus JavaScript, XFA, or RichMedia evidence; it does not validate malformed JPEG2000 codestream data.

JavaScript action high PDF_JAVASCRIPT

PDF contains a /JavaScript action.

JavaScript embedded in a PDF can interact with the viewer, exploit vulnerabilities, or download external content. Most legitimate PDFs do not need JavaScript. This is the most common PDF exploit vector.

JavaScript heap-spray launcher high PDF_JS_HEAPSPRAY

PDF JS schedules a callback with a multi-kilobyte string (heap-spray primitive).

app.setTimeOut / app.setInterval with a multi-kilobyte string argument is the common PDF heap-spray primitive: it fills the renderer's heap with attacker-controlled bytes so a corrupted pointer lands in the spray.

Large character-table JavaScript eval stager high PDF_JS_LARGE_CHAR_TABLE_EVAL_STAGER

PDF JavaScript uses a large numeric index table and indirect eval to rebuild a hidden stage.

Older PDF exploit kits sometimes keep the first recovered stage encrypted or otherwise encoded, so a static decoder cannot always validate a final CVE API. This rule catches the high-confidence launcher shape itself: a large ar[] numeric table, a short cc character table, anti-analysis exception scaffolding such as loadXML({}), and an indirect eval sink.

Malformed PDF with no object graph high PDF_MALFORMED_NO_OBJECT_GRAPH

File has a PDF header but no indirect objects, xref table/stream, or startxref pointer.

Normal PDFs need an object graph and cross-reference structure. A large PDF-header blob with no objects is not renderable content; it is more consistent with parser fuzzing, evasion, corruption, or an exploit test case than a benign document.

Obfuscated JavaScript getURL redirector high PDF_JS_OBFUSCATED_GETURL_REDIRECTOR

PDF document JavaScript opens an obfuscated redirector URL with getURL().

The rule is constrained to document-level JavaScript that calls getURL() with a percent-escaped HTTP(S) URL and a redirector-style endpoint such as /in.cgi or /go.php. This catches redirect-carrier PDFs where the outbound URL is hidden in JavaScript instead of a normal /URI action, while avoiding broad matches on benign visible getURL links. This is malicious routing behavior, not a PDF parser CVE fingerprint.

Obfuscated Pidief-style JavaScript loader (stage not decoded) high PDF_PIDIEF_OBFUSCATED_VERSION_GATED_LOADER

PDF JavaScript carries a large opaque encoded stage built to be eval'd, but the encoding resisted full static decoding so no exact CVE could be attributed.

Structural fallback for the Pidief / multi-CVE exploit-kit loader family. The document has JavaScript plus a large opaque encoded stage (a z<hex> custom alphabet, or a numeric character-code array the loader sums/XORs and evals) that no exact-CVE, multi-CVE-kit or heap-spray rule could attribute because the scheme could not be fully decoded. A version-gated obfuscated JavaScript stage has no benign use; flagged suspicious on its own and pushed to malicious by an ML/AV signal or a recovered heap-spray.

Obfuscated multi-stage PDF JavaScript dropper high PDF_JS_OBFUSCATED_DROPPER

Composite signal of pre-2011 Adobe Reader exploit-kit dropper shape.

Fires when the PDF JavaScript shows three or more independent signals of exploit-kit-style multi-stage obfuscation: annotation-subject payload staging (reading pr[N].subject after getAnnots), String.fromCharCode hex decoder loops, long -hh-hh-hh hex-dashed payloads, incremental construction of a method name starting from 'ev' (to hide an eval call), and three or more app.plugIns.length anti-analysis gates. The actual CVE is hidden in the final decoded layer and is not visible via static analysis, but the template is strongly consistent with exploit-kit style payload staging.

Object defined twice with divergent /Filter chains high PDF_DUPLICATE_OBJ_DIVERGENT

Same indirect object (N G) is defined more than once in the file, and the definitions declare different /Filter chains.

Readers that take the first definition decode different bytes than readers that take the last definition (PDF spec is ambiguous on which wins; Acrobat takes last). Divergent filter chains across the duplicates is a deliberate parser-divergence pattern: benign content is shown to scanners while malicious content is shown to the actual reader.

OpenAction trigger high PDF_OPENACTION

PDF has an /OpenAction that performs an action when the file is opened.

OpenAction specifies an action or destination to perform when the document is opened. It can execute JavaScript when paired with a JavaScript action, but OpenAction is not always code by itself.

OpenType EBSC max-range record with bitmap tables high PDF_OPENTYPE_EBSC_WITH_SBIT

Malformed EBSC max-range record appears alongside EBLC/EBDT bitmap tables.

EBSC is used for embedded bitmap scaling metadata. A max-range EBSC record paired with bitmap glyph tables is an Adobe libCoolType-specific parser differential related to CVE-2023-26369.

OpenType EBSC table declares max offset and length high PDF_OPENTYPE_EBSC_MAX_RANGE

sfnt EBSC table record declares offset=0xffffffff and length=0xffffffff.

Project Zero noted the CVE-2023-26369 proof-of-concept font used this malformed EBSC table record to prevent the font from loading in many font parsing libraries while Adobe libCoolType still processed the bitmap tables.

OpenType VariationStore itemVariationDataCount is huge high PDF_OPENTYPE_VARSTORE_COUNT_HUGE

Embedded OpenType variable font declares an implausibly large itemVariationDataCount.

Adobe Acrobat Reader has had memory-safety bugs in VariationStore itemVariationData allocation and indexing paths. Huge counts are malformed structural inputs to that parser path.

OpenType VariationStore offset out of range high PDF_OPENTYPE_VARSTORE_OFFSET_OUT_OF_RANGE

VariationStore or itemVariationData offsets point outside the containing table.

VariationStore offset arrays are pointer-like fields in OpenType variable font tables. Out-of-range entries are the structural class described in Talos' Acrobat Reader VariationStore analysis.

OpenType cmap subtable offset out of range high PDF_OPENTYPE_CMAP_OFFSET_OUT_OF_RANGE

A cmap encoding record points outside the cmap table.

Out-of-range cmap offsets are a common structural parser bug shape.

OpenType embedded-bitmap component placement exceeds bitmap buffer high PDF_OPENTYPE_SBIT_COMPONENT_OOB

EBLC/EBDT compound bitmap glyph metadata positions a component beyond the computed bitmap buffer.

CVE-2023-26369 exploited missing bounds checks in Adobe libCoolType's sfac_GetSbitBitmap when merging embedded bitmap glyph components. The scanner computes the bitmap buffer size from glyph metrics and flags component offsets whose merge index exceeds that buffer.

OpenType glyph offset outside glyf table high PDF_OPENTYPE_GLYF_OFFSET_OUT_OF_RANGE

A loca entry points beyond the glyf table.

Out-of-range glyph offsets are a direct font parser memory-safety primitive.

OpenType head table is truncated high PDF_OPENTYPE_HEAD_TRUNCATED

The head table is too short to carry indexToLocFormat.

Without a valid indexToLocFormat, loca offsets can be interpreted with the wrong width.

OpenType invalid loca format high PDF_OPENTYPE_LOCA_FORMAT_INVALID

head.indexToLocFormat is outside the valid 0/1 range.

Invalid loca format values create parser divergence in glyph offset decoding.

OpenType itemVariationData malformed high PDF_OPENTYPE_VARSTORE_ITEMDATA_MALFORMED

itemVariationData subtable has impossible item or region counts.

Malformed itemVariationData metadata can desynchronise Adobe's variable-font VariationStore parser and is worth surfacing as font exploit-surface evidence.

OpenType loca offsets decrease high PDF_OPENTYPE_LOCA_NOT_MONOTONIC

loca glyph offsets are not monotonically increasing.

Decreasing glyph offsets imply overlapping or negative-size glyf records.

OpenType loca table too short high PDF_OPENTYPE_LOCA_TRUNCATED

loca cannot hold numGlyphs+1 offsets.

A too-short loca table can make glyph lookup read past the embedded table.

OpenType maxp table is truncated high PDF_OPENTYPE_MAXP_TRUNCATED

The maxp table is too short to declare numGlyphs.

Glyph table validation depends on maxp.numGlyphs. A truncated maxp table can make readers disagree over glyph bounds.

OpenType table record points outside the font high PDF_OPENTYPE_TABLE_OUT_OF_RANGE

sfnt table-record entry's offset+length lies beyond the embedded font bytes.

Renderers that follow the offset blindly read attacker-influenced bytes from adjacent memory. The OpenType table directory is the first structure parsed in any sfnt font, so this affects every font-engine code path.

PDF JavaScript ActiveX downloader high PDF_JS_ACTIVEX_DOWNLOADER

Decoded PDF JavaScript downloads, writes, and executes a Windows payload through ActiveX.

The detector requires co-occurring ActiveXObject, XMLHTTP/WinHTTP, ADODB.Stream or ResponseBody file-write behavior, and WScript/rundll32-style execution in a recovered PDF JavaScript stage. This is a precise commodity downloader signature, not a specific Acrobat parser CVE.

PDF JavaScript WScript downloader high PDF_JS_WSCRIPT_DOWNLOADER

Decoded PDF JavaScript reconstructs a Windows Script Host downloader.

The detector requires WScript.CreateObject/WScript.Shell together with XMLHTTP or WinHTTP download behavior, ADODB.Stream/SaveToFile style file writing, and Run/cmd execution markers in a recovered PDF JavaScript stage. This is a precise commodity downloader signature, not a specific Acrobat parser CVE.

PDF JavaScript embeds a Windows Script Host payload high PDF_JS_WSCRIPT_PAYLOAD

PDF JavaScript contains Windows Script Host/JScript payload behavior.

This rule is broader than the WScript downloader signature: it requires WScript.CreateObject plus WScript.Shell and multiple payload-behavior indicators such as environment access, run/exec, sleep, registry access, XMLHTTP, ADODB.Stream, PowerShell, or cmd.exe. It surfaces embedded Windows-script payloads without assigning an Acrobat CVE when the actual vulnerability trigger is not present.

PDF JavaScript object lifetime reuse pattern high PDF_JS_LIFETIME_REUSE_PATTERN

PDF JavaScript acquires, releases, delays, and then reuses a viewer-managed object.

Several Adobe Reader exploit chains abuse stale JavaScript wrappers for viewer-managed objects such as dataObjects, form fields, annotations, or media players. This rule looks for the behavioral sequence instead of one exact CVE: acquire an object, delete/remove/null it, use a timer or GC/heap pressure step, then access the same object family again.

PDF JavaScript shellcode contains an embedded download URL high PDF_JS_SHELLCODE_DOWNLOAD_URL

A URL was recovered from a %uXXXX shellcode run inside decoded PDF JavaScript.

Reader exploit shellcode stores its second-stage fetch URL as a run of little-endian %uXXXX Unicode escapes and downloads-and-executes it with a urlmon/URLDownloadToFile-style call. Recovering an http(s) URL from that byte stream is a concrete download/C2 indicator on its own, independent of which version-gated Acrobat CVE the surrounding script triggers. This is commodity downloader behaviour, so no specific CVE is asserted.

PDF URI command path high PDF_DANGEROUS_URI_COMMAND

PDF /URI action references a command interpreter or script host path.

Normal PDF URI actions point to web, mail, or document links. A URI using path traversal and command interpreter names such as cmd.exe is a legacy dropper/execution lure pattern and should not appear in benign documents.

PDF metadata JavaScript eval stager high PDF_METADATA_EVAL_STAGER

PDF JavaScript decodes document metadata fields and evals the recovered stage.

Some PDF exploit kits hide JavaScript or shellcode in metadata fields such as Title, Subject, Producer, or Keywords, then use parseInt/String.fromCharCode-style decoding and eval. The rule requires metadata field access, a decoder, and an eval sink, plus either multiple metadata fields or a large encoded base-N payload.

PDF metadata arithmetic JavaScript stager high PDF_INFO_ARITHMETIC_JS_STAGER

PDF Info metadata rebuilds an exploit stage through arithmetic char-code tokens.

Some PDF exploit kits hide the real JavaScript in Info metadata fields as comma-separated arithmetic tokens such as t9.5*w or n*7.375. A small launcher reads metadata like this.producer or this.title, rebuilds the JavaScript with String.fromCharCode, and evals it. The rule fires only after a bounded decoder recovers exploit-like JavaScript from that metadata/launcher shape.

PDF modified after it was signed high PDF_SIGNATURE_POST_SIGN_MODIFICATION

Bytes were appended after the signed/certified byte range.

An incremental update was appended AFTER a signature's ByteRange. Bare appends are normal (re-signing, form-fill), so this is only raised when the appended region introduces active content (/OpenAction, /JavaScript, /Launch, a new /Catalog, …) — the documented PDF 'shadow attack', where the viewer still shows the original signature as valid while rendering/executing content the signer never approved — or when a DocMDP no-changes certification is violated.

PDF parsers disagree on structural counts high PDF_PARSER_DIVERGENCE

Two independent PDF parsers produced significantly different counts of streams or pages on the same bytes.

Exploitation samples routinely rely on parser confusion: one reader processes the file one way, a vulnerable target reads it another. When the in-process byte-level scanner and a second parser (pdfminer.six) disagree on basic structural counts, the file is almost always either corrupt or deliberately crafted to be ambiguous — both are higher-risk than a clean parse.

PRC stream missing 'PRC' magic high PDF_PRC_HEADER_INVALID

PRC stream does not begin with the ASCII bytes 'PRC' at offset 0.

Adobe's Product Representation Compact format begins with the 'PRC' magic. A missing or wrong magic on a closed-format parser surface that is rarely exercised is an interesting unknown-exploit signal — fewer eyes have looked at PRC parsing than at any of the open 3D formats.

Page-word XOR JavaScript eval stager high PDF_PAGE_WORD_XOR_EVAL_STAGER

PDF JavaScript rebuilds and evals a hidden stage from rendered page words.

Older PDF exploit kits hide byte values in visible page text, then use getPageNumWords()/getPageNthWord() to enumerate words, take byte-like fragments, XOR-decode them, and eval the recovered JavaScript. The rule requires the page-word APIs, char-code decoding, an eval sink, and XOR logic, keeping it narrower than a generic JavaScript/OpenAction match.

Prototype-pollution JavaScript pattern high PDF_JS_PROTOTYPE_POLLUTION

PDF JavaScript mutates prototypes and references privileged PDF APIs.

Prototype pollution is a modern JavaScript exploitation technique. This rule matches __proto__, constructor.prototype, or Object.prototype mutation alongside privileged APIs such as trustedFunction, launchURL, submitForm, getField, or readFileIntoStream. It deliberately tracks the technique without assigning an unverified CVE number.

QR-code business verification phishing lure high PDF_QR_PHISHING_LURE

PDF combines a QR-like image with scan/verification/business-process lure text.

QR-code phishing can hide the target URL inside image pixels so static URL rules never see a destination. This rule does not need to decode the QR payload: it requires a QR-like square image plus visible text instructing the recipient to scan or use the QR code for verification, HR, payroll, policy, email, signature, or similar business-process activity. The co-occurrence keeps normal QR codes in brochures or invoices from being flagged on image shape alone.

RichMedia (Flash) high PDF_RICHMEDIA

PDF contains /RichMedia (Adobe Flash content).

Flash has a long history of critical vulnerabilities. Embedded Flash in PDFs was a major exploit vector. Flash is now end-of-life, making any Flash content in a PDF highly suspicious.

TrueType bitmap font + active content — CVE-2023-26369 related high PDF_CVE_2023_26369_RELATED

PDF embeds TrueType font data with bitmap tables and active-content indicators.

Embedded TrueType bitmap tables (EBDT, EBLC, sbix, or CBDT) paired with JavaScript, XFA, or RichMedia match the broad delivery surface used by CVE-2023-26369-style Adobe Reader font exploits. The exact CVE detector is CVE_2023_26369; this broader rule does not validate the malformed EBLC/EBDT component-placement primitive.

Type 1 CharString callOtherSubr stack-pivot sequence high PDF_TYPE1_CALLOTHERSUBR_STACK_PIVOT

Decrypted Type 1 CharString contains repeated get/callOtherSubr bytecode sequences.

Adobe Type 1 CharStrings are stack-machine bytecode. Repeated get/callOtherSubr sequences are unusual in normal glyph programs and match the primitive used by CVE-2021-21086-style CoolType operand-stack manipulation.

Type 1 CharString operand stack grows beyond spec high PDF_TYPE1_CHARSTRING_STACK_OVERFLOW

Decrypted Type 1 CharString bytecode pushes more operands than expected.

Type 1 CharStrings are stack bytecode. Operand-stack overflow is a font-engine exploit primitive.

Type 1 CharString operand stack underflow high PDF_TYPE1_CHARSTRING_STACK_UNDERFLOW

Decrypted Type 1 CharString bytecode consumes operands that are unavailable.

Stack underflow in Type 1 bytecode can lead to interpreter state corruption or parser divergence.

U3D block declares an implausibly large size high PDF_U3D_HUGE_BLOCK_SIZE

U3D block declares a data or metadata section size beyond any plausible legitimate value.

Allocators sized from these 32-bit fields are a recurring exploit primitive in 3D engines. Real-world U3D blocks rarely exceed a few megabytes; tens of megabytes is consistent with adversarial construction.

U3D block extends past stream end high PDF_U3D_BLOCK_TRUNCATED

U3D block declares a total size that runs past the available stream bytes.

Renderers that resync to the next plausible block header read attacker-controlled bytes from the gap between the truncated block and the next header. Real-world U3D files produced by 3D tooling are byte-aligned and never declare a size larger than the file.

U3D stream missing or wrong File Header block high PDF_U3D_HEADER_MISSING

An embedded U3D (3D-model) stream does not start with the required File Header block — a malformed/parser-divergence shape on a rarely-inspected attack surface.

ECMA-363 requires the first U3D block to be the File Header. A different first block is a parser-divergence shape on a low-traffic attack surface that few static analysers cover deeply. Truncated streams that fail to contain even the 30-byte minimum header are reported under this same rule.

U3D/3D content in PDF — Adobe Reader 3D parser CVE-family indicator high PDF_U3D_CVE_RELATED

PDF contains U3D/3D annotation content or U3D signatures.

CVE-2011-2462 (CVSS 9.3) and CVE-2009-3953 (CVSS 9.3) are critical vulnerabilities in Adobe Reader's U3D processing library. CVE-2011-2462 is a use-after-free triggered by a crafted U3D image inside a PDF, exploited as a zero-day in December 2011. CVE-2009-3953 is a heap overflow in the U3D parser. U3D content in PDFs is extremely rare in normal business documents. The rule matches /Subtype /U3D, /3D annotation content, or U3D signatures; it does not validate the malformed U3D structures needed to prove a specific CVE.

XFA form contains executable script high PDF_XFA_SCRIPT

PDF embeds an XFA dataset with a <script> or <xfa:script> block.

XFA scripting has been the exploit primitive for several Adobe Reader RCEs (CVE-2010-0188 family, CVE-2018-4901). Plain XFA without scripts is far less risky.

XFA numeric JavaScript stager high PDF_XFA_NUMERIC_JS_STAGER

PDF XFA script rebuilds hidden JavaScript from numeric field data or a character table.

Some XFA exploit kits store the real JavaScript as numeric values in form fields or as indexes into a short character table, then rebuild and eval it during an initialize event. The rule is bounded to XFA script packets and only fires after static decoding recovers exploit-like JavaScript or shellcode markers.

XFA numeric character-table eval stager high PDF_XFA_NUMERIC_EVAL_STAGER

XFA initialize script maps numeric rawValue data through a character table and evals it.

This is a bounded fallback for XFA exploit-kit launchers whose final recovered stage remains encoded. It requires an XFA script, rawValue numeric staging, a short cc character table, a long reconstruction loop, and an eval-like sink.

app.launchURL with file/cmd/UNC target high PDF_FOXIT_LAUNCHURL

PDF JavaScript launches a URL with a file://, cmd:, or UNC scheme.

Foxit and Adobe handle these schemes inconsistently — they have been used for code execution and NTLM credential theft (the latter via UNC paths).

eval() call high PDF_EVAL

JavaScript eval() function found in PDF.

eval() executes a string as code. In malicious PDFs it is often used for dynamically constructed or decoded exploit code, making static analysis harder.

getAnnots heap-spray JavaScript stager high PDF_JS_GETANNOTS_HEAPSPRAY_STAGER

PDF JavaScript pairs getAnnots with heap-spray shellcode and an embedded payload.

The document calls getAnnots() in a context containing classic Adobe Reader heap-spray markers and an embedded payload. This is CVE-2009-1492-related evidence, but is not exact CVE attribution unless the getAnnots argument has the distinctive overflow/long-string trigger shape.

unescape() call high PDF_UNESCAPE

JavaScript unescape() function found in PDF.

unescape() decodes percent-encoded strings. In PDF exploits, it is commonly used to convert encoded shellcode back to raw bytes before triggering a vulnerability.

xref table points away from the real object high PDF_XREF_OFFSET_MISMATCH

PDF cross-reference table claims object N is at byte offset O, but the bytes at O do not begin with the expected 'N G obj' header.

Readers that trust the xref will resolve the indirect reference to one set of bytes; readers that scan the file linearly will resolve it to another. This parser-divergence shape is a recurring evasion technique in targeted PDF exploits.

ASCII85Decode filter (with exploit indicators) medium PDF_FILTER_85

PDF uses ASCII85Decode stream filter alongside active scripting content.

ASCII85 is a relatively uncommon encoding. Like ASCIIHexDecode, it has legitimate uses, so we only flag it when it co-occurs with active scripting content (/JavaScript, /JS, /XFA, or /RichMedia).

ASCIIHexDecode filter (with exploit indicators) medium PDF_FILTER_HEX

PDF uses ASCIIHexDecode stream filter alongside active scripting content.

ASCIIHexDecode is legitimately used by some scanned-document and PostScript-derived PDFs, so on its own it's noise. This rule only fires when the filter co-occurs with active scripting content (/JavaScript, /JS, /XFA, or /RichMedia) — the shape associated with payload obfuscation.

Ad redirect link medium PDF_AD_REDIRECT_LINK

Small PDF routes a clickable link through an ad/tracking redirector.

Tiny PDFs that contain little content beyond a redirector URL are common in malvertising and phishing delivery. The embedded destination can change or be selectively served, so this is stronger than a generic external URI.

Additional Actions dictionary medium PDF_AA

PDF defines /AA (Additional Actions) triggers.

Additional Actions can fire on events like page open, print, or close. They are often used to trigger script or external actions during viewing.

CFF CharString is unusually large medium PDF_CFF_CHARSTRING_HUGE

A single CFF Type 2 glyph program is far larger than expected.

Normal glyph programs are small. Very large CharStrings often indicate bytecode-as-payload, malformed subroutine graphs, or exploit grooming.

CFF INDEX declares an implausibly large entry count medium PDF_CFF_INDEX_COUNT_HUGE

CFF INDEX (Name / Top DICT / String / Subrs / CharStrings) declares thousands of entries.

Real fonts have at most hundreds of entries in any single INDEX. Allocators sized from this 16-bit field have been a recurring exploit primitive in font rasterisers.

CFF INDEX first offset != 1 medium PDF_CFF_INDEX_FIRST_OFFSET_WRONG

CFF INDEX's first offset entry is not 1 (the spec-mandated value).

Some implementations validate; others trust the value and read from the wrong byte. Real-world fonts produced by standard tooling always have first offset = 1.

CFF font header is truncated or malformed medium PDF_CFF_HEADER_TRUNCATED

CFF font header is structurally invalid: header size out of range, or header runs past the stream length.

Different font rasterisers handle truncated CFF headers differently — some abort, some pad with zeros and continue — leading to divergent glyph rasterisation between scanner and viewer.

Cracked-software link-farm lure medium PDF_CRACKED_SOFTWARE_LURE

PDF links advertise cracked/pirated software (crack, keygen, serial key, warez).

These PDFs are SEO-spam carriers: they pack many clickable links whose targets use software-piracy vocabulary so the document ranks for '<app> crack download' searches, then routes users to fake crack pages serving potentially-unwanted programs, adware, or droppers. The rule fires on several distinct links carrying piracy-specific tokens in the URL itself, so ordinary documents that merely mention the word are not affected. The PDF carries no exploit of its own — the risk is the linked destinations, so it is capped at suspicious.

Escaped URL shortener medium PDF_ESCAPED_SHORTENER_URI

PDF hides a clickable URL-shortener destination with PDF string escapes.

PDF literal strings can legally encode punctuation with octal escapes, but pairing that obfuscation with a URL shortener is a stronger phishing signal than a normal visible link. Attackers use this to defeat simple URL extractors while hiding the final landing page behind a redirector.

High stream count medium PDF_MANY_STREAMS

PDF contains 500+ stream objects.

An abnormally high stream count may indicate heap spraying (filling memory with repeated data) or heavy obfuscation of the PDF structure. Threshold is 500 to avoid flagging legitimate technical / textbook PDFs.

ICC profile contains a duplicate tag signature medium PDF_ICC_DUPLICATE_TAG_SIG

Same ICC tag signature appears more than once in the tag table.

Per spec each signature must be unique. Implementations that take the first vs. the last copy produce different colour transforms — a shape used to hide an attacker-chosen mAB./mBA. pipeline behind a benign-looking earlier entry.

ICC profile declares an implausibly large tag count medium PDF_ICC_TAG_COUNT_HUGE

ICC profile declares more than ~256 tag entries; real-world profiles have at most a few dozen.

Implausibly large tag counts are consistent with hand-crafted profiles designed to exhaust allocator state or trigger integer overflows in the tag-table indexing arithmetic.

ICC profile size disagrees with embedded length medium PDF_ICC_SIZE_MISMATCH

ICC profile header field 'profile size' does not match the actual length of the embedded profile bytes (or the tag table extends past the bytes available).

Colour-management stacks that trust the header field and those that trust the container length read different bytes. The CVE-2018-4990 family of ICC-parser bugs lived in exactly this disagreement.

ICC tag has size=0 with non-zero offset medium PDF_ICC_TAG_ZERO_SIZE_NONZERO_OFFSET

ICC tag declares zero data length but a non-zero offset.

Implementations that prefetch the offset before checking size read out-of-bounds bytes. Real-world profiles either use zero/zero or non-zero/non-zero — the mismatched form is consistent with adversarial construction.

Image-only lure with single non-reputable link medium PDF_IMAGE_LURE_NONREPUTABLE_LINK

Image-heavy PDF whose sole clickable action links to a non-reputable host carrying a random throwaway subdomain.

The canonical malspam-carrier shape: a screenshot-like 'click to view document' page whose only purpose is to funnel the victim to one redirect/landing URL on a compromised or throwaway domain. Flagged suspicious rather than malicious because the image lure plus a single non-reputable link with a random subdomain is the only corroborator; ML/AV signals push it to malicious.

JBIG2 segment-header walk aborted medium PDF_JBIG2_HEADER_TRUNCATED

A JBIG2 image segment header could not be parsed cleanly (e.g. a length field points past the end of the data) — a renderer-divergence shape.

A JBIG2 segment header could not be parsed cleanly: a length field claimed more bytes than the stream contains, a referred-to-segment count was truncated, or a reserved code was used. Renderers that fail open on broken headers parse different segment data than renderers that abort.

JBIG2 stream contains an implausibly large number of segments medium PDF_JBIG2_HUGE_SEGMENT_COUNT

JBIG2 stream declares thousands of segments where real-world scanned-document JBIG2 typically contains tens to a few hundred.

Hand-crafted JBIG2 streams designed to stress the segment-graph allocator or refcounter are a recurring exploit primitive. On its own this is a weak signal but it contributes when paired with other JBIG2 anomaly rules.

JBIG2Decode filter medium PDF_JBIG2

PDF uses JBIG2Decode image compression.

JBIG2 is a complex image codec. Vulnerabilities in JBIG2 decoders have been exploited in high-profile zero-click attacks (e.g. NSO Group's FORCEDENTRY, CVE-2021-30860).

OpenType / sfnt directory declares too many tables medium PDF_OPENTYPE_NUMTABLES_HUGE

sfnt offset table declares more than ~64 tables, well beyond any realistic font.

Allocators sized from the numTables field are an exploit primitive. Real OpenType / TrueType fonts contain on the order of 10–25 tables.

OpenType cmap declares too many subtables medium PDF_OPENTYPE_CMAP_SUBTABLES_HUGE

cmap declares an implausibly large number of subtables.

Very large subtable counts stress parser loops and allocations.

OpenType cmap table is truncated medium PDF_OPENTYPE_CMAP_TRUNCATED

cmap header or encoding records extend past the table.

Malformed character-map tables can make different text/glyph paths disagree.

OpenType directory contains a duplicate table tag medium PDF_OPENTYPE_DUPLICATE_TABLE

sfnt directory contains the same 4-byte table tag more than once.

Renderers that take the first record vs. the last record produce different glyph rasterisation. Duplicated table tags are not produced by any standard font tooling.

OpenType maxp declares implausibly many glyphs medium PDF_OPENTYPE_MAXP_GLYPHS_HUGE

maxp.numGlyphs is far beyond typical embedded PDF fonts.

Huge glyph counts stress loca/glyf allocation and iteration paths.

OpenType name string offset out of range medium PDF_OPENTYPE_NAME_STRING_OUT_OF_RANGE

A name record points outside name table string storage.

Out-of-range name strings are a low-level font table consistency violation.

OpenType name table declares too many records medium PDF_OPENTYPE_NAME_RECORDS_HUGE

The name table record count is implausibly large.

Huge name record counts stress table-walking logic.

OpenType name table is truncated medium PDF_OPENTYPE_NAME_TRUNCATED

name records or string storage point outside the name table.

Malformed name tables are useful parser-divergence evidence when paired with other font anomalies.

PDF digital signature is cryptographically invalid medium PDF_SIGNATURE_INVALID

A PDF signature's CMS failed verification.

Either the signer's signature over the signed attributes is invalid, or the signed content digest does not match the bytes the ByteRange covers. The signature was tampered with, forged, or the signed content was altered after signing.

PDF embedded file could not be fully decoded medium PDF_EMBEDDED_FILE_UNDECODED

A declared PDF /EmbeddedFile stream could not be decoded through its filter chain.

The detector only applies to standard attachment streams referenced by /EmbeddedFile or /EF. When their filter chain fails, the raw stream is carved so unsupported or malformed attachment encodings do not hide payload bytes from artifact triage.

PDF paints image(s) but contains no text operators medium PDF_IMAGE_ONLY_LURE

PDF has at least one image XObject and zero text-emitting operators in raw or decompressed content streams.

Phishing PDFs are often built by exporting a screenshot to PDF: a single page with one or more image XObjects and no text. The carrier evades text-based scanners (no keywords to match) and delivers its call-to-action purely through rendered pixels — a phone number to call, a QR code to scan, or a link the user is told to type. Distinct from PDF_IMAGE_LURE, which requires a small file and an in-PDF click-action; this rule has neither constraint and looks inside compressed content streams to avoid false positives on real text-bearing PDFs.

PDF signature ByteRange does not start at the file head medium PDF_SIGNATURE_BYTERANGE_EVASION

A signature's ByteRange starts past byte 0, leaving content uncovered.

A legitimate PDF signature covers from byte 0. A ByteRange that starts past the file head leaves real content outside the signed region while still presenting as 'signed' — the Universal Signature Forgery evasion.

PHP-gateway SEO-spam PDF link farm medium PDF_SEO_PHP_GATEWAY_LINK_FARM

PDF carries >=4 links to .php gateways with a multi-word search-phrase document slug (pharma / binary-options / SEO spam).

Legitimate PHP-served documents use a filename or numeric id, not a multi-word search-query phrase. Four or more links of the shape 'index.php?.../binary+options+trading+nz.pdf' or 'pdf.php/cialis-dosage-side-effects.pdf' — whether spread across hosts or clustered — are a generated SEO link farm that ranks for queries and routes users into payload/redirect chains. The PDF is an inert link carrier; the risk lives in the linked destinations.

Raw-IP clickable URI medium PDF_URI_IP_LITERAL

PDF clickable URI points to a literal IPv4 address.

Legitimate PDFs normally link to named domains. Clickable HTTP(S) links to raw IP addresses are common in disposable phishing and malware-delivery infrastructure, especially when paired with link annotations or screenshot lures.

Remote GoTo action medium PDF_GOTO_REMOTE

PDF references a remote or embedded document via GoToR/GoToE.

GoToR/GoToE can open another PDF or trigger loading of a remote resource, potentially bypassing security controls by chaining documents.

SEO doc-farm redirector links medium PDF_SEO_DOC_REDIRECTOR_LINK_FARM

PDF carries /pdf/<domain> + /doc/<domain> SEO doc-farm redirector links.

The generated 'free document/template' SEO phishing family links through redirectors whose path ends in a bare website domain behind a /pdf/ or /doc/ segment (e.g. 'host/Document-Title-Slug/pdf/target-site.tld'). Firing requires the family's signature /pdf/ + /doc/ variant pair on the same host and slug, or one such redirector alongside multiple links into WordPress form-plugin upload storage. The PDF is an inert link carrier; the risk lives in the linked destinations.

Scheme-label deceptive host link medium PDF_DECEPTIVE_SCHEME_HOST_LINK

PDF clickable link host starts with a literal 'http.'/'https.' DNS label.

A destination hostname whose first DNS label is literally 'http' or 'https' (e.g. 'https.file-transfers.example.com') makes the rendered link read 'https://https.…', disguising the real registered domain as a secure service. No legitimate operator names a host this way; it is a URL-deception tell used by phishing carriers.

Stream /Length disagrees with actual byte count medium PDF_LENGTH_MISMATCH

PDF stream object declares a /Length that does not match the actual bytes between 'stream' and 'endstream'.

Different PDF readers resolve stream length either from the declared /Length value or from the 'endstream' framing markers. When the two disagree, the same file renders as different content in different readers — a known evasion shape used to hide payload from static scanners that trust one source while the actual reader trusts the other.

Stream advertises a filter that cannot decode the body medium PDF_FILTER_CHAIN_UNDECODABLE

PDF stream declares /Filter /FlateDecode but the raw stream bytes are rejected by zlib in both wrapped and raw modes.

A renderer that aborts on the broken stream and one that fails-open see different document content. Targeted samples sometimes deliberately break the filter chain so that lighter-weight scanners skip the stream while heavier renderers still extract a payload from the partial data.

String.fromCharCode medium PDF_FROMCHARCODE

String.fromCharCode found in PDF JavaScript.

fromCharCode constructs strings from numeric character codes. Exploit authors use it to build payloads character by character to evade string-based detection.

SubmitForm action medium PDF_SUBMITFORM

PDF has a /SubmitForm action that can POST data to an external URL.

SubmitForm actions send form field data to a URL when triggered — potentially exfiltrating credentials, file paths, or system information to an attacker-controlled server. Caveat: legitimate PDF forms (government applications, grant submissions, enterprise surveys) do use /SubmitForm to post data to known servers. Check the target URL context before escalating this finding.

U3D stream contains an implausibly large number of blocks medium PDF_U3D_HUGE_BLOCK_COUNT

U3D stream contains thousands of blocks where real-world files typically contain at most a few hundred.

Hand-crafted U3D streams designed to stress the modifier-chain allocator or refcounter are a recurring exploit primitive on the U3D parser surface. Weak signal alone, contributes when paired with other 3D-content anomaly rules.

URL shortener link medium PDF_URL_SHORTENER_URI

PDF clickable URI points to a URL shortener.

Clickable URL-shortener links hide the final landing page from static review and are common in phishing redirect PDFs. This is stronger than a generic external URI because the visible destination is an intermediate redirect service rather than the actual site.

AcroForm button with action trigger low PDF_ACROFORM_BUTTON

PDF contains a /Btn form field paired with a SubmitForm/URI/Launch/JS trigger.

Large interactive form buttons are the common 'fake download button' in phishing PDFs. Attackers overlay a /Btn field on a screenshot of a legitimate document to create the illusion of a clickable download link. /Btn appears in essentially every fillable PDF form, so the rule only fires when paired with a remote-action trigger — the actual phishing-button shape.

Embedded file low PDF_EMBEDDED

PDF embeds a file attachment.

Embedded files can carry executables, scripts, or other malware. While some legitimate PDFs include attachments, this warrants inspection.

Image-only document (screenshot lure) low PDF_IMAGE_LURE

PDF contains many images but very few text blocks — possible screenshot lure.

A common phishing technique renders a screenshot of a legitimate document (e.g. a locked Word file, a DocuSign request) as a full-page image with no real text content, then overlays a form button or URI action on the image. Caveat: this heuristic has a high false-positive rate. Scanned documents (contracts, invoices, IDs), image-heavy brochures, and photo PDFs all trigger it legitimately. Use this finding only as supporting context alongside higher-severity indicators.

Indirect reference to undefined object low PDF_DANGLING_INDIRECT

PDF body contains an indirect reference (N G R) to an object number that is never defined in the file.

Lenient readers silently skip dangling references; strict readers may treat the slot as null and follow a different code path. On its own this is often just a corruption artefact — older scanner output and damaged PDF/A files commonly trip it — but it contributes weakly when paired with stronger anomaly signals.

Optional Content Group with action trigger low PDF_OPTIONAL_CONTENT

PDF uses Optional Content Groups (OCG) and contains an action trigger.

Optional Content Groups allow parts of a PDF to be shown or hidden. Attackers abuse this to show lure content on first open then hide it (defeating sandbox screenshots) while the action trigger still fires. OCGs alone are standard in CAD/technical, multilingual, and layered PDFs, so this rule only fires when an action trigger is also present.

XFA form low PDF_XFA

PDF uses XML Forms Architecture (XFA).

XFA forms can contain JavaScript and complex logic. Vulnerabilities in XFA parsers have been exploited in the past.

syncAnnotScan annotation-staging primitive low PDF_FOXIT_SYNCANNOTSCAN

PDF JavaScript calls syncAnnotScan() — an exploit-kit staging primitive used to force annotation enumeration before reading payload bytes from /Subject fields.

syncAnnotScan() is a legitimate no-argument Acrobat / Foxit JavaScript API that ensures all annotation objects are populated before getAnnots() is called. It is not a vulnerable sink and has no associated CVE. However, exploit-kit JavaScript routinely calls it as a staging step in the pattern 'z.syncAnnotScan(); var p = y.getAnnots({nPage:0}); var s = p[0].subject; ... eval(s)' — where the encoded payload was hidden in annotation /Subject fields. A bare call rarely appears in legitimate PDFs, so it is a low-severity exploit-kit-shape indicator on its own; combined with getAnnots() + subject reads + eval, the related rule PDF_JS_OBFUSCATED_DROPPER fires the high-severity composite finding.

Body-only duplicate object in PDF info PDF_DUPLICATE_OBJ_BODY_INCREMENTAL

Same indirect object (N G) is defined more than once with different body bytes.

Body-only duplicate objects are common in benign incremental updates and PDF editor save chains. The analyzer records the structure for explainability, but it is not treated as an unknown-exploit signal unless a duplicate body carries active content or divergent filters.

CFF CharString operand stack grows beyond spec info PDF_CFF_CHARSTRING_STACK_OVERFLOW

Type 2 CharString bytecode pushes more operands than the interpreter stack should hold.

CFF/Type 2 CharStrings are stack bytecode. Operand-stack overflow is a recurring font parser bug class and a strong structural exploit signal.

Encrypted document info PDF_ENCRYPTED

PDF declares /Encrypt — strings and stream contents are encrypted.

PDF document encryption applies the standard security handler's cipher (RC4 or AES) to all strings and stream contents. The keys (/JavaScript, /Filter, etc.) remain visible but their values do not. Many legitimate documents are encrypted (signed contracts, billing statements, rights-managed material); on its own this is informational, but it limits what the static scanner can see.

External URI info PDF_URI

PDF contains an external URL action.

The PDF links to an external website. While common in legitimate PDFs, malicious PDFs use URLs to redirect to phishing sites or malware downloads.

Object defined twice with different bodies info PDF_DUPLICATE_OBJ_DIVERGENT_BODY

Same indirect object (N G) is defined more than once with different body bytes.

Duplicate object bodies create first-wins versus last-wins parser divergence even when the /Filter chains look identical.

PDF carries a digital signature info PDF_SIGNED

The document is digitally signed or certified.

A PDF can be signed with a detached PKCS#7/CMS signature whose ByteRange covers the whole file except the signature hole. The signer certificate, issuer, certification level (DocMDP) and the signature's cryptographic validity are shown for context. Unlike a VBA-project signature, PDF JavaScript is never signed on its own — only as part of the signed byte range — so presence of a signature is NOT a benign indicator and does not affect the risk score.

PDF differential parser failed info PDF_DIFFERENTIAL_PARSE_FAILED

The cross-check parser (pdfminer.six) raised an error on this file.

The analyzer cross-checks its in-process byte-level PDF scan against an independent pdfminer.six pass to surface parser-divergence exploit shapes. A pdfminer error here is itself a signal — malformed PDFs (corrupted xref, divergent duplicate objects, broken object streams) often deliberately defeat one parser while remaining renderable in another, which is the basis of several real-world PDF exploitation primitives. The static byte-level heuristics still ran on the file and their findings above are valid; only the differential cross-check signal is missing.

PDF signed with a self-signed certificate info PDF_SIGNATURE_SELF_SIGNED

The PDF signing certificate is self-signed (issuer == subject).

The document is signed with a self-signed certificate that no certificate authority vouches for. Unlike self-signed VBA macro signing (a real attacker trick), self-signed PDF sealing is a mainstream-legitimate pattern — invoicing and document services seal PDFs with self-signed certs for integrity, not CA-backed identity. So this is informational context (surfaced as the signature box's 'Self-signed' status) and does not affect the risk score; tampering surfaces as PDF_SIGNATURE_INVALID instead.

Office 164

Dangerous XLM formula APIs critical OOXML_XLM_DANGEROUS_FN

Excel 4.0 macro sheet uses formula APIs that call directly into Win32.

=CALL / =EXEC / =REGISTER / =FORMULA / =FOPEN — these are the primitives used by XLM-based droppers to download payloads, write files, and start processes without invoking VBA.

Embedded Adobe Flash (SWF) in Office document critical OFFICE_EMBEDDED_SWF

Office document contains an embedded SWF (Flash) object.

Vulnerabilities such as CVE-2018-4878 and CVE-2018-15982 involved Flash objects embedded in Office files. Adobe Flash has been end-of-life since December 2020.

Embedded Office document static findings critical EMBEDDED_OFFICE_CHILD_STATIC_TRIAGE

A carved embedded OLE Office document matched exploit or payload heuristics.

Some exploit samples are wrapped by a PE, binder, or other outer container while preserving a complete CFB/OLE Office document at a later offset. The engine carves that secondary Office body, runs the normal Office static rules on it, and promotes concrete CVE or high-risk child findings onto the parent sample.

Embedded Office object carries macros critical OFFICE_EMBEDDED_MACRO_OBJECT

An embedded OLE/OOXML object is itself an Office file that contains a VBA macro project or an Excel 4.0 (XLM) macro sheet.

Hiding a macro-bearing workbook or document inside another document — often under an obfuscated, non-standard part name — is a macro-smuggling technique that defeats scanners which only inspect the outer document's macro storage. No benign authoring workflow stages a hidden macro project this way, so an embedded Office file with its own macros is a strong delivery-vehicle indicator.

Embedded PE executable critical OLE_EMBEDDED_EXE

MZ/PE header found inside the document.

A Windows executable (PE file) is embedded inside the document. This is high-risk — the document is carrying an executable payload.

Embedded PE reassembled from base64 VBA string fragments critical OLE_VBA_EMBEDDED_PE_DROPPER

VBA carries a base64-encoded EXE split across many string variables and rebuilds it at run time (ADODB.Stream drop-and-run).

The VBA macro stores a base64-encoded Windows executable split across many string variables and reassembles it at run time — typically writing it to %TEMP% with ADODB.Stream (often via embedded JScript glue using MSXML2.DOMDocument.nodeTypedValue) and executing it. The payload is embedded in the document, not downloaded, and never appears as a contiguous executable on disk, so URL recovery and the raw embedded-EXE scan both miss it. The analyzer concatenates the macro's base64 runs, decodes at each base64 phase alignment, and confirms a valid PE (MZ + DOS stub + PE header, or process-injection imports). A benign macro does not carry an executable across its string literals, so the match is near-zero-FP; the reassembled PE is carved for full extracted-file (ClamAV + static triage) analysis.

Encrypted Office package with CFB FAT corruption critical OLE_ENCRYPTED_AND_MALFORMED

Encrypted-package shape co-occurs with FAT-chain corruption — the canonical combined evasion form.

An OLE container that is both password-encrypted at the MS-OFFCRYPTO layer and structurally malformed at the FAT-chain level is the canonical evasion shape used to deliver exploit-carrier Office documents past email and gateway scanners. Each signal alone has some benign-occurrence rate; the combination has effectively no legitimate explanation — Excel opens the file because both its FAT walker and its encrypted-package loader are lenient, and that asymmetric tolerance against strict static scanners is the point.

Equation Editor OLE object critical OLE_EQUATION_EDITOR

Equation Editor OLE CLSID found in the document.

The Microsoft Equation Editor component had critical vulnerabilities (CVE-2017-11882, CVE-2018-0802) that allowed arbitrary code execution. Microsoft later removed the component. Finding its CLSID is a high-signal indicator, but the CLSID alone is related evidence rather than a full exploit match.

Equation Editor command stager — CVE-2017-11882 likely critical CVE_2017_11882_EQUATION_NATIVE_CMD

Equation Native stream has invalid MTEF structure and embedded command-launch bytes.

This rule requires Equation Editor CLSID context, an invalid Equation Native/MTEF header, and process-launch command bytes inside the native stream. That combination is a weaponized Equation Editor exploitation pattern consistent with CVE-2017-11882 while avoiding attribution from CLSID presence alone or from benign embedded equations.

Equation Editor exploitation — CVE-2017-11882 critical CVE_2017_11882

Equation Editor exploit primitive or command-stager shape.

CVE-2017-11882 is the Equation Editor stack buffer overflow most directly identified by an overlong MTEF FONT typeface field copied into a fixed-size stack buffer in EQNEDT32.EXE. The analyzer also emits this CVE when an obfuscated or partially malformed Equation object exposes equivalent high-confidence exploitation evidence, such as an activated Equation 3.0 object carrying command-launch bytes and a remote payload target. The rule still requires exploit structure beyond a bare Equation Editor CLSID.

Equation Editor object carries Ole10Native downloader shellcode critical OLE_EQUATION_OLE10NATIVE_DOWNLOADER

Equation Editor OLE object contains Ole10Native shellcode with download and process APIs.

An embedded OLE object declares the legacy Equation Editor CLSID and its Ole10Native stream contains URLDownloadToFile plus process-launch API strings and a remote URL. This is high-confidence exploit payload evidence. It is not assigned to a specific Equation Editor CVE unless the malformed Equation Native/MTEF primitive also matches.

Excel 4.0 macro hidden in a regular worksheet part critical OOXML_XLM_MACRO_IN_WORKSHEET

Workbook has an Auto_Open defined name and stores XLM download/exec logic inside parts declared as normal worksheets.

An XLM evasion that omits the xl/macrosheets/ part and any xlMacrosheet relationship, storing the Excel 4.0 macro formulas as cells in ordinary worksheet parts so macro-sheet-content-type detectors miss it. Resolved WinAPI download/exec strings (URLDownloadToFile / ShellExecute) sit directly in worksheet cells alongside an Excel_BuiltIn_Auto_Open / _xlnm.Auto_Open trigger. Used by 2021-era builders such as AsHkERE/EZHE.

Excel 5 Laroux/Larou-CV macro virus critical OLE_XLS5_LAROUX_MACRO_VIRUS

Legacy Excel workbook contains Laroux/Larou-CV auto-open replication markers.

Laroux-family Excel 5/95 macro viruses infect workbooks by auto-running legacy VBA and copying macro modules into other workbooks. This rule requires a family marker such as laroux, Larou-CV.xls, or big_dork together with auto_open and workbook/module replication strings, so ordinary legacy Excel files and textual references are not enough to trigger it.

Field QUOTE with ASCII-integer payload critical OOXML_FIELD_QUOTE_ASCII_PAYLOAD

A Word field QUOTE expression contains a decimal-ASCII byte sequence. The decoded payload is emitted at field-update time and is typically used to assemble shell-command text that does not appear literally in the document bytes.

Word's QUOTE field accepts a list of decimal byte values and emits them as text when the field is evaluated. Threat actors use this to defeat content-based filters that look for literal 'cmd'/'powershell' strings in document bytes — the dangerous string only exists after Word evaluates the field. When a SET/REF field chain references the QUOTE output from a DDE field, the resulting command runs on document open (MITRE ATT&CK T1559.002). Severity escalates to CRITICAL when the decoded payload references a known-dangerous executable (cmd, powershell, mshta, etc.); MEDIUM otherwise (form has no legitimate use case but no immediately-visible dangerous target).

LOLBin reference in VBA critical OLE_VBA_LOLBIN

VBA macro references a Living-off-the-Land binary (certutil, bitsadmin, mshta).

LOLBins are legitimate Windows tools that attackers misuse for malicious purposes — downloading files, executing scripts, or decoding payloads — because they are trusted by security software.

Legacy Excel formula macro virus marker critical OLE_XLS_FORMULA_MACRO_VIRUS

Workbook contains self-identifying legacy Excel formula macro virus strings.

Older Excel malware sometimes used worksheet formulas and hidden workbook content rather than VBA projects or modern XLM macro-sheet structures. This rule is intentionally narrow: it requires explicit formula-macro-virus markers such as XF.Classic/Poppy text in the Workbook stream, so documents that merely contain ordinary formulas are not flagged.

Legacy XLM macro-virus family marker critical OLE_XLM_LEGACY_MACRO_VIRUS

Workbook contains an XLM macro sheet plus legacy macro-virus family strings.

Legacy Excel macro viruses commonly infected workbooks by adding Excel 4.0 macro sheets and replication logic. This rule requires macro-sheet structure plus specific family strings or workbook replication phrases such as XL4Poppy, Normal_MacroVirus, HPDung, or 'Infect Workbook', keeping it narrower than a generic XLM-macro finding.

MSHTML-style external object relationship critical OFFICE_MSHTML_EXTERNAL_OBJECT

OOXML external relationship targets HTML/CAB/MHTML/HTA-style content.

This is an Office MSHTML attack-surface indicator related to CVE-2021-40444-style delivery, including later script-file variants such as WSF, but it does not match the stricter external OLEObject mhtml/scriptlet/CAB gadget pattern used for the CVE_2021_40444 exact rule.

Malformed OLE auto-open stager with embedded ZIP payload critical OLE_RAW_MALFORMED_AUTOOPEN_STAGER

Malformed OLE bytes contain AutoOpen, embedded ZIP/theme content, VBA project metadata, and URL/CMD/Shell staging tokens.

Some exploit-builder Office documents keep a CFB/OLE header but corrupt the directory enough that ordinary OLE/VBA extraction fails, while an embedded ZIP/theme package and raw VBA project strings remain visible. AutoOpen plus URL/CMD/Shell staging tokens in that malformed container is not benign document automation.

Malicious DDE command critical OOXML_DDE_MALICIOUS

A DDE field instruction launches a dangerous system executable (cmd.exe, PowerShell, mshta, etc.).

This document uses DDE to silently execute a system command. The DDE field references a known-dangerous executable such as cmd.exe, powershell.exe, mshta.exe, or a UNC path. When the user opens the document and agrees to 'update links' (or if DDEAUTO is used, without any prompt), the command runs immediately. This is a well-known attack technique (MITRE ATT&CK T1559.002) that bypasses macro security entirely — no macros need to be enabled.

Microsoft PowerPoint malformed record — CVE-2006-0022 critical CVE likely CVE_2006_0022

PowerPoint OLE Pictures stream is malformed and carries a PE-like payload.

CVE-2006-0022 is a Microsoft PowerPoint remote-code-execution vulnerability triggered by a malformed PowerPoint record. Weaponized samples from this era commonly hide executable payload bytes in the Pictures stream while damaging the stream's compound-file chain so tolerant PowerPoint parsing reaches bytes that ordinary OLE stream readers do not expose. This rule requires a PowerPoint OLE container, a large malformed Pictures stream, image-record material, and an embedded PE-like payload to limit false positives.

Microsoft PowerPoint malformed record — CVE-2006-3877 critical CVE likely CVE_2006_3877

PowerPoint OLE numbered Table stream is malformed and carries a PE-like payload.

CVE-2006-3877 is a Microsoft PowerPoint malformed-record memory corruption vulnerability. This rule requires a PowerPoint OLE container, a large numbered *Table stream whose CFB chain cannot be read normally, and an embedded PE-like payload with process injection or Office-resiliency cleanup strings. Those gates avoid tagging ordinary legacy decks that merely contain table streams.

Microsoft PowerPoint mso.dll malformed shape — CVE-2006-3590 critical CVE likely CVE_2006_3590

PowerPoint Pictures stream contains malformed shape-container material and shellcode.

CVE-2006-3590 is the MS06-048 Microsoft PowerPoint mso.dll remote-code-execution vulnerability triggered by a malformed shape container in a PPT file. In observed PPDropper-style decks, the Pictures stream begins with malformed Escher/shape material and carries PEB/API-resolver shellcode or a PE-like payload. The rule requires PowerPoint stream context plus specific shellcode/payload evidence to avoid ordinary picture stream false positives.

Microsoft Word malformed object pointer — CVE-2006-2492 critical CVE likely CVE_2006_2492

Word OLE object pointers are malformed and unreferenced sectors contain decoded shellcode.

CVE-2006-2492 is the MS06-027 Microsoft Word malformed object pointer vulnerability. This rule requires a Word OLE container with malformed CFB directory/object-pointer evidence plus shellcode hidden in unreferenced sectors. One recognized subfamily has an impossible WordDocument declared size and rotate-decoded shellcode that manipulates Word Resiliency/StartupItems registry keys; broader samples require raw PEB/API-resolver shellcode and large OLE slack.

Microsoft Word malformed string — CVE-2007-3899 critical CVE likely CVE_2007_3899

Word FIB points to a malformed DOP/string-table region with exploit payload evidence.

CVE-2007-3899 is the MS07-060 Microsoft Word malformed-string memory-corruption vulnerability. The rule validates Word OLE context, a Word 97-family FIB, an abnormal INT_MAX run in the DOP/string-table area, inflated text counters, and payload or Mdropper.Z campaign markers before assigning the CVE.

Microsoft Word malformed table SPRM — CVE-2006-6456 critical CVE_2006_6456

Word document contains a corrupted table border/colour formatting record — the CVE-2006-6456 memory-corruption shape.

CVE-2006-6456 is a Microsoft Word 2000/2002/2003 and Word Viewer remote-code-execution vulnerability caused by malformed Word data structures. Exploit documents corrupt a table-formatting SPRM cluster, for example replacing a normal sprmTBrc*Cv record with an invalid 0xFF high-byte SPRM immediately after valid table border/color SPRMs.

Microsoft Word record parsing — CVE-2008-2244 critical CVE likely CVE_2008_2244

Word OLE document has malformed-record exploit structure with payload in OLE slack.

CVE-2008-2244 is the MS08-042 Microsoft Word record parsing remote-code-execution vulnerability. Targeted exploit documents from 2008 commonly keep normal-looking WordDocument/table streams and place shellcode or a PE payload in a large unallocated OLE slack region reached after malformed Word record parsing. The rule requires Word stream context, large OLE slack, and concrete payload bytes.

OOXML autoload OLE object target is missing critical OOXML_MISSING_AUTOLOAD_OLEOBJECT

Spreadsheet declares an auto-loaded OLE object, but the referenced embedded OLE part is absent.

Excel is instructed to activate an embedded OLE object through an `oleObject` relationship and worksheet declaration, but the target part is missing from the ZIP. When this co-occurs with autoLoad, VML shape context, or a random-looking ProgID, it is a high-confidence payload-stripped or malformed OLE activation carrier. This is not a specific CVE attribution unless the embedded OLE payload is present and contains a recoverable vulnerable-parser primitive.

Obfuscated VBA Shell command with URL critical OLE_VBA_OBFUSCATED_SHELL_URL

VBA macro builds a Shell command through decoder/string functions and includes a URL.

This rule requires a Shell invocation, decoder or string-building functions such as RC4String, Chr, StrReverse, Replace, Split, or Base64-style decoding, and a URL in the same macro source. That compound pattern is typical of downloader macros: the macro hides the command line, then launches it to retrieve or execute a payload.

Obfuscated XLM Auto_Open execution chain critical OLE_XLM_OBFUSCATED_AUTOEXEC_CHAIN

XLM macro sheet auto-executes an obfuscated formula/RUN chain.

Excel 4.0 macro malware commonly hides its command text in formula arithmetic, reconstructs strings with FORMULA(CHAR(...)), stores intermediate values through SET.VALUE / GET.CELL / GOTO, and transfers execution with RUN(). Seeing that chain behind Auto_Open is a high-confidence malicious pattern even when no VBA project exists.

Ole10Native package archive contains PowerShell downloader LNK critical OFFICE_PACKAGE_ARCHIVE_LNK_DOWNLOADER

OLE Package payload is an archive containing a Windows shortcut that launches PowerShell to download a remote payload.

This detects the common malware-delivery shape where an Office OLE Package displays a ZIP/attachment, but the archive contains an LNK whose command line starts PowerShell and retrieves a remote executable or script. It is user-activated shortcut execution, distinct from CVE-2017-8464's Windows Shell icon-parsing vulnerability.

Ole10Native package archive contains executable member critical OFFICE_PACKAGE_ARCHIVE_RISKY_MEMBER

OLE Package payload is an archive containing a shortcut, script, installer, or other executable-capable member.

Attackers can make the Object Packager UI show a benign-looking ZIP while placing the real execution primitive inside the archive, such as an LNK, JSE/JS/VBS script, HTA, MSI, or executable. The activation is user-driven through the embedded OLE Package object, but the document is still carrying a directly executable payload bundle.

Ole10Native package payload is a download-and-execute script critical OFFICE_PACKAGE_SCRIPT_DROPPER

OLE Package payload contains a script that hosts a shell, fetches a remote resource, and executes it.

An OLE Object Packager payload whose embedded script combines a shell host (PowerShell/WScript/mshta), a network-fetch verb (Invoke-WebRequest/Irm/certutil/http(s) URL), and an execute verb (Start-Process/ShellExecute/-outfile) is a download-and-run dropper. This is a direct user-execution delivery technique (MITRE T1204.002) and is detected on payload content, so it still fires when the package header fields are blanked or shuffled to evade extension-based checks.

Potential Shell call in VBA critical OLE_VBA_SHELL

VBA macro calls Shell() function.

The VBA Shell() function executes an external program. Macro malware often uses it to launch payloads such as cmd.exe or PowerShell scripts.

PowerPoint binary-format RCE payload — CVE-2011-1269 / MS11-036 family critical PPT_BINARY_MEMORY_CORRUPTION_PAYLOAD

Macro-free binary PowerPoint carries a native code payload (embedded PE / process-injection shellcode).

A binary PowerPoint 97-2003 (.ppt) document with no VBA macros that carries an embedded PE and/or process-injection shellcode (PEB+API-hash resolver, WriteProcessMemory/CreateRemoteThread), or an XOR-encoded payload alongside execution-API strings, staged in an oversized binary stream (Pictures, a numbered *Table). Legitimate presentations never embed executables or shellcode; this is the payload half of a PowerPoint memory-corruption exploit. Attributed to the CVE-2011-1269 / MS11-036 family (the same record-overflow delivery is shared with CVE-2010-2572 and CVE-2009-0556, so the exact CVE is not narrowed statically). The malformed-record trigger itself is not used as the signal because the PowerPoint persist-object directory makes naive record walks unreliable.

PowerShell reference in VBA critical OLE_VBA_PS

VBA macro references PowerShell.

PowerShell is one of the most common tools used in the second stage of macro-based attacks — downloading and executing payloads in memory.

Raw OLE macro text shows DNS-driven hidden Shell stager critical OLE_RAW_MACRO_DNS_SHELL_STAGER

Raw OLE streams contain AutoOpen, DNS API use, temp/AppData staging, and hidden Shell execution.

Some malformed or stomped VBA projects expose useful macro text in raw OLE streams even when normal source extraction fails. AutoOpen combined with DnsQuery/DnsRecordListFree, temp or AppData path construction, and Shell vbHide is a high-confidence staging pattern rather than benign document automation.

Raw OLE macro text shows self-replication or security tampering critical OLE_RAW_MACRO_SELF_REPLICATION

Raw OLE streams contain macro text with auto-run, automation, CodeModule modification, and Outlook or macro-security behavior.

Some OLE documents expose malicious macro text as raw strings even when oletools cannot recover a standard VBA project. Auto-run entry points paired with CreateObject, CodeModule AddFromString/InsertLines/DeleteLines, Outlook automation, or macro-security tampering are high-confidence macro-virus behavior.

Spreadsheet DDE link launches a dangerous command critical OOXML_SPREADSHEET_DDE_MALICIOUS

An Excel externalLinks/ddeLink entry launches cmd, PowerShell, mshta, or another dangerous executable.

This workbook uses SpreadsheetML DDE command execution. The payload is stored in xl/externalLinks/externalLink*.xml as ddeService/ddeTopic attributes and may be triggered by a formula such as =[1]!Name. This is separate from Word DDE field abuse and does not require VBA macros.

URL reconstructed from XLM cell array critical OOXML_XLM_CELL_ARRAY_URL

Payload URL was reconstructed from numeric cell values across the worksheet, not present as a literal string.

XLM downloaders evade literal-bytes URL extraction by storing each character of the URL — or of an embedded HTA that contains the URL — as the numeric value of an individual cell. The macrosheet's formulas read the cells via CHAR()/&-concat and build the URL only at execution time, so the string is never contiguous in the workbook bytes. URLs surfaced here were recovered by walking the BIFF12 record stream of every worksheet and macrosheet part.

URLDownloadToFile in VBA critical OLE_VBA_DOWNLOAD

VBA macro references URLDownloadToFile API.

This API downloads a file from the internet to disk. It is one of the most common functions in macro droppers that fetch second-stage malware from a remote server.

VBA ActiveX event launches decoded Excel4 macro critical OLE_VBA_ACTIVEX_XLM_STAGER

VBA ActiveX/UserForm event decodes worksheet-cell strings and executes them through ExecuteExcel4Macro.

The macro bridges ActiveX/UserForm event activation into Excel 4.0 macro formula execution. The command text is reconstructed from worksheet cells with Mid/Asc/Chr shifting before being passed to ExecuteExcel4Macro, which is a high-confidence macro stager pattern rather than a specific Office parser CVE.

VBA email-worm self-replication (Outlook mass-mailer) critical OLE_VBA_EMAIL_WORM_SELF_REPLICATION

VBA macro drives Outlook to mass-mail itself — creates mail items, harvests recipients, and auto-attaches the carrier.

The macro automates Outlook.Application and programmatically creates mail items, then spreads by at least two independent worm behaviors: harvesting recipient addresses from the MAPI address book / inbox (GAL enumeration, default-folder walk, or scraping message bodies), auto-attaching a file (the carrier) to outgoing messages, and sending them programmatically. A benign “email this report” macro mails a single fixed recipient and never enumerates the address book, so the conjunction is the defining behavior of the Melissa / LoveLetter / W97M mass-mailer worm lineage, detected independently of any AV signature.

VBA macro-virus self-replication / AV tampering critical OLE_VBA_MACRO_VIRUS_REPLICATION

VBA macro rewrites VBA project code (self-replication) and/or disables Office macro-virus protection.

The macro programmatically edits VBA project code through the VBE object model — CodeModule/VBComponents/VBProject with InsertLines, DeleteLines, AddFromString, or OrganizerCopy — to copy itself into the global template (Normal.dot) and other open documents, and/or sets Options.VirusProtection = False to silence Word's macro-virus warning. Self-replicating macro code has no benign document use; this is the defining behavior of the W97M macro-virus family (Thus, Melissa, Class, Marker), detected independently of any AV signature.

VBA writes script and launches it through Excel DDE cmd critical OLE_VBA_DDE_CMD_SCRIPT_DROPPER

VBA writes a script-like file and launches it via Excel DDEInitiate with cmd.

The macro creates a .vbe/.vbs/.js/.hta/.bat/.cmd/.ps1 payload on disk and uses Excel DDEInitiate("cmd", ...) to execute it. This is a high-confidence macro execution chain and should be treated as malware even when no Office parser CVE is present.

WScript.Shell usage critical OLE_VBA_WSCRIPT

VBA macro uses WScript.Shell object.

WScript.Shell provides Run and Exec methods that launch commands. Malware creates this COM object to execute system commands or scripts.

XLM Auto_Open environment-evasion close gate critical OLE_XLM_ENVIRONMENT_EVASION_CLOSE

XLM Auto_Open macro runs host-environment checks before showing a fake error and closing.

Malicious Excel 4.0 macros often abort when the workbook appears to be opened in a sandbox, non-Windows host, or reduced UI environment. This rule requires an XLM macro sheet with Auto_Open plus multiple GET.WORKSPACE/GET.WINDOW checks and an ALERT()/CLOSE(FALSE) decoy such as a fake corrupted workbook message. That combination is intentionally narrow and is not normal spreadsheet automation.

XLM payload reassembled from CHAR()/split formulas critical OOXML_XLM_REASSEMBLED_PAYLOAD

WinAPI names, LOLBin commands, or a payload URL were reassembled from per-character CHAR()/string-fragment concatenation inside the macrosheet formulas.

The most evasive Excel 4.0 downloaders never store their payload as a contiguous literal: each WinAPI name, shell command, drop path, or URL is built at runtime by concatenating CHAR(n) calls and one- or two-character string fragments inside the formula token stream (rgce). Literal-bytes and numeric cell-array scanners both miss this. The analyzer parses each formula's rgce, reconstructs the string it builds, and reports it when it resolves to a download/execute kill chain (e.g. URLDownloadToFile, regsvr32, mshta, wmic, a URL). This construct does not occur in benign workbooks.

Access database masquerading as Office document high ACCESS_MASQUERADE_DROPPER

Jet/Access database uses a document extension and contains macro/dropper strings.

The file is a Microsoft Jet/Access database while using a Word/Excel/PowerPoint-style extension, and contains strings associated with VBA execution or payload dropping. This is a masquerade/dropper vector, not a parser CVE.

ActiveX control high OOXML_ACTIVEX

Document contains ActiveX controls.

ActiveX controls are compiled components that can execute native code. They have a long history of exploitation and are a significant risk in Office documents.

BIFF CONTINUE follows a structurally-incompatible record high OLE_BIFF_CONTINUE_ORPHAN

An Excel record-continuation (CONTINUE) block appears with no preceding record to continue — a position abused by several Excel parser CVEs.

Readers that append CONTINUE bodies to the previous record's buffer regardless of compatibility have driven multiple Excel CVEs.

BIFF record body exceeds the 8224-byte spec maximum high OLE_BIFF_RECORD_HUGE

Single non-CONTINUE record body > 8224 bytes (BIFF8 spec maximum).

The legitimate way to ship a payload that big is a CONTINUE chain. An oversized single record is the shape behind several Excel size-parsing bugs.

BIFF record runs past Workbook stream end high OLE_BIFF_RECORD_TRUNCATED

Record's declared body size extends past the stream's last byte.

Excel's record reader can reject the file or — in older versions — copy as many bytes as remain, leaving uninitialised memory in the record buffer. Known shape behind several Excel parser CVEs.

BIFF stream ends with unclosed BOF substream high OLE_BIFF_BOF_SUBSTREAM_UNCLOSED

A BOF substream reaches the end of the Workbook stream without a matching EOF.

A BOF substream reaching the end of the Workbook stream without a matching EOF means older or buggy readers may continue parsing with stale substream/parser state.

BIFF workbook contains a defined-name record flood high OLE_BIFF_NAME_RECORD_FLOOD

Workbook contains thousands of BIFF NAME records.

Defined names are formula-bearing BIFF parser inputs. Very large contiguous runs of NAME records are unusual in benign documents and can trigger Excel/BIFF parser stress or corruption bugs in Office and analysis tooling.

CallByName call high OLE_VBA_CALLBYNAME

VBA macro uses CallByName for dynamic method invocation.

CallByName invokes methods dynamically by string name, allowing malware to obfuscate which functions it calls and evade static analysis.

Composite Moniker in RTF OLE object high RTF_COMPOSITE_MONIKER_RELATED

RTF OLE object contains Composite Moniker CLSID without nearby scriptlet payload evidence.

Composite Moniker is the vulnerable primitive associated with CVE-2017-8570, but this rule did not confirm the SCT/scriptlet payload shape. Treat it as related moniker attack-surface evidence rather than a CVE-specific match.

CreateObject call high OLE_VBA_CREATEOBJ

VBA macro calls CreateObject.

CreateObject instantiates COM objects (like WScript.Shell, XMLHTTP, ADODB.Stream) that provide system access. Malware uses these to download files, run commands, or interact with the file system.

DDEAUTO field (auto-execute) high OOXML_DDE_AUTO

A DDEAUTO field instruction was found — it attempts automatic execution or update when the document is opened.

Unlike regular DDE which asks the user to 'update links', DDEAUTO attempts to execute its command automatically when the document opens. Prompts or blocking can still occur depending on Office version, policy, and Protected View state. This technique is catalogued as MITRE ATT&CK T1559.002.

EMF rclBounds has negative width or height high OFFICE_EMF_BOUNDS_NEGATIVE

EMF header's rclBounds rectangle has right < left or bottom < top.

Multiple EMF parser bugs (CVE-2017-0108 / CVE-2017-8553 family) have lived behind unchecked dimension arithmetic on rclBounds. A negative width or height drives into integer-overflow paths.

EMF record extends past blob end or has invalid size high OFFICE_EMF_RECORD_TRUNCATED

EMR record's size field runs past available bytes, is < 8, or is not 4-byte aligned.

EMF records are required to be 4-byte aligned with size >= 8 bytes. Readers that advance by `nSize` without checking misalign their record pointer or read attacker-controlled bytes from the gap.

Encrypted Office package with non-block-aligned cipher high OFFICE_ENCRYPTED_PACKAGE_MALFORMED

EncryptedPackage cipher body is not a multiple of 16 bytes, violating the AES block-alignment requirement in [MS-OFFCRYPTO] §2.3.4.4.

The AES-CBC/ECB cipher used by Office Standard Encryption requires the cipher body (after the 8-byte declared-size header) to be a multiple of 16 bytes. Excel itself tolerates the misalignment by truncating to the last full block; most strict decryption tools (including antivirus/EDR scanners that introspect inner content) reject the file outright. This asymmetric tolerance is a deliberate evasion shape: the document opens normally in Office but defeats static inspection that depends on a successful decrypt-and-rescan.

Excel 4.0 (XLM) dangerous capability functions high OLE_XLM_DANGEROUS_FN_STATIC

XLM macro sheet references two or more dangerous capability functions (CALL/EXEC/REGISTER/FWRITE/FOPEN).

The download (CALL into URLDownloadToFile), native-code, process-exec (EXEC) and file-write (FWRITE/FOPEN) primitives of an XLM dropper, recovered directly from the BIFF records. Because it does not depend on deobfuscating the full formula chain, it convicts built-in-name and olevba-blind XLM downloaders that otherwise score only as a bare macro sheet.

Excel 4.0 (XLM) macro / Auto_Open high OLE_XLM_AUTOOPEN

OLE workbook contains an Excel 4.0 macro sheet, optionally with Auto_Open/Close.

Excel 4.0 (XLM) macros were a major Office malware vector during 2020-2022 and evaded many VBA-focused controls before Microsoft tightened XLM defaults. An Auto_Open / Auto_Close defined name combined with a macro-sheet sub-stream is the common XLM auto-execution shape used by families such as Emotet and QakBot.

Excel 4.0 (XLM) macro sheet high OOXML_XLM_MACROSHEET

Spreadsheet contains an xl/macrosheets/sheet*.xml part.

A definitive structural indicator of Excel 4.0 macros. XLM is rarely seen in modern legitimate workbooks and was a major Office malware vector during 2020-2022.

External OLE object relationship high OOXML_EXTERNAL_OLE_OBJECT

OOXML oleObject relationship targets an external HTTP(S) URL.

An external oleObject relationship is stronger than a normal hyperlink: Office resolves it through object/OLE update paths and may fetch remote content when the document is opened or updated. This is the relationship shape used by multiple Office remote-object exploitation and delivery chains, so it should not be reduced to generic external-link evidence.

External relationship high OOXML_EXTERNAL_REL

Document references an external target (URL) in its .rels file.

External relationships can be used for remote template injection — the document loads a macro-enabled template from a remote server when opened, bypassing email attachment filters that block macros.

GetObject call high OLE_VBA_GETOBJ

VBA macro calls GetObject.

GetObject can reference running COM objects or create instances from monikers. It is sometimes used as an alternative to CreateObject to evade detection.

Legacy Flash object embedded in Office document high OFFICE_LEGACY_SWF_OBJECT

Office document embeds a ShockwaveFlash object with an old SWF version.

The document contains a ShockwaveFlash ActiveX object and an embedded SWF with a legacy version. This is Flash-in-Office exploit-family evidence, but exact Flash CVE attribution requires SWF tag-level validation.

Legacy WordBasic macro-virus markers high OLE_LEGACY_WORDBASIC_MACRO_VIRUS

Legacy WordBasic auto-execution markers co-occur with macro-virus family or macro-management strings.

Historical Word macro viruses often stored WordBasic macro content in legacy WordDocument structures, with strings such as ToolsMacro, MacroFile$, fileMacro$, globMacro$, Wazzu, or similar family markers. These are not modern VBA projects, so standard VBA extraction can miss them.

MTEF FONT typeface field exceeds 32 bytes high OLE_MTEF_FONT_NAME_OVERLONG

FONT record's NUL-terminated typeface name is longer than the 32-byte spec maximum.

The CVE-2017-11882 primitive overflows this exact field — readers prior to the patched build copied the entire NUL-terminated string into a 32-byte stack buffer. This rule catches the structural shape regardless of the specific exploit byte pattern.

MTEF MATRIX record has implausible dimensions high OLE_MTEF_MATRIX_ROWCOUNT

MATRIX record declares rows or columns > 64.

CVE-2018-0798 abuses the MATRIX rows/cols fields to drive an OOB write. Real equations contain at most a few dozen rows/cols; values above 64 are the structural shape of the exploit family.

MTEF SIZE record has implausibly large value high OLE_MTEF_SIZE_RECORD_ANOMALY

SIZE record declares an explicit point size or delta far beyond normal equation text.

The published MTEF format uses record type 9 for SIZE. CVE-2018-0802 abuses Equation Editor's SIZE-record parser path; legitimate equations do not need 128pt+ explicit sizes or deltas. This structural rule runs only inside recovered Equation Native streams.

OLE DIFAT chain length or pointer is invalid high OLE_HEADER_DIFAT_ANOMALY

DIFAT extension chain loops, points beyond file end, or its declared length disagrees with the first-sector field.

A crafted DIFAT pointing past EOF is the classic shape behind the Word `cb*` family of CVEs. Real writers always emit a linear, in-range DIFAT.

OLE ObjectPool in file named RTF high OLE_OBJECTPOOL_CONTAINER_DISGUISED_RTF

OLE compound document is named with an .rtf extension and contains ObjectPool storage.

ObjectPool is where Word/OLE compound documents keep embedded-object storages. A file whose basename ends in .rtf but is actually an OLE compound container with ObjectPool is an extension/content mismatch that suggests a disguised Word/OLE container and embedded-object attack surface.

OLE appended executable-looking payload high OLE_APPENDED_PAYLOAD

Large high-entropy bytes beyond declared streams contain shellcode or loader markers.

The OLE file has a sizable high-entropy region after the declared major streams, and that appended region contains PE, shellcode, or loader API indicators. This is a payload-carrier heuristic rather than a CVE-specific attribution, and is gated on both entropy and concrete executable markers to limit false positives.

OLE directory tree contains a cycle high OLE_DIR_CYCLE

CFB directory red/black-tree walk visits the same DirID twice.

A spec-compliant compound file's directory tree is acyclic. Cycles are an encoder-impossible shape that hand-edited containers use to force parsers into infinite recursion or to hide named streams from one walk while still resolving them from another.

OLE document has large unaccounted-for region high OLE_SLACK_ANOMALY

OLE file bytes greatly exceed the sum of declared stream sizes.

Well-formed Office binary documents pack data into named streams with little slack. When the file is dramatically larger than its declared streams (>40% slack and >16 KB of unaccounted bytes), the extra bytes live in unallocated sectors. Pre-macro-era Word/Excel exploits (e.g. CVE-2010-3333, CVE-2014-1761, CVE-2015-2424) commonly hide XOR-encoded shellcode in this region, reached via a parser pointer-corruption bug in the document structure. The rule is a structural anomaly, not a CVE-specific match.

OLE metadata lists many Excel 4.0 macro sheets high OLE_XLM_DOCPROPS_MACROSHEET_INVENTORY

OLE workbook metadata lists many MacroN sheet titles with an Excel 4.0 macro-sheet marker.

Encrypted BIFF workbooks can hide XLM formula bodies from static extractors, but the clear DocumentSummaryInformation stream may still expose the workbook sheet inventory. Many MacroN sheet titles plus a BIFF Excel 4.0 macro-sheet marker is a strong XLM malware/evasion signal, especially when FILEPASS encryption is also present.

OLE raw shellcode-like payload high OLE_RAW_SHELLCODE_PAYLOAD

Malformed OLE bytes contain PEB/API-resolver shellcode evidence.

The file-level OLE bytes contain a PEB/API-resolver marker, loader-walk instruction context, and a nearby payload marker such as a NOP sled, MZ bytes, or hash-rotate loop. This is useful for malformed OLE exploit carriers where stream parsing fails, but it is intentionally not a CVE-specific attribution.

OLE sector chain loops or runs past end of FAT high OLE_FAT_CHAIN_LOOP

A stream's sector chain revisits a sector or follows a pointer outside the FAT.

Standard CFB writers never produce loops; readers that follow chains without cycle detection enter an infinite loop or read attacker-controlled bytes after the loop wraps. Encoder-impossible shape behind several pre-2017 Word/Excel container CVEs.

OLE streams share a sector high OLE_FAT_CROSSLINKED

Two different streams' sector chains include the same sector.

Reader divergence shape — depending on which stream is read first, the same bytes are interpreted as different content. Encoder-impossible.

OOXML XML part contains a DOCTYPE declaration high OOXML_XML_DOCTYPE_PRESENT

Any XML part inside an OOXML package contains <!DOCTYPE.

Office never emits DOCTYPE in its XML parts. The presence of one is a structural indicator that the package was authored by hand or by tooling outside the Office writer family — frequently as the staging step for an XXE attempt.

OOXML XML part declares an external entity high OOXML_XML_EXTERNAL_ENTITY

An <!ENTITY ... SYSTEM ...> or PUBLIC declaration was found in an XML part.

Pure XML-external-entity (XXE) shape. Office's own parser configuration historically ignored these, but third-party consumers of the same XML (cloud indexing, preview generators) may resolve them and disclose data.

OOXML external relationship uses an exotic scheme high OOXML_REL_EXTERNAL_NON_HYPERLINK

External `<Relationship>` of a non-hyperlink type uses an MSDT, search-ms, MHTML, scriptlet, or javascript: scheme.

Office's dispatch path resolves these schemes to non-browser handlers. Structural shape behind CVE-2022-30190 (Follina), CVE-2021-40444, CVE-2024-21413 and the RomCom external-RTF chain.

OOXML internal relationship escapes the package root high OOXML_REL_TARGET_OUTSIDE_PACKAGE

Internal-mode `<Relationship>` Target uses `..` segments that resolve above the package root.

ZIP path-traversal shape. Some Office configurations honour the literal path before normalisation, giving a drop-anywhere primitive.

OOXML oleObject relationship points at a non-OLE target high OOXML_REL_TYPE_TARGET_MISMATCH

Relationship typed as `oleObject` resolves to an HTML/CAB/MHT/scriptlet/HTA target.

The CVE-2021-40444 family abuses exactly this shape: the Type drives Office to load the target through the OLE/MHTML dispatch path, but the target bytes are interpreted as a different format. Generalised from named CVEs so future 0-days in the same family surface.

OOXML relationship graph contains a cycle high OOXML_REL_CYCLE

The OPC relationship graph is supposed to be a DAG; a cycle is encoder-impossible.

Containers using cycles try to force resolution loops or to make a part reachable only from inside its own subtree.

Office EPRINT stream contains EMF object high OLE_EPRINT_EMF_OBJECT

ObjectPool EPRINT stream contains EMF data.

An OLE ObjectPool EPRINT stream with EMF data is rare in normal documents and is consistent with Office object-delivery exploit staging when paired with payload anomalies. The rule is related evidence only; it does not prove a malformed graphics record or exact CVE attribution.

Ole10Native UI name and on-disk name disagree on type high OFFICE_PACKAGE_DOUBLE_EXT

OLE Package displayName is benign-looking while fullPath/defFile ends in an executable extension.

The user double-clicks what looks like a document and gets a binary executed. UI-spoofing shape used by package-as-dropper campaigns.

Ole10Native package carries executable/script file type high OFFICE_PACKAGE_RISKY_FILE

OLE Package displayName, fullPath, or defFile has an executable/script-capable extension.

Office Package objects are commonly used to embed arbitrary files. When the packaged file is directly runnable, such as EXE, JAR, HTA, script, shortcut, installer, or similar content, the document is carrying high-risk delivery payload even if the UI name is not spoofed.

Ole10Native package path contains traversal or UNC root high OFFICE_PACKAGE_PATH_TRAVERSAL

OLE Package filename contains `..\` or `\\host\` traversal sequences.

Some Office versions write the dropped file to the path embedded in the Package, giving the attacker a drop-anywhere primitive.

Payload URL assembled from a Chr()/Asc() string expression high OLE_VBA_EXPR_DROPPER_URL

VBA builds its stage-2 download URL char-by-char from string literals + Chr()/Asc()/StrReverse() (no numeric array); URLs recovered.

The macro assembles its stage-2 download URL character by character from string literals concatenated with Chr()/Asc()/StrReverse() results — often nested (Chr(Asc(Chr(Asc("h")))) = "h") and split across the + and & operators, sometimes emitted via Print #n, into a second-stage VBScript/PowerShell file. There is no numeric array to brute-force and the URL is never contiguous on disk, so the literal scan and the array recoverers all miss it (the "adobeacd-update" maldoc family). A bounded VBA value-expression evaluator resolves the expressions and harvests the URL. Self-validating: only a valid host URL that is not already present verbatim in the macro is reported, so a benign macro cannot false-positive.

Payload URL decoded from a Chr() numeric-array loader high OLE_VBA_CHR_ARRAY_DROPPER_URL

VBA builds its stage-2 download URL from a numeric array decoded with Chr() and a linear offset (XMLHTTP/ADODB.Stream dropper); URLs recovered.

The macro stores its stage-2 download URL as a numeric array (Array(250, 262, …)) and decodes it one character at a time with Chr() and a linear offset (e.g. Chr(n - 146)), then drives Microsoft.XMLHTTP.Open "GET", url with ADODB.Stream.SaveToFile and Shell.Application to drop and execute the payload in %TEMP%. This is the VBA-native analogue of the PowerShell char-array loader (no PowerShell, so that path does not see it); the URL is assembled at run time and never contiguous on disk. The analyzer folds the VBA concatenation, treats every run of small numbers as an array, brute-forces the per-element transform, and accepts only a decode that yields a valid host URL — so a benign numeric table cannot false-positive.

Payload URL decoded from an encoded PowerShell loader high OLE_VBA_ENCODED_PS_DROPPER_URL

VBA runs a PowerShell stage-2 loader whose download URL is hidden in a numeric char-code array (XOR/+/- decoded at runtime); URLs recovered.

The macro assembles (from string literals scattered across helper functions, padded with dead TypeName/arithmetic junk) a WScript.Shell command that runs a PowerShell stage-2 loader. The download URL is stored as a numeric char-code array decoded at runtime by [char]($_ -bxor k) (or +k / -k) after splitting on obfuscated, case-folded delimiters, so no literal URL scan sees it. The analyzer folds the VBA concatenation, treats every run of small numbers as an array, brute-forces the per-element transform, and accepts only a decode that yields a valid host URL — typically an @-separated fallback list dropped to %TEMP% and executed. Self-validating, so a benign numeric table cannot false-positive.

Payload URL decrypted from a PowerShell SecureString loader high OLE_VBA_SECURESTRING_DROPPER_URL

VBA assembles a PowerShell command from Mid(StrReverse(...)) fragments and AES-decrypts a key-encrypted ConvertTo-SecureString stage-2 WebClient downloader; download URLs recovered.

The Hancitor/Emotet-era maldoc builds a PowerShell command at runtime from dozens of reversed-substring fragments (a per-sample-named Mid(StrReverse(s),a,b) decoder), runs it hidden via WScript.Shell.Run (vbHide), and IEX-decrypts a key-encrypted ConvertTo-SecureString blob (fixed 76492d11… magic + base64 of version|IV|ciphertext, AES-CBC with the inline -Key bytes). The decrypted plaintext is a System.Net.WebClient DownloadFile + Invoke-Item dropper that drops an EXE to %PUBLIC% and runs it, with an @-separated fallback-host list. The download URLs exist only after the reverse-substring assembly AND the AES decrypt, so no literal scan sees them. The analyzer emulates the decoder, decrypts the SecureString, and accepts only valid host URLs — self-validating, so a benign macro cannot false-positive.

Payload URL reassembled from cmd character-index dropper high OLE_VBA_CMD_CHARINDEX_DROPPER_URL

VBA reassembles a download command via a cmd.exe character-index loop and a multi-host PowerShell downloader; payload URLs recovered.

The macro hides its download command behind two reversible layers: the URL-bearing command is concatenated from string literals scattered across many helper functions (the visible code is padded with dead TypeName/arithmetic junk), then rebuilt by a cmd.exe character-index loop (set <var>=<charset> & for %p in (<indices>) do !<var>:~%p,1!) into a PowerShell downloader that tries an @-separated list of fallback hosts. The analyzer folds the VBA concatenation, decodes the character-index stage, and surfaces the recovered hosts as IOCs. Self-validating: only transforms yielding syntactically valid host URLs are reported.

PowerPoint OffArray-style record stub — CVE-2009-0556 related high PPT_CVE_2009_0556_RELATED

Small embedded PowerPoint stream has sparse OffArray-style records and no normal text atoms.

The PowerPoint Document stream is a small, sparse record stub containing records associated with OffArray-style exploit documents and lacking normal slide text/placeholder atoms. This is reported as related to CVE-2009-0556 until the malformed OffArray field is directly validated.

PowerPoint malformed picture-record payload — CVE-2006-0022 related high PPT_CVE_2006_0022_RELATED

PowerPoint Pictures stream and document shellcode match a CVE-2006-0022-adjacent shape.

This rule covers legacy PowerPoint samples that retain the large Pictures stream, embedded image-record material, MZ-like payload bytes, and compact PEB/API-resolver shellcode in the PowerPoint Document stream, but do not provide enough static evidence for the exact CVE-2006-0022 PE-in-malformed-CFB-chain rule. It is reported as related rather than exact to reduce false CVE attribution.

Remote template injection high OOXML_REMOTE_TEMPLATE

Document loads its template from a remote URL (attachedTemplate / template / frame).

Word can fetch and apply a remote template when the document is opened; macros in that template may execute depending on Office policy, trust state, and Protected View. This is a common remote-template-injection vector used by Hancitor, Emotet, and many phishing campaigns.

URL Moniker in RTF OLE object high RTF_URL_MONIKER_RELATED

RTF OLE object contains URL Moniker evidence without a confirmed remote target.

URL Moniker is the OLE2Link primitive abused by CVE-2017-0199, but the RTF scanner could not decode a remote http/https/ftp target from the embedded object. This is related attack-surface evidence, not proof of CVE-2017-0199 exploitation.

VBA copies the workbook into the Excel XLSTART startup folder high OLE_VBA_XLSTART_PERSISTENCE

VBA saves a copy of the workbook into Application.StartupPath (XLSTART) so it auto-loads on every Excel launch.

Anything in the Excel XLSTART startup folder loads automatically every time Excel starts. A macro that writes a copy of its own workbook into Application.StartupPath is establishing persistence — the hallmark of a resident Excel macro virus (Laroux/"StartUp" lineage) rather than a normal document. Detected by a SaveAs/SaveCopyAs whose target is Application.StartupPath.

VBA hooks the VBE-editor / macro-list keys to evade inspection high OLE_VBA_VBE_KEY_HOOK_EVASION

VBA reroutes Alt+F11 (Visual Basic editor) and/or Alt+F8 (macro list) through Application.OnKey to intercept attempts to view the macro code.

Application.OnKey can rebind keyboard shortcuts. A macro that captures Alt+F11 (the Visual Basic editor) or Alt+F8 (the macro dialog) is intercepting the exact keystrokes an analyst or user would press to inspect the code — frequently to hide or remove the viral module while it is resident. This anti-analysis behaviour is characteristic of resident Excel macro viruses, not legitimate documents.

VBA infects other workbooks via an OnSheetActivate copy hook high OLE_VBA_WORKBOOK_INFECTION_SPREADER

VBA installs an Application.OnSheetActivate handler that copies a macro-bearing sheet into the active workbook, infecting every workbook the user opens.

Resident Excel macro viruses spread by hooking Application.OnSheetActivate (a global event that fires whenever any sheet is activated) and, from that handler, copying the viral sheet into the front of the active workbook. Every workbook the victim opens is silently infected and carries the macro onward. The combination of the OnSheetActivate hook with a sheet-copy into another workbook is the replication stage of such a virus.

VBA p-code auto-exec with execution tokens high OLE_VBA_PCODE_AUTOEXEC_EXEC

Compiled VBA/cache stream pairs an auto-run token WITH a shell/download/object-execution token (the combination, not either alone).

Some malicious Office documents keep executable VBA in compiled p-code or cache streams while source extraction fails or returns empty output. This rule triggers on the COMBINATION of two tokens co-occurring in the same compiled stream: an auto-execution entry point (Auto_Open / Document_Open / Workbook_Open / Auto_Close) AND a shell/download/object-execution token (Shell, CreateObject, GetObject, PowerShell, cmd.exe, URLDownloadToFile, WinHttp, XMLHTTP, ADODB.Stream, ShellExecute, ExecuteExcel4Macro). Neither token on its own is flagged — for example CreateObject is benign by itself; it is the pairing with an auto-run entry point that is the macro-malware indicator, even when decoded source is unavailable.

VBA project has compiled P-code but empty/missing module source high OLE_VBA_PCODE_NO_SOURCE

_VBA_PROJECT stream is substantive but every module-like sibling source stream is empty or absent.

The canonical 'VBA stomping' shape: the renderer executes the compiled P-code while scanners that read source see nothing. Used by post-2018 Office malware campaigns to evade source-based AV.

Word 6/95 legacy binary with executable payload high WORD6_LEGACY_BINARY_PAYLOAD

Legacy Word binary format carries executable payload markers.

The file starts with a Word 6/95 legacy binary magic and carries embedded executable payload markers. This is MS09-024/CVE-2009-1136-family attack surface evidence, but not an exact CVE attribution without validating the malformed converter record.

Word field-chain (SET/REF) co-located with DDE high OOXML_FIELD_SET_REF_CHAINING

Word SET/REF field variables assemble a hidden DDE command from fragments, so the literal command never appears in the document's raw bytes — a known field-chaining obfuscation.

Word's SET <name> field defines a variable and REF <name> dereferences it. When ≥2 closed SET/REF pairs co-occur with a DDE field, the typical purpose is to assemble the DDE command string at field-update time so the literal cmd/powershell tokens never appear in the raw document bytes. This is the documented field-chain obfuscation chain (SensePost 2017, MITRE ATT&CK T1559.002) and has no documented benign use.

XLM Auto_Open with dangerous formula APIs high OLE_XLM_DANGEROUS_FN

XLM auto-exec macro uses formula APIs that can run code or write files.

Excel 4.0 macro formulas such as RUN, CALL, EXEC, REGISTER, FOPEN, FWRITE, FORMULA, and HALT can execute programs, write payloads, or control macro flow without VBA. Paired with Auto_Open, this is a strong malware indicator.

XLM macro uses URL shortener high OLE_XLM_URL_SHORTENER

Excel 4.0 macro sheet contains a URL-shortener target.

URL shorteners are legitimate services, but they are high-signal inside Excel 4.0 macro formulas because XLM malware commonly uses shortened links to obscure the payload host and rotate infrastructure after delivery.

altChunk RTF/HTML injection wrapper, payload part missing high OOXML_ALTCHUNK_INJECTION_STUB

A <w:altChunk> wires an aFChunk relationship to an RTF/HTML part that is absent from the package.

The altChunk RTF/HTML-injection wrapper inlines and executes an embedded RTF/HTML part when Word opens the document. When the wiring and content-type are present but the target part is missing, the sample is a payload-stripped builder stub (or had its weaponized chunk removed). The injection wiring to an RTF/HTML type is itself the indicator, independent of whether the payload is currently attached.

cmd.exe reference in VBA high OLE_VBA_CMD

VBA macro references cmd.exe.

Invoking cmd.exe from a macro allows running arbitrary Windows commands, which is a core technique in macro-based malware.

BIFF BOF declares an unknown substream type medium OLE_BIFF_SUBSTREAM_TYPE_INVALID

An Excel sub-stream declares an unknown type (not workbook-globals, sheet, chart, macro, or VB-module), which older Excel may parse with the wrong record layout.

Older Excel falls through to a default handler that may parse the substream with the wrong record-table.

BIFF NAME record declares an overlong name medium OLE_BIFF_NAME_RECORD_OVERLONG

NAME record's character-count (cch) field exceeds the BIFF8 limit of 255.

Older Excel may copy `cch * sizeof(WCHAR)` bytes into a fixed-size buffer.

BIFF record graph is unusually large medium OLE_BIFF_RECORD_COUNT_EXCESSIVE

Workbook contains an unusually large number of BIFF records.

This is not a CVE-specific signature, but malformed or stress-test BIFF graphs have been used to trigger Excel parser bugs and third-party scanner failures. The scanner bounds its walk and reports excessive record graphs explicitly.

BIFF stream has unbalanced BOF/EOF substreams medium OLE_BIFF_BOF_EOF_UNBALANCED

An Excel sub-stream's begin (BOF) and end (EOF) markers are unbalanced — readers that ignore the mismatch can reach attacker-controlled parser state.

Substreams must be properly nested; readers that pop their substream stack regardless of balance reach attacker-controlled state.

CFB header with no readable streams medium OLE_PARSE_EMPTY_STREAMS

File has a valid OLE2/CFB header but olefile exposes zero directory streams.

A non-empty compound document whose directory cannot be read is anomalous. It occurs with truncated/corrupt files and, notably, with content shifted off byte boundaries (e.g. a whole-file nibble shift) to defeat olefile and byte-aligned signatures while the host Office application still recovers the embedded object — a known CVE-2017-11882 evasion.

EMF blob header is malformed or missing signature medium OFFICE_EMF_HEADER_INVALID

First record isn't EMR_HEADER (type 1) with the documented dSignature value.

An EMF blob's first record must be EMR_HEADER with declared size >= 88 and the dSignature field 'EMF' at offset 40. Readers that fail-open on header anomalies have driven multiple EMF parser CVEs.

EMF declares an implausibly large number of records medium OFFICE_EMF_HUGE_RECORD_COUNT

EMF header's nRecords field exceeds 100,000.

Real-world embedded EMF blobs contain at most a few thousand records. Hundreds of thousands is a hand-crafted shape used to stress the renderer's record dispatch table.

EMF record type outside spec range medium OFFICE_EMF_RECORD_TYPE_INVALID

EMR record's type field is outside 1..123 standard or 0x4000+ vendor extension.

Renderers' dispatch tables typically fall through to a generic handler that may mis-parse the body when an unknown record type is reached.

Embedded OLE object medium OOXML_OLE_OBJECT

Document contains an embedded OLE object.

OLE objects embedded in OOXML documents can contain executables, scripts, or exploit payloads. They warrant inspection.

External workbook data link medium OOXML_EXTERNAL_REL_DATALINK

Workbook references another workbook via an externalLinkPath relationship (cell / dropdown / chart-source data link).

Unlike a remote-template or external-object relationship, an external workbook data link does not load or execute code — Excel prompts before updating links and only pulls cell values. Legitimate macro-enabled business forms routinely reference a source/template workbook this way, so on its own it is a weak signal (and the macro-capability tiering treats it as such). It is still surfaced because a UNC / bare-IP target can leak NetNTLM credentials via forced authentication if the user updates links.

Legacy WordBasic auto-exec macro marker medium OLE_LEGACY_WORDBASIC_AUTOEXEC

A legacy Word 6/95 WordBasic auto-execution marker such as AutoOpen was found.

Older Word documents can contain WordBasic macro execution surface that is not exposed as a modern VBA project. A bare AutoOpen marker is not enough to classify a downloader or CVE, but it tells analysts why old macro malware signatures may fire.

MTEF stream version byte outside valid set medium OLE_MTEF_HEADER_ANOMALY

MTEF version byte at the start of the Equation Native stream is not 2..6.

Encoders never produce version values outside the documented set (2 = Equation Editor 2.x, 3 = Equation Editor 3.0, 4 = MathType 3.x, 5 = MathType 4.x, 6 = MathType 5.x). Other values force readers into default handlers that may parse the body with the wrong record table.

Multiple OLE Package CLSIDs nested in one container medium OFFICE_PACKAGE_NESTED_PACKAGE

Inner payload of an OLE Package contains the OLE Package CLSID itself, or multiple Package CLSIDs in one container.

Russian-doll obfuscation has been used to bypass scanners that only inspect the outermost layer.

OLE stream allocation kind disagrees with size medium OLE_MINISTREAM_OUT_OF_RANGE

Stream below the MiniStream cutoff is allocated in the regular FAT, or vice versa.

The CFB spec requires the MiniFAT path for sub-cutoff streams; Office writers always honour it. A mismatch is the kind of structural quirk that distinguishes hand-edited containers from real-world output.

OLE stream size disagrees with its sector chain medium OLE_STREAM_LEN_MISMATCH

Direntry size field claims more bytes than the FAT-walked sector chain can carry.

Readers that allocate from the size field and copy from the chain read out-of-bounds bytes. A simple structural shape that has appeared in several Office parser memory-corruption bugs.

OOXML XML part contains a non-standard processing instruction medium OOXML_XML_PI_NONSTD

Processing instruction with a target outside the Office allowlist (xml, mso-*).

Some readers dispatch on PI targets; an unexpected target can force a different parsing mode than the renderer.

OOXML XML part has an oversize CDATA section medium OOXML_XML_CDATA_OVERSIZE

A single CDATA section exceeds 1 MB.

Real Office output uses CDATA only for short embedded fragments; large CDATA sections are a smuggling shape — they let an attacker hide a binary payload in plain XML where most XML-aware scanners ignore the contents.

OOXML XML part has excessive element nesting medium OOXML_XML_DEPTH_EXCESSIVE

Element nesting depth exceeds 256 levels.

Real Office output caps in the dozens. Pathological depth is the structural shape behind several billion-laughs / stack-exhaustion DoS bugs.

OOXML [Content_Types].xml has conflicting type declarations medium OOXML_CONTENT_TYPES_DUPLICATE

Two different Content-Types declared for the same extension or PartName.

Reader divergence — first vs. last definition determines how every part of that extension is parsed.

OOXML hyperlink to URL shortener medium OOXML_URL_SHORTENER_HYPERLINK

Document contains a clickable hyperlink to a URL-shortener service.

URL shorteners are legitimate services, but in Office documents they hide the final landing page from static review. A shortener-only call to action is a common spam and phishing pattern, especially when the document has no other active content.

OOXML internal relationship target is missing medium OOXML_REL_DANGLING

Internal `<Relationship>` Target resolves to a ZIP entry that does not exist in the package.

Some Office paths fail open and fetch a remote alternative; the missing-target shape has been observed in droppers that race the renderer to deliver a payload to that path.

OOXML package contains parts unreachable from the root rels medium OOXML_REL_GRAPH_UNREACHABLE

>5% of parts are not reachable by walking from the root .rels through internal relationships.

Hidden-content shape: scanners that only follow the rel graph never see them, but Office may still dispatch to them via name or content-type.

OOXML relationship Id collides within one .rels medium OOXML_REL_DUPLICATE_ID

Two `<Relationship>` entries inside the same `.rels` part share an Id.

OPC requires Ids unique within a part. Readers diverge over which definition wins, letting an attacker bind one rel for static analysers and a different one for the live renderer.

Office document is password-encrypted medium OFFICE_ENCRYPTED_PACKAGE

OLE container holds an MS-OFFCRYPTO encrypted package (EncryptedPackage + EncryptionInfo streams).

Password-protected Office documents are stored as an OLE compound document containing an EncryptedPackage stream (AES-encrypted inner OOXML) and an EncryptionInfo stream (key-derivation metadata). Encryption defeats most content-based filtering at email gateways; threat actors use it to deliver macro/exploit-carrier spreadsheets and documents that would otherwise be detected by string-based scanners. Legitimate password-protected business documents do occur, but the bulk of phishing-delivered encrypted Office files are malicious — treat as a context-amplifier rather than a verdict on its own.

Office document signature is cryptographically invalid medium OFFICE_DOC_SIGNATURE_INVALID

A whole-document signature's CMS failed verification.

The document's Authenticode/PKCS#7 signature is present but does not cryptographically verify: either the signer's signature over the signed attributes is invalid, or the signed content digest does not match the signature's encapsulated content. The signature was tampered with, forged, or corrupted — a strong tell when paired with payload signals.

Ole10Native inner payload size exceeds remaining bytes medium OFFICE_PACKAGE_SIZE_MISMATCH

Inner `payloadSize` field declares more bytes than remain in the Ole10Native stream.

Readers that allocate from this field and copy without checking are an out-of-bounds-read primitive.

Ole10Native outer length disagrees with stream size medium OFFICE_PACKAGE_HEADER_ANOMALY

The leading 4-byte length field of an `\x01Ole10Native` stream does not equal the stream byte count.

Parser-divergence shape: readers that trust the field read different bytes than those that walk to end-of-stream.

Remote image (web beacon / tracking pixel) medium OOXML_IMAGE_BEACON

Document contains an external image relationship targeting an http(s):// URL.

An image relationship with an external http:// or https:// target is fetched by Office when external content is allowed. This can reveal the victim's IP address and timestamp to the attacker's server (tracking beacon). In some Windows Integrated Authentication configurations, the request may also expose NTLM authentication material, but plain HTTP image fetches are not a guaranteed NTLM-leak path. Caveat: documents exported from web-based editors or CMS platforms may include externally-hosted images; verify whether the target URL is a known-legitimate host before escalating.

Spreadsheet DDE link medium OOXML_SPREADSHEET_DDE_LINK

An Excel externalLinks/ddeLink entry was found.

Excel workbooks can store DDE links in xl/externalLinks/externalLink*.xml rather than in Word field instructions. Benign DDE data links exist, but attackers can set ddeService and ddeTopic to execute a shell command when the link is updated.

Standalone OOXML relationship XML medium OOXML_STANDALONE_RELS

File is raw OOXML .rels relationship XML rather than a valid OOXML ZIP package.

OOXML relationship files define external content that Office may load from a package. A standalone .rels file with an Office extension is malformed, but when it declares a remote template relationship it is still a strong indicator of remote-template-injection tooling or a stripped payload.

VBA __SRP_ cache stream exceeds 8 MB medium OLE_VBA_PERFORMANCE_CACHE_OVERSIZE

A VBA performance-cache (__SRP_*) stream exceeds 8 MB.

Real-world cached compiled modules are tiny; oversize caches are an allocator-stress shape and have been observed in some Office DoS campaigns.

VBA digital signature is cryptographically invalid medium OLE_VBA_SIGNATURE_INVALID

The VBA project's Authenticode signature failed verification.

A signature blob is present but does not cryptographically verify: either the signer's signature over the signed attributes is invalid, or the signed content digest does not match the signature's encapsulated content. This indicates the signature was tampered with, forged, or corrupted — a strong tell when paired with macro payload signals.

VBA macros present medium OLE_VBA_MACROS

Document contains VBA macro code.

VBA macros can automate tasks but are also the most common delivery mechanism for Office-based malware. Macros can download and execute arbitrary code when enabled by the user.

VBA project in OOXML medium OOXML_VBA

Document contains vbaProject.bin — VBA macros are present.

Same risk as OLE macros. The document can run VBA code when macros are enabled. Check the accompanying VBA keyword findings for details.

AutoOpen macro low OLE_VBA_AUTOOPEN

Macro with AutoOpen trigger found.

AutoOpen runs automatically when a Word document is opened. This is a capability/context marker — it tells you the macro runs without further interaction — not evidence of malice on its own; legitimate macro documents use it routinely. Rated low: a malicious document still convicts via its actual payload signal (Shell/download/LOLBin), so the auto-exec trigger is a corroborating context cue, not a standalone flag.

Auto_Close macro low OLE_VBA_AUTOCLOSE

Macro with Auto_Close trigger found.

Auto_Close runs automatically when an Office document closes. Close-time execution can delay activity past a sandbox, but the trigger itself is a capability/context marker, not standalone evidence of malice; the payload signal is what convicts.

Auto_Open macro low OLE_VBA_AUTO

Macro with Auto_Open trigger found.

Auto_Open is the legacy Excel auto-execute macro. Like AutoOpen in Word, it runs code automatically when the file is opened — a capability/context marker, not standalone evidence of malice.

Call-to-action shape / download button low OOXML_DOWNLOAD_SHAPE

Document drawing contains a call-to-action phrase in a shape or text box.

Shapes with phrases like 'Click Here to Enable Content', 'Download Now', or 'Open Document' are the Office equivalent of the PDF fake-button overlay, tricking users into enabling macros or following a malicious link. Caveat: these phrases appear legitimately in user manuals, training materials, onboarding documents, and any instructional content that guides users through a process. This finding is low-signal; elevate concern only when combined with macros, external relationships, or hidden sheets.

DDE field low OOXML_DDE

A DDE field instruction was found in the document XML. The command does not reference a known-dangerous executable.

DDE (Dynamic Data Exchange) fields link a document to an external data source or application. Benign uses include pulling live data from Excel spreadsheets or databases. However, DDE can be abused to execute arbitrary commands. This particular field does not appear to launch a dangerous program, but you should review the detail to confirm.

Document_Open macro low OLE_VBA_DOCOPEN

Macro with Document_Open event handler found.

Document_Open is the modern equivalent of AutoOpen — it fires when the document opens. A capability/context marker, not standalone evidence of malice; the payload signal is what convicts.

Environ() call low OLE_VBA_ENVIRON

VBA macro uses Environ() to access environment variables.

Environ() is used widely in legitimate macros for locale paths and user temp directories. It does appear in droppers (to find %TEMP% / %APPDATA% for staging payloads) but on its own is too noisy to be more than LOW.

External hyperlinks (summary) low OOXML_EXTERNAL_HYPERLINKS

Document contains one or more external hyperlinks.

Word stores every clickable URL as an external relationship. This rule summarises the count and surfaces the first target so the analyst can review without each hyperlink being scored separately.

Hidden worksheet low OOXML_HIDDEN_SHEET

Excel workbook contains hidden or veryHidden worksheets.

Hidden and 'veryHidden' Excel sheets are commonly used to conceal macro scaffolding, staging data, or intermediate payload construction from the user. Caveat: hidden sheets are routine in legitimate professional Excel workbooks — financial models hide calculation sheets and lookup tables, enterprise templates hide configuration sheets, and many vendor-supplied spreadsheets protect their formulas this way. This finding is low-signal on its own; treat as significant only when combined with VBA macros or external relationships.

Malformed OOXML package with recoverable local headers low OOXML_MALFORMED_ZIP_LOCAL_HEADERS

OOXML ZIP central directory is invalid, but local headers expose Office parts.

Office and tolerant ZIP readers may recover document parts from local file headers even when the central directory is malformed or missing. This is a parser-divergence shape; it is low-signal by itself, but important when the recoverable local parts include VBA projects, ActiveX controls, or XLM macro sheets.

OLE dirents share an unrecognised CLSID low OLE_DIRENT_CLSID_DUPLICATE

Multiple direntries share a non-null, unrecognised CLSID (>= 4 occurrences).

Office host CLSIDs (Excel, Word, PowerPoint, Equation Editor, MathType, Visio) legitimately repeat. Heavy duplication of a less-common CLSID is the shape used by containers trying to hide extra invocation points of the same parser surface from scanners that only inspect the first match.

Ole10Native tempPath leaks an AppData or Temp path low OFFICE_PACKAGE_TEMP_PATH_LEAK

Package's tempPath references an `AppData\` or `Temp\` folder of the author's machine.

Weak signal on its own — legitimate drag-and-drop attachments produce this — but a useful axis contributor when paired with other anomalies.

VBA project signed with a self-signed certificate low OLE_VBA_SIGNATURE_SELF_SIGNED

The VBA project signing certificate is self-signed (issuer == subject).

The macro project is signed, but with a self-signed certificate that no certificate authority vouches for. Self-signed VBA signing is the common trick to make a project appear signed/trusted without a real publisher identity. A context cue, not a conviction on its own.

Workbook_Open macro low OLE_VBA_WBOPEN

Macro with Workbook_Open event handler found.

Workbook_Open runs automatically when an Excel workbook opens. It is near-universal in legitimate macro-enabled business spreadsheets, so it is a capability/context marker rather than evidence of malice. Rated low: a malicious workbook still convicts via its actual payload (Shell/download/LOLBin); the auto-exec trigger alone does not.

Office document is digitally signed info OFFICE_DOC_SIGNED

The document/package carries a whole-document digital signature.

Office can seal a whole document — distinct from a VBA-project signature — via an OOXML XML-DSig part (_xmlsignatures/sig*.xml) or a legacy-OLE \x05DigitalSignature PKCS#7 stream. The signer certificate, issuer and validity are shown for context. Presence of a signature is NOT by itself a benign indicator and does not affect the risk score. Legacy-OLE signatures are cryptographically self-consistency checked; OOXML XML-DSig is surfaced for identity but its signature value is not recomputed (no XML canonicalization in the sandbox), so it shows as 'unverified' rather than risking a false 'invalid'.

Office document signed with a self-signed certificate info OFFICE_DOC_SIGNATURE_SELF_SIGNED

The document signing certificate is self-signed (issuer == subject).

The document is sealed with a self-signed certificate that no certificate authority vouches for. Unlike self-signed VBA macro signing (a real attacker trick to clear the macro trust bar), self-signed document sealing is a mainstream-legitimate integrity pattern, so this is informational context (the signature box's 'Self-signed' status) and does not affect the risk score; tampering surfaces as OFFICE_DOC_SIGNATURE_INVALID instead.

Unsupported Office format for VBA extraction info OFFICE_FORMAT_UNSUPPORTED

olevba could not extract VBA macros from the document; VBA source extraction was skipped.

olevba (and its olefile dependency) ran into a parse failure on this specific file — common causes include legacy formats (Excel 4/5 BIFF), encrypted streams, hand-crafted/malformed OLE compound storage, or anti-analysis structures that trip the parser. Format-agnostic byte-level scans still ran, so the verdict is real and re-scanning the same bytes will yield the same outcome — unlike SCAN_INCOMPLETE, this finding does not flag the result as needing retry.

VBA project is digitally signed info OLE_VBA_SIGNED

The VBA macro project carries an Authenticode digital signature.

Office can digitally sign a VBA project so it runs under a 'signed macros only' trust setting. The signer certificate, issuer and the signature's cryptographic validity are shown for context. Presence of a signature is NOT by itself a benign indicator — VBA stomping and signature-stripping mean a signed project can still be hostile — so this finding is informational and does not affect the risk score. It distinguishes the VBA-project signature from a whole-document signature, which is a separate feature.

RTF 16

Equation Editor CLSID critical RTF_EQUATION_EDITOR

Equation Editor OLE CLSID (0002CE02) found in RTF hex data.

This CLSID instantiates the vulnerable Equation Editor component. CVE-2017-11882, CVE-2018-0802, and CVE-2018-0798 are among the most exploited Office vulnerabilities in history, and RTF is the most common delivery format.

Equation Editor object class critical RTF_OBJCLASS_EQUATION

Object class name references Equation Editor.

The explicit mention of the Equation Editor class confirms the document is attempting to instantiate the vulnerable component.

PE header in hex data critical RTF_MZ_HEX

MZ header (hex '4D5A') found in RTF hex data.

The presence of a Windows PE executable header (MZ) inside hex-encoded RTF data means an executable is embedded in the document. This is consistent with an embedded dropper payload.

Automatically linked OLE object high RTF_OBJAUTLINK

RTF contains \objautlink — an automatically linked OLE object marker.

The \objautlink control word marks an OLE object as an automatic link. In malicious RTFs this is a useful activation/update surface, especially when paired with \objdata and \objupdate. Legitimate use is uncommon in modern document exchange.

INCLUDETEXT/INCLUDEPICTURE remote URL high RTF_INCLUDE_REMOTE

RTF document uses INCLUDETEXT or INCLUDEPICTURE with an http:// or https:// URL.

RTF \fldinst blocks with INCLUDETEXT or INCLUDEPICTURE and a remote (http:// or https://) target can cause Word to fetch the remote resource when the document is opened, depending on Office version and external-content settings. This is a remote template injection vector: the attacker controls what content is fetched, can steal NTLM credentials via a UNC redirect, or deliver a second-stage payload. Caveat: legitimate RTF documents very rarely include remote http:// field references; this construction has almost no benign use in consumer-produced documents, making the false-positive rate low.

Large hex data blocks high RTF_EXCESSIVE_HEX

RTF contains large blocks of hex-encoded data.

Legitimate RTF files rarely contain very large hex blocks. Excessive hex data usually hides an embedded payload (executable, shellcode, or exploit object) encoded in hexadecimal.

OLE Package CLSID high RTF_PACKAGE_OLE

OLE Package CLSID pattern found alongside object data.

The Package CLSID combined with embedded OLE data suggests the document wraps an arbitrary file (potentially an executable) inside an OLE Package object.

Obfuscated control words high RTF_OBFUSCATION

Many RTF control words appear fragmented or obfuscated.

RTF parsers are tolerant of whitespace in control words. Malware authors insert spaces to break up keywords (like 'o b j d a t a') so that simple string scanners miss them.

PHP IRC bot source embedded in RTF high RTF_PHP_IRC_BOT_SOURCE

RTF text contains PHP IRC bot source code.

This rule looks for a compound source-code pattern: PHP markers, socket connection calls, IRC protocol commands such as JOIN or PRIVMSG, and bot-control strings. The RTF is not necessarily an Office exploit, but it is carrying operational malware source code and should not be treated as a clean document.

Package object class high RTF_OBJCLASS_PACKAGE

OLE Package object found in RTF.

OLE Package objects can wrap arbitrary files (including executables) inside a document. The packaged file can be extracted and run when the user double-clicks the object.

Remote template injection (\*\template to remote URL) high RTF_REMOTE_TEMPLATE

The RTF's \*\template destination is a remote URL/UNC path that Word fetches and loads on open.

RTF template injection (MITRE T1221): the document attaches a remote template via {\*\template <url>}. On open, Word retrieves and loads it, which can deliver a macro/exploit template, a scriptlet/HTA (.html/.hta target), or leak NTLM credentials over a UNC path. Benign RTFs attach only a local template, so a remote target is the injection itself. Obfuscated targets (\uN/\'xx escapes), raw-IP or dynamic-DNS hosts, and active/script extensions escalate it to critical.

\objupdate forces OLE activation high RTF_OBJUPDATE

RTF contains \objupdate — forces automatic OLE object activation.

The \objupdate control word forces Word to immediately instantiate the embedded OLE object when the document is opened, without requiring the user to double-click or interact with the object. This is a near-universal indicator of Equation Editor exploit documents — it ensures the vulnerable EQNEDT32.EXE process is spawned automatically. Legitimate use of \objupdate is extremely rare.

\pFragments control word high RTF_PFRAGMENTS_RELATED

RTF contains \pFragments without the oversized value needed for CVE-2010-3333.

The \pFragments control word is the parser surface abused by CVE-2010-3333, but this finding does not include the oversized numeric argument used by the public stack-overflow trigger.

Embedded OLE object medium RTF_OBJEMB

RTF contains \objemb — an embedded OLE object marker.

The \objemb control word marks an embedded OLE object. Combined with \objdata, it indicates an object is fully embedded in the RTF.

OLE object data medium RTF_OBJDATA

RTF contains \objdata sections with embedded OLE objects.

RTF documents can embed OLE objects via \objdata sections. These objects can carry executables, scripts, or trigger exploits in OLE-handling code.

OlePres presentation stream in RTF OLE object medium RTF_OLEPRES_STREAM

RTF embedded OLE object contains an OlePres presentation stream marker.

OlePres is an OLE presentation stream name. It is relevant to the CVE-2025-21298 attack surface, but the stream name alone is common in embedded OLE objects and does not prove malformed OlePres internals.

HWP 14

Embedded PE executable critical HWP_EMBEDDED_PE

PE executable found inside HWP document.

A Windows executable hidden inside an HWP document is a clear indicator of malware. The document is carrying an executable payload.

PostScript exec command critical HWP_PS_EXEC

PostScript 'exec' operator found in embedded PostScript.

The PostScript 'exec' operator executes a string as PostScript code. In malicious PostScript it can run dynamically decoded payloads.

PostScript runtime hex-to-code execution critical HWP_PS_CVX_EXEC

PostScript hex string converted to executable code and executed at runtime.

The pattern '<HEX...> cvx exec' decodes a PostScript token from a hex literal and executes it. APT-grade HWP exploits use this to stage payloads in pieces — every fragment is reconstructed at parse time so static scanners that only look for plain 'exec' tokens never see the dangerous operator string in the file. A handful of these is a strong indicator the embedded EPS is a weaponised exploit, not a benign vector graphic.

PostScript system call critical HWP_PS_SYSTEM

PostScript 'system' operator found.

Some PostScript interpreters expose a 'system' operator or equivalent extension that can run operating-system commands. Its presence in embedded PostScript is high-risk and unusual in ordinary documents.

Shell command reference critical HWP_SHELL_CMD

Reference to a shell command (cmd.exe, powershell, etc.) in HWP.

Direct references to system shells inside a document can indicate an attempt to execute commands on the recipient's system.

Embedded PostScript / EPS high HWP_POSTSCRIPT

HWP document contains embedded PostScript or EPS content.

Embedded PostScript/EPS is a common exploit surface in targeted HWP campaigns. PostScript is a full programming language; file and command execution depends on the interpreter and sandbox configuration.

Hex-encoded data in PostScript high HWP_PS_HEXCODE

Many hex escape sequences found in PostScript content.

A high number of hex escape sequences (\xNN) in PostScript suggests the presence of encoded shellcode or binary payloads.

JavaScript in HWP high HWP_JAVASCRIPT

JavaScript references found in HWP document.

JavaScript in an HWP document is unusual and potentially dangerous. It may indicate an attempt to exploit the document viewer.

PostScript file operation high HWP_PS_FILE

PostScript file operation (file/run/deletefile) found.

File operations in PostScript allow reading, writing, or deleting files on the system — capabilities that exploits use to drop payloads.

External URL in HWP medium HWP_URL

External URL(s) found in HWP document content.

URLs in HWP content may be used to download second-stage payloads or connect to command-and-control servers.

PostScript decode filter medium HWP_PS_FILTER

PostScript decode filter (SubFileDecode, ASCIIHexDecode, etc.) found.

Decode filters in PostScript can be used to hide encoded payloads that are decoded at runtime.

Scripts storage medium HWP_SCRIPTS_STREAM

OLE-based HWP contains a Scripts storage section.

A Scripts section in an HWP OLE container may contain executable code that runs when the document is opened.

BinData stream low HWP_BINDATA

OLE-based HWP contains a BinData storage section.

BinData stores binary objects (images, OLE objects). While normal, it can also contain malicious embedded objects.

Compressed sections info HWP_COMPRESSED

Zlib-compressed sections were found and decompressed for analysis.

HWP 5.0+ files store sections compressed with zlib. Decompressing them allows scanning for embedded threats. This is informational.

CVE 88

Acrobat prototype-pollution PoC/exploit pattern — CVE-2026-34621 related critical CVE likely CVE_2026_3461_RELATED

PDF JavaScript combines Acrobat prototype pollution targeting privileged state with an execution or sensitive file-read primitive.

This rule covers CVE-2026-34621-themed PoC/exploit variants that do not carry the exact internal Adobe API markers used by the stricter CVE_2026_34621 rule. It requires a high-signal combination: prototype pollution of privileged state such as __trusted, constructor.prototype.bypass, or __proto__.privileged, plus concrete impact behavior such as app.launchURL(file/cmd/osascript), ActiveXObject('WScript.Shell'), or util.readFileIntoStream(cDIPath). That co-occurrence is not expected in benign PDF forms and keeps the match cheap by using bounded literal and regex checks over raw, decompressed, and bounded base64-decoded JavaScript.

Adobe Acrobat malformed TrueType bitmap font — CVE-2023-26369 critical CVE exact CVE_2023_26369

Embedded TrueType font has malformed EBLC/EBDT bitmap-glyph placement plus the EBSC max-range table trap.

CVE-2023-26369 is an out-of-bounds write in Adobe Acrobat Reader's libCoolType sfac_GetSbitBitmap path. Project Zero documented a font whose EBLC/EBDT compound bitmap glyph metadata places a component beyond the bitmap buffer computed from the glyph metrics, and whose EBSC table record declares offset and length as 0xffffffff to avoid loading in many non-Adobe font parsers. This rule parses the embedded sfnt directory and bitmap tables and fires only when both structural conditions are present.

Adobe Acrobat/Reader privileged API chain — CVE-2026-34621 critical CVE exact CVE_2026_34621

PDF JavaScript uses Acrobat internal share/login APIs, swConn prototype manipulation, and privileged RSS/file-read APIs.

CVE-2026-34621 is an Adobe Acrobat/Reader JavaScript exploit chain reported in actively exploited malicious PDFs. Public analyses describe abuse of internal APIs such as ANFancyAlertImpl, ANShareFile, and SilentDocCenterLogin, prototype/getter manipulation around swConn, and privileged APIs such as RSS.addFeed/removeFeed or util.readFileIntoStream to fingerprint the victim and retrieve staged JavaScript. The scanner matches that combined marker set, including when the JavaScript is hidden inside a long base64 AcroForm value.

Adobe Flash authplay SWF exploit in PDF — CVE-2010-1297 critical CVE likely CVE_2010_1297_FLASH_RICHMEDIA

PDF combines RichMedia Flash activation, a crafted SWF with authplay-era markers, and PDF-side shellcode heap-spray staging.

CVE-2010-1297 is an Adobe Flash/authplay.dll memory-corruption vulnerability exploited in the wild through malicious PDFs containing crafted SWF content plus encoded JavaScript heap-spray stages. This rule requires RichMedia Flash activation, an embedded SWF with ActionScript prototype/AVM-era markers or AES-PHP/authplay variant markers seen in the 2010 exploit family, and PDF-side shellcode staging. Ordinary RichMedia Flash documents and SWFs without the heap-spray stage are not attributed to the CVE.

Adobe Flash/authplay SWF exploit in PDF — CVE-2009-1862 critical CVE likely CVE_2009_1862_FLASH_RICHMEDIA

PDF combines RichMedia Flash activation with a crafted Run_Sploit/HeapSpray SWF or compact AS3 SWF plus PDF-side encoded shellcode.

CVE-2009-1862 is the July 2009 Adobe Flash Player and Adobe Reader/Acrobat authplay.dll vulnerability exploited through crafted SWF content, including SWFs embedded in PDFs. This rule requires RichMedia Flash activation plus an embedded SWF with Run_Sploit, HeapSpray, ByteArray, and spray-marker content, or the compact DoABC/SymbolClass AS3 shape paired with PDF-side encoded shellcode staging, associated with the 2009 authplay exploit shape.

Adobe Reader CoolType SING font exploit — CVE-2010-2883 critical CVE likely CVE_2010_2883

PDF embeds a TrueType/OpenType SING font table together with JavaScript heap-spray shellcode.

CVE-2010-2883 is the Adobe Reader/Acrobat CoolType SING table stack overflow exploited in weaponised PDFs. The rule requires an actual SING table in a decoded sfnt font stream plus heap-spray JavaScript, which keeps it narrower than a plain string match for the word SING.

Adobe Reader Document ID JavaScript overflow — CVE-2018-4901 critical CVE likely CVE_2018_4901

PDF has an overlong trailer /ID and JavaScript dereferences this.docID.

CVE-2018-4901 is an Adobe Acrobat Reader DC stack buffer overflow in Document ID handling. The Talos-described trigger requires an overly large trailer /ID value and JavaScript access to `this.docID`, which causes Reader to re-encode the ID through the vulnerable EScript path.

Adobe Reader JPEG2000 JPX command payload exploit — CVE-2018-4990 critical CVE likely CVE_2018_4990_JPX_EMBEDDED_CMD

PDF embeds a malformed JPX/JPEG2000 image whose JP2 header area contains a command-execution/download payload.

CVE-2018-4990 is an Adobe Acrobat/Reader JPEG2000 parser memory-corruption vulnerability. This rule requires a /JPXDecode stream, malformed JP2 box structure, and a command/download payload embedded inside the JPEG2000 stream body. Plain JPEG2000 images, and malformed JPX images without an execution/download payload, remain covered only by related JPX anomaly rules.

Adobe Reader Launch action command execution — CVE-2010-1240 critical CVE likely CVE_2010_1240

PDF uses /Launch with shell parameters and an embedded/exported payload chain.

CVE-2010-1240 is the Adobe Reader/Acrobat Launch File warning dialog abuse used by malicious PDFs to drop or rename an embedded payload and then start it through a /Launch /Win action. The detector requires a Launch action with command-shell parameters plus embedded-file/export evidence such as exportDataObject/nLaunch:0 or an EmbeddedFiles/EF payload chain.

Adobe Reader LibTIFF XFA image exploit — CVE-2010-0188 critical CVE likely CVE_2010_0188

PDF contains XFA JavaScript that heap-sprays shellcode, builds a TIFF image payload, and assigns it to an XFA image rawValue.

CVE-2010-0188 was widely exploited through XFA JavaScript that generated a malformed TIFF image and assigned it to an image field, causing Adobe Reader/Acrobat to parse the crafted TIFF through the vulnerable LibTIFF path. The rule decodes the common long-hex XFA wrapper and requires the TIFF payload marker, rawValue trigger, and heap-spray/version-selection logic, including split-string character-table wrappers seen in older kits. It is a high-confidence exploit-template match rather than a full TIFF structural validator.

Adobe Reader ToolButton UAF — CVE-2014-0496 critical CVE exact CVE_2014_0496

PDF JavaScript combines app.addToolButton(), app.removeToolButton(), heap-spray arrays, and unescape('%u...') shellcode markers.

CVE-2014-0496 is an Adobe Reader/Acrobat use-after-free vulnerability in affected 10.x and 11.x releases. Public exploit examples use the ToolButton JavaScript API pattern: add a toolbar button, trigger code through cEnable, remove the button, and heap-spray shellcode/ROP data with unescape('%u...') strings and large arrays. That combination is not consistent with a benign interactive form.

Adobe Reader XFA oneOfChild exploit - CVE-2013-0640 critical CVE likely CVE_2013_0640

PDF contains the XFA choiceList/oneOfChild trigger shape associated with CVE-2013-0640.

CVE-2013-0640 is an Adobe Reader/Acrobat XFA memory-corruption vulnerability exploited in the wild in 2013. The rule requires the specific JavaScript/XFA sequence described in public technical analysis: resolve a choiceList, mutate a draw object's keep.previous property to contentArea, and reattach the choiceList through the UI node's oneOfChild property via a timer. Requiring all of these elements avoids flagging ordinary XFA forms that merely contain choice lists or resolveNode() form logic.

Adobe Reader authplay SWF exploit in PDF — CVE-2010-3654 critical CVE likely CVE_2010_3654_FLASH_RICHMEDIA

PDF combines RichMedia Flash activation, AS3 DoABC/SymbolClass SWF code, and PDF-side shellcode heap-spray staging.

CVE-2010-3654 affects Adobe Reader/Acrobat authplay Flash handling when crafted SWF content is embedded in a PDF. This detector requires all three parts of the submitted exploit shape: RichMedia Flash activation, an ActionScript 3 SWF with DoABC/SymbolClass and URLRequest or StagePlayer/ByteArray/loadBytes code, and PDF-side heap-spray or shellcode stage evidence. Those gates avoid labelling ordinary RichMedia Flash documents.

Adobe Reader mailto URI command execution — CVE-2007-5020 critical CVE likely CVE_2007_5020_MAILTO_MSHTA

PDF contains a crafted mailto URI that reaches mshta via path traversal and executes inline script.

CVE-2007-5020 is the Adobe Reader/Acrobat 8.1 mailto URI command-execution issue described by Adobe APSA07-04. The rule requires a mailto URI with traversal or the historical percent-slash shape, an mshta target, and inline JavaScript/WScript.Shell execution markers. Normal mailto links and generic suspicious command paths remain covered only by the lower-confidence URI rule.

C6 Messenger DownloaderActiveX — CVE-2008-2551 critical CVE exact CVE_2008_2551

HTML or PDF-embedded HTML configures C6 Messenger DownloaderActiveX to download and run a file.

CVE-2008-2551 affects the Icona/C6 Messenger DownloaderActiveX control. The rule requires the vulnerable control identity (DownloaderActiveX or CLSID c1b7e532-3ecb-4e9e-bb3a-2951ffe67c61), propDownloadUrl, and propPostDownloadAction=run. This avoids flagging generic ActiveX content.

CAB/HTML external object — CVE-2021-40444 critical CVE exact CVE_2021_40444

OOXML external OLEObject relationship targets HTML/CAB/MSHTML-style content.

The scanner parses .rels entries with TargetMode="External" and assigns this CVE only when the relationship has the stricter OLEObject gadget shape associated with CVE-2021-40444. Broader MSHTML/CAB/MHTML external targets are reported separately as OFFICE_MSHTML_EXTERNAL_OBJECT.

CVE-2010-1297 — Adobe Flash/Reader authplay heap-spray (Flash-in-PDF) critical CVE likely CVE_2010_1297

PDF embeds a Flash SWF (RichMedia) and its de-obfuscated JavaScript heap-sprays to groom memory for the Flash exploit.

CVE-2010-1297 is an Adobe Flash / authplay.dll memory-corruption vulnerability exploited through PDFs that embed a crafted SWF and use JavaScript to heap-spray the process before the Flash object loads. This rule fires when an embedded Flash SWF co-occurs with a JavaScript heap-spray recovered after de-obfuscating the evasions these kits use — space-padded %u tokens, split unescape, and fromCharCode/\u spray builders inside a grooming loop — which defeat the raw %u0c0c/%u9090 heap-spray rules. Benign interactive Flash PDFs (sound players, text widgets) carry no spray and are not attributed.

CVE-2017-0262 related DDE stager critical CVE_2017_0262_DDE_STAGER

Word DDE field downloads and executes the observed CVE-2017-0262 second-stage URL.

This rule is intentionally not an exact exploit-body detector. The local DOCX contains a DDE PowerShell chain that downloads the observed sendmevideo.org/dh2025e/eee.txt stage and executes it with powershell -enc, but it does not contain the EPS/object memory-corruption bytes locally. The CVE label is therefore reported as related campaign-stage evidence.

Collab.collectEmailInfo — CVE-2007-5659 critical CVE exact CVE_2007_5659

PDF JavaScript calls Collab.collectEmailInfo() with a long or heap-sprayed message argument.

CVE-2007-5659 is a buffer overflow in Adobe Reader triggered by a long argument to the Collab.collectEmailInfo() JavaScript API. This was one of the earliest widely-exploited PDF JavaScript vulnerabilities. The rule requires either Collab.collectEmailInfo() with a quoted string at least 128 bytes long, or the decoded exploit-kit shape where JavaScript builds a %u0c0c heap spray and passes that sprayed string through the msg field of Collab.collectEmailInfo(). It also catches older variants that decode shellcode with unescape(), assemble a large version-dependent buffer, and pass that variable through msg, including annotation-/Subject staged payloads that are recovered before CVE matching. Plain short calls are not flagged. CISA KEV.

Collab.getIcon — CVE-2009-0927 critical CVE exact CVE_2009_0927

PDF JavaScript calls Collab.getIcon() with a long string argument.

CVE-2009-0927 (CVSS 9.3) is a stack-based buffer overflow in Adobe Reader triggered by the Collab.getIcon() JavaScript API with a crafted string argument. The overflow allows arbitrary code execution. The rule requires Collab.getIcon() with a quoted string at least 64 bytes long; plain short calls are not flagged. CISA KEV.

Composite Moniker — CVE-2017-8570 critical CVE likely CVE_2017_8570

OLE data contains the Composite Moniker CLSID with nearby scriptlet payload evidence.

CVE-2017-8570 abuses Composite Moniker handling to load scriptlet content. The CVE rule now requires the Composite Moniker CLSID plus nearby SCT/scriptlet/scrobj-style payload evidence. Bare Composite Moniker evidence is reported as RTF_COMPOSITE_MONIKER_RELATED instead.

Doc.printSeps — CVE-2010-4091 critical CVE exact CVE_2010_4091

PDF JavaScript invokes Doc.printSeps() with exploit-shaped arguments.

CVE-2010-4091 is a memory corruption vulnerability triggered by a crafted argument to Doc.printSeps(). The rule matches long quoted strings, large hex constants such as 0xffff, or 10+ digit numeric arguments.

EPS image filter — CVE-2017-0261/0262 critical CVE_2017_0261

Document references EPSIMP32 or contains PostScript/EPS markers.

CVE-2017-0261 and CVE-2017-0262 exploit the Windows EPS (Encapsulated PostScript) image filter (EPSIMP32.FLT) used by Microsoft Office. A crafted EPS image inside an Office document triggers a use-after-free or out-of-bounds read that allows arbitrary code execution. These vulnerabilities were exploited in highly targeted APT campaigns in 2017. The rule matches EPSIMP32 or %!PS-Adobe markers; it does not validate the malformed EPS needed to identify a specific EPS CVE.

Embedded Flash authplay SWF exploit — CVE-2010-1297 likely critical CVE likely CVE_2010_1297_FLASH_EMBEDDED

PDF embeds a crafted authplay-era SWF and pairs it with PDF-side shellcode heap-spray staging.

Some CVE-2010-1297 exploit PDFs carry the malicious SWF in a plain compressed stream rather than exposing a canonical /RichMedia dictionary. This rule requires the same high-signal SWF markers used by the RichMedia variant plus a PDF-side encoded heap-spray stage, avoiding attribution for ordinary embedded Flash content.

Equation Editor Matrix overflow — CVE-2018-0798 critical CVE exact CVE_2018_0798

MTEF Matrix record exploit signature found in Equation Editor OLE data.

CVE-2018-0798 is a stack buffer overflow in the Matrix record (type 0x05) parser of EQNEDT32.EXE (Microsoft Equation Editor). Public writeups describe it as affecting Equation Editor broadly, including builds patched for CVE-2017-11882 and CVE-2018-0802. The public exploit shape uses 0x60 bytes followed by 0x61 padding and a return address (0x0BFB) to hijack execution. Widely used by APT groups including Conimes, KeyBoy, Emissary Panda, and Rancor.

Equation Editor OLE1 payload — CVE-2017-11882 related critical CVE_2017_11882_RELATED

RTF contains an activated OLE1 Equation.3 object with large payload-like native data.

The document reaches the legacy Equation Editor parser through an OLE1 Equation.3 object and supplies large high-entropy native data or an embedded PE-like payload. This is a weaponized Equation Editor RCE delivery shape associated with CVE-2017-11882 and CVE-2018-0802, but the scanner did not recover the malformed MTEF FONT/MATRIX record required for exact CVE attribution.

Equation Editor Ole10Native payload — CVE-2017-11882 likely critical CVE likely CVE_2017_11882_EQUATION_OLE10NATIVE

RTF activates a Microsoft Equation 3.0 OLE storage carrying a high-entropy Ole10Native payload.

Normal Equation Editor OLE objects store Equation Native/MTEF data. This rule requires decoded RTF objdata containing a CFB whose Root Entry CLSID is Microsoft Equation 3.0, RTF activation controls such as \objemb and \objupdate, and a high-entropy Ole10Native payload stream. That combination is a weaponized Equation Editor RCE delivery shape consistent with CVE-2017-11882/CVE-2018-0802 while avoiding attribution from the Equation CLSID alone.

Equation Editor SIZE overflow — CVE-2018-0802 critical CVE likely CVE_2018_0802

MTEF SIZE record contains an exploit-sized explicit point size or delta.

CVE-2018-0802 is an Equation Editor memory-corruption vulnerability in the SIZE-record parsing path. This rule requires a valid Equation Native stream and an implausibly large SIZE value, so it is stronger than Equation Editor activation or CLSID evidence alone.

Excel EXTSST record overflow — CVE-2011-0105 critical CVE exact CVE_2011_0105

Excel's shared-string index table (EXTSST) declares far more entries than the workbook's string count allows — the CVE-2011-0105 memory-corruption shape.

Microsoft Excel (MS11-021) rebuilds the shared-string index from an EXTSST record whose ISSTINF entry count must equal ceil(cstUnique / Dsst) taken from the preceding SST record. An EXTSST declaring far more entries drives a copy with an attacker-controlled length and source — the documented CVE-2011-0105 data-initialization memory-corruption primitive, where the over-declared array doubles as the heap-spray / pointer table — achieving code execution from a crafted .xls.

Excel FEATHEADER record overflow — CVE-2009-3129 critical CVE exact CVE_2009_3129

Excel FEATHEADER record declares an oversized/inconsistent internal length — the CVE-2009-3129 parser-overflow shape (legitimate records are only tens of bytes).

Microsoft Excel 2007/2003 SP1 FEATHEADER (Feature Header) parser uses an attacker-controlled cbHdrData length in a memcpy. A FEATHEADER record with an oversized record size or cbHdrData value is the documented exploit primitive — legitimate FEATHEADER records are tens of bytes. CVE-2009-3129 was actively exploited in targeted attacks to achieve code execution from crafted .xls files.

Excel HTML/XML WorksheetOptions UAF shape — CVE-2019-1448 critical CVE likely CVE_2019_1448

Excel HTML/XML workbook markup contains unexpected nested content in x:WorksheetOptions.

Cisco Talos describes CVE-2019-1448 as a Microsoft Office Excel HTML/XML parser use-after-free in mso.dll. The reported proof-of-concept uses an Office HTML workbook with ExcelWorkbook/ExcelWorksheets and a malformed x:WorksheetOptions block containing unexpected nested markup.

Excel Index Array exploit — CVE-2008-3005 critical CVE likely CVE_2008_3005

Legacy Excel BIFF8 workbook combines a narrow FORMAT-index cluster with large OLE slack payload staging.

CVE-2008-3005 is the Microsoft Excel Index Array vulnerability from MS08-043. The rule is deliberately narrow: it requires a BIFF8 Workbook/Book stream with the observed low FORMAT-index cluster and a normal XF table, plus a large unallocated OLE sector region where pre-macro-era Excel exploits stage shellcode. Low built-in FORMAT records by themselves occur in clean workbooks, so the OLE slack payload-hiding context is mandatory for CVE attribution.

Excel invalid object access exploit — CVE-2009-0238 critical CVE likely CVE_2009_0238

Excel workbook has repeated malformed drawing-object (OBJ) records using an invalid target value, alongside shellcode/heap-spray context — the CVE-2009-0238 shape.

CVE-2009-0238 is an Excel remote-code-execution vulnerability triggered by a crafted workbook that causes invalid object access. This rule requires repeated malformed BIFF OBJ records whose ftMacro subrecord uses the invalid 0xFFFF target, plus shellcode or heap-spray context. Ordinary form-control OBJ records are not sufficient.

Excel object record corruption exploit — CVE-2009-0557 critical CVE likely CVE_2009_0557

Excel workbook combines abnormal Forms.CommandButton OBJ record IDs with XLM/VBA auto-execution context.

CVE-2009-0557 is the Microsoft Excel Object Record Corruption vulnerability. This detector keys on a narrow workbook structure: multiple Forms.CommandButton OBJ records with abnormal object-id gaps, several XLM macro sheets, and a VBA project. The auto-exec context is required so normal spreadsheets with form controls do not receive CVE attribution.

Follina/MSDT URI — CVE-2022-30190 critical CVE likely CVE_2022_30190

Document contains an ms-msdt: URI consistent with Follina payload delivery.

CVE-2022-30190 (Follina) is a critical Microsoft Office zero-day that allows arbitrary code execution without macros. When a specially crafted document containing an ms-msdt: URI is opened, Word fetches a remote HTML file that triggers the Microsoft Support Diagnostic Tool (MSDT) to execute a PowerShell payload. The scanner matches the URI string in Office content or relationships; that is strong evidence, but it does not prove the surrounding HTML/Office load path is live. Actively exploited in the wild since May 2022.

Ghostscript SAFER bypass in HWP/EPS — CVE-2017-8291 critical CVE exact CVE_2017_8291

Embedded PostScript/EPS uses Ghostscript CVE-2017-8291 exploitation primitives.

CVE-2017-8291 is a Ghostscript -dSAFER bypass/type-confusion issue exploited through crafted EPS/PostScript content, including HWP documents that embed EPS in BinData streams. The rule covers both major static exploit shapes: the public .rsdparams plus /OutputFile(%pipe%) command execution path, and the .eqproc type-confusion path commonly hidden in '<HEX> cvx exec' staged HWP/EPS payloads.

HWP/EPS PostScript exploit — CVE-2013-0808 critical CVE exact CVE_2013_0808

Embedded HWP EPS/PostScript matches the CVE-2013-0808 exploit staging shape.

This detector covers the stripped-header HWP/EPS exploit form associated with CVE-2013-0808. It requires a revision-gated PostScript program, a large /D40 hex payload, D41/D42 array heap grooming, repeated recursive /B bind stubs, and both sentinel searches ending in 22222 and 11111. Those gates are deliberately conjunctive so ordinary EPS graphics with large hex image data do not receive CVE attribution.

Hangul HWPX embedded OLE exploit — CVE-2015-6585 critical CVE likely CVE_2015_6585

HWPX BinData embeds a malformed prefixed OLE/CFB chart object with shellcode-style API markers.

CVE-2015-6585 affects Hancom Hangul handling of crafted embedded OLE/chart content. This rule is specific to HWPX packages that carry BinData/*.ole objects outside standard OOXML oleObject relationships and requires the exploit-shaped combination of a 4-byte-prefixed CFB body, VtChart/VtDataGrid object data, executable-memory API strings such as kernel32.dll and VirtualProtect, and shellcode sled bytes.

Malformed Word ActiveX package — CVE-2017-11826 critical CVE likely CVE_2017_11826_ACTIVEX_PACKAGE

RTF embeds a Word.Document.12 package with repeated null-CLSID ActiveX controls and an oversized activeX1.bin CFB payload.

CVE-2017-11826 is a Microsoft Office memory-corruption vulnerability triggered when Office mishandles objects in memory. This detector covers the submitted exploit carrier shape: decoded RTF objdata containing a Word.Document.12 package, many ActiveX XML controls with the invalid {00000000-0000-0000-0000-000000000001} class ID, repeated relationships to one activeX1.bin, and an oversized/highly-compressed CFB payload. The gates distinguish it from ordinary embedded ActiveX controls.

Moniker Link — CVE-2024-21413 critical CVE likely CVE_2024_21413

Document contains a file:///\\ moniker-link target with an exclamation mark.

CVE-2024-21413 (Moniker Link) is a Microsoft Outlook/Office vulnerability where a specially crafted hyperlink using the file:///\\host\share\path!something format can bypass Protected View and trigger NTLM authentication in affected Office/Outlook handling paths. The rule matches the file:///\\host\share...! target shape; it does not prove which host application or preview path will process the link.

OLE OlePres zero-click RCE — CVE-2025-21298 critical CVE_2025_21298

RTF embedded OLE object contains malformed OlePres evidence.

CVE-2025-21298 (CVSS 9.8) is a zero-click vulnerability in Windows ole32.dll where the UtOlePresStmToContentsStm function mishandles memory while processing malformed OlePres streams. Public advisories classify it as a use-after-free. Outlook's Preview Pane can trigger the vulnerable parsing path. A bare OlePres stream name is now reported as a generic RTF_OLEPRES_STREAM marker rather than this CVE, because normal embedded OLE objects can contain presentation streams.

OLE/COM security bypass — CVE-2026-21509 critical CVE_2026_21509

Document contains Shell.Explorer.1 CLSID evidence plus OLE activation context.

CVE-2026-21509 is a Microsoft Office security feature bypass (CVSS 7.8) that exploits reliance on untrusted inputs in OLE/COM security decisions. Attackers craft documents with manipulated metadata so the parser incorrectly marks dangerous objects as 'Safe for Initialization,' allowing code execution without security warnings. Actively exploited in the wild. The scanner matches the Shell.Explorer.1 CLSID or ProgID and, for text forms, requires embedding context such as Ole10Native, RTF objdata, or an embedded PE marker; bare text is not enough.

OLE2Link auto-activated remote loader — CVE-2017-0199 / CVE-2017-8759 critical RTF_OLE2LINK_REMOTE_MONIKER_LOADER

RTF OLE2Link object is force-activated with \objupdate and fetches a remote second stage via an INCLUDE field.

Field-delivered OLE2Link auto-update attack path shared by CVE-2017-0199 and CVE-2017-8759: an OLE2Link object (Package/OLE2Link CLSID {00000300-0000-0000-C000-000000000046}) is auto-activated on open via \objupdate (no user interaction), and the remote loader URL is carried in an INCLUDETEXT/INCLUDEPICTURE field rather than an inline URL Moniker — so the moniker-based detector does not fire. The two CVEs share this exact RTF artifact and diverge only in the C2 response: CVE-2017-0199 returns an HTA/scriptlet (OLE URL-moniker / htafile handler), CVE-2017-8759 returns a SOAP WSDL the .NET SOAP-moniker parser compiles. With the C2 unreachable the specific CVE is reported as RELATED, narrowing to a single CVE (LIKELY) only when the moniker URL carries a tell (.txt/!soap/wsdl -> 8759; .hta/.sct/script: -> 0199).

Office malformed EPS — CVE-2015-2545 critical CVE likely CVE_2015_2545

Office document embeds EPS/PostScript with exploit-style dynamic execution or decode-filter markers.

CVE-2015-2545 is a KEV-listed Microsoft Office vulnerability triggered by crafted EPS images. This rule is stricter than a plain EPS-header match: it requires EPS/PostScript content in an Office document plus dynamic PostScript execution or decode-filter markers such as cvx exec, system, deletefile/run, or ASCIIHex/ASCII85/SubFileDecode usage.

Outlook NTLM credential theft — CVE-2023-23397 critical CVE_2023_23397

Outlook .msg contains UNC reminder evidence: exact for ReminderFileParameter, related for raw UNC fallback.

CVE-2023-23397 is a critical Microsoft Outlook privilege escalation / credential theft vulnerability with CVSS 9.8. An attacker sends a specially crafted meeting request, task, or appointment where the PidLidReminderFileParameter property contains a UNC path pointing to an attacker-controlled server. When Outlook processes the item (even before the user opens it), Windows automatically authenticates to the remote server using NTLM, leaking the victim's Net-NTLMv2 hash. This hash can be cracked offline or used in relay attacks for lateral movement across the network. The sandboxed MSG parser matches the ReminderFileParameter property stream and marks this exact when it contains a UNC path; fallback raw-byte UNC evidence remains related.

Outlook composite moniker in img tag — CVE-2024-38021 critical CVE_2024_38021

Document XML/HTML contains an <img> tag with file://...!... moniker URL.

CVE-2024-38021 is a zero-click vulnerability in Microsoft Outlook where composite monikers in <img> tag URLs are processed without setting the BlockMkParseDisplayNameOnCurrentThread flag. This allows code execution through image tag URLs using the file://...!... moniker syntax. The rule matches <img> tags in Office document XML/HTML parts whose source uses that moniker shape; outside Outlook, treat this as related moniker-abuse evidence rather than proof of zero-click exploitation.

PDF.js FontMatrix type confusion — CVE-2024-4367 critical CVE exact CVE_2024_4367

PDF font dictionary contains non-numeric FontMatrix values.

CVE-2024-4367 is a high-severity vulnerability in Mozilla PDF.js (used in Firefox and Thunderbird) where a missing type check in FontFaceObject.getPathGenerator allows arbitrary JavaScript execution. The exploit replaces numeric values in the /FontMatrix array with JavaScript code that gets compiled during glyph rendering. Affects Firefox < 126 and Thunderbird < 115.11.

PowerPoint 95 sound-data length corruption — CVE-2009-1129 critical CVE likely CVE_2009_1129

PowerPoint 95 native file has inconsistent PP7 directory lengths, sound-data marker, and nearby native payload bytes.

CVE-2009-1129 is a PP7X32.DLL PowerPoint 95 importer vulnerability involving inconsistent sound-data record lengths. The detector is restricted to the legacy PowerPoint 95 storage/header layout and requires malformed early PP7 directory lengths, a sound-data record marker, and shellcode-like bytes near the malformed record area.

PowerPoint OLE INF package object — CVE-2014-6352 critical CVE likely CVE_2014_6352

Binary PowerPoint stream contains an embedded .inf object reference with package data.

CVE-2014-6352 is a Windows OLE remote code execution vulnerability exploited in the wild through crafted PowerPoint documents. This rule scans the binary PowerPoint Document stream and requires an .inf package/object reference in System-object context plus embedded package local-header evidence. It avoids broad .inf matching in ordinary OLE metadata.

PowerPoint malformed ClientTextbox — CVE-2009-0556 critical CVE likely CVE_2009_0556

PowerPoint Document contains a malformed EscherClientTextbox with TextHeaderAtom, repeated-byte TextBytesAtom payload, and OutlineTextRefAtom.

CVE-2009-0556 is a Microsoft PowerPoint code execution vulnerability involving malformed textbox/outline text record handling. This rule recovers the exploit shape directly from the PowerPoint Document stream: an EscherClientTextbox containing TextHeaderAtom, a long repeated-byte TextBytesAtom buffer, and OutlineTextRefAtom in the same textbox. Direct recovery is used because weaponized PPT streams often desynchronize full record walking while PowerPoint still reaches the malformed textbox records.

RTF Word ActiveX package — CVE-2015-1641 related critical CVE_2015_1641_ACTIVEX_RELATED

RTF objdata embeds Word.Document.12 packages with many repeated ActiveX controls and oversized activeX1.bin.

CVE-2015-1641 is a Word RTF memory-corruption vulnerability. This detector is intentionally structural: it requires decoded RTF objdata containing a Word.Document.12 package, dozens of repeated ActiveX control XML parts, and a single oversized highly-compressed activeX1.bin payload. It is reported as related evidence for the 2015 Word memory-corruption family rather than exact vulnerability proof from public CVE metadata alone.

RTF Word ActiveX package — CVE-2015-1770 related critical CVE_2015_1770_ACTIVEX_RELATED

RTF objdata embeds Word.Document.12 packages with many repeated ActiveX controls and oversized activeX1.bin.

CVE-2015-1770 is an Office uninitialized-memory-use vulnerability. This detector covers the submitted exploit family's RTF-embedded Word.Document.12/ActiveX package shape and is reported as related evidence because the public CVE text does not expose a uniquely validating byte-level primitive.

SOAP Moniker — CVE-2017-8759 critical CVE likely CVE_2017_8759

OLE data contains the SOAP Moniker CLSID.

The scanner matches the SOAP Moniker CLSID bytes {ECABB0C7-7F19-11D2-978E-0000F8757E2A}. This is likely CVE evidence because the CLSID is the vulnerable moniker primitive, but the static rule does not validate a crafted WSDL body.

Sandworm OLE Package — CVE-2014-4114 critical CVE likely CVE_2014_4114

OLE Package CLSID found alongside executable file references.

CVE-2014-4114 is a Windows OLE remote code execution vulnerability exploited by the Russian Sandworm threat group in targeted attacks against NATO and Ukraine in 2014. The attack uses OLE Package objects embedded in PowerPoint (PPSX) files to silently drop and execute .inf and .exe files when the document is opened. The rule requires the OLE Package CLSID plus executable file references (.inf, .exe, .dll, .bat, .cmd, .scr, .vbs, .ps1, or .hta), so it is a strong package-dropper indicator but not a full reconstruction of the original Sandworm exploit chain.

Storm-0978/RomCom HTML RCE — CVE-2023-36884 critical CVE likely CVE_2023_36884

OOXML .rels file contains an auto-load relationship Target pointing to a remote .rtf URL.

CVE-2023-36884 (Storm-0978/RomCom) was exploited through specially crafted Office documents that led to remote RTF/MSHTML processing and payload delivery. The scanner now requires a non-hyperlink relationship that Office can auto-load (template/subdocument/frame/OLE/altChunk-style relationship) whose Target is an http:// or https:// URL ending in .rtf. Plain clickable hyperlinks to RTF files are not enough for this CVE rule.

Type 1 callOtherSubr operand-stack manipulation — CVE-2021-21086 critical CVE exact CVE_2021_21086

Decrypted Type 1 CharString matches the public callOtherSubr stack-pointer manipulation shape.

CVE-2021-21086 is an out-of-bounds write in Adobe Acrobat/Reader's CoolType Type 1 CharString interpreter. The Project Zero/Faraday analysis describes abusing predefined callOtherSubr handling, especially subroutine 18, to move the operand-stack pointer outside the operand stack and write toward the saved return address. The scanner decrypts Type 1 eexec and CharString data and flags repeated not/get/callOtherSubr bytecode sequences matching the public exploit generator.

URL Moniker weaponized URL — CVE-2017-0199 critical CVE exact CVE_2017_0199_WEAPONIZED_URL

URL Moniker OLE link points to an HTA/script/template-style remote loader.

This is the tighter CVE-2017-0199 static shape: URL Moniker / OLE2Link evidence plus a remote URL ending in an executable Office-loader extension such as .hta, .sct, .wsf, .xsl, .mht, or a macro-capable template. Generic remote URL Moniker evidence remains CVE_2017_0199 likely because the returned server content cannot be proven statically.

URL Moniker — CVE-2017-0199 critical CVE likely CVE_2017_0199

URL Moniker OLE link points to a remote loader.

CVE-2017-0199 abuses OLE2Link / URL Moniker handling so Office fetches remote content and processes it based on the response type. The scanner matches embedded URL Moniker structures or URL Moniker CLSID bytes near weaponised remote-loader targets; it does not prove the server-side content type or payload returned at scan time.

UTF-16BE Base URL — CVE-2021-39863 critical CVE exact CVE_2021_39863

PDF catalog uses a UTF-16BE /URI /Base value and JavaScript resolves a relative URL.

CVE-2021-39863 is an Adobe Acrobat/Reader heap-based buffer overflow in document base-URL concatenation. Exodus Intelligence documented that earlier research had associated this primitive with CVE-2021-21017, but Adobe assigned CVE-2021-39863 for the still-vulnerable bug. The rule requires the malformed UTF-16BE Base URL primitive plus submitForm(), app.launchURL(), or app.media.createPlayer() JavaScript, avoiding broad matches on ordinary PDF JavaScript or normal /URI actions.

WordPad Word97 converter exploit — CVE-2008-4841 critical CVE likely CVE_2008_4841

Word 97-era document places shellcode immediately before a malformed converter-facing table-SPRM cluster.

CVE-2008-4841 affects the Windows WordPad Text Converter for Word 97 files, not the normal Microsoft Word parser. The rule detects the converter exploit carrier where native payload bytes sit immediately before a malformed Word table-SPRM cluster. Because that cluster can look like older Word table-SPRM corruption, the analyzer suppresses CVE-2006-6456 when this more specific converter shape is present.

\fonttbl heap overflow — CVE-2023-21716 critical CVE exact CVE_2023_21716

RTF font table with excessive entries — Word heap buffer overflow.

CVE-2023-21716 is a critical heap buffer overflow in Microsoft Word's RTF parser triggered by an RTF file with an abnormally large number of font entries in the \fonttbl group. The rule requires a \fonttbl and counts at least 32768 \fN font entries, matching the public exploit-scale trigger rather than merely large but valid font tables.

\pFragments RTF stack overflow — CVE-2010-3333 critical CVE exact CVE_2010_3333

RTF contains an oversized pFragments value.

CVE-2010-3333 is a stack-based buffer overflow in Microsoft Word 2002, 2003, and 2007 triggered by a crafted pFragments value in an RTF document. The scanner emits this CVE rule when the control-word numeric argument is at least 256 or when the canonical shape-property form {\sn pFragments}{\sv ...} carries an oversized value; bare pFragments without exploit-sized data is reported separately as related RTF evidence.

customUI external link — CVE-2021-42292 critical CVE_2021_42292

customUI ribbon part contains an external relationship target.

The scanner looks in customUI XML/.rels parts for TargetMode="External" relationships. This is related evidence because the customUI external-load surface is present, but public summaries provide limited technical detail and the rule does not prove the full exploit chain.

dataObjects ESObject stale-cache trigger — CVE-2020-9715 critical CVE exact CVE_2020_9715

PDF embeds a file and JavaScript triggers the dataObjects ESObject use-after-free pattern.

CVE-2020-9715 is an Adobe Acrobat/Reader ESObject use-after-free. The PixiePoint/ZDI trigger creates a Data ESObject by accessing this.dataObjects[0].toString(), clears this.dataObjects[0], then uses app.setTimeOut() to run garbage collection before re-accessing the stale cached Data ESObject. The rule requires the embedded-file surface plus the dataObjects toString/null/setTimeOut lifecycle so ordinary attachment-bearing PDFs are not flagged as the CVE.

media.newPlayer — CVE-2009-4324 critical CVE exact CVE_2009_4324

PDF JavaScript calls the media.newPlayer API.

CVE-2009-4324 (CVSS 9.3) is a use-after-free vulnerability in Adobe Reader's multimedia plugin triggered by the media.newPlayer() JavaScript API. It was actively exploited as a zero-day in December 2009 and became one of the most widely exploited PDF vulnerabilities. The newPlayer API is extremely rarely used in legitimate PDF documents — its presence is a useful indicator of an exploit attempt. CISA KEV.

util.printf — CVE-2008-2992 critical CVE exact CVE_2008_2992

PDF JavaScript invokes util.printf() with an oversized format/string argument.

CVE-2008-2992 is a widely-exploited stack buffer overflow in Adobe Reader's util.printf JavaScript implementation. The rule matches util.printf() only when the argument contains a very long format specifier (for example %0000x with 4+ digits) or a quoted string of at least 256 bytes.

ADODB.RecordSet — CVE-2015-0097 high CVE likely CVE_2015_0097

OLE data contains the ADODB.RecordSet CLSID.

The scanner matches the ADODB.RecordSet CLSID bytes. This is likely CVE evidence for CVE-2015-0097-era sandbox-escape documents, but the static rule does not prove the surrounding exploit logic.

Anomalous Equation Editor native stream — CVE-2018-0798 likely high CVE likely CVE_2018_0798_EQUATION_NATIVE_ANOMALY

Embedded Equation Editor OLE data contains malformed, payload-like native stream bytes.

CVE-2018-0798 is an Equation Editor memory-corruption vulnerability in the MTEF Matrix-record parser. Some weaponized Office samples carry malformed Equation native data that is high-entropy or otherwise payload-like but does not preserve the exact public 0x60/0x61 matrix signature. This rule requires an embedded Equation Editor CLSID and an anomalous native/Ole10Native stream, so it is treated as likely CVE-2018-0798-family evidence rather than an exact match.

CoolType/SING font exploit indicator high PDF_COOLTYPE_SING

PDF font data contains SING/CoolType markers inside font content.

Adobe Reader CoolType font parsing has been exploited by document CVEs such as CVE-2010-1297. The rule matches SING markers in font-related PDF data; it is related evidence for the CoolType attack surface, not proof of a specific malformed font CVE.

Equation Editor activation — CVE-2017-11882 related high CVE_2017_11882_ACTIVATION_RELATED

RTF decodes to Equation.3 object activation without a recovered malformed native stream.

The document embeds an Equation.3 object and requests automatic OLE activation with RTF object controls such as \objemb and \objupdate. This reaches the legacy Equation Editor attack surface associated with CVE-2017-11882 and CVE-2018-0802, but does not recover the malformed MTEF/native stream needed for exact or likely attribution.

Exchange P2 FROM header spoofing — CVE-2024-49040 high CVE likely CVE_2024_49040

Raw email From header contains multiple parsed/angle-bracket addresses.

CVE-2024-49040 exploits improper P2 FROM header parsing in Microsoft Exchange Server. By including multiple angle-bracket addresses in the From header (non-RFC-compliant syntax), attackers can make the displayed sender address differ from the actual routing address. The rule inspects the raw From line before parser normalization and fires when it contains multiple address markers or parses as multiple mailboxes; it cannot prove the recipient Exchange server was vulnerable.

GoToE/GoToR UNC action — CVE-2018-4993 high CVE exact CVE_2018_4993_GOTOE_UNC

PDF automatic/open action uses GoToE or GoToR with a UNC /F target.

This is the tighter Adobe Reader NTLM credential-leak shape: an /AA or /OpenAction-triggered GoToE/GoToR action whose /F file target is a UNC path. That matches the public CVE-2018-4993 proof-of-concept pattern more closely than the broader UNC-in-action rule.

JBIG2 + active content high PDF_JBIG2_ACTIVE_CONTENT

PDF uses JBIG2Decode/JBIG2 data alongside active content.

JBIG2 plus active content is a high-value parser-exploit indicator for families including CVE-2021-30860 and CVE-2009-0658. The rule matches /JBIG2Decode or JBIG2 signatures plus active content such as JavaScript, XFA, or RichMedia; it does not uniquely identify either CVE.

MSCOMCTL.ListView — CVE-2012-0158 high CVE likely CVE_2012_0158

OLE data contains the MSCOMCTL.ListView CLSID.

The scanner matches the ListView ActiveX CLSID bytes. This identifies the vulnerable control used by CVE-2012-0158 campaigns, but does not parse the crafted control property data needed to prove the overflow.

MSCOMCTL.Toolbar — CVE-2012-0158 / CVE-2012-1856 high CVE likely CVE_2012_1856

OLE data contains the MSCOMCTL.Toolbar CLSID.

The scanner matches the Toolbar ActiveX CLSID bytes. This identifies the vulnerable control used by CVE-2012-0158/1856 campaigns, but does not parse the crafted control property data needed to prove the overflow.

MSScriptControl — CVE-2015-0097 high CVE likely CVE_2015_0097_SC

OLE data contains the MSScriptControl.ScriptControl CLSID.

The scanner matches the MSScriptControl.ScriptControl CLSID bytes. Treat this as related local-zone or scripting-surface evidence rather than a specific proof of CVE-2015-0097, because public Microsoft guidance ties ADODB.RecordSet more directly to that CVE's workaround.

Malformed JPEG2000/JP2 box structure high PDF_JP2_BOX_ANOMALY

Embedded JP2/JPEG2000 data has invalid, oversized, or truncated box sizes.

Malformed JP2 boxes provide stronger evidence than a bare /JPXDecode filter for JPEG2000 parser attack surface, but still do not prove a specific CVE such as CVE-2018-4990. The rule matches JP2 box lengths statically and flags impossible or truncated structures.

OLE2Link remote document — CVE-2017-8759 related high CVE_2017_8759_RELATED

OOXML OLE2Link object fetches a remote Office-looking document.

CVE-2017-8759 campaigns can use an OOXML OLE2Link object to activate a remote document/WSDL stage that contains the SOAP moniker payload. This rule matches the local staging document, not the fetched WSDL body; therefore it is related evidence unless the SOAP/WSDL target is present in the local file.

OOXML OLE2Link remote loader — CVE-2017-0199 related high CVE_2017_0199_RELATED

OOXML linked OLE object auto-loads a remote URL without enough local evidence for an exact CVE.

The document contains an o:OLEObject Type=Link with an external oleObject relationship to a remote URL. This is the OOXML OLE2Link activation shape associated with CVE-2017-0199 delivery, but the local bytes do not expose URL Moniker data or a weaponized remote content type. Treat this as related CVE attack-surface evidence rather than exact attribution.

Suspicious Equation Editor Matrix record — CVE-2018-0798 likely high CVE likely CVE_2018_0798_MTEF_ANOMALY

Equation Editor MTEF Matrix record has an anomalous exploit-like shape.

The scanner found an abnormal MTEF Matrix record inside Equation Editor OLE data, but not the tighter public CVE-2018-0798 byte pattern. Treat this as likely Equation Editor exploit evidence rather than an exact CVE-2018-0798 signature.

Suspicious JBIG2 segment structure high PDF_JBIG2_SEGMENT_ANOMALY

Embedded JBIG2 data contains anomalous segment headers or sizes.

Malformed JBIG2 segment structure is a parser-exploit indicator. Use this as related evidence for JBIG2 decoder CVE families. The rule looks for JBIG2 signatures plus suspicious segment-header shapes, not a validated FORCEDENTRY-style logical circuit.

UNC path in PDF — CVE-2018-4993/CVE-2019-7089 high CVE likely CVE_2018_4993

PDF action target contains a UNC path and the file has action triggers.

CVE-2018-4993 (Adobe Acrobat/Reader) and CVE-2019-7089 allow an attacker to steal NTLM authentication credentials by embedding a UNC path (\\server\share) in a PDF action (JavaScript, GoToR, URI, etc.). When the victim opens the PDF, a vulnerable viewer may resolve the UNC path and Windows can send the user's Net-NTLMv2 hash to the attacker's server. The hash can be cracked offline or used in relay attacks to authenticate as the victim. The rule requires the UNC path to be inside a PDF action target parameter (/F, /URI, /D, or /Target) plus an action keyword such as /JavaScript, /GoToR, /URI, /Launch, /OpenAction, or /AA.

Word OLE security bypass — CVE-2026-21514 high CVE likely CVE_2026_21514

Document contains CVE-2026-21514-style Word/OLE bypass indicators.

CVE-2026-21514 is a Microsoft Word security feature bypass (CVSS 7.8) — an OLE trust bypass. A crafted document disables Word's OLE object/link protection enforcement so an embedded OLE object activates without a Protected View / Enable Content prompt. For OOXML, the rule matches word/settings.xml disabling OLE protection enforcement (oleLinkProtection / objectEmbedProtection with enforcement off) while the document embeds an OLE object — the actual exploit primitive, not the generic Ole10Native+payload dropper shape (which is convicted by the OFFICE_PACKAGE_* / ClamAV rules instead). It also matches the observed RTF-embedded Word package shape where webSettings.xml.rels contains a frame relationship to a local Windows diagnostics XML target, and the altChunk/RTF shape where a hidden \svb hex package contains DrsE2oDoc, graphicFrameDoc, and downRevStg drawing compatibility parts.

\listoverridecount corruption — CVE-2014-1761 high CVE exact CVE_2014_1761

RTF \listoverridecount with abnormally large value.

CVE-2014-1761 is a memory corruption vulnerability in Microsoft Word triggered by an excessively large \listoverridecount value in an RTF document. The vulnerability allows arbitrary code execution and was used in targeted attacks. The rule fires only when \listoverridecount has a numeric value at or above 2048, not on ordinary list metadata.

getAnnots — CVE-2009-1492 high CVE exact CVE_2009_1492

PDF JavaScript calls getAnnots() with an exploit-shaped argument.

CVE-2009-1492 affects Adobe Reader's annotation handling. Rule fires only when the call has an integer-overflow numeric (many-F hex or >int32 decimal) or a long stuffed-string argument; plain calls like getAnnots({nPage:0}) used as harmless staging boilerplate in exploit kits are not flagged.

spell.customDictionaryOpen — CVE-2009-1493 high CVE exact CVE_2009_1493

PDF JavaScript invokes spell.customDictionaryOpen() with a long string argument.

CVE-2009-1493 is a stack buffer overflow in Adobe Reader's spell-check API. The rule requires spell.customDictionaryOpen() with a quoted string at least 128 bytes long; ordinary short dictionary names are not flagged.

PRC/3D content in PDF medium PDF_PRC_3D

PDF contains PRC 3D content markers.

3D parsers in PDF viewers are a recurring exploit surface. PRC content is rare in normal business documents and should be reviewed as related parser-exploit evidence. The rule matches /Subtype /PRC or /PRCStream markers; it does not validate malformed PRC records.

PDF font marker lacks validated CVE exploit shape info PDF_FONT_CVE_NOT_VALIDATED

PDF font data has SING/CoolType markers without the stricter validated CVE exploit shape.

This preserves negative evidence for analyst review: generic font-parser surface was observed, but the scanner did not validate an actual SING font table paired with heap-spray shellcode, so no exact font CVE should be inferred from the marker alone.

Email 51

ClamAV detected malware at linked site critical EMAIL_URL_CLAMAV

A page linked in the email matched a ClamAV malware signature.

The analyzer downloaded the linked page with a browser-like user-agent and scanned it with ClamAV. The page matched a known malware or phishing-kit signature, so treat the link as high-risk.

Dangerous attachment type critical EMAIL_DANGEROUS_ATTACH

An attachment has a file extension that can execute code.

The email contains an attachment with a file type that can execute code (for example .exe, .js, .vbs, .scr, .bat, or .hta). Executable attachments are high-risk in email, including when the sender appears familiar.

Double file extension critical EMAIL_DOUBLE_EXT

An attachment uses a double extension to disguise its type.

The attachment has two file extensions, such as 'invoice.pdf.exe'. If the mail client or operating system hides the final extension, the executable can appear to be a document.

HTML smuggling in email critical EMAIL_HTML_SMUGGLING

Email HTML contains JavaScript patterns for dynamic payload construction.

HTML smuggling is a technique where email HTML contains JavaScript that dynamically constructs and triggers the download of executable payloads using Blob/createObjectURL or base64 decoding. This can bypass email gateway scanners because the payload is assembled in the victim's browser instead of being attached as a normal file.

Hyperlink uses javascript:/vbscript: scheme critical EMAIL_HREF_SCRIPT

An <a> tag's href executes script when clicked.

javascript: and vbscript: links execute script in the rendering context. Many webmail providers strip them, but some clients and previewers may not.

Link text doesn't match destination critical EMAIL_URL_MISMATCH

A hyperlink displays one URL but actually leads to a different domain.

This is a high-risk phishing technique. The email shows a link that looks like it goes to a trusted site (e.g. 'https://yourbank.com') but when you click it, you're taken to a completely different website controlled by the attacker. A visible URL that points to a different domain is high-risk, though forwarded or rewritten mail can occasionally produce odd link text.

Linked page contains a login form critical EMAIL_URL_LOGIN_PAGE

The email links to a page with a password input field.

The analyzer fetched the linked page and found a password input field. That is a high-risk phishing shape when reached from unsolicited mail, especially if the page uses brand names or external form handlers.

Assessed phishing intent high EMAIL_INTENT

The email matches one or more phishing-intent patterns.

The analyzer examines the full text of the email — subject line, body, HTML structure, attachments, and linked pages — and matches it against known attack patterns (credential harvesting, financial fraud, malware delivery, data exfiltration, account takeover, business email compromise, and extortion). The assessed intent tells you *what the attacker is trying to achieve*, helping you understand the specific risk and take appropriate action.

Body mentions a brand but no link goes to that brand high EMAIL_BRAND_LINK_MISMATCH

Body references a known brand but every outbound link points elsewhere.

The email mentions a familiar brand, but every clickable link points to a different registrable domain. This mismatch is useful impersonation evidence.

Brand name in subdomain of unrelated domain high EMAIL_BRAND_IN_SUBDOMAIN

Sender domain has a known brand label as a subdomain but the registrable domain isn't legitimate.

The sender domain contains a brand string as a subdomain, such as 'microsoft.com.example.tld'. This can make the address look familiar even though the registrable domain is different.

Business email compromise (BEC) language high EMAIL_BEC_PATTERN

Body matches multiple BEC / wire-fraud / gift-card / executive-impersonation patterns.

The rule matches multiple business email compromise patterns, such as 'Are you available?', gift-card requests, wire-transfer instructions, or last-minute vendor bank-detail changes. It fires only when at least two distinct patterns are present.

DKIM signing domain differs from sender high EMAIL_DKIM_D_MISMATCH

DKIM-Signature d= tag does not match the From: domain.

DKIM lets a third party sign for any domain, so DKIM=pass is not the same as 'the message comes from who it claims'. When the d= tag points at a different domain than From:, DKIM alignment fails; DMARC may still pass if SPF aligns with the From domain. This mismatch is useful evidence when reviewing sender authenticity.

Email authentication failure high EMAIL_AUTH_FAIL

SPF, DKIM, or DMARC authentication failed.

Email authentication protocols verify that the sender is who they claim to be. SPF checks if the sending server is authorised by the domain owner. DKIM verifies the email hasn't been tampered with using a cryptographic signature. DMARC ties SPF and DKIM together with a policy. When any of these fail, it means the email may be spoofed — sent by someone pretending to be from a domain they don't control.

Freemail impersonating an organisation high EMAIL_FREEMAIL_ORG

Sender claims to be an organisation but uses a free email provider.

The email claims to be from an official organisation (bank, government, tech company) but is sent from a free email service like Gmail, Yahoo, or Outlook.com. Many legitimate organisations use their own corporate domain (e.g. @yourbank.com, not @gmail.com), so a freemail sender is useful impersonation evidence.

HTML file attached high EMAIL_HTML_ATTACHMENT

Attachment is an .htm / .html / .shtml / .xhtml file (or text/html MIME).

HTML attachments open as local browser pages, which can avoid link rewriting and URL reputation checks. They are common in credential phishing, but some business workflows also send HTML reports or exports.

HTML form in email body high EMAIL_HTML_FORM

The email contains an HTML form element.

An HTML form embedded in an email can capture typed information such as passwords or card numbers. Many mail clients block or alter forms, but their presence is still high-risk in unsolicited mail.

Hyperlink uses data: URI high EMAIL_HREF_DATA_URI

An <a> tag's href is a data: URI.

data: URIs are decoded inline by the browser, leaving no remote URL for reputation checks. They can carry phishing pages or small downloads directly inside the link.

IP address in URL high EMAIL_URL_IP

A URL in the email uses an IP address instead of a domain name.

The rule matches links that use an IP address instead of a domain name. Temporary phishing infrastructure often does this, though internal systems and appliances can also use IP-based URLs.

Image-only attachment with phishing-shape language (quishing) high EMAIL_QR_LURE_ATTACHMENT

Email's only attachment(s) are images and the body uses MFA / verification / scan language.

QR-code phishing can hide the target URL inside an image so mail gateways do not see a clickable link. This rule looks for image-only attachments paired with MFA, verification, or scan-language lures.

Internationalised domain name (IDN) in URL high EMAIL_URL_HOMOGRAPH

URL contains a Punycode domain that may be a visual lookalike.

The rule matches Punycode domains, which can represent internationalised domain names. Some phishing sites use lookalike characters such as Cyrillic or Greek letters; legitimate internationalised domains can also use Punycode.

Linked page contains obfuscated JavaScript high EMAIL_URL_OBFUSCATED_JS

The linked page uses JavaScript obfuscation techniques.

The page uses eval(), unescape(), fromCharCode, or similar obfuscation. Heavy JavaScript obfuscation is common in phishing kits and exploit pages; some legitimate sites also minify code, so the exact technique matters.

Linked page impersonates known brand(s) high EMAIL_URL_BRAND_SPOOF

The linked page references well-known brand names.

The downloaded page contains brand names of major services (banks, cloud providers, delivery companies, etc.). Phishing kits replicate the logos, colours, and layout of trusted brands to make the fake page convincing. The presence of brand names on a non-official domain is a useful indicator of impersonation.

Linked page submits data to external server high EMAIL_URL_DATA_EXFIL

The page has both input forms and JavaScript that sends data externally.

The linked page has form fields and JavaScript paths that submit data to another server. In phishing pages this is used to collect credentials or session data; legitimate sites can also use external form processors.

MFA/OTP language with a clickable link high EMAIL_MFA_PHISHING

Email mentions MFA, 2FA, OTP, or verification codes alongside an action link.

The rule matches MFA, OTP, authenticator, or sign-in alert language together with an action link. That combination is common in adversary-in-the-middle phishing, but account-security mail can use similar terms.

Multiple From: headers high EMAIL_MULTIPLE_FROM_HEADERS

Email contains more than one From: header.

RFC 5322 requires exactly one From: header. Multiple From: headers are a header-injection technique: different MTAs and clients can disagree on which one to display vs which one to authenticate, allowing the visible sender to be different from the one the spam filter and DMARC check.

Password-protected archive suspected high EMAIL_PASSWORD_ARCHIVE

Email mentions a password and includes an archive attachment.

The email includes an archive and gives a password in the body. Encrypted archives can prevent mail gateways from inspecting the contents before the recipient extracts them.

Reply-To domain differs from sender high EMAIL_REPLYTO_DIFF

The Reply-To address points to a different domain than the sender.

When you reply to this email, your response will go to a different domain than the apparent sender. Phishing campaigns use this so replies containing sensitive information go somewhere other than the apparent sender.

Reply-To redirects to a freemail account high EMAIL_REPLY_TO_FREEMAIL

Sender uses a corporate domain but Reply-To is on a freemail provider.

Classic business email compromise pattern: spoof a colleague or vendor's address, then quietly redirect any reply to an attacker-controlled freemail inbox (Gmail, Outlook.com). The victim believes they are responding to the real person.

Right-to-left override character (Unicode evasion) high EMAIL_RTL_OVERRIDE

Headers, body, or filename contain U+202E or related bidi-override characters.

RTL-override characters reverse the displayed reading order of subsequent text. They can make filenames like 'invoice‮fdp.exe' display as a different extension in some clients and can also interfere with keyword matching.

Sender display name spoofing high EMAIL_FROM_MISMATCH

The display name contains an email address that differs from the actual sender.

Phishers set the display name to a trusted email address (e.g. 'support@yourbank.com') while the real From address is completely different. Most email clients show the display name prominently and hide the actual address, so recipients believe the email is from a trusted source. Always check the actual email address, not just the name shown.

Sender domain looks like a known brand high EMAIL_LOOKALIKE_DOMAIN

Sender registrable domain is one or two character edits from a brand domain.

The analyzer compares the sender's domain to a small list of well-known brand domains using bounded edit distance. Close lookalikes such as 'paypa1.com' or 'micros0ft.com' are useful impersonation evidence.

URL contains userinfo (user@host display spoof) high EMAIL_URL_CREDENTIALS

A URL in the body uses the user:pass@host syntax.

Browsers treat everything before the first '@' in a URL authority as userinfo and route to the host after it. A URL such as 'https://login.bank.com@example.tld/' can therefore display a familiar name while pointing elsewhere.

no-reply sender with Reply-To override high EMAIL_NOREPLY_WITH_REPLYTO

From: is a no-reply / do-not-reply address but the message has Reply-To.

Legitimate transactional 'no-reply' mailers do not set Reply-To — the whole point is that replies go nowhere. Phishers spoof a no-reply From and set Reply-To to capture replies (or push the conversation to a different inbox) while the victim still trusts the visible sender.

Body is essentially a single URL medium EMAIL_SHORT_BODY_URL_ONLY

Email body, with URLs removed, is fewer than 80 characters.

The body contains little text once URLs are removed. Minimal 'click here' messages are common in throwaway phishing, though automated notifications can also be brief.

Calendar invite attachment (.ics / text/calendar) medium EMAIL_CALENDAR_INVITE

Email contains a calendar-invite attachment.

Mail clients (especially Outlook) auto-render calendar invites in the preview pane. Links and attachments embedded in the description / location fields become clickable before the recipient ever opens the file. The vector is increasingly used for OAuth-consent phishing and BEC.

Email parse error medium EMAIL_PARSE_ERROR

Failed to parse the email file.

The email could not be parsed properly. It may be corrupt, truncated, or deliberately malformed to evade analysis. Malformed emails can sometimes exploit vulnerabilities in email clients.

HTML-entity-encoded hyperlink medium EMAIL_HREF_OBFUSCATED

An <a> tag's href contains long runs of HTML character entities.

Encoding the URL as decimal or hex character entities ('ht…') preserves the link's meaning in the browser but defeats simple gateway scanners that don't decode entities. Used together with shorteners or punycode to compound the obfuscation.

Hidden text in HTML body medium EMAIL_HIDDEN_TEXT

CSS techniques used to hide text from the reader.

The email contains invisible text (font-size: 0, display: none, white text on white background, etc.). Hidden text can be used to confuse spam filters by including 'legitimate' invisible words, or to hide malicious content from casual inspection while it remains functional in the HTML code.

Link to phishing-heavy TLD medium EMAIL_URL_SUSPICIOUS_TLD

Email body links to a domain on a phishing-heavy TLD.

This applies the phishing-heavy TLD list to links in the body. It is most useful when combined with brand language, login prompts, or other phishing signals.

Phishing language detected medium EMAIL_PHISHING_KEYWORDS

The email body contains typical phishing language patterns.

The rule matches common phishing phrases such as account verification, suspension threats, sensitive-information requests, generic greetings, and prize language. These phrases are useful context, not a verdict by themselves.

Re:/Fwd: subject without thread headers medium EMAIL_FAKE_REPLY

Subject begins with Re:/Fwd: but no In-Reply-To or References header is present.

Real replies and forwards often carry In-Reply-To or References headers. A Re:/Fwd: subject without thread headers can indicate a cold message made to look like an existing conversation.

Return-Path domain mismatch medium EMAIL_RETURN_PATH_MISMATCH

The Return-Path domain differs from the sender domain.

The Return-Path (envelope sender) is the address that receives bounce messages. When it differs from the From address, it may indicate the email was sent through a third-party service or is spoofed. Legitimate organisations typically have matching domains.

Sender domain on phishing-heavy TLD medium EMAIL_SUSPICIOUS_TLD

Sender uses a top-level domain over-represented in phishing.

Some TLDs ('.zip', '.mov', '.top', '.click', '.country', '.work', etc.) are over-represented in phishing and abuse datasets. This is a sender reputation signal, not proof by itself.

URL shortener link detected medium EMAIL_URL_SHORTENER

Email contains a shortened URL that hides the true destination.

URL shorteners (bit.ly, tinyurl.com, etc.) are legitimate services, but phishers abuse them to hide destinations. This finding means the destination is not visible from the email text alone; expand or inspect the URL before trusting it.

Urgency / pressure in subject line medium EMAIL_URGENCY_SUBJECT

The subject line contains urgency or pressure language.

The rule matches urgency phrases such as 'Act now', 'Account suspended', or 'Verify immediately'. Urgency appears in both phishing and real account notices, so this is supporting context rather than a standalone verdict.

Zero-width / invisible Unicode characters medium EMAIL_ZERO_WIDTH_CHARS

Subject, display name, or body contains zero-width Unicode characters.

Zero-width spaces and joiners (U+200B, U+200C, U+200D, U+FEFF) are invisible to readers but can break simple keyword matching. They can also be used as per-recipient markers in text.

Missing Message-ID header low EMAIL_NO_MESSAGEID

The email has no Message-ID header.

Most normal mail systems add a unique Message-ID. A missing Message-ID can mean the message was generated by a script, tool, or unusual mail path.

Tracking pixel detected low EMAIL_TRACKING_PIXEL

Tiny external image likely used for email open tracking.

A tracking pixel is a tiny (usually 1x1) invisible image loaded from an external server. When your email client loads this image, it tells the sender that you opened the email, your IP address, and sometimes your location. Phishers use tracking pixels to confirm which email addresses are active and being read, making you a higher-value target.

Unusually many attachments low EMAIL_MANY_ATTACHMENTS

The email has an unusually large number of attachments.

While not inherently malicious, a large number of attachments is unusual and may indicate an attempt to overwhelm or confuse the recipient.

Limited .msg parsing info EMAIL_MSG_LIMITED

The .msg format requires additional libraries for full analysis.

Outlook .msg files use a proprietary format. Full parsing requires the 'extract-msg' Python library. Without it, only basic text-level scanning is performed. For complete email analysis, save the email as .eml format.

Linked page scan summary info EMAIL_URL_SCAN_SUMMARY

Summary of pages downloaded and scanned from links in the email.

The analyzer used curl with a realistic Chrome browser user-agent to retrieve the web pages linked in the email. Each page was scanned with ClamAV for known malware/phishing signatures, and analysed for phishing indicators such as login forms with password fields, brand impersonation, obfuscated JavaScript, and data exfiltration code. This summary shows what was scanned and whether anything suspicious was found.

Archive 7

Archive bundles a malicious executable critical ARCHIVE_MALICIOUS_EXECUTABLE

A native executable bundled in the archive was identified as malware by ClamAV.

The archive carries an executable payload that ClamAV flagged as malicious. Bundling an executable dropper with documents is a classic phishing / malware-delivery pattern, so the archive is convicted even when its document members are clean.

Corrupt or invalid ZIP archive medium ARCHIVE_CORRUPT

The file has a ZIP signature but could not be opened.

The archive appears to be a ZIP file based on its header bytes, but the internal structure is invalid. It may be corrupt, truncated, or deliberately malformed to exploit vulnerabilities in ZIP parsers.

Encrypted archive — could not decrypt medium ARCHIVE_ENCRYPTED

The archive is password-protected with an unknown password.

The ZIP archive is encrypted and none of the common analysis passwords worked. The files inside could not be extracted or scanned. If you know the password, extract the files manually and upload them individually.

Total decompression limit reached medium ARCHIVE_SIZE_LIMIT

The total decompressed size exceeded the safety limit.

To protect against zip bombs (archives that decompress to enormous sizes), the analyzer caps total decompressed output. Some files in the archive may not have been scanned.

Archive contains a bundled executable low ARCHIVE_CONTAINS_EXECUTABLE

A native executable (PE/ELF/Mach-O or .exe/.dll/.scr/…) is bundled inside the archive alongside documents.

The document-analysis pipeline does not deep-inspect executable payloads, but a bundled executable in a document-delivery archive is a common dropper / phishing pattern, so it is surfaced for visibility. ClamAV is run on the executable; this finding means ClamAV did not flag it (legitimate software archives also carry executables), so it is informational and does not by itself convict the archive.

Archive entry limit reached info ARCHIVE_LIMIT

Only a limited number of files were scanned from the archive.

To prevent resource exhaustion, the analyzer limits the number of files it extracts and scans from a single archive.

Oversized archive entry skipped info ARCHIVE_LARGE_ENTRY

An entry in the archive exceeds the per-file size limit.

Individual files inside the archive are capped at 50 MB. Files exceeding this limit are skipped to prevent memory exhaustion.

PSD 1

Embedded PE in PSD critical PSD_EMBEDDED_EXE

PE executable found inside Photoshop file.

A Windows executable inside a PSD file indicates the image is being used as a container for malware.

ClamAV 2

ClamAV malware detection critical CLAMAV_DETECTION

ClamAV antivirus identified this file as malware.

ClamAV is an open-source antivirus engine with a database of known malware signatures. A positive detection means this file matches a known malicious pattern catalogued in the ClamAV virus database.

ClamAV scan did not complete info CLAMAV_SCAN_INCOMPLETE

ClamAV invocation failed (timeout, daemon unreachable, or DB missing).

The ClamAV signature pass on this run did not finish — typically because clamd was reloading its signature database (after freshclam) or briefly unreachable. The static heuristics still ran, but the AV signature signal is absent for this scan, so the score may be lower than it would be with a working clamd. Results carrying this marker are not cached, so a later lookup re-runs the analysis.

General 18

PDF/ZIP bundle contains child CVE exploit critical POLYGLOT_PDF_APPENDED_ZIP_CVE_BUNDLE

A ZIP appended after PDF EOF contains a member that matches a CVE rule.

The analyzer scanned the visible PDF and then scanned ZIP bytes appended after %%EOF. A member of that appended archive independently matched a CVE-specific exploit rule, so this finding ties the child CVE back to the parent PDF as a bundled multi-exploit delivery package.

GIF-header file contains embedded PDF body high POLYGLOT_GIF_PDF_FORCEDENTRY_SHAPE

File starts with GIF magic but contains a PDF body at a later offset.

Project Zero's FORCEDENTRY analysis documented a fake-GIF routing trick where image processing accepts a GIF-named/headered file while ImageIO/CoreGraphics identifies and parses embedded PDF/JBIG2 content. This rule flags that wrapper shape so the parent object is not treated as a benign GIF.

Multiple structural anomalies in a single file high SPEC_DIVERGENCE_HIGH

Three or more independent structural-anomaly heuristics fired on this file.

Format parsers tolerate a wide range of small spec violations in benign files. When multiple independent structural-anomaly rules fire together on the same file, it usually means the file was crafted to confuse a specific parser into mis-handling its content — the canonical shape of exploitation against a memory-safety bug. The aggregate signal is more reliable than any single anomaly.

Non-PDF with .pdf extension high PDF_EXTENSION_MISMATCH

File was submitted as .pdf but does not contain a PDF header.

A file using a PDF extension without PDF structure is a masquerade or evasion pattern. It should not be considered a clean PDF simply because of the filename.

PDF with appended ZIP archive high POLYGLOT_PDF_ZIP_APPENDED

ZIP local-file header appears after the last %%EOF in a PDF.

Polyglot files carry valid bytes for two formats simultaneously. A PDF followed by a ZIP local-file header at the tail will open in a PDF reader as the document and in an archive parser as the ZIP. This is a known exploitation primitive used to smuggle payloads past one-format scanners.

Suspicious extracted artifact high EXTRACTED_FILE_STATIC_TRIAGE

A file carved from inside the sample matched static suspicious-content checks.

The analyzer performs a lightweight triage pass on carved artifacts in addition to ClamAV. It looks for signals such as script obfuscation, long encoded blobs, PowerShell encoded commands, VBA auto-exec with execution terms, and high-entropy packed content. These are not signature matches by themselves, but they are useful evidence when an embedded payload is trying to hide behavior from simple static inspection.

Tag-set historically associated with malicious files high CORPUS_HISTORICALLY_MALICIOUS

This combination of heuristics has fired multiple times before, mostly on malicious files.

The analyzer keeps a record of every prior scan's heuristic combination. When the same combination has been seen at least three times in the last 90 days and at least 80% of those were classified malicious, the corpus prior is a strong indicator that the current file should be treated the same way.

Text document carries an embedded PDF body high POLYGLOT_TEXT_PDF

%PDF- magic and %%EOF trailer found inside a text document.

A document classified as HTML, RTF, or script that also contains a fully formed %PDF body (header and trailer) is a polyglot: browsers render the wrapper, PDF readers open the embedded document. Used to deliver PDF exploits while evading file-type-based filtering.

ZIP/OOXML container with non-ZIP prefix bytes high POLYGLOT_ZIP_PREFIXED

Non-empty bytes precede the first ZIP local-file header.

Many ZIP-based parsers (including OOXML readers) scan forward to find the central directory and successfully open archives that have arbitrary prefix bytes. Format-aware parsers see only the prefix. Mismatched parser behaviour on the same bytes is a polyglot delivery pattern.

Rare heuristic combination, previously flagged medium CORPUS_RARE_COMBINATION

An unusual combination of heuristics that has been flagged on at least one prior file.

Files that fire heuristic combinations rarely seen in the corpus are worth a closer look — especially when at least one prior occurrence was suspicious or malicious. This is a weaker signal than the historically-malicious combination, but it surfaces low-volume patterns that signature-style rules would miss.

Rare structural feature combination, previously flagged medium CORPUS_RARE_STRUCTURAL_FEATURE_SET

Normalized parser/payload feature set is rare and has appeared on a flagged file before.

This corpus signal groups exact rule IDs into broader structural features such as font-parser anomalies, object-graph divergence, embedded payloads, and shellcode evidence. It can surface recurring exploit shapes even when the precise signature names differ.

Analysis timed out (partial result) info ANALYSIS_TIMEOUT_PARTIAL

Analysis exceeded the wall-clock timeout; phases that had completed before the timeout are preserved.

Some scanners (regex-heavy parsers, large structured documents) can hit the per-file wall-clock budget. Unlike SCAN_INCOMPLETE this finding does not flag the result as needing retry — re-scanning the same bytes will hit the same timeout, so the partial result is cached as-is. Operators investigating individual cases can force a fresh scan via the rescan API.

Embedded URL info EMBEDDED_URL

One or more URLs were extracted from the document bytes but were not attributed to any other specific heuristic.

URL-themed rules attribute URLs they actually evaluated (e.g. PDF link annotations, macro download calls). URLs that appear in the file but were not tied to a specific finding are surfaced here so an analyst can still see and triage them. The per-finding detail names the most likely channel (e.g. 'Macro calls URL', 'PDF link annotation references URL') inferred from the other rules that fired on the same file.

Macro capabilities present but unconfirmed info MACRO_CAPABILITY_UNCORROBORATED

An Office document's VBA exposes execution capabilities (Shell / WScript / CreateObject / auto-exec) but nothing corroborates malicious intent, so the verdict was capped at suspicious rather than malicious.

Capability-presence rules fire on a single keyword and carry full weight, so one keyword can otherwise reach the malicious threshold and false-positive legitimate macro-heavy business documents. By policy a capability-only false positive is more costly than a missed detection, so this tiering is ON by default: 'suspicious' is the FLOOR for a capability-only macro. Any genuine malice signal UPGRADES it back to malicious — obfuscation, a memory-exec primitive, a download+exec chain, an encoded payload, a LOLBin, DDE, an AV hit, a suspicious URL, or VBA-stomping p-code that pairs auto-exec with execution. This note means only capability rules fired and none of those upgrades was present. Low-confidence structural or social-engineering rules (an 'enable macros' instruction, external hyperlinks, a hidden sheet) do not upgrade on their own.

Macro validly signed by an identified publisher info MACRO_VALID_PUBLISHER_SIGNATURE

A capability-only Office macro carries a cryptographically valid, CA-issued (non-self-signed) VBA publisher signature and nothing corroborates malice, so the 'suspicious' floor was lifted to clean.

The macro-capability floor caps an uncorroborated macro at 'suspicious'. A valid, CA-issued publisher signature is a stronger benign signal: it ties the VBA project to an accountable, identity-verified author and proves the signed bytes are intact. When that signature verifies AND the same no-corroborator gate the floor uses passes (no obfuscation / download+exec / loader / AV-ML hit / suspicious URL), the verdict is lifted from suspicious to clean — clearing legitimately-signed add-ins (e.g. the Microsoft Analysis ToolPak). A valid signature is NOT trusted on its own: a validly-CA-signed but weaponised macro always trips a hard corroborator and stays malicious. Self-signed / unverified / invalid signatures never qualify.

PDF appended ZIP child scan incomplete info POLYGLOT_PDF_APPENDED_ZIP_SCAN_INCOMPLETE

A ZIP appended after PDF EOF could not be fully scanned.

ZIP bytes were found after the PDF EOF marker, but the appended archive member scan failed or could not run. Parent PDF findings remain valid, but child payload attribution may be incomplete.

Scan did not complete info SCAN_INCOMPLETE

A scanner failed, timed out, or hit an output/resource cap before analysis completed.

This is an operational failure, not evidence that the file is safe. At least one required parser or worker did not finish, so the result is marked Error and should be retried or reviewed manually.

Unrecognised file format info UNKNOWN_FORMAT

File format was not recognised by the analyzer.

The file does not match any supported document format. Only generic shellcode scanning was performed. The file may still be harmful.

HTML 7

HTA/VBScript DOM-text execution critical HTML_HTA_VBSCRIPT_DOM_EXECUTE

HTA/VBScript document executes code assembled from DOM text.

Malicious HTA attachments often split the real script across HTML text nodes, then use VBScript Execute to run the reconstructed body on load. This hides the payload from simple script-block scanners while preserving automatic execution.

HTML ActiveX/COM object high HTML_ACTIVEX_OBJECT

HTML script instantiates ActiveX or COM objects.

CreateObject and ActiveXObject let script reach Windows automation interfaces such as WScript.Shell, XMLHTTP, and ADODB.Stream. That is rare in benign documents and common in script malware.

HTML Windows scripting object high HTML_WINDOWS_SCRIPTING_OBJECT

HTML references COM objects commonly used for execution or payload download.

Objects such as WScript.Shell, Shell.Application, MSXML2.XMLHTTP, and ADODB.Stream provide command execution and staged-download capability from script.

HTML contains VBScript high HTML_VBSCRIPT

Standalone HTML contains a VBScript script block.

VBScript in local HTML documents is a legacy Windows execution surface. Malicious attachments commonly use it with COM objects to download, drop, or execute payloads.

HTML scripted COM execution high HTML_SCRIPTED_COM_EXECUTION

HTML script dynamically creates objects and invokes execution/open methods.

Dynamic object creation followed by execution-like calls is a staged script malware pattern, especially when hidden inside a file with a document extension.

HTML base64 payload medium HTML_LONG_BASE64_SCRIPT_PAYLOAD

HTML script contains a long base64-like blob.

Long encoded blobs in script are commonly used for HTML smuggling or for staging a second payload while avoiding simple content filters.

HTML obfuscated string builder medium HTML_OBFUSCATED_STRING_BUILDER

HTML script repeatedly builds strings from small fragments.

Heavy string-fragment construction hides object names, commands, URLs, and payloads from static scanners and is common in malicious script.

Machine Learning 1

Nyx PDF Classifier flagged this file high ML_NYX_PDF_MALICIOUS

Gradient-boosted classifier scored the PDF above the suspicious threshold.

The Nyx classifier is a LightGBM model trained on byte-level structural features (keyword counts, filter histograms, entropy, object/stream balance) of hundreds of thousands of malicious and benign PDFs. Severity is graded by score: medium >= 0.25, high >= 0.5, critical >= 0.9. The model is complementary to the rule-based heuristics — it can catch families with no individual indicator that trips an explicit rule but whose overall shape matches the malicious training distribution.

Office Macros 9

Dangerous API name reassembled from split string literals critical OLE_VBA_SPLIT_KEYWORD_OBFUSCATION

VBA concatenates short string literals that reassemble a dangerous API/ProgID/LOLBin name (e.g. Scripting.FileSystemObject, WScript.Shell, powershell) appearing in no single literal.

Splitting an API or ProgID name across string concatenation is done only to evade keyword scanning; the rule keys on the reconstructed token rather than on concatenation density, so benign macro-heavy documents (which concatenate even more) are not flagged.

Obfuscated auto-exec VBA loader critical OLE_VBA_OBFUSCATED_AUTOEXEC_LOADER

Auto-exec VBA reconstructs strings with a heavy custom decoder and feeds them to a COM-instantiation or execution sink.

A numeric char-array decoder, repeated hex-string decode, dynamic CreateObject(decoder(...)), or many chained Replace() junk-token reconstructions — combined with an auto-exec entry point and a CreateObject/Shell/exec sink — is the obfuscated-loader shape used to keep indicators out of the macro source.

Raw OLE macro native-memory callback shellcode loader critical OLE_RAW_MACRO_NATIVE_MEMORY_CALLBACK_LOADER

Raw OLE/VBA project text exposes an auto-exec entry plus native memory allocation, process-memory write/copy, and callback/timer execution APIs.

Some malicious Office documents are source-stomped or only partially recovered by normal VBA extraction. The raw project bytes still expose the loader triad: allocate memory, copy payload bytes into it, then execute through a callback/timer primitive such as CreateTimerQueueTimer.

VBA downloads and writes a file to disk critical OLE_VBA_HTTP_DROP_EXEC

VBA reads an HTTP response body and writes it to disk (ADODB.Stream SaveToFile) — a download-drop dropper even when the COM ProgIDs are built dynamically.

Macro downloaders often hide the MSXML/ADODB ProgID in a variable so the URLDownloadToFile / ProgID keyword rules never fire. The .ResponseBody + .SaveToFile chain is a high-confidence indicator independent of those names.

VBA executes content staged in worksheet cells critical OLE_VBA_CELL_GETOBJECT_EXEC

VBA passes a worksheet cell/comment reference to GetObject and drives an Exec/Open/Run sink.

Malware hides the COM moniker and command in cell data (Range().Value / .NoteText) so the macro source carries no literal indicators; the GetObject(cell)+exec shape catches it regardless.

VBA injects an Excel-4 macro CALL to a download/exec API critical OLE_VBA_XLM_CALL_INJECTION

VBA writes Excel-4 (XLM) =CALL() formulas targeting urlmon URLDownloadToFile / Shell32 ShellExecute and runs them.

This VBA-to-XLM bridge downloads and executes a payload while keeping the API names out of normal VBA keyword scanning (the names are split / CHAR-built into cell formulas).

VBA native-memory callback shellcode loader critical OLE_VBA_NATIVE_MEMORY_CALLBACK_LOADER

VBA auto-exec macro combines native memory allocation, process-memory write/copy, and callback/timer execution APIs.

In-memory VBA loaders allocate writable memory, copy a decoded payload into it, then transfer control through a callback or timer API such as CreateTimerQueueTimer. Benign document automation does not need the NtAllocateVirtualMemory/NtWriteVirtualMemory/callback triad.

VBA stages a PowerShell/LOLBin download-and-run command critical OLE_VBA_BITSTRANSFER_DROPPER

VBA assembles a PowerShell/LOLBin download primitive (Start-BitsTransfer, Invoke-WebRequest, Net.WebClient, bitsadmin, certutil) that fetches a remote payload and then executes it.

Downloader macros split the download cmdlet, URL and command keywords with PowerShell backtick (`) / cmd caret (^) escapes and stage the command to a .cmd/.ps1 that they run via Shell.Application.Open / WScript.Shell. The detection de-escapes the source, then requires a download primitive, a URL, and an execution sink (file-write+run or auto-exec+run).

VBA polls global keyboard state (keylogger) high OLE_VBA_KEYLOGGER_SPYWARE

VBA declares/calls a Win32 keystroke-monitoring API (GetAsyncKeyState, SetWindowsHookEx, GetKeyboardState) to capture keystrokes system-wide.

No legitimate document automation polls global key state. These APIs are the core of a VBA keylogger, usually paired with active-window capture and a log file. High-confidence spyware behaviour independent of download/Shell evidence.

Polyglot 2

Executable/encoded overlay appended after image end critical IMAGE_TRAILING_OVERLAY_EXECUTABLE

A valid image is followed by a large appended overlay that is a second container (PE/ZIP/...) or a loader-delimited base64-encoded payload.

Image-stego-loaders ship a real lure image (often a 4K screenshot) with an encoded or raw payload appended after the image's logical end (JPEG EOI / PNG IEND / GIF trailer). The image renders normally while a loader (AutoIt/.NET, AsyncRAT/njRAT/DCRat family) extracts and runs the hidden PE. Benign images carry at most a few KB of structured slack, never a large executable/archive or a delimiter-marked base64 blob, so this overlay shape has no benign use.

JPEG carrier with embedded encoded executable payload critical JPEG_CARRIER_EMBEDDED_ENCODED_PE

A valid JPEG carries a loader-delimited encoded Windows executable after the JPEG end marker.

The JPEG itself does not exploit an image viewer. Instead, a separate loader reads the rendered lure image as a carrier, locates a marker such as INICIO after the JPEG EOI marker, decodes the hidden MZ/PE payload, and writes or reflectively runs it. Benign JPEG encoders do not append large loader-marked executable payloads after the logical image end.

Script 2

Obfuscated WSH script critical SCRIPT_WSH_OBFUSCATED

Windows Script Host content includes execution or obfuscation indicators.

WSH script combined with repeated string construction, encoded blobs, or shell/COM execution terms is a strong indicator of script malware.

Windows Script Host masquerade high SCRIPT_WSH_MASQUERADE

File contains Windows Script Host code while masquerading as a document.

WScript/CScript content can execute directly on Windows. When it is submitted with a document-like name or extension, it is a common attachment masquerade pattern.

Social Engineering 26

Fake CAPTCHA with command-running instructions critical SE_FAKE_CAPTCHA_CLICKFIX

Document combines fake CAPTCHA language with instructions to paste or run a command.

The rule requires both a CAPTCHA or human-verification frame and a command-running step such as Win+R, Ctrl+V, PowerShell, cmd, mshta, or similar. This is a high-confidence ClickFix pattern rather than a generic CAPTCHA phrase.

Recovery secret / private key request critical SE_SECRET_RECOVERY_LURE

Document requests recovery phrases, private keys, backup codes, or saved passwords.

The rule matches requests for seed phrases, private keys, backup codes, or saved passwords. These are recovery secrets; asking for them inside a document is a high-risk signal.

Advance-fee lottery/parcel scam high SE_ADVANCE_FEE_SCAM_LURE

Document contains lottery/beneficiary, large-value funds, and claim/contact/payment instructions.

The rule requires multiple independent fraud-letter cues: beneficiary or lottery/prize language, large-value draft/funds wording, and claim, contact, payment-bureau, or courier instructions. This is the classic advance-fee scam shape and is stronger than generic prize wording alone.

Brand-impersonation credential phishing lure high SE_BRAND_CREDENTIAL_PHISH

Document impersonates a well-known consumer brand and uses account-security language to drive the reader to a credential-harvesting link.

The rule fires when a brand name (Amazon, PayPal, Apple, Microsoft, a bank, etc.) appears together with account-security / verification lure language ("unusual activity", "account on hold", "verify your account", "restore access") AND at least one corroborator: lure text written with homoglyph letter swaps (capital 'I' for lowercase 'l', e.g. "hoId", "unusuaI" — a deliberate keyword-filter evasion), an action link to an abused app-hosting / redirector service (Google Apps Script /macros/.../exec, *.web.app, *.workers.dev, Google Forms), or a call-to-action link whose host does not match the impersonated brand's own domain. Requiring a corroborator keeps legitimate brand notices — which link to the real domain and spell their words normally — from tripping it. Critical when the link points at an abused redirector or two corroborators are present.

Browser extension / update installation lure high SE_BROWSER_INSTALL_LURE

Document tells the user to install a browser extension, plugin, viewer, or update.

The rule matches instructions to install a browser extension, plugin, viewer, or browser update to read the document. This is high-risk in an unsolicited file because fake updates and extensions are common malware delivery paths.

ClickFix social engineering attack high SE_CLICKFIX

Document instructs the user to press Win+R and paste or run a command.

ClickFix lures ask the user to open the Run dialog or a shell and paste a command, usually PowerShell or cmd. The rule looks for those instructions in document text. It is high-risk because the command is supplied by the lure, not because an exploit is present.

Clipboard command execution lure high SE_CLIPBOARD_COMMAND_LURE

Document tells the user to copy or paste clipboard content into a command context.

The rule matches clipboard, copy, paste, or Ctrl+V instructions near Run, PowerShell, cmd, terminal, mshta, regsvr32, or similar execution contexts. That combination is uncommon in ordinary documents and is typical of ClickFix-style social engineering.

Fake CAPTCHA / human verification prompt high SE_FAKE_CAPTCHA

Document displays a fake CAPTCHA or robot-verification prompt.

The rule matches CAPTCHA or human-verification phrases ('verify you are not a robot', 'complete the verification', etc.) in document text. In malicious lures this is typically the framing for a ClickFix-style step that asks the user to paste a command, but the rule fires on the verification language alone.

Fake browser/security check with command step high SE_FAKE_BROWSER_SECURITY_CHECK

Document combines browser/security-check language with instructions to run a command.

The rule looks for browser check, connection verification, or security verification language together with a command-running step. This catches fake browser checks that use the same ClickFix workflow without saying CAPTCHA.

Fake encrypted/secure-document view lure high SE_ENCRYPTED_DOC_LURE

Document claims to be encrypted/protected and steers the reader to a deceptive 'secure view' link.

The secure-document phishing carrier: the page claims the file is encrypted or protected by a secure service ("this document is encrypted using …", "click below to securely view") and the action link uses deceptive infrastructure — a destination host whose first DNS label is literally 'http'/'https' (e.g. 'https.file-transfers.example.com'), or an abused app-hosting/redirector service. Legitimate secure-mail gateways use similar wording but link to their own honest domains, so the lure text alone never fires; the deceptive-link corroborator is required.

Invoice remittance address uses free webmail high SE_INVOICE_FREE_WEBMAIL_REMITTANCE

Invoice/payment document routes remittance contact through a consumer webmail domain.

Legitimate vendor invoices can include payment instructions, but a bank-transfer or remittance workflow that points to generic consumer webmail (for example gmail.com, outlook.com, mail.com, or post.com) is a strong business-email-compromise indicator, especially when the document impersonates a named organisation.

LOLBin token sequence in document text high SE_LOLBIN_RUN_COMMAND

Extracted document text contains a Windows execution tool name within 220 characters of a dangerous flag, command verb, or URL.

The rule matches the name of a script/execution tool (PowerShell, cmd, mshta, rundll32, regsvr32, wscript, cscript, certutil, bitsadmin, curl, wget) within 220 characters of a dangerous flag (-enc, downloadstring, iex, /i:, javascript:, vbscript:) or a URL. This catches two distinct shapes: (1) a visible 'run this' instruction in HTML/PDF/RTF lure bodies, where the matched span really is the command a victim is asked to run; and (2) macro-laden Office files where the macro's own string-pool entries (CreateObject names, action verbs, payload URLs) end up adjacent in the extracted text. The detail field shows the head and tail of the matched span so an analyst can tell which case applies.

MFA / one-time-code harvesting lure high SE_MFA_LURE

Document asks for an MFA, OTP, authenticator, or one-time passcode action.

The rule matches requests for MFA, OTP, authenticator, one-time code, or push approval actions. Documents that ask for these actions should be reviewed carefully, especially when combined with credential or account language.

Password-protected archive handoff high SE_PASSWORD_ARCHIVE_LURE

Document gives password instructions for an archive or attachment.

The rule matches password instructions near archive or attachment language. Encrypted archives are often used to keep gateway scanners from inspecting the payload before the recipient extracts it.

Payment redirection / bank-detail change lure high SE_PAYMENT_REDIRECT_LURE

Document describes new or changed bank, wire, ACH, IBAN, SWIFT, or routing instructions.

The rule matches text about changed bank, wire, ACH, IBAN, SWIFT, or routing details. This is a high-value business email compromise pattern, but it still needs business-context review.

Remote-support code/control handoff high SE_REMOTE_SUPPORT_CODE_LURE

Document asks the user to share a support code or allow remote control.

The rule matches remote-support tools such as AnyDesk, TeamViewer, Quick Assist, ScreenConnect, or Splashtop near requests for a session code, support code, connection ID, or permission to control the machine.

Remote-support tool lure high SE_REMOTE_SUPPORT_LURE

Document instructs the user to install or open remote-support software.

The rule matches instructions to install or open remote-support tools such as AnyDesk, TeamViewer, Quick Assist, ScreenConnect, or Splashtop. This is high-risk in an unsolicited document because it can lead to interactive control of the machine.

Security software disable instruction high SE_SECURITY_BYPASS

Document instructs the user to disable antivirus or security software.

The rule matches instructions to disable antivirus, security tools, or protections before opening content. That request is unusual for ordinary documents and should be treated as high-risk.

Callback phishing phone lure medium SE_CALLBACK_LURE

Document asks the user to call a phone number in a finance or security context.

The rule matches phone-call instructions in finance, renewal, refund, fraud, or account-security contexts. Callback phishing commonly starts this way, but some legitimate notices also use phone numbers.

Cloud document impersonation lure medium SE_CLOUD_DOC_LURE

Document impersonates a cloud file-sharing or collaboration service.

The rule matches cloud-file service names such as SharePoint, OneDrive, Google Drive, Dropbox, Box, Teams, or Microsoft 365 in a verify-to-view or secure-document context. Real sharing workflows can look similar, so use this as supporting evidence.

Document signing service impersonation lure medium SE_DOCUSIGN_LURE

Document impersonates DocuSign, Adobe Sign, or similar service to lure users.

The rule matches visible references to e-signature services such as DocuSign or Adobe Sign in a signing-request context. This is common in phishing, but real signature workflows can use the same names, so the service reference is supporting evidence rather than proof.

Macro/content-enable lure medium SE_ENABLE_LURE

Document instructs the user to enable macros or editing.

Macro malware often uses fake preview text or security-warning language to push the user toward 'Enable Content' or 'Enable Macros'. This finding means the document contains that kind of instruction; it does not prove the macros are malicious by itself.

QR-code redirect lure medium SE_QR_LURE

Document instructs the user to scan a QR code, likely as an off-band phishing vector.

The rule matches instructions to scan a QR code to view, verify, or access a document. QR codes are common in legitimate material, but these access-oriented phrases are useful supporting evidence for QR phishing.

Invoice / payment language low SE_INVOICE_LURE

Document contains invoice or payment language paired with an action instruction.

The rule matches invoice or payment language paired with an action such as open, download, review, or click. Genuine invoices use the same vocabulary, so this finding is mainly useful when paired with macro, link, or attachment indicators.

Urgency / deadline lure low SE_URGENCY_LURE

Document contains urgency or deadline language to pressure the user into acting.

The rule matches deadline and account-pressure phrases such as 'final notice' or 'action required within 24 hours'. This language is common in both phishing and legitimate billing, legal, and account notices, so it is low-signal on its own.

Visual download / call-to-action button lure low SE_DOWNLOAD_BUTTON

Document contains a call-to-action phrase ('Click here to download', etc.).

The rule matches call-to-action text such as 'Download Now' or 'Open Document'. That wording is common in manuals and setup guides, so this is low-signal unless other findings point to a malicious workflow.

Webshell 4

PHP webshell / backdoor source critical WEBSHELL_PHP

The file contains PHP code with the signature of a webshell/backdoor (request input fed to a command/code-exec sink, or a named-shell banner).

A webshell takes attacker input from an HTTP request and runs commands or code on the server (RCE). It is flagged as a malicious hacktool artifact even when carried inside a document or archive (e.g. a c99/Locus7s PHP shell pasted into an RTF) — the code does not run from the carrier, but the file IS a webshell. Detection requires a multi-token combination (script tag + request input + exec sink, a decoder/second sink, or a distinctive named-shell banner) so ordinary server code does not false-positive.

ASP webshell / backdoor source high WEBSHELL_ASP

The file contains ASP webshell code (eval/Execute over Request input, or WScript.Shell.Run of request data).

Classic ASP server-side remote-command-execution backdoor source — attacker-supplied Request data is passed to eval/Execute or a shell.

JSP webshell / backdoor source high WEBSHELL_JSP

The file contains JSP webshell code (Runtime.exec / ProcessBuilder driven by a request parameter).

JSP server-side remote-command-execution backdoor source — a request parameter is passed to Runtime.exec / ProcessBuilder to run commands.

Known webshell marker string medium WEBSHELL_MARKER

A distinctive named-webshell marker (c99shell/r57/WSO/b374k/Locus7s/…) was found without surrounding script context.

A named-webshell identifier string appeared but no script body was matched around it (e.g. the marker sits in a binary or document). Surfaced at medium for review as a likely hacktool artifact; the bare 3-char forms (c99/r57) are deliberately excluded to avoid coincidental substring matches in large binaries.