Malware Analysis 101: Safe Setup and First Steps

Every security team eventually faces the same question: what exactly does this file do? A suspicious attachment lands in a phishing report, an EDR alert fires on an unknown binary, or IR triage surfaces a script with an obfuscated payload. The instinct is to submit it to VirusTotal and move on, but detection rates on targeted samples are often poor, and vendor labels like "Trojan.GenericKD.12345" tell you nothing about capability, infrastructure, or intended victim. Malware analysis fills that gap.

At its core, malware analysis is about answering three questions: what does this file do, how does it do it, and what does it communicate with? The discipline ranges from quick triage performed in minutes to deep reverse engineering that takes days. This guide focuses on the entry point: building a safe analysis environment and developing a repeatable workflow for basic static and dynamic analysis — the skills that directly improve incident response quality without requiring a background in assembly language.

The tooling is largely free. The main investment is time and discipline — especially discipline around safety, because the files you will analyze are designed to compromise systems.

Why Learn Malware Analysis

Malware analysis sits at the intersection of several high-value security skills, and even a foundational capability produces significant returns across the incident response lifecycle.

From a career perspective, analysts who can characterize malware behavior are significantly more valuable during active incidents. The ability to quickly determine whether a suspicious file is a dropper, a loader, a stealer, or a lateral movement tool shapes every subsequent response decision — what systems to prioritize, what network traffic to hunt for, what persistence mechanisms to check. That analysis capability is not automatic; it is built through practice.

From a detection perspective, malware analysis generates artifacts that translate directly into detections. Understanding the file system paths, registry keys, mutex names, network indicators, and behavioral signatures of a specific sample lets you build YARA rules, Sigma rules, and EDR detections tuned to that exact threat rather than relying on generic signatures. Defenders who have analyzed the malware they are hunting write dramatically better detections than those who have not.

From a broader intelligence perspective, analysis connects individual samples to campaigns, threat actors, and toolkits. Shared code patterns, infrastructure overlaps, and behavioral similarities link what looks like an isolated incident to a wider operation. That context drives better prioritization and more effective communication with leadership.

Lab Setup

Before touching a malware sample, your environment must be prepared. Analyzing malware on your primary workstation or in a connected corporate environment is not a calculated risk — it is an uncontrolled experiment with unpredictable consequences. The lab environment is not optional infrastructure; it is the foundation that makes everything else safe.

Isolated Virtual Machines

The standard approach is a dedicated analysis VM running on a hypervisor that you control. VMware Workstation Pro, VMware Fusion, and VirtualBox are all viable. The critical requirement is that the VM is isolated from your production network. This means:

Host-only networking by default. Set the VM's network adapter to host-only mode, which creates an internal virtual network between the VM and the host with no external routing. The malware cannot reach the internet, your corporate network, or any system outside the hypervisor. This is the baseline configuration for initial analysis.
No shared folders. Disable VMware Tools or VirtualBox Guest Additions shared folder features. A sample that detects it is in a VM may attempt to escape via shared filesystem access. More practically, if you accidentally execute something destructive, shared folders expose your host filesystem to damage.
Snapshots before every session. Before introducing any sample, take a clean snapshot of the VM in its baseline state. After analysis, revert. Never run multiple samples in the same VM session without reverting between them — artifacts from one sample contaminate the behavioral baseline for the next.

Network Segmentation

When you need to observe network behavior, a two-VM setup is more controlled than opening the analysis VM to the internet. The analysis VM connects to a second VM running a fake network services host (such as a machine running INetSim or Fakenet-NG), which responds to DNS queries, HTTP requests, and other common protocol interactions with plausible but benign responses. The malware believes it has internet connectivity and executes its network routines; you capture the traffic without actually connecting to live command-and-control infrastructure.

A separate physical machine running on a dedicated VLAN — firewalled from everything else on your network — is preferable for high-confidence detonation of sophisticated or potentially VM-aware samples. For most practitioner-level work, a well-configured VM setup is sufficient.

REMnux and FlareVM

Rather than assembling an analysis environment from scratch, two pre-built distributions are the standard starting point in the community:

REMnux is a Linux distribution purpose-built for malware analysis. It includes hundreds of pre-installed tools for static analysis, network analysis, memory forensics, and document examination. REMnux is particularly strong for analyzing Linux malware, document-based threats, and network artifacts. It runs well as a VM and is maintained by the SANS Institute.
FlareVM is a Windows-based analysis environment maintained by Mandiant. It installs on top of a Windows base image and provides a curated set of Windows malware analysis tools, debuggers, disassemblers, and utilities. Because most enterprise malware targets Windows, FlareVM is often the primary analysis machine, with REMnux serving as the network simulation host in a two-VM configuration.

Start with FlareVM as your Windows analysis VM and REMnux as your network host. Snapshot both in their clean states and you have a functional analysis lab.

Safety First

Safety discipline is not a formality — it is the difference between a controlled analysis and an incident that requires its own incident response. Several practices are non-negotiable:

Never handle samples on your host OS. Transfer samples to the analysis VM using a mechanism that does not risk accidental execution: zip archives with a password (the community convention is "infected"), direct copy through VM disk mounting with the VM powered off, or a controlled file transfer after verifying network isolation. Never double-click a sample to "check what it is" on a non-isolated machine.
Disable Windows Defender and AV before analysis. This sounds counterintuitive, but AV that quarantines your sample mid-analysis corrupts your behavioral baseline. Disable AV within the analysis VM before introducing the sample, perform the analysis, then revert the snapshot. The protection you lose is within an isolated, disposable VM.
Never submit samples directly to VirusTotal without considering sensitivity. VirusTotal shares all uploaded samples with its subscriber base of security vendors and researchers. If the sample is from an ongoing incident at a client organization, contains sensitive data, or may tip off a threat actor that their tool has been identified, use a local YARA scan or hash lookup instead of direct submission. Hash lookups (submitting the SHA-256 rather than the file) reveal detection rates without sharing the binary.
Document everything. Note the sample source, SHA-256 hash, file size, and analysis date before starting. Record every tool you run and every finding you produce. Analysis notes are case records and may be referenced weeks later when additional context emerges.
Verify network isolation before detonation. Before executing any sample dynamically, confirm that the VM cannot reach external networks. A simple test: attempt to ping an external IP or resolve an external domain from within the VM. If it fails, your isolation is working. Do not skip this check.

Static Analysis Workflow

Static analysis examines the file without executing it. It is fast, safe, and provides the initial context that shapes everything else. Even if you eventually run the sample dynamically, static analysis first gives you hypotheses to validate and indicators to watch for during execution.

File Type Identification

Start by determining what the file actually is, independent of its extension. Malware frequently uses misleading extensions: a .pdf that is actually a PE executable, a .docx that contains a malicious macro, or a .jpg that is a renamed ZIP archive. The file command on Linux/REMnux identifies file type from magic bytes rather than extension. The TrID tool provides more detailed format identification across a broader signature database.

For Windows PE files (executables and DLLs), note the architecture (32-bit or 64-bit), subsystem (GUI vs. console), and compilation timestamp. The timestamp may be forged but is worth recording; a compilation date hours before a phishing campaign is a useful correlation point.

Hash and Threat Intelligence Lookup

Compute the MD5, SHA-1, and SHA-256 hashes of the sample. Look up the SHA-256 on VirusTotal (by hash, not file), MalwareBazaar, and your internal threat intelligence platform. If the hash is known, you often get immediate context: malware family, campaign associations, and behavioral reports from previous analyses. If it is unknown, that is also meaningful — a new or customized sample warrants more thorough analysis.

Strings Extraction

The strings command extracts printable character sequences from a binary. Even in packed or obfuscated files, strings often reveals useful artifacts: URLs, domain names, IP addresses, file paths, registry keys, error messages, and hardcoded credentials. On Windows samples, use both ASCII and Unicode string extraction (strings -a -el). The FLOSS tool from Mandiant extends this by automatically deobfuscating common string obfuscation techniques that simple strings extraction misses.

Look specifically for:

HTTP/HTTPS URLs and domain names — potential C2 infrastructure
File system paths, especially temp directories or user profile paths
Windows API function names that appear as strings — malware that resolves API calls dynamically sometimes stores function names in plaintext
Registry key paths — persistence mechanisms frequently involve specific registry locations
Mutex names — malware often creates a named mutex to prevent re-infection; these names are reliable IOCs
Error messages and debug strings — sometimes left in by developers and revealing of function or campaign

PE Header Analysis

For PE files, the header contains rich metadata about the binary's structure and intended behavior. Tools like pestudio, PE-bear, and pefile (Python library) parse PE headers and present this information in readable form.

Key header fields to examine:

Import Address Table (IAT). The IAT lists every Windows API function the binary imports. API imports are a behavioral fingerprint: a sample importing CreateRemoteThread, WriteProcessMemory, and VirtualAllocEx has process injection capability. A sample importing CryptEncrypt and FindFirstFile has encryption and file enumeration capability. Read the imports as a capability declaration.
Sections. Standard PE sections include .text (code), .data (initialized data), and .rsrc (resources). Unusual section names, sections with high entropy (indicating compressed or encrypted content), or a .text section with both read and write permissions are packing and obfuscation indicators.
Resources. The resource section can contain embedded executables, scripts, or data that the sample drops or executes. Tools like ResourceHacker and pestr extract and examine embedded resources.

Packing Detection

Many malware samples are packed — compressed, encrypted, or obfuscated — to evade static detection and analysis. Packed binaries typically have very few imports (just enough to unpack themselves), high entropy sections, and limited readable strings. Tools like Detect-It-Easy (DIE) and ExeinfoPE identify common packers by signature. If a sample is packed with a known packer like UPX, you may be able to unpack it automatically and then analyze the unpacked binary. Custom packers require dynamic analysis to capture the unpacked payload from memory.

Dynamic Analysis Workflow

Dynamic analysis executes the sample in a controlled environment and observes its behavior. It reveals what the malware actually does at runtime — including behavior that was hidden by packing, obfuscation, or encryption that static analysis could not see through. The cost is that you are running live malware, which is why the lab environment and safety practices from earlier sections are prerequisites.

Baseline Before Execution

Before detonating the sample, take a snapshot and establish a behavioral baseline: running processes, active network connections, loaded services, and relevant registry key values. Tools like Autoruns (for persistence locations) and TCPView (for network connections) give you a clean pre-execution reference. This makes post-execution comparison far more efficient.

Process Monitoring

Process Monitor (ProcMon) from Sysinternals is the single most useful dynamic analysis tool for Windows. It captures every file system operation, registry operation, and process/thread event on the system in real time. With ProcMon running and filtered to your sample's process tree, you see exactly which files it creates, which registry keys it modifies, which child processes it spawns, and which system calls it makes. The filter capability is essential — a single malware execution can generate tens of thousands of events, and filtering to the sample's PID and its children makes the output tractable.

Network Traffic Capture

Run Wireshark on the network interface (or on the REMnux host in a two-VM setup) before executing the sample. Network traffic reveals C2 communication patterns, DNS queries, exfiltration attempts, and download behavior. Even if the malware cannot reach live infrastructure (because of your network isolation), it will still attempt connections that Wireshark captures. DNS queries to C2 domains, HTTP POST requests with encoded data, and IRC or custom protocol traffic are all visible in the packet capture.

If you are using Fakenet-NG or INetSim as a network simulation layer, these tools log every connection attempt and serve plausible responses that keep the malware executing its network routines rather than failing and exiting early.

Registry and File System Changes

After execution, compare the current system state against your pre-execution baseline. Autoruns highlights any new persistence mechanisms added to the registry or startup locations. A manual or scripted comparison of the file system against a pre-execution snapshot reveals dropped files, modified binaries, and created directories. Focus particularly on persistence locations: HKCU\Software\Microsoft\Windows\CurrentVersion\Run, the Startup folder, scheduled tasks, and services.

Essential Tools Reference

A functional analysis environment does not require dozens of specialized tools. The following set covers the majority of initial analysis needs:

pestudio — Windows PE analysis with integrated VirusTotal hash lookups, import analysis, string extraction, and packing detection. The most efficient starting point for any Windows executable.
FLOSS (FLARE Obfuscated String Solver) — Extracts strings from binaries using static analysis techniques that defeat common obfuscation, including stack strings, tight loops, and simple encoding schemes that standard strings misses.
Detect-It-Easy (DIE) — Identifies packers, compilers, and protections applied to PE files using a signature-based approach with an active community maintaining the signatures database.
Process Monitor (ProcMon) — Real-time file system, registry, and process activity monitor. The primary behavioral analysis tool for Windows dynamic analysis.
Process Hacker / System Informer — Advanced task manager that shows process memory maps, loaded DLLs, handles, and network connections in real time. Useful for spotting injected threads and anomalous memory regions during live detonation.
Wireshark — Network protocol analyzer for capturing and dissecting traffic produced during dynamic analysis.
Fakenet-NG — Windows network simulation tool that intercepts network traffic and responds to common protocols (DNS, HTTP, HTTPS, SMTP) with plausible responses. Keeps network-dependent malware executing rather than failing on unreachable infrastructure.
Cuckoo Sandbox / CAPE Sandbox — Automated sandbox platforms that detonate samples in a controlled VM, capture behavioral reports, extract network indicators, and produce structured output. CAPE adds unpacking and config extraction capabilities. Both can be self-hosted for sensitive sample analysis.

Your First Analysis: A Macro-Enabled Document

Document-based malware — particularly Office documents with embedded macros — remains one of the most common initial access vectors. Walking through a macro document analysis illustrates how static and dynamic techniques combine in practice.

Static Examination

Start with file to confirm the format. A modern .docm or .xlsm file is an OOXML ZIP archive; an older .doc with macros is OLE2 compound document format. Use olevba (part of the oletools suite) to extract and examine VBA macro code from Office documents without opening them in Word or Excel:

olevba suspicious.docm

The output displays all extracted VBA modules, flags suspicious patterns (Shell calls, WScript usage, base64 strings, PowerShell invocations), and produces an IOC summary. Examine the macro code for:

AutoOpen or Document_Open subroutines — code that runs automatically when the document is opened
String concatenation used to build commands — a common obfuscation technique to bypass static pattern matching
Shell, WScript.Shell, or PowerShell invocations — indicators that the macro is executing system commands
URL strings or encoded payloads — the macro may be downloading a second-stage payload or decoding an embedded one

If the macro contains base64-encoded content, decode it with CyberChef (the browser-based tool from GCHQ) or from the command line. The decoded content often reveals the next stage: a PowerShell script, a PE executable, or another encoded layer.

Dynamic Examination

With ProcMon filtering active, Wireshark capturing on the network interface, and Fakenet-NG running on the network host, open the document in Microsoft Word within the analysis VM. If the macro is set to auto-execute, it will run immediately. If it requires the user to enable macros (the "Enable Content" prompt), click it — you are the analyst, not a victim; you want to see the behavior.

In ProcMon, watch for:

WINWORD.EXE spawning child processes — particularly cmd.exe, powershell.exe, wscript.exe, or mshta.exe. Any process spawned by Word that itself spawns further children is a strong indicator of a multi-stage payload chain.
Files written to temp directories — %TEMP%, %APPDATA%, and C:\ProgramData are common drop locations for second-stage payloads.
Registry modifications in Run keys — persistence written immediately after macro execution indicates the sample is attempting to survive reboot.

In Wireshark, watch for DNS queries immediately after the document opens. The domain queried is almost always a C2 domain or a hosting location for the next-stage payload. Note the full query, the response (which Fakenet-NG will have served), and any subsequent HTTP requests. The User-Agent string in HTTP requests sometimes contains version or campaign identifiers hardcoded by the malware author.

After the macro has executed, use Process Hacker to examine the memory of any child processes that were spawned. Injected shellcode or a reflectively loaded DLL will appear as executable memory regions without a backing file on disk — the same pattern that windows.malfind looks for in Volatility.

Next Steps: Deepening Your Analysis Capability

Static and dynamic analysis answer what a sample does behaviorally. Understanding how — the underlying implementation, the specific algorithms, the code structure — requires moving into reverse engineering and code-level analysis. Several natural progressions build on the foundation established here.

Ghidra and IDA Free. Ghidra (from the NSA, open-source) and IDA Free are disassemblers and decompilers that translate binary machine code into readable assembly and pseudo-C. They are the primary tools for understanding the internal logic of a sample: how it decrypts its configuration, what its C2 protocol looks like at the protocol level, and how it implements evasion techniques. Starting with simple, unobfuscated samples and working toward more complex ones builds the pattern recognition needed for efficient disassembly analysis.
x64dbg for dynamic debugging. While ProcMon captures system-level behavior, x64dbg lets you step through malware execution instruction by instruction, set breakpoints on specific API calls, and inspect memory at any point during execution. This is particularly valuable for unpacking — setting a breakpoint at the OEP (Original Entry Point) where a packer hands off to the unpacked payload lets you dump the decrypted binary from memory for further static analysis.
YARA rule development. The strings, behavioral patterns, and code sequences you find during analysis are the raw material for YARA rules that detect the same malware across your environment. Building YARA rules from your own analysis is significantly more effective than relying on public rule sets, because your rules target the exact variant you have seen. The YARA Rules for Malware Detection guide covers the rule development workflow in detail.
Community resources. MalwareBazaar and ANY.RUN provide public sample repositories and behavioral analysis reports. The Malware Traffic Analysis blog (malware-traffic-analysis.net) publishes regular write-ups of malware samples with network captures and analysis notes. The VX-Underground and Objective-See (Mac-focused) communities produce practitioner-oriented research. Regular engagement with these resources builds familiarity with current threat actor techniques.
Structured training. Certifications like GREM (GIAC Reverse Engineering Malware) and courses from Malware Unicorn, TCM Security, and Zero2Automated provide structured curricula that systematically develop analysis skills. The combination of structured learning and hands-on practice against real samples is the fastest development path.

Malware analysis is a compounding skill. The first few samples feel slow and uncertain; the thirtieth produces pattern recognition that makes analysis dramatically faster. Every sample analyzed is a vocabulary entry — a technique, an IOC pattern, a behavioral signature — that applies to the next one. The investment pays dividends across every security function: detection engineering, incident response, threat intelligence, and red team understanding.

For analysts building out their forensic capabilities alongside malware analysis, the Memory Forensics Fundamentals guide covers the volatile evidence collection and Volatility analysis workflow that complements dynamic malware analysis — particularly for investigating samples that rely on in-memory execution to evade disk-based detection.

Develop Your Analysis Skills

Malware analysis is a critical capability for any security team. Learn how ForgeWork helps organizations build hands-on security skills through structured training.

Training Programs Explore More Insights →