Building a Detection Engineering Program from Scratch

What Is Detection Engineering

Detection engineering is the discipline of designing, building, testing, deploying, and maintaining threat detection logic in a systematic, repeatable, and measurable way. It is to security alerting what software engineering is to programming — the difference between writing code and building reliable software systems. A detection engineer does not simply create a SIEM rule when asked. They hypothesize about threats, research attack techniques, design detection logic, test that logic against both benign and malicious data, deploy it through a controlled process, monitor its performance, and iteratively improve it based on measured outcomes.

The distinction matters because most security operations centers are drowning. They are drowning in alerts generated by rules that were created ad hoc, with no testing methodology, no performance metrics, and no lifecycle management. When a new threat report comes out, someone writes a rule. When an auditor asks about a specific attack vector, someone writes a rule. When a vendor deploys default content, hundreds of rules activate simultaneously. The result is an environment where analysts spend their time triaging false positives rather than investigating real threats, where genuine attack signals are buried in noise, and where the security team cannot articulate what they can and cannot detect.

Detection engineering solves this by applying engineering rigor to the problem. Every detection has a documented purpose, a defined scope, expected performance characteristics, a testing methodology, and an owner responsible for its maintenance. Detections are treated as code — version-controlled, peer-reviewed, tested, and deployed through CI/CD pipelines. The program tracks metrics that quantify its effectiveness and uses those metrics to drive continuous improvement.

The Problem with Traditional Alerting

Before building something better, it is worth understanding exactly what is broken. The traditional approach to SIEM alerting is characterized by several pathologies that compound each other into operational dysfunction.

45% Of security alerts are false positives in the average SOC, according to Ponemon Institute research. In some environments, the rate exceeds 80%.

Alert fatigue is the most visible symptom. Analysts face hundreds or thousands of alerts per day, the vast majority of which are false positives or low-fidelity signals that require manual investigation to resolve. Research consistently shows that when false positive rates exceed 50%, analysts begin to unconsciously deprioritize or skip alerts entirely. They develop mental shortcuts — "this rule always fires on the finance server, ignore it" — that eventually cause them to miss the one time it fires because of an actual compromise.

Detection decay is a subtler but equally damaging problem. A detection rule that was accurate when written becomes less accurate over time as the environment changes. New applications generate log patterns that trigger existing rules. Infrastructure changes alter network baselines. Attackers evolve their techniques to avoid known detection patterns. Without a process for regularly reviewing and tuning detection rules, the overall quality of the detection portfolio degrades steadily, often without anyone noticing until a real incident is missed.

Coverage blindness means the team cannot answer the fundamental question: "What can we detect?" Without systematic coverage mapping, security teams have no visibility into which threat techniques are covered by existing detection logic, which techniques have no coverage at all, and which detections are redundant. They cannot make informed decisions about where to invest their detection development effort because they do not have a map of the current terrain.

Knowledge silos emerge when detection rules exist only as configurations in a SIEM console, documented nowhere, understood by only the person who wrote them. When that person leaves the organization, the institutional knowledge of why a rule was written, what it is intended to catch, what its known limitations are, and how to tune it leaves with them. The remaining team inherits a collection of opaque rules they are afraid to modify or disable.

Detection-as-Code

The foundational practice of a mature detection engineering program is treating detection rules as code. This means storing detection logic in a version control system (Git), writing it in a structured and portable format, subjecting it to peer review, testing it automatically, and deploying it through a CI/CD pipeline. This is not metaphorical — it is literal. Detection rules are code, and they should be managed with the same tools and discipline that software engineering teams apply to application code.

A detection-as-code repository typically contains, for each detection:

The detection logic itself — written in the query language of your SIEM (KQL for Sentinel, SPL for Splunk, Lucene/EQL for Elastic) or in a portable format like SIGMA that can be compiled to multiple target platforms.
Metadata — a structured file (YAML or JSON) documenting the detection's purpose, the MITRE ATT&CK techniques it maps to, its severity, expected false positive sources, data source requirements, author, creation date, and modification history.
Test cases — sample log events (both true positive and true negative examples) that can be used to validate the detection logic automatically. Some teams use tools like Atomic Red Team or custom attack simulation frameworks to generate live test data.
Runbook — investigation steps for an analyst who receives this alert, including what context to gather, what to escalate, and what constitutes a confirmed true positive versus a false positive.

The benefits of this approach are substantial. Reproducibility: any team member can understand, modify, and deploy any detection. Auditability: every change is tracked in version control with a clear history of who changed what and why. Collaboration: detection development becomes a team activity with peer review catching errors and improving quality before deployment. Rollback: if a detection change causes problems, reverting to the previous version is a single Git operation. Portability: when you change SIEM platforms (and you will, eventually), detection logic documented in a structured format is far easier to migrate than raw rules extracted from a console.

Building Your First Detections

The most common mistake when starting a detection engineering program is trying to build coverage for everything simultaneously. This is paralyzing and counterproductive. Instead, start with a threat-informed approach anchored in the MITRE ATT&CK framework.

Begin by identifying your organization's most likely threats. What adversary groups target your industry? What are their documented techniques? Use resources like MITRE ATT&CK Groups, threat intelligence reports from your industry ISAC, and your own historical incident data to build a prioritized list of attack techniques. Then map your existing detection coverage against that list using ATT&CK Navigator. The result is a visual heat map showing where you have coverage, where you have gaps, and where your effort should be focused.

Example: Detecting Credential Dumping (T1003)

Consider the MITRE ATT&CK technique T1003 — OS Credential Dumping. This technique is used by virtually every threat actor during post-exploitation to extract credentials from memory, the SAM database, or domain controller ntds.dit. Building a detection for this technique illustrates the detection development process.

Data source requirements: You need Windows Security event logs (Event ID 4688 — Process Creation with command-line logging enabled, or Event IDs 4656/4663 for LSASS access), Sysmon logs (Event ID 10 — Process Access targeting lsass.exe, Event ID 1 — Process Creation with hashes), or EDR telemetry providing equivalent visibility into process behavior and LSASS access patterns.

Detection logic options range from simple to sophisticated:

Process name matching: Alert on execution of known credential dumping tools — mimikatz.exe, procdump.exe targeting lsass, comsvcs.dll MiniDump. This catches unsophisticated attackers but is trivially bypassed by renaming binaries.
Command-line pattern matching: Detect suspicious command-line arguments associated with credential dumping — sekurlsa::logonpasswords, lsass.dmp, -ma lsass. Better coverage but still signature-based.
LSASS access monitoring: Alert when a process opens a handle to lsass.exe with suspicious access rights (PROCESS_VM_READ). This detects the behavior regardless of the tool used, but requires careful tuning to exclude legitimate LSASS access from security products, Windows Defender, and system processes.
Behavioral analytics: Detect anomalous patterns — a process that has never previously accessed LSASS, a non-system process reading LSASS memory, or credential material appearing in network traffic following LSASS access. This requires baselining and more sophisticated analytics.

Testing: Use Atomic Red Team's T1003 test cases to generate true positive events. Execute each test in a controlled lab environment while monitoring your SIEM to confirm the detection fires correctly. Also verify that the detection does not fire during normal operations — run it against a week of production logs to assess the false positive rate before deploying to production.

Detection Quality Metrics

A detection engineering program without metrics is just a SIEM rule collection with better documentation. Metrics are what transform detection development from an art into an engineering practice. Track these metrics consistently and use them to drive decisions about where to invest effort.

>80% Target precision rate for production detections. If fewer than 80% of alerts represent true positives, the detection needs tuning or redesign.

Precision (true positives / total alerts fired): This measures alert quality. A detection with 20% precision means analysts waste time investigating 4 false positives for every real finding. Target a precision rate above 80% for production detections. If precision drops below this threshold, the detection should be tuned, moved to a lower-severity tier, or retired.
Recall (threats detected / threats that occurred): This measures completeness — are you catching real attacks? Recall is harder to measure because you need to know about threats that were not detected. Purple team exercises, red team operations, and breach-and-attack simulation (BAS) tools provide the controlled adversary activity needed to measure recall.
Mean Time to Detect (MTTD): The elapsed time from initial compromise to detection alert. Measure this per technique and per detection. An MTTD of 4 hours for lateral movement is very different from an MTTD of 30 days for data staging.
Mean Time to Respond (MTTR): The elapsed time from alert to containment. While not purely a detection metric, MTTR is heavily influenced by detection quality — a high-fidelity alert with a clear runbook enables faster response than an ambiguous alert that requires extensive investigation.
ATT&CK coverage percentage: The number of ATT&CK techniques covered by at least one detection, divided by the total number of techniques relevant to your threat model. This is not about achieving 100% — it is about understanding where your gaps are and making informed risk decisions about them.
Alert volume per analyst per day: A workload metric that directly impacts analyst effectiveness. Research suggests that SOC analysts can effectively investigate 20-30 meaningful alerts per day. If the per-analyst volume exceeds this, either the team is understaffed or the detection portfolio is generating too much noise.

The Detection Development Lifecycle

Mature detection engineering follows a structured lifecycle that ensures quality, consistency, and continuous improvement. Each detection progresses through defined stages from initial concept to production operation.

Hypothesis

Start with a threat hypothesis: "An attacker who has compromised a user workstation will attempt to escalate privileges by exploiting a misconfigured service." The hypothesis should be grounded in threat intelligence, incident history, or ATT&CK coverage gap analysis.

Research

Study the attack technique in depth. How does it work technically? What artifacts does it leave? What log sources capture those artifacts? What legitimate activity looks similar? Read threat intelligence reports, academic research, tool documentation, and existing detection content from the community.

Logic Design

Write the detection query. Start with high-fidelity, low-volume logic — it is always easier to broaden coverage than to narrow it. Document the data source requirements, expected false positive sources, and the MITRE ATT&CK mapping.

Implementation

Implement the detection in your SIEM or detection platform. Write the structured metadata file, the runbook, and the test cases. Submit a pull request for peer review.

Testing

Validate with unit tests (sample log events), replay tests (run against historical logs), and purple team tests (execute the actual technique in a controlled environment). Measure initial precision and false positive rate.

Deploy

Deploy to production through CI/CD. Start in "observation mode" (alert logged but not routed to analysts) for a burn-in period to assess real-world performance before promoting to active alerting.

Monitor & Tune

Continuously monitor detection performance metrics. Tune logic to reduce false positives, update for environment changes, and adjust severity based on observed patterns. Schedule periodic reviews — at minimum, every detection should be reviewed quarterly.

Maturity Model

Detection engineering programs mature through predictable stages. Understanding where your organization falls on this spectrum helps set realistic goals and identify the highest-impact improvements for your current state.

Level 1: Reactive

The organization relies on vendor-provided default detection content and creates ad hoc rules in response to specific incidents or audit findings. There is no detection inventory, no metrics, no testing process, and no formal ownership of detection rules. Rules accumulate over time and are rarely tuned or retired. Alert fatigue is endemic. When analysts ask "what can we detect?", no one can answer with confidence.

Level 2: Procedural

Detection rules are documented, and there is a process (even if informal) for creating and deploying new detections. An inventory of active detections exists. Basic metrics are tracked — alert volume, false positive rate for high-volume rules. Someone is responsible for detection content, even if it is not their full-time role. Rules are reviewed occasionally, though not on a regular schedule. The team can identify their highest-volume false positive sources and has begun addressing them.

Level 3: Proactive

Detection-as-code is the standard practice. All detections are stored in version control, reviewed by peers, and deployed through CI/CD. Purple team exercises regularly validate detection coverage and measure recall. ATT&CK coverage mapping informs detection development priorities. Detection quality metrics (precision, recall, MTTD) are tracked per detection and reviewed regularly. A detection development lifecycle is followed consistently. The team proactively builds detections based on threat intelligence and coverage gaps rather than waiting for incidents.

Level 4: Optimized

Detection testing is automated and continuous — new detections are automatically validated against simulated attack data before deployment. Threat intelligence is operationalized directly into the detection pipeline. Detection performance data feeds back into development priorities automatically. The program can quantitatively demonstrate its value — "we detect X% of techniques used by our top threat actors with a median MTTD of Y hours and a precision rate of Z%." Detection engineering is a recognized specialization with dedicated staffing and career development.

Tools and Frameworks

You do not need to build everything from scratch. The detection engineering community has produced a wealth of open-source tools and frameworks that accelerate program development.

SIGMA: A generic, open, and portable signature format for SIEM systems. SIGMA rules describe detection logic in a YAML format that can be automatically converted (using sigma-cli or pySigma) into queries for Splunk, Elastic, Microsoft Sentinel, QRadar, and dozens of other platforms. The SigmaHQ repository contains thousands of community-contributed detection rules mapped to MITRE ATT&CK. Starting with SIGMA rules and customizing them for your environment is significantly faster than building from scratch.
MITRE ATT&CK Navigator: A web-based tool for visualizing ATT&CK coverage. Overlay your detection coverage, your threat actor targeting, and your visibility gaps on the ATT&CK matrix to identify priorities. Export and share coverage maps with stakeholders.
Atomic Red Team: A library of small, portable tests mapped to MITRE ATT&CK techniques. Each "atomic test" executes a specific attack technique, generating the telemetry your detections should catch. Essential for testing detection logic and measuring recall.
Elastic Detection Rules and Splunk Security Content: Vendor-curated detection rule repositories that are openly available. Even if you do not use these platforms, the detection logic and documentation provide valuable reference material for building your own rules on other platforms.
Detection-as-code frameworks: Tools like detection-rules (Elastic), contentctl (Splunk), and custom frameworks built on SIGMA provide the scaffolding for managing detections as code — linting, validation, testing, and deployment automation.

"The goal of detection engineering is not to write more rules. It is to build a system that reliably converts threat intelligence into operational detection capability, and that can prove it works."

Building a detection engineering program is a multi-year journey, not a single project. Start where you are: if you are at Level 1, focus on inventorying your existing detections and tracking basic metrics. If you are at Level 2, begin adopting detection-as-code practices and conducting your first purple team exercises. Each step forward reduces alert fatigue, improves threat detection, and gives your security team the confidence that comes from knowing — not hoping — that their defenses work.