# SOC Triage Methodology

### 1. Purpose

The purpose of triage is to rapidly determine **severity**, **business impact**, and **required action** for security alerts while maintaining **consistency**, **audit defensibility**, and **analyst sustainability**.

In a security function, triage must:

* Separate real threats from noise efficiently
* Enable confident decision-making without peer validation
* Minimize unnecessary investigation
* Prevent alert fatigue and burnout
* Produce repeatable, defensible documentation

***

### 2. Operating Reality & Constraints

#### Assumptions

* The analyst owns:
  * Initial triage
  * Deeper investigation
  * Escalation and containment decisions
  * Documentation and audit readiness

#### Guiding Principle

> Sustainability is a security control.\
> Triage decisions must be **efficient, evidence-based, and repeatable**.

***

### 3. Analyst Mindset&#x20;

#### Efficiency & Decision Discipline

* Efficient, not rushed
* Decision-oriented
* Calm under pressure

#### Analytical & Context-Aware

* Curious, not complacent
* Context-driven
* Pattern-oriented
* Skeptical, not paranoid

#### Clear & Consistent Communication

* Consistent documentation
* Clear justification for decisions
* Willingness to pause and gather evidence when uncertain

***

### 4. Anatomy of an Alert (Minimum Required Fields)

Before taking action, capture the following **minimum fields**.

| Field                 | Purpose                                  |
| --------------------- | ---------------------------------------- |
| Alert / Rule Name     | Understand detection category            |
| Detection Logic       | Assess fidelity and intent               |
| Timestamp             | Anchor timeline                          |
| Username              | Identify privilege and baseline behavior |
| Hostname              | Determine asset criticality              |
| Process Name / ID     | Execution context                        |
| Command Line          | Primary indicator of intent              |
| File Path             | Validate legitimacy and LOLBIN usage     |
| Hash (if available)   | Threat intelligence correlation          |
| Domain                | Potential C2 or exfiltration             |
| Source IP / Port      | Direction and origin                     |
| Destination IP / Port | Data flow and exposure                   |

***

### 5. Standard Triage Questions (Ask Every Time)

1. Is this normal for the user or host?
2. Does the command, domain, or behavior look suspicious?
3. Have I seen this pattern before?
4. Do I have enough context to justify action?
5. What logs or tools can quickly validate this?

***

### 6. End-to-End Triage Workflow

#### Step 1: Review the Alert

**Objective:** Identify what triggered the alert and extract high-signal indicators.

Actions:

* Read the alert carefully (do not skim)
* Capture severity, rule name, detection logic, and trigger artifact
* Identify the “trigger nucleus” (command line, domain, file, or process)
* Run the Standard Triage Questions

**Stop-Loss Rule**

If core fields (user, host, process, command line) are missing:

* Do not deep dive
* Document insufficient telemetry
* Create a tuning or telemetry improvement action item

***

#### Step 2: Gather Context

**Objective:** Add only the context required to make a defensible decision.

Context Sources:

* User role and historical behavior
* Asset role and criticality
* Threat intelligence (only if it changes a decision)
* Relevant logs and prior alerts
* Internal documentation or previous cases

**Efficiency Guardrails**

* Focus on key fields
* Avoid excessive tool switching
* Correlate once, not repeatedly

***

#### Step 3: Make a Decision

Decision options:

* **Escalate / Contain**
* **Investigate Further**
* **Close**

Default behavior:

* If uncertain, choose **Investigate Further** or review with Peer Analysts
* Apply a **time-box** to prevent over-investigation

***

### 7. Escalation vs Closure Criteria

#### Escalate / Contain When Any Apply

* Confirmed malicious behavior
* Known attack patterns
* Abnormal behavior for user or host
* Connections to suspicious or malicious IPs/domains
* Unauthorized tooling
* Activity on critical systems
* Persistence or backdoor indicators
* Containment is required

#### Close When All Apply

* Expected business behavior
* Known benign pattern
* Covered by existing allowlist or tuning
* Triggered by internal tools (scanner, admin tooling)
* No indicators of compromise
* Test or lab environment
* Duplicate alert already handled

***

### 8. Real Threat vs Noise

#### High-Signal Threat Indicators

* Abnormal user or host behavior
* Known hacking tools
* Multi-stage activity (execution → lateral movement → exfiltration)

#### Common Noise Patterns

* Scheduled administrative scripts
* RMM or IT tooling
* Vendor telemetry
* Vulnerability scanning

***

### 9. When You’re Unsure&#x20;

#### Mandatory Self-Validation Checklist

Before escalating or closing:

* Is this abnormal?
* Does this pose business risk?
* Can I clearly explain why this is suspicious or benign?
* Is my decision evidence-based?
* What do my peer analysts think about it?

#### Evidence Thresholds

Replace peer validation with:

* User and host baselining
* Historical alert review
* Correlation checks
* Time-boxed investigation
* SOC Manager's verification

#### Control

Do not escalate out of fear.\
Escalation must be **evidence-based**, not emotional.

***

### 10. Documentation Standard (Required)

#### Required Structure&#x20;

**Summary**

* What triggered the alert
* Asset and user involved
* Why it matters

**Actions Taken**

* Logs reviewed
* Tools used
* Enrichment sources
* Evidence references (links/screenshots)

**Findings**

* Key observations
* Suspicious or benign indicators
* Relevant artifacts

**Decision**

* Escalated / Closed / Investigate Further
* Clear justification
* If deferred: next steps and time-box

#### Documentation Rules

* Be specific
* Be concise
* Be consistent
* Include references

***

### 11. Common Failure Modes & Controls

| Failure Mode                       | Risk                       | Control                    |
| ---------------------------------- | -------------------------- | -------------------------- |
| Escalating without evidence        | Wasted time, loss of trust | Require evidence threshold |
| Over-investigating low-risk alerts | Burnout                    | Time-box investigations    |
| Weak documentation                 | Audit gaps                 | Enforce template           |
| Skipping context                   | False positives            | Mandatory baseline checks  |
| Treating all alerts equally        | Misallocated effort        | Prioritize critical assets |

***

### 12. Managing Alert Fatigue

#### Symptoms

* Skimming alerts
* Escalating “to be safe”
* Closing too quickly
* Burnout

#### Root Causes

* Poorly tuned detections
* Repetitive low-fidelity alerts
* Growing workload without automation
* Pressure to act on every alert

#### Mitigations

* Leverage prior documentation
* Use enrichment selectively
* Maintain a tuning backlog
* Take intentional breaks
* Trust the process

***

### 13. One-Pass Triage Checklist

* [ ] &#x20;Capture minimum alert fields
* [ ] &#x20;Answer standard triage questions
* [ ] &#x20;Baseline user and host
* [ ] &#x20;Review historical alerts
* [ ] &#x20;Enrich trigger nucleus only
* [ ] &#x20;Decide: Escalate / Investigate / Close
* [ ] &#x20;Document using standard template
* [ ] &#x20;Create tuning or telemetry action if needed


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://khoa-tran.gitbook.io/khoa-tran-docs/alert-triage-methodology/soc-triage-methodology.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
