Bridging Detection Engineering, SOPs, and Automation

Framework for Cyber Threat Detection and Security Automation

Building upon the foundational insights introduced in our exploration of the Security Automation Development Lifecycle (SADLC) (read here), this article ventures further into the critical interplay between Detection Engineering and Automation/Hyperautomation. The fusion of these components not only optimises operational efficiency but also plays a pivotal role in fortifying our defenses against the evolving landscape of cyber threats. By combining Detection Engineering with Automation Use-cases, we aim to significantly reduce the risk of failures in security automation programs, ensuring a robust, streamlined, and proactive security posture.

Taking a step back to better understand where our process fits, I will reffer to an article that Sohan G wrote [link]

There are mainly three automation levers placed between the first three stages of the Blue Team Funnel, which if deployed and maintained can help reduce the time across the funnel from left to right. This also solves the alert fatigue(SOC), one of the most discussed concerns which clogs the funnel. The more time spent in the early stages of the funnel helps fasten the response(detection, mitigation and remediation) to the risks from adversaries, insiders and other factors alike.

1st automation lever — This is between the Collection and Detection stages, this involves the automations that can be performed in the log management lifecycle. For example — automating the ingestion of new data sources by following standard correlation/normalisation of fields across various sources and possible log enrichments.

2nd automation lever — This is between Detection and Triage stages, this involves the automations that can be performed within detection-as-code CI/CD pipeline — rule sanitary checks for the fields, metadata tagging, testings, simulations and emulations. This also involves automated triage of alerts as part of response workflow. For example — end-user confirmations, process tree correlations, software installation checks, code signing checks, reputation checks, severity increments, etc.

3rd automation lever — This is between Triage and Investigation stages, this involves the automations that can be performed as part of artifact/supporting-information collection with respect to DFIR workflow. For example — network connections, running processes, host and owner details, websites visited, transfer of files(SaaS apps, external devices), UEBA scores, etc.

In this article we deep dive into the 3rd Automation lever.

Detection Engineering , IR and Automation

Detection Engineering, the craft of creating tailored detection rules, stands as the bedrock of robust security postures. These rules, meticulously deployed across Security Information and Event Management (SIEM) systems and directly within security solutions such as Endpoint Detection and Response (EDR), Firewalls, and Proxies, are indispensable for surpassing the constraints of default configurations. Far from being mere enhancements, custom detections are vital, SOC with the preemptive power to neutralize advanced threats. This process, inherently dynamic, either stems from a specialized Detection Engineering team or integrates within broader SOC or security engineering functions, forming the foundational layer of all security alerts.

Following the establishment of detection rules, the formulation of Operating Procedures outlines the actionable steps for analysts and responders. Ranging from the initial triage of alerts to the intricate orchestration of incident response, these procedures are crucial for effective threat management.

The pinnacle of innovation, however, is achieved through the integration of automation/hyperautomation into these established frameworks. By embedding automation at the initial stages of both detection rule development and the crafting of operating procedures, organizations can forge a seamless link between detection and automated responses. This strategy encompasses everything from straightforward data enrichment to the holistic management of incident resolution, ensuring a direct correspondence that enhances both efficiency and effectiveness.

To evaluate the efficacy of this integrated process, employing threat or attack simulation emerges as the optimal strategy. Such simulations expediently ascertain the reliability of detection rules, the comprehensiveness of operating procedures, and the precision of automated responses.

Regarding metrics, our recommendation is to collect and analyze data at each juncture of the process, thereby facilitating a clear demonstration of ROI for each implemented program:

  • Detection Engineering: Metrics such as the true positive and false positive rates illuminate the effectiveness of detection rules. Additionally, the alignment of these rules with frameworks like MITRE ATT&CK or their success in countering specific threat profiles targeting your organization can be quantitatively assessed.

  • Operating Procedures: The efficiency of these procedures can be measured, offering insights into the balance between manual and automated processes within your security operations.

  • Automation and Orchestration: This aspect allows for the straightforward demonstration of how many alerts are processed automatically, the speed of alert processing and response, and the net impact on analyst workload—either through time saved or additional time required for oversight.

  • Threat Simulation: Effectiveness metrics can reveal how many attacks were detected initially, the number of iterations required before deeming a threat simulation successful, and the overall robustness of your security tools against simulated adversaries.

By adopting a metrics-driven approach, organisations can not only justify their investments in each component of their cybersecurity framework but also continuously refine their strategies for maximum impact.

Incorporating these metrics does more than merely illuminate the efficacy of individual components within the cybersecurity framework; it catalyses the formation of a feedback loop that is instrumental for continuous improvement. By systematically analysing these metrics, organisations can gain invaluable insights into the performance of their detection engineering efforts, the efficiency of their operating procedures, and the effectiveness of their automation and orchestration initiatives. This data-driven feedback loop allows for the iterative refinement of processes, ensuring that each element of the cybersecurity strategy not only meets the current threat landscape but also adapts proactively to future challenges.

This cyclical process of evaluation and enhancement—grounded in the rigorous application of metrics—ensures that the cybersecurity framework remains dynamic, resilient, and aligned with evolving organisational needs and external threats. Ultimately, it transforms the cybersecurity operations from a static defense mechanism into a continuously evolving ecosystem, characterised by its ability to learn from past encounters and anticipate future vulnerabilities.

SADLC Process

Let's talk about how we can bring together Detection Engineering, SecOps Procedures, and all that smart Automation and Orchestration stuff. We're going to walk through this in four main steps: getting ready, building things, checking them out, and then putting them to work. This way, we make sure everything in our cybersecurity playbook works well together, from start to finish.

1. Preparation

  • Detection Use-Case Analysis: Begin with a comprehensive analysis of potential detection scenarios, incorporating real-time threat intelligence to inform detection rule creation. Evaluate where detections will be deployed (e.g., SIEM, EDR) and ensure data sufficiency and relevance. This step is critical for identifying how to cover specific malware, techniques, and threat actors. One can use frameworks like MITRE ATT&CK to understand more the context and prioritise accordingly.

  • Operating Procedure Backlog Creation: Simultaneously, plan for Operating Procedures that will be triggered by detection alerts. Assess if new detections align with existing procedures or necessitate new SOPs for emerging threats.

  • Automation Feasibility Assessment: Evaluate existing automation for potential reuse and check integration capabilities with necessary tools. This ensures readiness for SOP and automation development post-detection use-case analysis. If possible, run preliminary atomic testing to confirm your initial assumptions. Just because something seems like it could be reusable, that doesn’t mean it necessarily applies to the new scenario. If something doesn’t work, note what it would need, so you can set it as a requirement in the Development phase.

2. Development

  • Detection Use-Case Development: Craft detection rules based on the preparatory analysis, ensuring they're scalable to handle future growth and complexity. Engage cross-functional teams for a comprehensive approach that takes into consideration the business and compliance impacts.

  • SOP Development: Start drafting SOPs aligned with detection rules, emphasizing collaboration across departments to ensure operational consistency and compliance. This phase should account for scalability and flexibility to adapt to evolving organizational needs.

  • Automation Scripting: Develop automation workflows that enhance and streamline SOP execution. Include orchestration strategies to ensure seamless integration and communication among various security tools, enhancing the coordinated response capability. Going back to the previous point, include the workflows that you want to reuse. Because you’re here and hopefully you’ve done a good job in the past to make things reusable, keep that in mind as you’re creating new workflows/actions/steps. As you’ve seen in the Automation Feasibility Assessment step, they might come in handy next time around.

3.Testing

  • Detection Use-Case Validation: Test detection rules against both simulated and real-world attack scenarios, using feedback to refine accuracy and reduce false positives. Engage in purple teaming to ensure comprehensive validation.

  • SOP Effectiveness Evaluation: Rigorously test SOPs to ensure comprehensive coverage and actionability. Use feedback from this testing phase to refine procedures, ensuring they are effective and efficient in managing alerts.

  • Automation Workflow Testing: Validate automation scripts, ensuring they process inputs and outputs accurately and integrate smoothly with detection and response processes. Test for scalability and flexibility to adapt to changing threat landscapes and organizational growth.

4.Production

  • Operationalising Detection Use-Cases: Roll out the validated detection rules into the live environment to commence alert generation. It’s crucial to continuously monitor these rules against established Key Performance Indicators (KPIs), such as detection latency and false positive rates, to ensure they operate at peak efficiency.

  • Automation Deployment: Launch the automation workflows, which are meticulously designed to synergize with the detection rules, enhancing the overall responsiveness of the cybersecurity framework. It's important to acknowledge that while individual workflows may not test the limits of API capabilities, the cumulative effect of multiple, less optimized flows could. Vigilant monitoring and refinement of these workflows are essential, using KPIs such as response times and the efficacy of incident resolution to gauge their performance. Additionally, remain alert to potential errors that could arise from unforeseen scenarios, such as changes in data formatting by third-party services, ensuring the system's resilience against external modifications.

  • Finalising SOPs: Conclude the development phase of the Standard Operating Procedures (SOPs), implementing them within the operational context. These SOPs should be designed with flexibility in mind, allowing for adjustments informed by ongoing feedback and the insights gained from the continuous improvement process. Metrics should be established to evaluate the effectiveness of these SOPs, focusing on adherence to the procedures and the efficiency of incident resolution times. This approach ensures that the SOPs not only guide immediate response actions but also evolve in alignment with the dynamic landscape of cybersecurity threats and organisational practices.

Continuous Improvement

After deployment, establish a continuous feedback loop to gather data on the effectiveness of each component. Regularly review performance metrics and KPIs to inform iterative improvements across detection rules, SOPs, and automation workflows.

Engage cross-functional teams in this process to ensure alignment with evolving business needs and compliance requirements, thereby fostering a culture of continuous adaptation and improvement in cybersecurity operations.

Create processes that encourage the continuous improvement and announce the changes made as to provide visibility (sometimes you might be the third party making unannounced changes to a group you’ve missed) and to highlight wins.

Benefits

The proposed unified process not only bridges gaps between detection and response but also provides quantifiable metrics for evaluating the performance and ROI of detection use-cases and automation. By minimising manual intervention, organisations can precisely measure the efficiency of their security operations, reduce the incidence of false positives, and ensure that SOC procedures and automations are complementary rather than redundant.

Ultimately, this framework is about maximising the strategic value of every aspect of SOC operations, from alert generation to incident resolution.

Reply

or to participate.