top of page
Search

When Your Detection Fails… But It’s Actually Correct-Building a Fault-Tolerant SOC API

When Your Detection Fails… But It’s Actually Correct

I built a brute-force detection system based on Windows Event ID 4625 (failed logins).

Everything looked correct:✔ Query worked✔ Elasticsearch returned data✔ API responded

But…

📉 My “Top Attacker IPs” endpoint returned:[]

At first, I thought my code was broken.

It wasn’t.

The real issue :All logs had :"IpAddress": "-"

👉 Meaning:

  • Local login attempts

  • No external attacker

  • No real IP to aggregate


Lesson :Security detection is not just about queries — it's about understanding the semantics of logs.

✔ 4625 = failed login✔ IpAddress "-" = local activity✔ No IP = no attacker

Fix:

  • Generate real remote login attempts

  • Or adjust detection logic for local anomalies

Key takeaway: Your system is only as smart as the data it sees.


🔥 Building a Fault-Tolerant SOC API

One of the biggest mistakes I made early:

Returning HTTP 500 whenever Elasticsearch failed.

Bad idea.

In a real SOC system:👉 Data sources WILL fail👉 Logs may be delayed👉 Queries may break

So I redesigned my API to be fault-tolerant:

Instead of:❌ res.status(500).send(err.message)

I now use:✅ res.json([])


Why this matters:

  • Frontend never crashes

  • Dashboard stays responsive

  • System degrades gracefully


Other improvements:

  • Optional chaining (?.) for ES responses

  • Safe fallbacks for aggregations

  • Isolated routes (no global async calls)

Lesson: Production systems must survive failures — not expose them.



 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page