When Your Detection Fails… But It’s Actually Correct-Building a Fault-Tolerant SOC API
- Maryam Ziaee
- 2 days ago
- 1 min read
When Your Detection Fails… But It’s Actually Correct
I built a brute-force detection system based on Windows Event ID 4625 (failed logins).
Everything looked correct:✔ Query worked✔ Elasticsearch returned data✔ API responded
But…
📉 My “Top Attacker IPs” endpoint returned:[]
At first, I thought my code was broken.
It wasn’t.
The real issue :All logs had :"IpAddress": "-"
👉 Meaning:
Local login attempts
No external attacker
No real IP to aggregate
Lesson :Security detection is not just about queries — it's about understanding the semantics of logs.
✔ 4625 = failed login✔ IpAddress "-" = local activity✔ No IP = no attacker
Fix:
Generate real remote login attempts
Or adjust detection logic for local anomalies
Key takeaway: Your system is only as smart as the data it sees.
🔥 Building a Fault-Tolerant SOC API
One of the biggest mistakes I made early:
Returning HTTP 500 whenever Elasticsearch failed.
Bad idea.
In a real SOC system:👉 Data sources WILL fail👉 Logs may be delayed👉 Queries may break
So I redesigned my API to be fault-tolerant:
Instead of:❌ res.status(500).send(err.message)
I now use:✅ res.json([])
Why this matters:
Frontend never crashes
Dashboard stays responsive
System degrades gracefully
Other improvements:
Optional chaining (?.) for ES responses
Safe fallbacks for aggregations
Isolated routes (no global async calls)
Lesson: Production systems must survive failures — not expose them.





Comments