When Your Detection Fails… But It’s Actually Correct-Building a Fault-Tolerant SOC API

When Your Detection Fails… But It’s Actually Correct

I built a brute-force detection system based on Windows Event ID 4625 (failed logins).

Everything looked correct:✔ Query worked✔ Elasticsearch returned data✔ API responded

But…

📉 My “Top Attacker IPs” endpoint returned:[]

At first, I thought my code was broken.

It wasn’t.

The real issue :All logs had :"IpAddress": "-"

👉 Meaning:

Lesson :Security detection is not just about queries — it's about understanding the semantics of logs.

✔ 4625 = failed login✔ IpAddress "-" = local activity✔ No IP = no attacker

Fix:

Key takeaway: Your system is only as smart as the data it sees.

🔥 Building a Fault-Tolerant SOC API

One of the biggest mistakes I made early:

Returning HTTP 500 whenever Elasticsearch failed.

Bad idea.

In a real SOC system:👉 Data sources WILL fail👉 Logs may be delayed👉 Queries may break

So I redesigned my API to be fault-tolerant:

Instead of:❌ res.status(500).send(err.message)

I now use:✅ res.json([])

Why this matters:

Other improvements:

Lesson: Production systems must survive failures — not expose them.

Recent Posts