Are You Reading Your Pen Test Results Wrong?
Instead of reading pen test results as an evaluation of the application’s security, AppSec teams should use the results to evaluate how effective their efforts are in deploying training, tooling, governance, and processes. Mature AppSec teams aren’t asking “what” of pen test results, they’re asking “why” and “how” those findings made it that far in the first place.
The first contact developers have with application security is usually a web app or API penetration test, or at least it was when AppSec was getting started a decade or two ago. Things may have changed, but the pen test is still a rite of passage for many dev teams who wish to expose their applications to the wider world. For such an important milestone that has been achieved for years, many teams still read the results wrong.
It’s okay to use pen testing as a first round of defect discovery, but it tends to lose its usefulness after repeated uses. The find-and-fix loop created by retesting via pen tests will eventually shrink to nothing over time as the feature set and code base stabilizes and security defects are identified and eliminated. At this point, the cost per identified defect may be in the thousands of dollars when only one or two defects are reported per test.
There are cheaper ways to find vulnerabilities than pen testing. Design reviews and threat modeling can identify security problems way before there is even a single line of code to compile and run a dynamic scan against. Security policy can drive requirements for teams to pro-actively select libraries, design patterns, and security features that stop findings from manifesting in applications. When committing code to the repository, it can be checked for goodness by tools and checklists on the way in, and the whole code base can be scanned by a SAST tool when it’s pulled from the repository at build time. Automated abuse cases can exercise common attack patterns that a pen tester would exercise as part of their checklist and catch the risk before it moved to production.
For a security aware dev team that is doing that and more, pen testing should be the last line of defense, not the first and only source of defects. Yet this is often not how pen testing is sold and delivered. A third-party vendor with training, tools, and experience can test an application with very little demand on the dev team. It’s far easier to run a pen test than it is to integrate and tune a SAST tool into the pipeline, or decompose the application’s planned design into its component parts and think like a hacker. This low barrier to entry often leaves pen testing as the first, last, and only line of defense in some organizations.
How do you read a pen test “the correct way” then?
It starts by treating every defect as the tip of a whole iceberg. In my experience, security defects and vulnerabilities travel in packs. Where you find a SQL injection caused by a naive assumption that special characters can’t cause the SQL query to do untoward things to the data stored in the database, you’ll find cross-site scripting caused by that same naive assumption that special characters can’t cause a victim’s browser to do untoward things to their session cookies. When you find a vulnerability hidden behind an admin login panel, you’ll often find another vulnerability exposed to internal employees based on the same naive trust that “only trusted users have access”.
Treat each finding as a bigger problem by following the iceberg below the waterline. There’s a reason the iceberg shows up in so many security services and tooling sales slides, there’s not many other good visual metaphors for “Where you see one, there’s multitudes that are hiding” like the old Titanic sinker. When you begin moving from a single reported defect to realizing that there may be entire related families of vulnerabilities that live in your code, you’re reading the pen testing results right.
How do you act on this? Root cause analysis. Identify the gaps in your tooling, training, experience, visibility, and processes that let each pen test finding live in your code as it traversed the whole software development life cycle. A defect as critical as SQLi or XSS can be stopped in many places such as security requirements and acceptance criteria that mandates input validation, output encoding, and secure libraries. They can be found by just about every decent security code scanner out there, and most dynamic scanners will look for these issues as well. SQLi and XSS cheat sheets exist for use by QA testers who can look for these issues manually, and those same cheat sheets can be turned into automated security test cases that should flunk any update that comes bundled with XSS.
Other issues may need specific prevention such as server hardening guidelines to prevent misconfiguration issues and coding guidelines that mandate proper logging in case of a forensic investigation after a security incident. Design reviews and threat modeling may be needed to highlight business logic flaws that could lead to applications being vehicles for fraud by allowing nonsensical transactions, or they may identify missing or improperly applied controls like encryption that can prevent data loss in case of hardware theft.
All told, the right way to read a pen test report after you have a few security capabilities under your belt is to evaluate how effective those security capabilities. The first question when evaluating a pen test result shouldn’t be “How do we fix this?” but instead it should be “How do we catch these earlier?”
The answer to that question may highlight a gap in the tools that you think are scanning your code or doing unit testing for you, or they may highlight a gap in your governance that needs to be updated, socialized, and trained on.
You should still fix the instance reported in the report, too. Though if that’s where you stop, you’re reading your pen test results wrong.
There's always more to say on a topic like this one, but I've hit my budget for now. Stay secure and never forget the humans.