Since founding North America's first RPA consulting practice, document processing has been a constant thorn in my side. Regardless of the industry, priorities, or systems in use, automation has always depended on the ability to read and understand documents.
I built teams with deep expertise across RPA, OCR, visual tooling, NLP, and machine learning. I was fortunate to work alongside highly capable consultants and clients who genuinely believed digitization and automation could radically improve how their businesses served customers.
To drive meaningful results, we learned how to maximize the reusability of our automation work. That allowed us to automate large process areas spanning dozens of related workflows, often touching ten, twelve, or more systems at once.
But document processing was different.
Like most teams, we approached inputs through a triage lens. We assessed forms using a Pareto-style analysis: how many existed of each type, how standardized they were, how complex they were, and what quality the input arrived in.
This worked well early on. We matched tools to complexity—simple OCR for clean inputs, heavier platforms and machine learning for more variable forms—while focusing primarily on the most frequently used documents.
We believed, as many still do, that the 80/20 rule applied here as well. That by addressing the small set of document types responsible for most of the volume, we could capture most of the value. In some industries, that assumption held. In many others, it did not.
As we adopted more advanced Intelligent Document Processing platforms, we could finally start addressing more of the long tail of document variation. These systems improved our ability to handle multi-page, variable-length, and less predictable forms.
Yet the same underlying reality remained.
Critical work continued to flow through documents that were never as clean, consistent, or predictable as our process designs assumed. We often needed dozens—sometimes hundreds—of samples before we could confidently achieve straight-through processing.
Early on, I believed what many teams believe: that if we standardized inputs, added enough rules, or trained models on enough examples, STP would steadily climb.
And for a while, it usually did.
And then it stopped.
No matter how much effort went into refining templates, tightening validation rules, or retraining models, STP would plateau. Exception queues would shrink temporarily, then slowly grow back. Manual review crept in around the edges. New document types, vendor changes, regulatory updates, or subtle formatting drift reintroduced work the system could not confidently handle.
What stood out to me was how expensive each incremental improvement became. The automation wasn't failing—but every additional point of STP required more people, more training, more rework, and more ongoing maintenance. While human-in-the-loop validation was still more efficient than manual data entry, high STP remained critical for both digitization and downstream automation to deliver real value.
Over time, it became clear that the ceiling wasn't technical in the way we often framed it. It wasn't a lack of skill or vendor capability. It was a flawed assumption embedded in how automation was designed—that documents would eventually behave.
They don't.
Documents carry ambiguity by nature. They encode meaning through structure, language, layout, and context—and that meaning shifts as business conditions change. The long tail of "almost the same, but not quite" cases is where throughput slows and humans get pulled back in.
That long tail sets the real limit on STP.
Once you see this pattern enough times, you stop asking, "How do we automate this document?" and start asking something more important: what about this document makes it hard for a system to decide without a person stepping in?
When teams answer that honestly, the conversation changes. Automation stops being about squeezing more performance out of brittle rules or narrowly trained models and starts being about how systems interpret uncertainty, context, and variation.
That shift doesn't eliminate complexity overnight. But it does explain why so many automation programs stall—and why the next gains don't come from doing more of the same.
That realization was a primary reason Syncura exists. We wanted an approach to document understanding that didn't depend on templates behaving or models seeing hundreds of examples before becoming useful.
If you've been investing in document automation for years and wondering why STP refuses to move past a certain point, you're not alone.
In my experience, that ceiling isn't accidental. It's structural.
If you have a long-tail challenge, please reach out to us.