ISO 18587 Revision 2026: What the Expanded Standard Means for LLM-Powered Translation
ISO 18587 was written in 2017 for a world of statistical MT and early neural engines. It defined requirements for post-editing machine translation output — a narrow scope built around a specific kind of system producing a specific kind of error. Nine years later, the translation world looks nothing like that. LLMs produce output that's fluent, contextually aware, and sometimes indistinguishable from human work. The original standard simply doesn't cover this reality.
The 2026 revision addresses the gap directly. It expands scope to cover all forms of AI-generated translation, updates competence requirements for post-editors, and connects explicitly to AI governance frameworks like ISO 42001. For organizations using machine-generated translations — which is most of the industry at this point — these changes aren't optional reading.
What Is ISO 18587 and Why It Matters
ISO 18587:2017, titled "Translation services — Post-editing of machine translation output — Requirements," set the baseline for handling MT output. It defined:
- Requirements for post-editing processes: Steps needed to bring MT output to publishable quality
- Competence requirements for post-editors: Skills and knowledge post-editors must have
- Client-service provider agreements: What should be specified before post-editing work begins
The standard matters because it gives organizations a defensible framework for quality assurance with machine translation. In regulated industries — legal, medical, financial — pointing to ISO compliance is often a procurement requirement. Without it, you're just saying "trust us."
What's Changing in the 2026 Revision
Expanded Scope: Beyond Traditional MT
The biggest change: the scope expands from "machine translation output" to "AI-generated translation output." This explicitly covers:
- Translations from large language models (GPT, Claude, Gemini, etc.)
- Translations generated by AI agents in automated workflows
- Hybrid outputs where TM matches are combined with AI-generated completions
- Adaptive MT systems that learn from post-editor corrections in real time
The original standard assumed phrase-based and neural MT — systems with predictable error patterns. LLM output is different: it can be fluent but factually wrong, stylistically polished but terminologically inconsistent, contextually appropriate in one segment and wildly off-register in the next.
Updated Competence Requirements for Post-Editors
The 2017 standard required post-editors to have translation competence, linguistic competence in both languages, and "machine translation literacy." The revision expands that last piece significantly:
| Competence Area | ISO 18587:2017 | ISO 18587:2026 (Revised) |
|---|---|---|
| Translation competence | Required | Required (unchanged) |
| Linguistic competence | Source + target language | Source + target language (unchanged) |
| MT literacy | Understanding of MT output characteristics | Understanding of AI model capabilities, hallucination patterns, confidence calibration |
| AI output evaluation | Not specified | Ability to identify AI-specific errors: hallucinations, style drift, false fluency |
| Prompt awareness | Not applicable | Understanding of how prompts and context influence output quality |
| Tool competence | Basic MT and CAT tools | AI-assisted editing environments, TQA platforms, quality scoring interpretation |
The critical addition is "false fluency" detection — recognizing translations that read perfectly but contain subtle accuracy errors, omissions, or meaning shifts. This is the defining challenge of post-editing LLM output, and the revised standard calls it out explicitly.
AI-Specific Error Taxonomy
The revision adds a supplementary error taxonomy for AI-generated content:
- Hallucination: Content in the translation that has no basis in the source
- Source divergence: Translation reflecting a plausible but incorrect interpretation of the source
- False fluency: Target text that reads naturally but hides inaccuracies behind fluent language
- Context leakage: Information from the prompt, system instructions, or adjacent segments bleeding into the translation
- Register inconsistency: Shifts in formality or tone within a single document
These complement existing MQM error types rather than replacing them. Organizations are expected to track both.
Connection to ISO 42001 (AI Management Systems)
The revised standard introduces explicit references to ISO 42001:2023, creating a two-layer compliance framework:
- ISO 42001 governs how the organization manages AI systems — risk assessment, governance, transparency, monitoring
- ISO 18587 (revised) governs how AI-generated translation output is specifically handled, evaluated, and quality-assured
For organizations already pursuing ISO 42001 certification (increasingly common in 2026), the revised ISO 18587 provides domain-specific guidance for translation. The two standards are designed to work together.
How TQA Platforms Demonstrate Compliance
One of the most practical aspects of the revision: its emphasis on documented quality evaluation. Organizations must show they systematically evaluate AI-generated translations using structured frameworks. TQA platforms become compliance tools, not just quality tools.
What Auditors Will Look For
Based on the draft revision, auditors will expect:
- Documented evaluation methodology: How translation quality is measured — error categories, severity levels, scoring formulas
- Consistent application: Evidence that the same methodology is applied across projects, languages, and time periods
- AI-specific error tracking: Records showing hallucinations, false fluency, and similar AI-specific error types are tracked separately
- Threshold documentation: Written quality thresholds per content type, with justification
- Trend analysis: Historical data showing quality trends over time
How KTTC Supports Compliance
KTTC produces exactly the kind of structured evaluation data the revised standard demands:
- MQM-based scoring with full error taxonomy including AI-specific categories provides the documented evaluation methodology
- Project-level configuration of error categories and severity weights ensures consistent application
- API-accessible evaluation history lets auditors review quality records programmatically
- Threshold configuration and pass/fail gating creates the documented thresholds the standard requires
- Dashboard analytics with historical trends directly support the trend analysis requirement
The key advantage: auditability. Every evaluation is timestamped, attributed, and stored. When an auditor asks "how do you evaluate your AI-generated translations?", the answer is a platform with structured data — not a spreadsheet one person maintains.
Comparison: ISO 18587 Original vs. Revised
| Aspect | ISO 18587:2017 (Original) | ISO 18587:2026 (Revised) |
|---|---|---|
| Scope | Machine translation output | All AI-generated translation output (MT, LLM, hybrid) |
| MT systems covered | Statistical and neural MT | All AI systems including LLMs and agent-based workflows |
| Post-editor competence | MT literacy | AI literacy including hallucination detection, prompt awareness |
| Error taxonomy | Standard translation errors | Extended with AI-specific categories (hallucination, false fluency, context leakage) |
| Quality evaluation | Required but loosely specified | Structured evaluation with documented methodology mandatory |
| AI governance | Not addressed | Explicit link to ISO 42001 AI management framework |
| Automation level | Assumes human post-editing | Acknowledges automated QA with human oversight |
| Data requirements | Basic project documentation | Historical evaluation data, trend analysis, threshold documentation |
| Client communication | Inform client about MT use | Disclose AI system type, capabilities, known limitations |
| Continuous improvement | Recommended | Required with documented feedback mechanisms |
Practical Checklist for Organizations
Immediate Actions (Q1-Q2 2026)
- Audit current workflows: Document which AI systems generate translations in your pipeline (LLMs, MT engines, hybrid systems)
- Review post-editor competencies: Have your post-editors been trained on AI-specific error types? Can they spot hallucinations and false fluency?
- Implement structured QA: If you're not using MQM-based evaluation, start now. KTTC can serve as your evaluation platform
- Start tracking AI-specific errors: Add hallucination, false fluency, and context leakage to your error taxonomy today
Medium-Term Actions (Q3-Q4 2026)
- Establish quality baselines: Use KTTC to evaluate a representative sample of your AI-generated translations and document baseline scores per language pair and content type
- Define quality thresholds: Set pass/fail thresholds for each content type based on baseline data and business needs
- Create post-editor training: Develop materials specifically covering AI output — particularly the difference between MT errors and LLM errors
- Evaluate ISO 42001 alignment: If your organization uses AI broadly, consider whether ISO 42001 certification should happen in parallel
Ongoing Actions
- Monitor quality trends: Use KTTC's analytics to track quality over time and catch degradation early
- Update error taxonomy: As AI systems evolve, new error patterns will emerge; update your taxonomy to match
- Document everything: Every methodology change, threshold adjustment, and process update should be recorded
- Conduct periodic reviews: Quarterly review of post-editor competencies and methodology effectiveness
The Broader Standards Picture
ISO 18587 doesn't stand alone. Here's how it connects to related standards:
| Standard | Relationship to ISO 18587 (Revised) |
|---|---|
| ISO 17100 (Translation services) | Parent standard for human translation; ISO 18587 extends it for AI output |
| ISO 42001 (AI management) | Complementary governance framework; ISO 18587 references it for AI oversight |
| ISO 5060 (Translation quality) | Defines quality metrics; ISO 18587 requires their application to AI output |
| MQM (Multidimensional Quality Metrics) | Industry-standard error framework; recommended by ISO 18587 as evaluation methodology |
FAQ
Does the revised ISO 18587 apply to all LLM-generated translations?
Yes. The revised scope explicitly covers any translation output generated by an AI system, regardless of the underlying technology. This includes translations from general-purpose LLMs (GPT, Claude, Gemini), specialized translation models, AI agent pipelines, and hybrid systems combining TM with AI generation. If an AI system produced the translation and it'll be published or delivered to a client, the standard applies.
Do we need ISO 42001 certification to comply with the revised ISO 18587?
No. The revised ISO 18587 references ISO 42001 and recommends alignment, but doesn't require certification. That said, organizations handling AI-generated translations at scale will find ISO 42001's governance structures make ISO 18587 compliance much easier. Think of ISO 42001 as the organizational framework and ISO 18587 as the translation-specific implementation.
How does the revised standard handle fully automated translation without human post-editing?
The revision acknowledges that some workflows involve minimal or no human post-editing, especially for low-risk content. In those cases, the standard requires documented automated quality assessment, systematic monitoring of the AI system's quality, and clear escalation criteria for routing content to human review. Platforms like KTTC that provide automated scoring with configurable thresholds directly support this.
What is "false fluency" and why is it a specific concern with LLM output?
False fluency happens when a translation reads perfectly but contains hidden inaccuracies — omitted information, altered meaning, or fabricated details masked by natural-sounding prose. Traditional MT produced obviously awkward output, making errors easy to spot. LLMs produce fluent text by default, so accuracy errors can slip past post-editors who use fluency as their quality signal. The revised standard requires training in false fluency detection and recommends structured evaluation methods like MQM scoring (through platforms such as KTTC) that assess accuracy independently of fluency. This is, in my view, the single most important change in the revision.
