Skip to main content

ISO 18587 Revision 2026: What the Expanded Standard Means for LLM-Powered Translation

alex-chen3/16/202610 min read
iso-18587mtpepost-editingtranslation-standardsiso-17100compliance

ISO 18587 was written in 2017 for a world of statistical MT and early neural engines. It defined requirements for post-editing machine translation output — a narrow scope built around a specific kind of system producing a specific kind of error. Nine years later, the translation world looks nothing like that. LLMs produce output that's fluent, contextually aware, and sometimes indistinguishable from human work. The original standard simply doesn't cover this reality.

The 2026 revision addresses the gap directly. It expands scope to cover all forms of AI-generated translation, updates competence requirements for post-editors, and connects explicitly to AI governance frameworks like ISO 42001. For organizations using machine-generated translations — which is most of the industry at this point — these changes aren't optional reading.

What Is ISO 18587 and Why It Matters

ISO 18587:2017, titled "Translation services — Post-editing of machine translation output — Requirements," set the baseline for handling MT output. It defined:

  • Requirements for post-editing processes: Steps needed to bring MT output to publishable quality
  • Competence requirements for post-editors: Skills and knowledge post-editors must have
  • Client-service provider agreements: What should be specified before post-editing work begins

The standard matters because it gives organizations a defensible framework for quality assurance with machine translation. In regulated industries — legal, medical, financial — pointing to ISO compliance is often a procurement requirement. Without it, you're just saying "trust us."

What's Changing in the 2026 Revision

Expanded Scope: Beyond Traditional MT

The biggest change: the scope expands from "machine translation output" to "AI-generated translation output." This explicitly covers:

  • Translations from large language models (GPT, Claude, Gemini, etc.)
  • Translations generated by AI agents in automated workflows
  • Hybrid outputs where TM matches are combined with AI-generated completions
  • Adaptive MT systems that learn from post-editor corrections in real time

The original standard assumed phrase-based and neural MT — systems with predictable error patterns. LLM output is different: it can be fluent but factually wrong, stylistically polished but terminologically inconsistent, contextually appropriate in one segment and wildly off-register in the next.

Updated Competence Requirements for Post-Editors

The 2017 standard required post-editors to have translation competence, linguistic competence in both languages, and "machine translation literacy." The revision expands that last piece significantly:

Competence AreaISO 18587:2017ISO 18587:2026 (Revised)
Translation competenceRequiredRequired (unchanged)
Linguistic competenceSource + target languageSource + target language (unchanged)
MT literacyUnderstanding of MT output characteristicsUnderstanding of AI model capabilities, hallucination patterns, confidence calibration
AI output evaluationNot specifiedAbility to identify AI-specific errors: hallucinations, style drift, false fluency
Prompt awarenessNot applicableUnderstanding of how prompts and context influence output quality
Tool competenceBasic MT and CAT toolsAI-assisted editing environments, TQA platforms, quality scoring interpretation

The critical addition is "false fluency" detection — recognizing translations that read perfectly but contain subtle accuracy errors, omissions, or meaning shifts. This is the defining challenge of post-editing LLM output, and the revised standard calls it out explicitly.

AI-Specific Error Taxonomy

The revision adds a supplementary error taxonomy for AI-generated content:

  • Hallucination: Content in the translation that has no basis in the source
  • Source divergence: Translation reflecting a plausible but incorrect interpretation of the source
  • False fluency: Target text that reads naturally but hides inaccuracies behind fluent language
  • Context leakage: Information from the prompt, system instructions, or adjacent segments bleeding into the translation
  • Register inconsistency: Shifts in formality or tone within a single document

These complement existing MQM error types rather than replacing them. Organizations are expected to track both.

Connection to ISO 42001 (AI Management Systems)

The revised standard introduces explicit references to ISO 42001:2023, creating a two-layer compliance framework:

  1. ISO 42001 governs how the organization manages AI systems — risk assessment, governance, transparency, monitoring
  2. ISO 18587 (revised) governs how AI-generated translation output is specifically handled, evaluated, and quality-assured

For organizations already pursuing ISO 42001 certification (increasingly common in 2026), the revised ISO 18587 provides domain-specific guidance for translation. The two standards are designed to work together.

How TQA Platforms Demonstrate Compliance

One of the most practical aspects of the revision: its emphasis on documented quality evaluation. Organizations must show they systematically evaluate AI-generated translations using structured frameworks. TQA platforms become compliance tools, not just quality tools.

What Auditors Will Look For

Based on the draft revision, auditors will expect:

  1. Documented evaluation methodology: How translation quality is measured — error categories, severity levels, scoring formulas
  2. Consistent application: Evidence that the same methodology is applied across projects, languages, and time periods
  3. AI-specific error tracking: Records showing hallucinations, false fluency, and similar AI-specific error types are tracked separately
  4. Threshold documentation: Written quality thresholds per content type, with justification
  5. Trend analysis: Historical data showing quality trends over time

How KTTC Supports Compliance

KTTC produces exactly the kind of structured evaluation data the revised standard demands:

  • MQM-based scoring with full error taxonomy including AI-specific categories provides the documented evaluation methodology
  • Project-level configuration of error categories and severity weights ensures consistent application
  • API-accessible evaluation history lets auditors review quality records programmatically
  • Threshold configuration and pass/fail gating creates the documented thresholds the standard requires
  • Dashboard analytics with historical trends directly support the trend analysis requirement

The key advantage: auditability. Every evaluation is timestamped, attributed, and stored. When an auditor asks "how do you evaluate your AI-generated translations?", the answer is a platform with structured data — not a spreadsheet one person maintains.

Comparison: ISO 18587 Original vs. Revised

AspectISO 18587:2017 (Original)ISO 18587:2026 (Revised)
ScopeMachine translation outputAll AI-generated translation output (MT, LLM, hybrid)
MT systems coveredStatistical and neural MTAll AI systems including LLMs and agent-based workflows
Post-editor competenceMT literacyAI literacy including hallucination detection, prompt awareness
Error taxonomyStandard translation errorsExtended with AI-specific categories (hallucination, false fluency, context leakage)
Quality evaluationRequired but loosely specifiedStructured evaluation with documented methodology mandatory
AI governanceNot addressedExplicit link to ISO 42001 AI management framework
Automation levelAssumes human post-editingAcknowledges automated QA with human oversight
Data requirementsBasic project documentationHistorical evaluation data, trend analysis, threshold documentation
Client communicationInform client about MT useDisclose AI system type, capabilities, known limitations
Continuous improvementRecommendedRequired with documented feedback mechanisms

Practical Checklist for Organizations

Immediate Actions (Q1-Q2 2026)

  • Audit current workflows: Document which AI systems generate translations in your pipeline (LLMs, MT engines, hybrid systems)
  • Review post-editor competencies: Have your post-editors been trained on AI-specific error types? Can they spot hallucinations and false fluency?
  • Implement structured QA: If you're not using MQM-based evaluation, start now. KTTC can serve as your evaluation platform
  • Start tracking AI-specific errors: Add hallucination, false fluency, and context leakage to your error taxonomy today

Medium-Term Actions (Q3-Q4 2026)

  • Establish quality baselines: Use KTTC to evaluate a representative sample of your AI-generated translations and document baseline scores per language pair and content type
  • Define quality thresholds: Set pass/fail thresholds for each content type based on baseline data and business needs
  • Create post-editor training: Develop materials specifically covering AI output — particularly the difference between MT errors and LLM errors
  • Evaluate ISO 42001 alignment: If your organization uses AI broadly, consider whether ISO 42001 certification should happen in parallel

Ongoing Actions

  • Monitor quality trends: Use KTTC's analytics to track quality over time and catch degradation early
  • Update error taxonomy: As AI systems evolve, new error patterns will emerge; update your taxonomy to match
  • Document everything: Every methodology change, threshold adjustment, and process update should be recorded
  • Conduct periodic reviews: Quarterly review of post-editor competencies and methodology effectiveness

The Broader Standards Picture

ISO 18587 doesn't stand alone. Here's how it connects to related standards:

StandardRelationship to ISO 18587 (Revised)
ISO 17100 (Translation services)Parent standard for human translation; ISO 18587 extends it for AI output
ISO 42001 (AI management)Complementary governance framework; ISO 18587 references it for AI oversight
ISO 5060 (Translation quality)Defines quality metrics; ISO 18587 requires their application to AI output
MQM (Multidimensional Quality Metrics)Industry-standard error framework; recommended by ISO 18587 as evaluation methodology

FAQ

Does the revised ISO 18587 apply to all LLM-generated translations?

Yes. The revised scope explicitly covers any translation output generated by an AI system, regardless of the underlying technology. This includes translations from general-purpose LLMs (GPT, Claude, Gemini), specialized translation models, AI agent pipelines, and hybrid systems combining TM with AI generation. If an AI system produced the translation and it'll be published or delivered to a client, the standard applies.

Do we need ISO 42001 certification to comply with the revised ISO 18587?

No. The revised ISO 18587 references ISO 42001 and recommends alignment, but doesn't require certification. That said, organizations handling AI-generated translations at scale will find ISO 42001's governance structures make ISO 18587 compliance much easier. Think of ISO 42001 as the organizational framework and ISO 18587 as the translation-specific implementation.

How does the revised standard handle fully automated translation without human post-editing?

The revision acknowledges that some workflows involve minimal or no human post-editing, especially for low-risk content. In those cases, the standard requires documented automated quality assessment, systematic monitoring of the AI system's quality, and clear escalation criteria for routing content to human review. Platforms like KTTC that provide automated scoring with configurable thresholds directly support this.

What is "false fluency" and why is it a specific concern with LLM output?

False fluency happens when a translation reads perfectly but contains hidden inaccuracies — omitted information, altered meaning, or fabricated details masked by natural-sounding prose. Traditional MT produced obviously awkward output, making errors easy to spot. LLMs produce fluent text by default, so accuracy errors can slip past post-editors who use fluency as their quality signal. The revised standard requires training in false fluency detection and recommends structured evaluation methods like MQM scoring (through platforms such as KTTC) that assess accuracy independently of fluency. This is, in my view, the single most important change in the revision.

We use cookies to improve your experience. Learn more in our Cookie Policy.