Skip to main content

MQM Framework: Complete Guide to Translation Quality Metrics 2025

KTTC Team1/2/20258 min read
mqmtranslation-qualityiso-standardslqaquality-assessment

MQM — Multidimensional Quality Metrics — is the industry-standard framework for evaluating translation quality. It grew out of the EU-funded QTLaunchPad project, now sits behind ISO standards, and gives teams a systematic way to assess translations without relying on gut feeling.

Here's what it actually does, how scoring works, and how to put it into practice.

What is MQM?

MQM is a framework for analytic Translation Quality Evaluation (TQE). Where subjective reviews produce vague feedback like "this doesn't sound right," MQM gives you a standardized error typology and scoring system that works consistently across projects, languages, and evaluators.

What makes it useful:

  • Standardized error typology with clearly defined categories
  • Severity levels (Critical, Major, Minor) for each error type
  • Flexible configuration to match project requirements
  • ISO alignment through ISO 5060:2024 and ISO 11669:2024
  • Works for both human and machine translation

Why MQM Matters in 2025

AI translation changed the game, but it also made quality measurement harder. LLMs like GPT-4 and Claude produce translations that sound great. Fluency isn't the problem anymore — accuracy is. And you can't catch accuracy issues by reading the target text alone.

That's where MQM comes in.

Enterprise clients now demand measurable quality standards. The new ISO 5060:2024 standard for translation quality evaluation is built on MQM principles. Agencies use it to compare translator and MT engine performance with actual numbers instead of opinions.

Without a framework like MQM, you're just guessing.

MQM Error Categories Explained

MQM organizes translation errors into a hierarchy. Here are the main categories:

1. Accuracy Errors

Errors related to how faithfully the translation represents the source text.

Error TypeDescriptionExample
MistranslationIncorrect meaning transfer"annual report" translated as "monthly report"
OmissionSource content missing in targetA sentence from the source not translated
AdditionExtra content not in sourceTranslator added explanation not in original
UntranslatedSource text left in targetTechnical term left in English in German text

2. Fluency Errors

Errors affecting the natural flow of the target text.

Error TypeDescriptionExample
GrammarGrammatical mistakesSubject-verb disagreement
SpellingOrthographic errors"recieve" instead of "receive"
PunctuationIncorrect punctuationMissing comma in a compound sentence
InconsistencyInconsistent usageSame term translated differently

3. Terminology Errors

Errors related to specialized vocabulary.

Error TypeDescriptionExample
Wrong termIncorrect terminology used"computer mouse" translated as "animal mouse"
Inconsistent terminologySame term translated differently"user interface" and "UI" used interchangeably when client requires one

4. Style Errors

Errors related to style guide compliance.

Error TypeDescriptionExample
RegisterWrong formality levelUsing informal "you" when formal is required
UnidiomaticAwkward but not incorrectLiteral translation that sounds unnatural

5. Locale Errors

Errors in locale-specific conventions.

Error TypeDescriptionExample
Date formatWrong date conventionMM/DD/YYYY in European locale
CurrencyIncorrect currency handling$ symbol for EUR amounts
MeasurementWrong unit systemMiles instead of kilometers

MQM Severity Levels

Each error gets a severity level that affects the quality score:

SeverityDescriptionTypical Penalty
CriticalErrors causing serious harm (legal, safety, financial)25 points
MajorErrors significantly impacting understanding or usability5 points
MinorErrors with minimal impact on understanding1 point

How do you decide? Ask yourself:

  • Critical: Would this error cause legal liability, safety risks, or significant financial loss?
  • Major: Does this error prevent understanding or create real confusion?
  • Minor: Is this a noticeable error that doesn't seriously hurt comprehension?

The gap between Major and Critical matters more than most people realize. A mistranslated button label is Major. A mistranslated drug dosage is Critical. The penalty difference — 5 points vs. 25 — reflects that.

MQM Scoring Model

The MQM score is calculated from errors found during evaluation:

MQM Score = 100 - (Total Penalty Points / Word Count × 100) 

Example Calculation

For a 1000-word document with:

  • 2 Major errors (5 × 2 = 10 points)
  • 5 Minor errors (1 × 5 = 5 points)
  • Total penalty: 15 points

MQM Score = 100 - (15 / 1000 × 100) = 98.5

Quality Thresholds

Typical industry thresholds:

Score RangeQuality LevelAction Required
99-100ExcellentReady for delivery
95-98GoodMinor review recommended
90-94AcceptableReview and corrections needed
Below 90PoorSignificant rework required

A score of 93 might look fine on paper, but in practice it means roughly one Major error per 200 words. For a product manual, that's probably okay. For a pharmaceutical label, it's not.

How to Implement MQM in Your Workflow

Step 1: Define Your MQM Profile

Not every project needs all error categories. A marketing campaign cares about style and register; a software UI cares about terminology consistency; a legal contract cares about accuracy above everything else.

Build a custom MQM profile based on content type, target audience, and quality requirements.

Step 2: Train Your Evaluators

Consistent application requires trained evaluators who understand error category definitions, severity level criteria, and project-specific requirements. Two evaluators looking at the same segment should reach the same conclusion — or at least close to it.

Step 3: Select Sample Size

For statistical validity, evaluate a representative sample:

  • Minimum: 250-500 words per document
  • Recommended: 10-15% of total word count
  • High-stakes: 100% evaluation for critical content

Step 4: Document and Analyze Results

Track MQM scores over time to identify patterns in error types, measure translator or MT engine improvement, and provide feedback that translators can actually act on. A score alone isn't useful. "You scored 94" tells someone nothing. "You had 6 terminology inconsistencies in the last batch" does.

MQM vs Other Quality Frameworks

MQM vs LISA QA Model

AspectMQMLISA QA
Error categoriesHierarchical, over 100 typesFixed categories
CustomizationHighly flexibleLimited
ISO backingYes (ISO 5060)No
Industry adoptionGrowing standardLegacy

LISA QA served the industry well for years, but it wasn't designed for the AI translation era. MQM was.

MQM vs DQF (TAUS)

MQM and DQF (Dynamic Quality Framework) merged efforts. DQF now uses MQM error typology, making them complementary rather than competing frameworks.

Tools Supporting MQM

Several translation quality tools support MQM-based evaluation:

  • KTTC - Full MQM support with automated error detection
  • Phrase Quality Assessment - Enterprise MQM implementation
  • TAUS DQF - Industry benchmarking with MQM
  • memoQ - Built-in QA with MQM categories

FAQ

What does MQM stand for?

MQM stands for Multidimensional Quality Metrics. It's a framework for systematically evaluating and measuring translation quality using standardized error categories and severity levels.

Is MQM an ISO standard?

MQM principles are incorporated into ISO 5060:2024 (Translation quality evaluation) and align with ISO 11669:2024 (Translation projects). While MQM itself is a framework, it provides the error typology foundation for these international standards.

How many error categories does MQM have?

The full MQM framework contains over 100 error types organized hierarchically. Most implementations use a subset of 20-40 categories relevant to their specific use case. The main top-level categories are Accuracy, Fluency, Terminology, Style, and Locale conventions.

Can MQM be used for machine translation evaluation?

Yes, MQM is widely used for MT evaluation. The WMT (Workshop on Machine Translation) shared tasks use MQM-based annotations for human evaluation of machine translation systems. MQM helps objectively compare MT outputs from different engines.

What is a good MQM score?

It depends on the content type. Scores above 95 are generally considered publishable quality. 90-95 is acceptable for most purposes. Below 90 typically requires revision. Critical content like legal or medical documents often requires 99+.

MQM gives translation teams a shared language for quality — literally. As AI translation gets better and harder to evaluate by ear, having a structured, numbers-driven approach isn't optional anymore. It's how the industry separates "sounds good" from "is good."

Ready to implement MQM in your workflow? Try KTTC's MQM-based quality assessment and see the difference objective quality metrics can make.

We use cookies to improve your experience. Learn more in our Cookie Policy.