MQM Framework: Complete Guide to Translation Quality Metrics 2025
MQM — Multidimensional Quality Metrics — is the industry-standard framework for evaluating translation quality. It grew out of the EU-funded QTLaunchPad project, now sits behind ISO standards, and gives teams a systematic way to assess translations without relying on gut feeling.
Here's what it actually does, how scoring works, and how to put it into practice.
What is MQM?
MQM is a framework for analytic Translation Quality Evaluation (TQE). Where subjective reviews produce vague feedback like "this doesn't sound right," MQM gives you a standardized error typology and scoring system that works consistently across projects, languages, and evaluators.
What makes it useful:
- Standardized error typology with clearly defined categories
- Severity levels (Critical, Major, Minor) for each error type
- Flexible configuration to match project requirements
- ISO alignment through ISO 5060:2024 and ISO 11669:2024
- Works for both human and machine translation
Why MQM Matters in 2025
AI translation changed the game, but it also made quality measurement harder. LLMs like GPT-4 and Claude produce translations that sound great. Fluency isn't the problem anymore — accuracy is. And you can't catch accuracy issues by reading the target text alone.
That's where MQM comes in.
Enterprise clients now demand measurable quality standards. The new ISO 5060:2024 standard for translation quality evaluation is built on MQM principles. Agencies use it to compare translator and MT engine performance with actual numbers instead of opinions.
Without a framework like MQM, you're just guessing.
MQM Error Categories Explained
MQM organizes translation errors into a hierarchy. Here are the main categories:
1. Accuracy Errors
Errors related to how faithfully the translation represents the source text.
| Error Type | Description | Example |
|---|---|---|
| Mistranslation | Incorrect meaning transfer | "annual report" translated as "monthly report" |
| Omission | Source content missing in target | A sentence from the source not translated |
| Addition | Extra content not in source | Translator added explanation not in original |
| Untranslated | Source text left in target | Technical term left in English in German text |
2. Fluency Errors
Errors affecting the natural flow of the target text.
| Error Type | Description | Example |
|---|---|---|
| Grammar | Grammatical mistakes | Subject-verb disagreement |
| Spelling | Orthographic errors | "recieve" instead of "receive" |
| Punctuation | Incorrect punctuation | Missing comma in a compound sentence |
| Inconsistency | Inconsistent usage | Same term translated differently |
3. Terminology Errors
Errors related to specialized vocabulary.
| Error Type | Description | Example |
|---|---|---|
| Wrong term | Incorrect terminology used | "computer mouse" translated as "animal mouse" |
| Inconsistent terminology | Same term translated differently | "user interface" and "UI" used interchangeably when client requires one |
4. Style Errors
Errors related to style guide compliance.
| Error Type | Description | Example |
|---|---|---|
| Register | Wrong formality level | Using informal "you" when formal is required |
| Unidiomatic | Awkward but not incorrect | Literal translation that sounds unnatural |
5. Locale Errors
Errors in locale-specific conventions.
| Error Type | Description | Example |
|---|---|---|
| Date format | Wrong date convention | MM/DD/YYYY in European locale |
| Currency | Incorrect currency handling | $ symbol for EUR amounts |
| Measurement | Wrong unit system | Miles instead of kilometers |
MQM Severity Levels
Each error gets a severity level that affects the quality score:
| Severity | Description | Typical Penalty |
|---|---|---|
| Critical | Errors causing serious harm (legal, safety, financial) | 25 points |
| Major | Errors significantly impacting understanding or usability | 5 points |
| Minor | Errors with minimal impact on understanding | 1 point |
How do you decide? Ask yourself:
- Critical: Would this error cause legal liability, safety risks, or significant financial loss?
- Major: Does this error prevent understanding or create real confusion?
- Minor: Is this a noticeable error that doesn't seriously hurt comprehension?
The gap between Major and Critical matters more than most people realize. A mistranslated button label is Major. A mistranslated drug dosage is Critical. The penalty difference — 5 points vs. 25 — reflects that.
MQM Scoring Model
The MQM score is calculated from errors found during evaluation:
MQM Score = 100 - (Total Penalty Points / Word Count × 100) Example Calculation
For a 1000-word document with:
- 2 Major errors (5 × 2 = 10 points)
- 5 Minor errors (1 × 5 = 5 points)
- Total penalty: 15 points
MQM Score = 100 - (15 / 1000 × 100) = 98.5
Quality Thresholds
Typical industry thresholds:
| Score Range | Quality Level | Action Required |
|---|---|---|
| 99-100 | Excellent | Ready for delivery |
| 95-98 | Good | Minor review recommended |
| 90-94 | Acceptable | Review and corrections needed |
| Below 90 | Poor | Significant rework required |
A score of 93 might look fine on paper, but in practice it means roughly one Major error per 200 words. For a product manual, that's probably okay. For a pharmaceutical label, it's not.
How to Implement MQM in Your Workflow
Step 1: Define Your MQM Profile
Not every project needs all error categories. A marketing campaign cares about style and register; a software UI cares about terminology consistency; a legal contract cares about accuracy above everything else.
Build a custom MQM profile based on content type, target audience, and quality requirements.
Step 2: Train Your Evaluators
Consistent application requires trained evaluators who understand error category definitions, severity level criteria, and project-specific requirements. Two evaluators looking at the same segment should reach the same conclusion — or at least close to it.
Step 3: Select Sample Size
For statistical validity, evaluate a representative sample:
- Minimum: 250-500 words per document
- Recommended: 10-15% of total word count
- High-stakes: 100% evaluation for critical content
Step 4: Document and Analyze Results
Track MQM scores over time to identify patterns in error types, measure translator or MT engine improvement, and provide feedback that translators can actually act on. A score alone isn't useful. "You scored 94" tells someone nothing. "You had 6 terminology inconsistencies in the last batch" does.
MQM vs Other Quality Frameworks
MQM vs LISA QA Model
| Aspect | MQM | LISA QA |
|---|---|---|
| Error categories | Hierarchical, over 100 types | Fixed categories |
| Customization | Highly flexible | Limited |
| ISO backing | Yes (ISO 5060) | No |
| Industry adoption | Growing standard | Legacy |
LISA QA served the industry well for years, but it wasn't designed for the AI translation era. MQM was.
MQM vs DQF (TAUS)
MQM and DQF (Dynamic Quality Framework) merged efforts. DQF now uses MQM error typology, making them complementary rather than competing frameworks.
Tools Supporting MQM
Several translation quality tools support MQM-based evaluation:
- KTTC - Full MQM support with automated error detection
- Phrase Quality Assessment - Enterprise MQM implementation
- TAUS DQF - Industry benchmarking with MQM
- memoQ - Built-in QA with MQM categories
FAQ
What does MQM stand for?
MQM stands for Multidimensional Quality Metrics. It's a framework for systematically evaluating and measuring translation quality using standardized error categories and severity levels.
Is MQM an ISO standard?
MQM principles are incorporated into ISO 5060:2024 (Translation quality evaluation) and align with ISO 11669:2024 (Translation projects). While MQM itself is a framework, it provides the error typology foundation for these international standards.
How many error categories does MQM have?
The full MQM framework contains over 100 error types organized hierarchically. Most implementations use a subset of 20-40 categories relevant to their specific use case. The main top-level categories are Accuracy, Fluency, Terminology, Style, and Locale conventions.
Can MQM be used for machine translation evaluation?
Yes, MQM is widely used for MT evaluation. The WMT (Workshop on Machine Translation) shared tasks use MQM-based annotations for human evaluation of machine translation systems. MQM helps objectively compare MT outputs from different engines.
What is a good MQM score?
It depends on the content type. Scores above 95 are generally considered publishable quality. 90-95 is acceptable for most purposes. Below 90 typically requires revision. Critical content like legal or medical documents often requires 99+.
MQM gives translation teams a shared language for quality — literally. As AI translation gets better and harder to evaluate by ear, having a structured, numbers-driven approach isn't optional anymore. It's how the industry separates "sounds good" from "is good."
Ready to implement MQM in your workflow? Try KTTC's MQM-based quality assessment and see the difference objective quality metrics can make.
