A Corpus-Based Study of Simplification in Translated Chinese

Does Simplification Truly Exist in Translated Chinese? A Multilingual Corpus-Based Investigation of Translation Universals

Author: Griroek Gruaythong

11 May 2026, Bangkok – The concept of translation universals, particularly the simplification hypothesis, has occupied a central position in corpus-based translation studies for several decades. Nevertheless, empirical investigations have predominantly focused on English and European language pairs, while multilingual studies involving translated Chinese remain relatively limited. This article examines whether the simplification hypothesis can be universally applied to Chinese fiction translated from six source languages: English, German, Russian, French, Spanish, and Japanese. Drawing upon a self-constructed multilingual corpus consisting of approximately eight million Chinese characters, the study analyses fifteen linguistic indicators across lexical, syntactic, and collocational dimensions. Statistical analyses, including Kruskal–Wallis and Mann–Whitney U tests, alongside Random Forest machine learning classification, reveal that simplification is not uniformly observable in translated Chinese. Although translated texts exhibit lower lexical density and reduced collocational diversity in several cases, many translated corpora display greater lexical variation, longer syntactic structures, and increased structural complexity when compared with original Chinese fiction. Furthermore, significant divergence is observed between translations derived from Indo-European languages and those translated from Japanese, indicating the substantial influence of language typology. The findings suggest that simplification and complexification coexist in translated Chinese and are conditioned by multiple interacting factors, including source-language typology, genre, and feature selection. The study argues that future research on translation universals should move beyond binary validation and instead adopt multifactorial approaches capable of capturing the dynamic interplay between linguistic and contextual variables.

Keywords: translation universals, simplification, complexification, translated Chinese, corpus linguistics, multilingual translation, linguistic typology

Introduction

The notion that translated language possesses distinctive linguistic characteristics separate from non-translated language has long been recognised within translation studies. Early theoretical formulations conceptualised translated language as a “third code” (Frawley, 1984) or “translationese” (Gellerstam, 1986), emphasising the existence of systematic patterns that differentiate translated texts from original writing. Building upon foundational work by Toury (1978), Blum-Kulka and Levenston (1983), and Blum-Kulka (1986), Baker (1993) formally proposed the concept of translation universals, which subsequently became one of the most influential paradigms in corpus-based translation studies.

Among the various hypotheses associated with translation universals, simplification has attracted sustained scholarly attention. Baker (1996) defines simplification as the tendency for translated texts to employ simpler language than comparable original texts. Previous research has identified simplification through features such as reduced lexical density, shorter sentence structures, and decreased syntactic complexity (Laviosa, 1998; Olohan & Baker, 2000). However, empirical findings remain inconsistent, particularly in the context of translated Chinese. While some studies support simplification in translated Chinese fiction (Hu, 2007; Wang & Hu, 2008; Xiao, 2010), others report evidence of complexification, especially at lexical and syntactic levels (Xiao & Dai, 2014; Wu et al., 2023).

One major limitation of existing scholarship concerns its overwhelming focus on English-centred language pairs. Most corpus-based investigations of translation universals have concentrated on English and closely related European languages, leaving multilingual studies involving Chinese comparatively underdeveloped (Xiao & Dai, 2010). Moreover, the effects of genre and translation direction have often been overlooked, despite evidence suggesting that such variables significantly shape translated language patterns (Ke, 2005; Hu et al., 2020).

To address these gaps, the present study investigates Chinese fiction translated from six source languages representing both Indo-European and non-Indo-European linguistic systems. By comparing translated fiction with original Chinese fiction across lexical, syntactic, and collocational dimensions, the study seeks to reassess the universality of the simplification hypothesis within a multilingual framework. Specifically, the research addresses the following questions:

Does the simplification hypothesis apply universally to Chinese translated from multiple source languages?
How does linguistic complexity vary across different language pairs?
What role do language typology and genre play in shaping translated Chinese?

Literature Review

Translation Universals and Simplification

Translation universals refer to recurring linguistic features believed to characterise translated language regardless of the source and target language involved (Baker, 1993). Simplification, one of the most widely discussed translation universals, suggests that translators unconsciously produce texts that are linguistically less complex than originals (Baker, 1996). Laviosa (1998) identified several indicators of simplification in translated English, including lower lexical density, increased reliance on high-frequency vocabulary, and reduced lexical variety.

Research on translated Chinese has produced mixed findings. Hu (2007), Wang and Hu (2008), and Xiao (2010) found that translated Chinese fiction generally exhibits lower lexical density and shorter sentence structures than original Chinese, supporting the simplification hypothesis. Jiang et al. (2021) similarly observed reduced dependency distances in translated Chinese, suggesting lower syntactic complexity.

Nevertheless, other scholars have challenged the universality of simplification. Xiao and Yue (2009) reported that translated Chinese fiction often contains longer sentences than original Chinese fiction, while Xiao and Dai (2014) demonstrated that translated Chinese can exhibit greater grammatical complexity. Mauranen (2000) also identified atypical collocations in translated English, questioning whether simplification can adequately explain translated language behaviour across different contexts.

These contradictory findings suggest that simplification may not function as a universal principle but rather as a context-dependent tendency shaped by genre, source-language influence, and linguistic feature selection.

Language Typology and Source-Language Interference

Toury’s (1995) “law of interference” proposes that source-language structures leave discernible traces in translated texts. Teich’s (2003) notion of “shining through” similarly emphasises the persistence of source-language patterns in translation. Recent corpus-based studies increasingly recognise linguistic typology as a crucial factor influencing translated language.

Cappelle and Loock (2017) demonstrated that typological differences affect the use of phrasal verbs in translated English, while Molés-Cases (2019) found that source-language typology significantly shapes motion-event translation strategies. Hu and Zeng (2017), analysing translated English from twenty source languages, observed systematic typological influences on translated language patterns.

In the context of translated Chinese, Hu and Kübler (2021) examined multilingual news translations and found that lexical simplification coexisted with syntactic complexity, particularly in translations from Japanese and Korean. Chen (2023) similarly demonstrated that Japanese-to-Chinese translations diverged from other language pairs in several linguistic dimensions.

These studies collectively suggest that typological proximity between source and target languages substantially affects translated language behaviour. However, multilingual corpus-based investigations focusing specifically on translated Chinese fiction remain limited.

Research Methodology

Corpus Construction

This study employs the Chinese Fiction Corpus (CNC), a self-constructed multilingual corpus consisting of translated Chinese fiction and original Chinese fiction. The translated component includes texts translated from six source languages: English, German, Russian, French, Spanish, and Japanese. Each sub-corpus contains 200 texts, resulting in a total corpus size of approximately 7.98 million Chinese characters.

All texts were selected from representative literary fiction translated within the past two decades by native Chinese translators. To ensure comparability, texts were standardised into segments of approximately 5,000 Chinese characters and balanced across sub-corpora.

Linguistic Features

Fifteen linguistic features were extracted across three dimensions: lexical, syntactic, and collocational complexity.

At the lexical level, the analysis included average word length (AWL), Standardised Type-Token Ratio (STTR), lexical density (LD), and percentage of four-character words (FCW). Syntactic complexity was measured through indicators such as mean sentence length (MLS), mean clause length (MLC), and mean tree depth (MTD). Collocational complexity included measures such as TOTAL_RTTR and UNIQUE_RTTR.

Statistical Analysis

Python-based computational tools were employed to extract linguistic features. Because some datasets violated normality assumptions according to Shapiro–Wilk tests, non-parametric Kruskal–Wallis tests were used for group comparisons, followed by Mann–Whitney U post hoc tests.

Additionally, Random Forest classification was employed to determine whether the selected linguistic features could reliably distinguish translated Chinese according to source language. The model utilised 80% of the data for training and 20% for testing.

Results

Lexical Complexity

The results demonstrate that simplification is not consistently observable at the lexical level. Only English-translated Chinese exhibited shorter average word length than original Chinese. In contrast, translations from German, Spanish, Russian, French, and Japanese showed longer average word lengths, indicating lexical complexification.

Similarly, STTR values in most translated corpora exceeded those of original Chinese fiction, suggesting greater lexical diversity rather than simplification. However, lexical density remained lower in translated texts overall, supporting previous findings regarding reduced informational density in translated Chinese (Hu, 2007).

Japanese emerged as a particularly distinctive case. Unlike translations from Indo-European languages, Japanese-translated Chinese closely resembled original Chinese in lexical density and several other lexical indicators, suggesting the influence of typological proximity.

Syntactic Complexity

At the syntactic level, Indo-European translations generally displayed greater complexity than original Chinese fiction. German, English, and Spanish translations exhibited longer sentence and clause structures, while Japanese translations consistently showed shorter and simpler syntactic constructions.

German demonstrated the strongest tendency toward syntactic complexification across multiple indicators, including MLC, MLTU, and MTD. Conversely, Japanese supported the simplification hypothesis across nearly all syntactic measures.

These findings highlight the substantial influence of source-language structure on translated Chinese syntax.

Collocational Complexity

Unlike lexical and syntactic dimensions, collocational analysis largely supported the simplification hypothesis. Original Chinese fiction consistently demonstrated greater collocational diversity and more frequent use of Chinese-specific collocations.

Translated texts relied more heavily on repetitive and homogeneous collocational patterns, particularly in UNIQUE_RTTR and UNIQUE_RATIO measures. These findings suggest that translators may simplify collocational structures due to source-language interference and reduced access to target-language-specific phraseological patterns.

Random Forest Classification

The Random Forest model achieved an overall F1-score of 0.75, demonstrating relatively strong classification performance. Lexical density emerged as the most influential classification feature, followed by NTPS and collocational indicators.

Importantly, traditional simplification indicators such as STTR and MLC contributed comparatively little to classification accuracy. This finding suggests that translation complexity cannot be adequately captured through isolated lexical or syntactic measures alone.

Discussion

The findings challenge the universality of the simplification hypothesis in translated Chinese. Rather than exhibiting uniformly simplified language, translated Chinese demonstrates a complex interaction between simplification and complexification depending on linguistic dimension, source-language typology, and genre.

Translations from Indo-European languages generally displayed greater lexical and syntactic complexity, while Japanese translations aligned more closely with original Chinese. This supports Toury’s (1995) law of interference and Teich’s (2003) “shining through” hypothesis, demonstrating that typological structures significantly shape translated language.

Genre also appears to play a crucial role. Compared with previous research on translated news (Hu & Kübler, 2021), translated fiction exhibits greater tendencies toward lexical and syntactic complexification. Fictional discourse, which often prioritises stylistic nuance and literary expression, may encourage translators to preserve source-language complexity more extensively than informational genres such as news reporting.

Furthermore, the study demonstrates that conclusions regarding simplification depend heavily on feature selection. Coarse-grained indicators such as average word length and sentence length may suggest complexification, whereas finer-grained measures such as lexical density and collocational diversity often support simplification.

Consequently, translation universals should not be understood as absolute linguistic laws. Instead, translated language emerges through the interaction of typology, genre, translator decisions, and target-language conventions.

Conclusion

This study investigated the simplification hypothesis in translated Chinese fiction from a multilingual perspective. Through corpus-based analysis of fifteen linguistic features across lexical, syntactic, and collocational dimensions, the findings reveal that simplification is not universally applicable to translated Chinese.

While translated Chinese often demonstrates reduced lexical density and collocational diversity, it simultaneously exhibits greater lexical variation and syntactic complexity, particularly in translations from Indo-European languages. Japanese translations, by contrast, align more closely with original Chinese, highlighting the importance of typological proximity.

The study contributes to translation studies by demonstrating that simplification and complexification coexist within translated language and are shaped by multiple interacting variables. Future research should therefore adopt multifactorial approaches capable of integrating linguistic, typological, and genre-based variables rather than attempting to confirm or reject translation universals in isolation.

References

Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In honor of John Sinclair (pp. 233–250). John Benjamins. https://doi.org/10.1075/z.64.15bak
Baker, M. (1996). Corpus-based translation studies: The challenges that lie ahead. In H. Somers (Ed.), Terminology, LSP and translation studies in language engineering (pp. 175–186). John Benjamins. https://doi.org/10.1075/btl.18.17bak
Blum-Kulka, S. (1986). Shifts of cohesion and coherence in translation. In J. House & S. Blum-Kulka (Eds.), Interlingual and intercultural communication (pp. 17–35). Gunter Narr.
Cappelle, B., & Loock, R. (2017). Typological differences shining through: The case of phrasal verbs in translated English. In G. De Sutter et al. (Eds.), Empirical translation studies (pp. 235–264). De Gruyter Mouton.
Frawley, W. (1984). Prolegomenon to a theory of translation. In W. Frawley (Ed.), Translation: Literary, linguistic and philosophical perspectives (pp. 159–175). Associated University Press.
Gellerstam, M. (1986). Translationese in Swedish novels translated from English. In L. Wollin & H. Lindquist (Eds.), Translation studies in Scandinavia (pp. 88–95). C.W.K. Gleerup.
Hu, H., & Kübler, S. (2021). Investigating translated Chinese and its variants using machine learning. Natural Language Engineering, 27(3), 339–372. https://doi.org/10.1017/S1351324920000182
Hu, X. (2007). A corpus-based study on the lexical features of Chinese translated fiction. Foreign Language Teaching and Research, 39(3), 214–220.
Laviosa, S. (1998). Core patterns of lexical use in a comparable corpus of English narrative prose. Meta, 43(4), 557–570. https://doi.org/10.7202/003425ar
Mauranen, A. (2000). Strange strings in translated language: A study on corpora. In M. Olohan (Ed.), Intercultural faultlines (pp. 119–141). St. Jerome.
Olohan, M., & Baker, M. (2000). Reporting that in translated English. Across Languages and Cultures, 1(2), 141–158. https://doi.org/10.1556/Acr.1.2000.2.1
Teich, E. (2003). Cross-linguistic variation in system and text. Mouton de Gruyter.
Toury, G. (1995). Descriptive translation studies and beyond. John Benjamins.
Wu, J., Liu, K., Hu, R., & Zhou, W. (2023). A comparative study of the syntactic complexity of translated Chinese and original Chinese. Foreign Language Teaching and Research, 55(2), 264–275.
Xiao, R. (2010). How different is translated Chinese from native Chinese? International Journal of Corpus Linguistics, 15(1), 5–35. https://doi.org/10.1075/ijcl.15.1.01xia
Xiao, R., & Dai, G. (2014). Lexical and grammatical properties of translational Chinese. Corpus Linguistics and Linguistic Theory, 10(1), 11–55. https://doi.org/10.1515/cllt-2013-0016

For professional services, please visit: https://www.seaproti.org/practitioners/

About certified translators, translation certifiers, and certified interpreters associated with SEAProTI. The Southeast Asian Association of Professional Translators and Interpreters (SEAProTI) has officially shared the qualifications and requirements for becoming Certified Translators, Translation Certification Providers, and Certified Interpreters in Sections 9 and 10 of the Royal Gazette, which was published by the Prime Minister’s Office in Thailand on July 25, 2024. Certified Translators, Translation Certification Providers, and Certified Interpreters

The Council of State has proposed the enactment of a Royal Decree, granting registered translators and recognised translation certifiers from professional associations or accredited language institutions the authority to provide legally valid translation certification (Letter to SEAProTI dated April 28, 2025)

SEAProTI is the first professional association in Thailand and Southeast Asia to implement a comprehensive certification system for translators, certifiers, and interpreters.

Head Office: Baan Ratchakru Building, No. 33, Room 402, Soi Phahonyothin 5, Phahonyothin Road, Phaya Thai District, Bangkok 10400, Thailand

Email: hello@seaproti.com | Tel.: (+66) 2-114-3128 (Office hours: Mon–Fri, 09:00–17:00).

การทำให้ง่ายลงมีอยู่จริงหรือไม่ในภาษาจีนแปล: การสำรวจผ่านมุมมองพหุภาษาเชิงคลังข้อมูลภาษา

ผู้เขียน: ไกรฤกษ์ กรวยทอง

11 พฤษภาคม 2569, กรุงเทพมหานคร – แนวคิดเรื่อง “สากลลักษณ์ของการแปล” (translation universals) โดยเฉพาะสมมติฐานเรื่องการทำให้ง่ายลง (simplification hypothesis) ถือเป็นประเด็นสำคัญในวงการการศึกษาการแปลเชิงคลังข้อมูลภาษามาอย่างต่อเนื่อง อย่างไรก็ตาม งานวิจัยเชิงประจักษ์ส่วนใหญ่มักมุ่งเน้นคู่ภาษาอังกฤษและภาษายุโรป ขณะที่การศึกษาภาษาจีนแปลในบริบทพหุภาษายังคงมีจำนวนจำกัด บทความนี้ศึกษาว่าสมมติฐานการทำให้ง่ายลงสามารถอธิบายภาษาจีนแปลได้อย่างเป็นสากลหรือไม่ โดยวิเคราะห์วรรณกรรมภาษาจีนที่แปลจากภาษาต้นทาง 6 ภาษา ได้แก่ ภาษาอังกฤษ ภาษาเยอรมัน ภาษารัสเซีย ภาษาฝรั่งเศส ภาษาสเปน และภาษาญี่ปุ่น งานวิจัยใช้คลังข้อมูลภาษาที่ผู้วิจัยสร้างขึ้นเองซึ่งมีขนาดประมาณ 8 ล้านตัวอักษรจีน และวิเคราะห์ตัวชี้วัดทางภาษาศาสตร์จำนวน 15 ตัวในระดับคำศัพท์ วากยสัมพันธ์ และการปรากฏร่วมของคำ ผลการวิเคราะห์ทางสถิติด้วย Kruskal–Wallis test, Mann–Whitney U test และแบบจำลอง Random Forest พบว่า “การทำให้ง่ายลง” ไม่ได้ปรากฏอย่างสม่ำเสมอในภาษาจีนแปล แม้ว่าข้อความแปลจะมีความหนาแน่นทางศัพท์และความหลากหลายของการปรากฏร่วมของคำต่ำกว่าในหลายกรณี แต่ข้อความแปลจำนวนมากกลับแสดงความหลากหลายทางศัพท์สูงขึ้น โครงสร้างประโยคยาวขึ้น และมีความซับซ้อนทางวากยสัมพันธ์มากกว่าภาษาจีนต้นฉบับ นอกจากนี้ยังพบความแตกต่างอย่างเด่นชัดระหว่างงานแปลจากภาษาอินโด-ยูโรเปียนและภาษาญี่ปุ่น ซึ่งสะท้อนอิทธิพลของลักษณะเชิงประเภทวิทยาของภาษา ผลการศึกษาชี้ว่า “การทำให้ง่ายลง” และ “การทำให้ซับซ้อนขึ้น” สามารถดำรงอยู่ร่วมกันในภาษาจีนแปล และถูกกำหนดโดยปัจจัยหลายด้าน ได้แก่ ประเภทของภาษาต้นทาง ประเภทของตัวบท และการเลือกตัวชี้วัดทางภาษา ดังนั้น การศึกษาสากลลักษณ์ของการแปลในอนาคตควรก้าวข้ามการยืนยันหรือปฏิเสธสมมติฐานแบบทวิลักษณ์ และหันไปสู่แนวทางพหุปัจจัยที่สามารถอธิบายปฏิสัมพันธ์ระหว่างปัจจัยทางภาษาและบริบทได้อย่างครอบคลุม

คำสำคัญ: สากลลักษณ์ของการแปล, การทำให้ง่ายลง, การทำให้ซับซ้อนขึ้น, ภาษาจีนแปล, ภาษาศาสตร์คลังข้อมูล, การแปลพหุภาษา, ประเภทวิทยาทางภาษา

บทนำ

แนวคิดที่ว่าภาษาที่ผ่านการแปลมีลักษณะเฉพาะแตกต่างจากภาษาที่เขียนขึ้นโดยตรงได้รับการยอมรับมาอย่างยาวนานในศาสตร์การแปล นักวิชาการยุคแรกอธิบายภาษาที่ผ่านการแปลว่าเป็น “รหัสภาษาที่สาม” (third code) (ฟรอว์ลีย์, 2527) หรือ “ภาษาการแปล” (translationese) (เกลเลอร์สแตม, 2529) เพื่อชี้ให้เห็นว่าภาษาที่ผ่านการแปลมีรูปแบบทางภาษาที่เป็นระบบและแตกต่างจากภาษาต้นฉบับทั่วไป ต่อมา เบเกอร์ (2536) ได้พัฒนาแนวคิด “สากลลักษณ์ของการแปล” (translation universals) บนพื้นฐานงานของ ตูรี (2521) บลัม-คุลกา และเลเวนสตัน (2526) และบลัม-คุลกา (2529) จนกลายเป็นหนึ่งในกรอบแนวคิดสำคัญของการศึกษาการแปลเชิงคลังข้อมูลภาษา

ในบรรดาสมมติฐานเกี่ยวกับสากลลักษณ์ของการแปล “การทำให้ง่ายลง” ได้รับความสนใจมากที่สุด เบเกอร์ (2539) อธิบายว่าผู้แปลมีแนวโน้มผลิตข้อความที่มีความซับซ้อนทางภาษาน้อยกว่าข้อความต้นฉบับ งานศึกษาหลายชิ้นพบว่าการทำให้ง่ายลงสะท้อนผ่านความหนาแน่นทางศัพท์ที่ลดลง ประโยคสั้นลง และโครงสร้างวากยสัมพันธ์ที่เรียบง่ายขึ้น (ลาวิโอซา, 2541; โอโลฮาน และเบเกอร์, 2543)

อย่างไรก็ตาม ผลการศึกษาภาษาจีนแปลยังคงขัดแย้งกัน งานของ หู (2550) หวัง และหู (2551) และเสี่ยว (2553) สนับสนุนแนวคิดการทำให้ง่ายลง โดยพบว่าภาษาจีนแปลมีความหนาแน่นทางศัพท์ต่ำและโครงสร้างประโยคสั้นกว่า ขณะที่งานของ เสี่ยว และไต้ (2557) รวมถึง อู๋ และคณะ (2566) กลับพบหลักฐานของการทำให้ซับซ้อนขึ้น โดยเฉพาะในระดับคำศัพท์และวากยสัมพันธ์

ข้อจำกัดสำคัญของงานวิจัยเดิมคือการมุ่งเน้นคู่ภาษาอังกฤษเป็นหลัก งานวิจัยจำนวนมากศึกษาความสัมพันธ์ระหว่างภาษาอังกฤษกับภาษายุโรป ขณะที่การศึกษาภาษาจีนในบริบทพหุภาษายังมีจำนวนน้อย (เสี่ยว และไต้, 2553) นอกจากนี้ ปัจจัยด้านประเภทตัวบทและทิศทางการแปลมักถูกละเลย ทั้งที่มีอิทธิพลอย่างมากต่อรูปแบบของภาษาที่ผ่านการแปล (เค่อ, 2548; หู และคณะ, 2563)

ดังนั้น งานวิจัยนี้จึงมุ่งศึกษานวนิยายภาษาจีนที่แปลจาก 6 ภาษาต้นทางซึ่งประกอบด้วยทั้งภาษาอินโด-ยูโรเปียนและภาษาที่ไม่ใช่อินโด-ยูโรเปียน เพื่อประเมินสมมติฐานการทำให้ง่ายลงในบริบทพหุภาษา โดยตั้งคำถามวิจัยดังต่อไปนี้

สมมติฐานการทำให้ง่ายลงสามารถอธิบายภาษาจีนแปลจากหลายภาษาได้อย่างเป็นสากลหรือไม่
ความซับซ้อนทางภาษามีความแตกต่างกันอย่างไรระหว่างภาษาต้นทางแต่ละภาษา
ประเภทวิทยาทางภาษาและประเภทตัวบทมีบทบาทอย่างไรต่อภาษาจีนแปล

ทบทวนวรรณกรรม

สากลลักษณ์ของการแปลและการทำให้ง่ายลง

สากลลักษณ์ของการแปลหมายถึงลักษณะทางภาษาที่ปรากฏซ้ำในงานแปลไม่ว่าจะเกี่ยวข้องกับภาษาใดก็ตาม (เบเกอร์, 2536) สมมติฐานการทำให้ง่ายลงเสนอว่าผู้แปลมีแนวโน้มใช้ภาษาที่เรียบง่ายกว่าข้อความต้นฉบับ (เบเกอร์, 2539) ลาวิโอซา (2541) พบว่าภาษาอังกฤษแปลมีความหนาแน่นทางศัพท์ต่ำ ใช้คำที่มีความถี่สูงมากขึ้น และมีความหลากหลายทางศัพท์ลดลง

ในบริบทของภาษาจีน หู (2550) หวัง และหู (2551) และเสี่ยว (2553) พบว่าภาษาจีนแปลมีความหนาแน่นทางศัพท์ต่ำกว่าและมีโครงสร้างประโยคสั้นกว่า ขณะที่ เจียง ฟาน และหวัง (2564) พบว่าภาษาจีนแปลมี dependency distance ต่ำกว่าเช่นกัน

อย่างไรก็ตาม งานของ เสี่ยว และเยว่ (2552) พบว่าภาษาจีนแปลมีประโยคยาวกว่าภาษาจีนต้นฉบับ ขณะที่ เสี่ยว และไต้ (2557) พบว่าภาษาจีนแปลมีความซับซ้อนทางไวยากรณ์มากขึ้น เมาราเนน (2543) ยังพบรูปแบบการปรากฏร่วมของคำที่ผิดปกติในภาษาอังกฤษแปล ซึ่งตั้งคำถามต่อแนวคิดการทำให้ง่ายลง

ผลการศึกษาที่ขัดแย้งกันเหล่านี้สะท้อนว่าการทำให้ง่ายลงอาจไม่ใช่คุณลักษณะสากล หากแต่ขึ้นอยู่กับคู่ภาษา ประเภทตัวบท และตัวชี้วัดทางภาษาที่เลือกใช้

ระเบียบวิธีวิจัย

การสร้างคลังข้อมูลภาษา

งานวิจัยนี้ใช้คลังข้อมูล Chinese Fiction Corpus (CNC) ซึ่งประกอบด้วยวรรณกรรมภาษาจีนที่แปลจาก 6 ภาษา ได้แก่ อังกฤษ เยอรมัน รัสเซีย ฝรั่งเศส สเปน และญี่ปุ่น รวมถึงวรรณกรรมภาษาจีนต้นฉบับ โดยแต่ละคลังย่อยประกอบด้วยข้อความจำนวน 200 ชิ้น รวมทั้งหมดประมาณ 7.98 ล้านตัวอักษรจีน

ข้อความทั้งหมดถูกแปลงเป็นไฟล์ข้อความธรรมดาและแบ่งเป็นหน่วยละประมาณ 5,000 ตัวอักษรเพื่อให้สามารถเปรียบเทียบกันได้อย่างสมดุล

ตัวชี้วัดทางภาษา

งานวิจัยวิเคราะห์ตัวชี้วัดทางภาษา 15 ตัว แบ่งเป็น 3 ระดับ ได้แก่ ระดับคำศัพท์ ระดับวากยสัมพันธ์ และระดับการปรากฏร่วมของคำ

ระดับคำศัพท์ประกอบด้วย Average Word Length (AWL), Standardised Type-Token Ratio (STTR), Lexical Density (LD) และสัดส่วนคำสี่พยางค์ ส่วนระดับวากยสัมพันธ์ประกอบด้วย Mean Length of Sentence (MLS), Mean Length of Clause (MLC) และ Mean Tree Depth (MTD)

การวิเคราะห์ข้อมูล

ผู้วิจัยใช้ภาษา Python ในการดึงค่าตัวชี้วัดทางภาษา และใช้ Kruskal–Wallis test รวมถึง Mann–Whitney U test ในการเปรียบเทียบข้อมูลระหว่างกลุ่ม นอกจากนี้ยังใช้แบบจำลอง Random Forest เพื่อทดสอบว่าตัวชี้วัดเหล่านี้สามารถจำแนกภาษาจีนแปลตามภาษาต้นทางได้หรือไม่

ผลการวิจัย

ระดับคำศัพท์

ผลการศึกษาพบว่าสมมติฐานการทำให้ง่ายลงไม่ได้ปรากฏอย่างสม่ำเสมอในระดับคำศัพท์ มีเพียงภาษาจีนที่แปลจากภาษาอังกฤษเท่านั้นที่มีค่า Average Word Length ต่ำกว่าภาษาจีนต้นฉบับ ขณะที่ภาษาเยอรมัน สเปน รัสเซีย ฝรั่งเศส และญี่ปุ่นกลับมีความยาวคำเฉลี่ยสูงกว่า

นอกจากนี้ ค่า STTR ของภาษาจีนแปลส่วนใหญ่ยังสูงกว่าภาษาจีนต้นฉบับ แสดงถึงความหลากหลายทางศัพท์ที่มากขึ้น อย่างไรก็ตาม ภาษาจีนแปลกลับมีค่า Lexical Density ต่ำกว่าโดยรวม ซึ่งสอดคล้องกับงานของ หู (2550)

ภาษาญี่ปุ่นถือเป็นกรณีที่โดดเด่น เนื่องจากมีลักษณะใกล้เคียงภาษาจีนต้นฉบับมากกว่าภาษาอินโด-ยูโรเปียนอื่น ๆ สะท้อนอิทธิพลของความใกล้ชิดทางประเภทวิทยาของภาษา

ระดับวากยสัมพันธ์

ภาษาจีนที่แปลจากภาษาอินโด-ยูโรเปียนมีแนวโน้มแสดงความซับซ้อนทางวากยสัมพันธ์สูงกว่า โดยเฉพาะภาษาเยอรมันซึ่งมีค่าความยาวประโยคและความลึกของโครงสร้างประโยคสูงที่สุด ในทางตรงกันข้าม ภาษาจีนที่แปลจากภาษาญี่ปุ่นมีโครงสร้างประโยคสั้นและเรียบง่ายกว่า

ผลการศึกษาสะท้อนให้เห็นถึงอิทธิพลของโครงสร้างภาษาต้นทางต่อภาษาจีนแปลอย่างชัดเจน

ระดับการปรากฏร่วมของคำ

ผลการวิเคราะห์ในระดับการปรากฏร่วมของคำกลับสนับสนุนสมมติฐานการทำให้ง่ายลง ภาษาจีนต้นฉบับมีความหลากหลายของ collocation สูงกว่า และใช้โครงสร้างเฉพาะของภาษาจีนมากกว่า ขณะที่ข้อความแปลมีแนวโน้มใช้ collocation ซ้ำ ๆ และมีรูปแบบจำกัดกว่า

แบบจำลอง Random Forest

แบบจำลอง Random Forest มีค่า F1-score เฉลี่ย 0.75 แสดงถึงประสิทธิภาพในการจำแนกภาษาจีนแปลตามภาษาต้นทางได้ค่อนข้างดี โดย Lexical Density เป็นตัวชี้วัดที่สำคัญที่สุดในการจำแนกข้อมูล

อภิปรายผล

ผลการศึกษาชี้ว่าสมมติฐานการทำให้ง่ายลงไม่สามารถอธิบายภาษาจีนแปลได้อย่างสมบูรณ์ ภาษาจีนแปลแสดงทั้งลักษณะของ “การทำให้ง่ายลง” และ “การทำให้ซับซ้อนขึ้น” พร้อมกัน ขึ้นอยู่กับระดับทางภาษาและภาษาต้นทาง

ภาษาที่อยู่ในตระกูลอินโด-ยูโรเปียนมีแนวโน้มสร้างความซับซ้อนทางศัพท์และวากยสัมพันธ์มากกว่า ขณะที่ภาษาญี่ปุ่นซึ่งมีความใกล้ชิดทางประเภทวิทยากับภาษาจีน กลับแสดงลักษณะใกล้เคียงภาษาจีนต้นฉบับมากกว่า ผลลัพธ์นี้สอดคล้องกับแนวคิด “law of interference” ของ ตูรี (2538) และแนวคิด “shining through” ของ ไทช์ (2546)

นอกจากนี้ ประเภทตัวบทก็มีอิทธิพลสำคัญ งานแปลวรรณกรรมมีแนวโน้มรักษาความซับซ้อนของภาษาต้นฉบับมากกว่าตัวบทข่าว ซึ่งมุ่งเน้นความชัดเจนและการสื่อสารข้อมูล

สรุป

งานวิจัยนี้ศึกษาสมมติฐานการทำให้ง่ายลงในภาษาจีนแปลผ่านมุมมองพหุภาษา โดยวิเคราะห์ตัวชี้วัดทางภาษา 15 ตัวในระดับคำศัพท์ วากยสัมพันธ์ และการปรากฏร่วมของคำ ผลการศึกษาพบว่าสมมติฐานการทำให้ง่ายลงไม่สามารถใช้อธิบายภาษาจีนแปลได้อย่างเป็นสากล

แม้ว่าภาษาจีนแปลจะมีความหนาแน่นทางศัพท์และความหลากหลายของ collocation ต่ำกว่า แต่กลับมีความหลากหลายทางศัพท์และความซับซ้อนทางวากยสัมพันธ์สูงกว่าในหลายกรณี โดยเฉพาะงานแปลจากภาษาอินโด-ยูโรเปียน ขณะที่ภาษาญี่ปุ่นมีลักษณะใกล้เคียงภาษาจีนต้นฉบับมากกว่า

ผลการศึกษาชี้ให้เห็นว่าภาษาที่ผ่านการแปลเกิดจากปฏิสัมพันธ์ระหว่างประเภทวิทยาของภาษา ประเภทตัวบท และการตัดสินใจของผู้แปล ดังนั้น งานวิจัยในอนาคตควรใช้แนวทางพหุปัจจัยเพื่ออธิบายพลวัตของภาษาที่ผ่านการแปลอย่างครอบคลุมมากขึ้น

เอกสารอ้างอิง

เบเกอร์, เอ็ม. (2536). Corpus linguistics and translation studies: Implications and applications. ใน M. Baker, G. Francis, และ E. Tognini-Bonelli (บ.ก.), Text and Technology: In Honor of John Sinclair (น. 233–250). John Benjamins.
เบเกอร์, เอ็ม. (2539). Corpus-based translation studies: The challenges that lie ahead. ใน H. Somers (บ.ก.), Terminology, LSP and Translation Studies in Language Engineering (น. 175–186). John Benjamins.
ฟรอว์ลีย์, ดับเบิลยู. (2527). Prolegomenon to a theory of translation. ใน W. Frawley (บ.ก.), Translation: Literary, Linguistic and Philosophical Perspectives (น. 159–175). Associated University Press.
หู, เอ็กซ์. (2550). A corpus-based study on the lexical features of Chinese translated fiction. Foreign Language Teaching and Research, 39(3), 214–220.
เสี่ยว, อาร์. (2553). How different is translated Chinese from native Chinese? International Journal of Corpus Linguistics, 15(1), 5–35.
เสี่ยว, อาร์., และไต้, จี. (2557). Lexical and grammatical properties of translational Chinese. Corpus Linguistics and Linguistic Theory, 10(1), 11–55.
ตูรี, จี. (2538). Descriptive Translation Studies and Beyond. John Benjamins.

สมาคมวิชาชีพนักแปลและล่ามแห่งเอเชียตะวันออกเฉียงใต้ (SEAProTI) ได้ประกาศหลักเกณฑ์และคุณสมบัติของผู้ที่ขึ้นทะเบียนเป็น “นักแปลรับรอง (Certified Translators) และผู้รับรองการแปล (Translation Certification Providers) และล่ามรับรอง (Certified Interpreters)” ของสมาคม หมวดที่ 9 และหมวดที่ 10 ในราชกิจจานุเบกษา ของสำนักเลขาธิการคณะรัฐมนตรี ในสำนักนายกรัฐมนตรี แห่งราชอาณาจักรไทย ลงวันที่ 25 ก.ค. 2567 เล่มที่ 141 ตอนที่ 66 ง หน้า 100 อ่านฉบับเต็มได้ที่: นักแปลรับรอง ผู้รับรองการแปล และล่ามรับรอง

สำนักคณะกรรมการกฤษฎีกาเสนอให้ตราเป็นพระราชกฤษฎีกา โดยกำหนดให้นักแปลที่ขึ้นทะเบียน รวมถึงผู้รับรองการแปลจากสมาคมวิชาชีพหรือสถาบันสอนภาษาที่มีการอบรมและขึ้นทะเบียน สามารถรับรองคำแปลได้ (จดหมายถึงสมาคม SEAProTI ลงวันที่ 28 เม.ย. 2568)

สมาคมวิชาชีพนักแปลและล่ามแห่งเอเชียตะวันออกเฉียงใต้เป็นสมาคมวิชาชีพแห่งแรกและแห่งเดียวในประเทศไทยและภูมิภาคเอเชียตะวันออกเฉียงใต้ที่มีระบบรับรองนักแปลรับรอง ผู้รับรองการแปล และล่ามรับรอง

สำนักงานใหญ่: อาคารบ้านราชครู เลขที่ 33 ห้อง 402 ซอยพหลโยธิน 5 ถนนพหลโยธิน แขวงพญาไท เขตพญาไท กรุงเทพมหานคร 10400 ประเทศไทย