CrossAsia DH Lunchtalks Archive

image source provided by Dr. What does it mean to be a historian in the age of AI? AI is not the first such shift. The digital turn quietly reshaped how historians work. It raised accessibility. A historian today starts a project at a search engine, pulls sources from a digital archive, and turns archive photographs into research data at home. As Ian Milligan puts it, “we are all digital now.” If the digital turn brought accessibility, AI brings something accessibility alone could not: machine reading at the scale of the archive itself. Why scale? Historical research moves through stages: reading, extracting, structuring, analyzing, visualizing, asking new questions. Each works on a single document but breaks at archive scale. The Annals of the Joseon Dynasty hold roughly 384,000 articles across five centuries. Reconstructing the careers of even one generation of officials requires linking and reasoning across more material than a single researcher can manage. In this talk I draw on several ongoing projects, including a vision-language model fine-tuned for Manchu and an agent-based record-linkage system across the Annals and the Bangmok (civil-examination rosters), to argue that AI does not replace any step in this sequence; it changes the scale at which each becomes possible. The Manchu model does not read more carefully than a Manchu specialist, but it makes an entire archive legible. The linkage system does not match identities more carefully than a historian by hand, but it tracks the same person across sources that no individual could reconcile end to end. Once reading, linkage, and structuring scale up, questions of a different order become askable: not one official’s career, but a generation’s; not one local pattern, but the structure of bureaucratic mobility across five centuries. The historian’s craft is unchanged; what changes is what becomes askable. To be a historian in the age of AI is to treat discovery, when the data itself begins to suggest the questions, as a stage of the craft.

CrossAsia DH Lunchtalks – From Reading to Discovery: AI-Assisted Workflows for East Asian Historical Texts

12. Mai 2026/in CrossAsia DH Lunchtalks, Veranstaltungen/von CrossAsia

Dear users,

On June 9th at 12:30 pm (CEST), we are pleased to host the fifth session of the CrossAsia DH Lunchtalks 2026. This session will feature a presentation by Dr. Donghyeok Choi titled “From Reading to Discovery: AI-Assisted Workflows for East Asian Historical Texts.” In this talk, Dr. Choi explores how the craft of historical research is changing in the age of AI through several of his ongoing digital humanities projects focused on premodern East Asian texts. The abstract is as follows:

What does it mean to be a historian in the age of AI? AI is not the first such shift. The digital turn quietly reshaped how historians work. It raised accessibility. A historian today starts a project at a search engine, pulls sources from a digital archive, and turns archive photographs into research data at home. As Ian Milligan puts it, “we are all digital now.” If the digital turn brought accessibility, AI brings something accessibility alone could not: machine reading at the scale of the archive itself. Why scale? Historical research moves through stages: reading, extracting, structuring, analyzing, visualizing, asking new questions. Each works on a single document but breaks at archive scale. The Annals of the Joseon Dynasty hold roughly 384,000 articles across five centuries. Reconstructing the careers of even one generation of officials requires linking and reasoning across more material than a single researcher can manage.

In this talk I draw on several ongoing projects, including a vision-language model fine-tuned for Manchu and an agent-based record-linkage system across the Annals and the Bangmok (civil-examination rosters), to argue that AI does not replace any step in this sequence; it changes the scale at which each becomes possible. The Manchu model does not read more carefully than a Manchu specialist, but it makes an entire archive legible. The linkage system does not match identities more carefully than a historian by hand, but it tracks the same person across sources that no individual could reconcile end to end. Once reading, linkage, and structuring scale up, questions of a different order become askable: not one official’s career, but a generation’s; not one local pattern, but the structure of bureaucratic mobility across five centuries. The historian’s craft is unchanged; what changes is what becomes askable. To be a historian in the age of AI is to treat discovery, when the data itself begins to suggest the questions, as a stage of the craft.

About the speaker:

Dr. Donghyeok Choi is a Postdoctoral Fellow in the Department of History at Hong Kong Baptist University. He holds a Ph.D. from KAIST’s Graduate School of Culture Technology (2024) and a B.A. in History and a B.E. in Computer Science Engineering from Sungkyunkwan University. He applies computational and quantitative methods to East Asian history and builds AI-assisted research infrastructure for the humanities. He previously held a postdoctoral fellowship at the University of Hong Kong.

The lecture will be held in English. If you have any questions, please contact us at ostasienabt@sbb.spk-berlin.de.

The lecture will be streamed and recorded via Webex*. You can take part in the lecture using your browser without having to install a special software. Please click on the respective button “To the lecture” below, follow the link “join via browser,” and enter your name.

To the lecture

You can find the full programm of CrossAsia DH Lunchtalks 2026 here. Further talks will also be announced on our blog as well as on Mastodon and BlueSky.

Yours,

CrossAsia Team

*By participating, you grant the Stiftung Preußischer Kulturbesitz and its subordinate institutions free of charge all rights of usage of pictures and videos taken of you during this lecture presentation. This declaration of consent is valid in terms of time and space without restrictions and for usage in all media, including analogue and digital usage. It includes image processing and the usage of photos in composite illustrations. German law will apply.

CrossAsia DH Lunchtalks – Structures of Knowing an Empire: Building Digital Analytical Tools for Chinese Local Gazetteers and Spanish Relaciones Geográficas

24. April 2026/in Aktuelles, CrossAsia DH Lunchtalks, Veranstaltungen/von CrossAsia

Dear users,

On May 21st at 12:30 pm (CEST), we are pleased to host the fourth session of the CrossAsia DH Lunchtalks 2026. This session will feature a joint presentation by Dr. CHEN Shih-Pei and Dr. Mariana Favila Vázquez, titled “Structures of Knowing an Empire: Building Digital Analytical Tools for Chinese Local Gazetteers and Spanish Relaciones Geográficas.” In this talk, Dr. Chen and Dr. Favila Vázquez will present and compare their digital approaches to analyzing geographical knowledge in early modern China and the Spanish Empire.

How did early modern empires come to know their vast territories, especially the remote regions at their peripheries? In a recent book titled “Knowing an Empire: Early Modern Chinese and Spanish Worlds in Dialogue”, scholars explore how the Spanish and the Chinese empires developed comparable ways to gather, organize, and use knowledge about their local worlds. The Spanish Empire compiled the Relaciones Geográficas (trans. relational geographies) that surveyed the indigenous peoples, lands, and natural resources of its newly acquired, remote territories. In parallel, the Chinese officials compiled difangzhi 地方志 (local gazetteers) since the 12th century to document the local landscapes, people, flora, and fauna of each regions within the vast empire.

In this CrossAsia DH Lunch talk, two authors who contributed to this book will talk about how they each designed digital analytical tools to help grasp the overall structures of these two genres, given their large amount and rich contents. Shih-Pei Chen will introduce a quantitative analysis based on the section headings of local gazetteers within LoGaRT (Local Gazetteers Research Tools). She argues, the sections headings of each local gazetteer are conscious selection made by its compilers as to how to best describe and document a region, and thus they should be treated as knowledge categories. In this session, she will show how it looks like when analyzing all the section headings from 4000 gazetteers together: it reveals a dynamic structure of “local knowledge” of historical China that is jointly defined by imperial guidelines and local officials across geographical regions over 800 years.

Mariana Favila Vázquez will introduce the case of the sixteenth-century Relaciones Geográficas, a documentary corpus produced in response to a questionnaire of fifty questions circulated in 1577. The questionnaire was commissioned by King Philip II and distributed through the Council of the Indies as part of a broader effort by the Spanish Crown to gather systematic information about its American territories. The instructions and interrogatory were prepared under the direction of the royal cosmographer-chronicler Juan López de Velasco and sent to local authorities in New Spain, who were responsible for compiling the responses.

This session will present a case study based on the information contained in the responses from the former Bishopric of Michoacán, with particular attention to references to inland bodies of water. It will also outline the methodology of Geographical Text Analysis, which enables the creation of digital annotations using historically relevant semantic categories and the linking of identified toponyms to their corresponding geographic coordinates, making it possible to conduct subsequent spatial analyses.

The works featured in this talk can also be found at “Part 2: Structures of Knowing” in Knowing an Empire, which is open access and can be read online at Fulcrum.org.

About the speakers:

Dr. CHEN Shih-pei is a Senior Research Scholar at the Max Planck Institute for the History of Science (MPIWG) and a specialist in Digital Humanities. She desgins digital research methods, tools, and infrastructures to help historians engage with digitized historical materials from new perspectives. She has led the development of several DH projects, including the Local Gazetteers Research Tools (LoGaRT); CHMap as a website hosting open-access historical maps of China (in collaboration with Shanghai Jiao Tong University); RISE & SHINE as an API protocols for the standardized exchange of digital texts among digital tools and content providers. At MPIWG, she is now leading another research project: “Common Knowledge and Its Sources in the Sinosphere, 14th–20th Centuries,” which investigate how the Chinese daily-use encyclopedias to examine how “common knowledge” in Chinese history evolved and diverged from elite and literati genres.

Dr. Mariana Favila Vázquez is an archaeologist, and holds an MA and a PhD in Mesoamerican Studies from the National Autonomous University of Mexico (UNAM). Her research focuses on pre-Hispanic and colonial navigation, cultural landscapes, and the use of digital technologies and spatial analysis in historical research. She is the author of Veredas de Mar y Río. Navegación prehispánica y colonial en Los Tuxtlas, Veracruz (UNAM, 2016) and Navegación prehispánica en Mesoamérica (BAR Publishing, 2020), as well as several articles and book chapters. She has held postdoctoral fellowships at Lancaster University and at UNAM’s Institute of Geography. She is currently Associate Professor at the Centre for Research and Advanced Studies in Social Anthropology (CIESAS), Mexico City Unit, in the area of Ethnohistory, where she is developing a project on lacustrine landscapes and digital humanities. She is a member of Mexico’s National System of Researchers (SNII), Level 1.

The lecture will be held in English. If you have any questions, please contact us at ostasienabt@sbb.spk-berlin.de.

To the lecture

You can find the full programm of CrossAsia DH Lunchtalks 2026 here. Further talks will also be announced on our blog as well as on Mastodon and BlueSky.

Yours,

CrossAsia Team

*By participating, you grant the Stiftung Preußischer Kulturbesitz and its subordinate institutions free of charge all rights of usage of pictures and videos taken of you during this lecture presentation. This declaration of consent is valid in terms of time and space without restrictions and for usage in all media, including analogue and digital usage. It includes image processing and the usage of photos in composite illustrations. German law will apply.

CrossAsia DH Lunchtalks – Reimagining Humanities Education: Interdisciplinary Cultivation in the Era of Digital Intelligence

20. April 2026/in Aktuelles, CrossAsia DH Lunchtalks, Veranstaltungen/von CrossAsia

Dear users,

On April 21st at 12:30 pm (CEST), we are pleased to host the third session of the CrossAsia DH Lunchtalks 2026. The talk will be given by Dr. Beibei Zhan and is titled “Reimagining Humanities Education: Interdisciplinary Cultivation in the Era of Digital Intelligence.” Dr. Zhan will share her experience developing a structured pedagogical approach for integrating AI and digital methods into humanities teaching in the era of “Digital Intelligence.”

In the „Digital Intelligence“ era, the rapid evolution of AI and Big Data is fundamentally reshaping the production and dissemination of knowledge, necessitating a transition in humanities education from traditional paradigms to an integrated, technology-enhanced ecosystem. This lecture proposes a transformative framework for cultivating humanities students under the „New Liberal Arts“ initiative, aiming to bridge the gap between classical erudition and computational science through a Six-Dimensional Structural Model. This model integrates problem-solving, knowledge synthesis, tool literacy, task practice, organizational collaboration, and ethical governance into a cohesive strategy, driving research through authentic socio-cultural inquiries while balancing technical proficiency with rigorous responsibility.

Central to this pedagogical shift are the practical innovations at Yuelu Academy (Hunan University), specifically the „Digital Intelligence Micro-course Cluster“ and the „Humanities-AI Seminar“. The Micro-course Cluster operates on a three-tiered conceptual framework: first, establishing General Digital Literacy to foster computational thinking and a critical understanding of AI tools; second, developing Discipline-Specific Core Reflection, where students utilize digital methods such as metadata encoding and text mining to innovate traditional tasks like version tracing and semantic analysis; and third, encouraging Interdisciplinary Frontier Exploration, which empowers students to lead original research in cutting-edge fields such as Linguistic Intelligence, Cultural Visualization, and Digital Geography (GIS). Complementing this structured approach, the Humanities-AI Seminar offers a self-organized, „Human-in-the-loop“ community where students, experts, and industry engineers co-create knowledge through real-world case studies, such as utilizing OpenAI APIs for structured knowledge extraction from historical archives. By synthesizing systematic training with open-ended collaborative research, these models demonstrate how humanities students can evolve into versatile scholars capable of navigating and shaping the global digital landscape.

About the speaker:

Dr. Beibei Zhan is an Associate Professor and Director of the Digital Humanities Center at Yuelu Academy, Hunan University, holding dual doctorates in Computer Vision (Kingston University) and Sinology (SOAS University of London). Her research focuses on the intersection of Ming-Qing history, Digital Humanities and Humanistic Intelligence. She currently serves as an Executive Member of the Technical Committee on Computing Applications, China Computer Federation (CCF), and as a Council Representative of the Digital Humanities Development Alliance of China.

The lecture will be held in English. If you have any questions, please contact us at ostasienabt@sbb.spk-berlin.de.

The lecture will be streamed via Webex*. You can take part in the lecture using your browser without having to install a special software. Please click on the respective button “To the lecture” below, follow the link “join via browser,” and enter your name.

To the lecture

You can find the full programm of CrossAsia DH Lunchtalks 2026 here. Further talks will also be announced on our blog as well as on Mastodon and BlueSky.

Yours,

CrossAsia Team

*By participating, you grant the Stiftung Preußischer Kulturbesitz and its subordinate institutions free of charge all rights of usage of pictures and videos taken of you during this lecture presentation. This declaration of consent is valid in terms of time and space without restrictions and for usage in all media, including analogue and digital usage. It includes image processing and the usage of photos in composite illustrations. German law will apply.

CrossAsia DH Lunchtalks – Getting the Lines Right: Layout Analysis as the Critical First Step for Tibetan Newspaper HTR

23. März 2026/in Aktuelles, CrossAsia DH Lunchtalks, Forschungsdaten, OCR, Veranstaltungen/von CrossAsia

Dear users,

On March 24th at 12:30 pm (CET), we are pleased to host the second session of the CrossAsia DH Lunchtalks 2026. The talk will be given by Dr. Franz Xaver Erhard and is titled “Getting the Lines Right: Layout Analysis as the Critical First Step for Tibetan Newspaper HTR.” Dr. Erhard will introduce his the Divergent Discourses project, as well as TransYolo, a custom Python workflow to solve the layout analysis bottleneck in digitizing historical Tibetan newspapers.

Handwritten Text Recognition (HTR) has matured rapidly in recent years, and for many document types, the core recognition task is largely solved. Yet when researchers turn to historical Tibetan newspapers, progress stalls — not because HTR models fail, but because the lines are never correctly identified in the first place. This talk argues that layout analysis, not transcription, is the true bottleneck in Tibetan newspaper digitization, and that no single off-the-shelf tool is adequate for the task.

Tibetan newspapers such as the Tibet Daily (TID) collection present a combination of challenges that expose the limits of general-purpose layout tools: dense multi-column page designs with inconsistent column spacing, mixed scripts (Tibetan, Chinese, Latin), varying typefaces and handwriting styles across issues and periods, and the physical realities of digitized print — page skew, gutter distortion, and uneven illumination. These properties interact in ways that defeat standard segmentation approaches, producing incorrect line detections, boundary bleed-across, and broken reading order — all before a single character is recognized.

Transkribus, the dominant platform for historical HTR in the humanities, offers built-in layout analysis through its field models. These work well for their intended use cases, but Tibetan newspaper material sits well outside that scope: column layouts confuse region assignment, high line density triggers false positives, and the platform’s limited configurability makes targeted correction difficult. The lesson is not that Transkribus falls short, but that specialized material demands specialized solutions.

To meet this need, the talk introduces TransYolo, a custom Python workflow developed within the Divergent Discourses project (AHRC/DFG). TransYolo uses a YOLO model trained specifically on Tibetan newspaper pages to detect text lines, assigns detections to text regions previously detected with Transkribus, reconstructs reading order, and exports Transkribus-compatible PAGE XML. The example shows what becomes possible when layout analysis is treated as a problem in its own right.

About the speaker:

Dr. Franz Xaver Erhard is a Tibetologist specializing in Tibetan literature, biography, and cultural history, with close to a decade of fieldwork experience in Lhasa. He is the Principal Investigator of the DFG/AHRC cooperative project „Divergent Discourses: Processes of Narrative Construction in Tibet, 1955–1962,“ which compiles and analyses the first modern corpus of historical Tibetan newspapers using digital humanities methods, including computational tools for text recognition and natural language processing, to trace how divergent narratives emerged and evolved in PRC and exile publications during one of the most consequential periods of Tibetan history.

The lecture will be held in English. If you have any questions, please contact us at ostasienabt@sbb.spk-berlin.de.

The lecture will be streamed and recorded via Webex. You can take part in the lecture using your browser without having to install a special software. Please click on the respective button “To the lecture” below, follow the link “join via browser,” and enter your name.

To the lecture

You can find the full programm of CrossAsia DH Lunchtalks 2026 here. Further talks will also be announced on our blog as well as on Mastodon and BlueSky.

Yours,

CrossAsia Team

CrossAsia DH Lunchtalks – AI for the Humanities: A Case of Manchu OCR

2. Februar 2026/in Aktuelles, CrossAsia DH Lunchtalks, Forschungsdaten, Newsletter 36, OCR, Veranstaltungen/von CrossAsia

Dear users,

On February 3rd at 12:30 pm (CET), we are pleased to host the first session of the CrossAsia DH Lunchtalks 2026. The talk will be given by Dr. Yan Hon Michael Chung and is titled “AI for the Humanities: A Case of Manchu OCR.” Dr. Chung will introduce the development pipeline for creating an OCR model for Manchu-language documents and share his reflections on applying AI to humanities research.

Manchu, today an endangered language, was once the official language of China’s last imperial dynasty, the Qing (1644–1911). The Qing state produced an enormous corpus of Manchu-language documents, many of which have been digitized and made publicly available by archives and libraries worldwide. Despite this abundance of scanned materials, there is still no reliable, publicly accessible optical character recognition (OCR) system for Manchu, posing a major bottleneck for historical research.

This presentation introduces an end-to-end Manchu OCR system developed by fine-tuning a vision–language model (VLM), and uses it as a case study to reflect on the broader challenges of applying AI to humanities research. It identifies three structural constraints that distinguish humanities-oriented AI development from commercial or industrial settings: the scarcity of labeled training data, the unusually high accuracy requirements demanded by scholarly research, and the limited computational resources available to most humanities scholars.

To address these constraints, the project adopts a small-model, data-centric strategy. The OCR model is trained using a combination of large-scale synthetic data and carefully curated historical samples. Specifically, a LLaMA-3.2-11B Vision model is fine-tuned using approximately 60,000 synthetic Manchu images alongside 20,000 Manchu word images extracted from real Qing-era documents. The resulting model achieves up to 96% accuracy on unseen, real-world scanned Manchu sources.

The OCR pipeline is further enhanced through a custom Manchu word detection and segmentation model, combined with a post-processing large language model for typographical correction. Together, these components form a complete, practical Manchu OCR system built with state-of-the-art vision–language and language models. Beyond presenting technical results, this presentation argues that carefully constrained, accuracy-driven AI systems offer a viable and sustainable path for AI research in the humanities.

About the speaker:

Dr. Michael Chung is an Assistant Professor in Digital Humanities at the Hong Kong University of Science and Technology. Chung received his PhD in history from Emory University in 2025, and his BA and MPhil from the Chinese University of Hong Kong in 2012 and 2016 respectively. Chung’s research centers on the early Qing dynasty, with a focus on the transfer of European artillery technology and the formation of the Hanjun Eight Banners. As a digital humanist, Chung is currently developing a Manchu OCR system based on a fine-tuned vision-language model.

The lecture will be held in English. If you have any questions, please contact us at ostasienabt@sbb.spk-berlin.de.

To the lecture

You can find the full programm of CrossAsia DH Lunchtalks 2026 here. Further talks will also be announced on our blog as well as on Mastodon and BlueSky.

Yours,

CrossAsia Team

CrossAsia DH Lunchtalks Launching in February 2026

16. Januar 2026/in Aktuelles, CrossAsia DH Lunchtalks, Forschungsdaten, Newsletter 36, OCR, Veranstaltungen/von CrossAsia

Dear colleagues,

We are delighted to announce that the CrossAsia DH Lunchtalks will return in February 2026.

Originally launched between winter 2023 and spring 2024, the first DH Lunchtalk Series was warmly received by our community. Building on this success, the CrossAsia team and the Max Planck Institute for the History of Science (MPIWG) went on to co-host the international conference “Charting the European D-SEA: Digital Scholarship in East Asian Studies” in Berlin from 8–12 July 2024, bringing together around 120 participants from 19 countries and regions (read more).

In light of this strong engagement and our ongoing commitment to digital scholarship, we are pleased to relaunch the Lunchtalks as an online forum where scholars can share project updates, present new tools and methods, offer methodological insights, and showcase innovative research in Digital Asian Studies.

Between February and June 2026, the DH Lunchtalks will take place monthly. While the 2023–2024 season focused primarily on training in digital tools and platforms, the upcoming series will feature 60-minute lunchtime talks (including Q&A) by distinguished speakers presenting their latest digital research projects. The currently confirmed programme is as follows:

February 3
Prof. Michael Yan Hon CHUNG (Hong Kong University of Science and Technology)
AI for Endangered Documentary Archives: Manchu OCR
March 24
Dr. Franz Xaver Erhard (Leipzig University)
Getting the Lines Right: Layout Analysis as the Critical First Step for Tibetan Newspaper HTR
April 21
Dr. ZHAN Beibei (Yuelu Academy, Hunan University)
Reimagining Humanities Education: Interdisciplinary Cultivation in the Era of Digital Intelligence
May 21
Dr. CHEN Shih-Pei (Max Planck Institute for the History of Science) & Dr. Mariana Favila-Vázquez (CIESAS–Unidad Ciudad de México)
Structures of Knowing an Empire: Building Digital Analytical Tools for Chinese Local Gazetteers and Spanish Relaciones Geográficas
June 9
Dr. CHOI Donghyeok (Hong Kong Baptist University)
From Reading to Discovery: AI-Assisted Workflows for East Asian Historical Texts
June 23
Dr. Rafał Jan Felbur (Heidelberg University)
Born-digital Dictionary of Early Chinese Buddhist Translations

All DH Lunchtalks will take place from 12:30 to 13:30 (Central European Time) and will be held online via Webex. Further details for each session, including abstracts and access links, will be announced in advance on the CrossAsia blog. The first talk, by Prof. Michael Yan Hon Chung, will be announced shortly on CrossAsia.

If you have any questions about the DH Lunchtalks, or if you are interested in proposing a future talk and sharing your own digital research, please contact Dr. Jing Hu at jing.hu@sbb.spk-berlin.de.

We look forward to welcoming many of you to the CrossAsia DH Lunchtalks 2026!

Yours,

CrossAsia Team