CrossAsia and Staatsbibliothek zu Berlin have far-reaching knowledge in (research) data management. Outside of East Asia, the Library manages probably the largest collection of Asia-related data worldwide. Initially starting with the management of bibliographic metadata in a variety of formats, it also administers a large and constantly growing collection of full-text data of licensed resources today. In the future CrossAsia will be even more active in the field of research data management as well.
In addition to bibliographic data of the printed East, Southeast, and Central Asian collections of Staatsbibliothek zu Berlin, CrossAsia manages a large amount of bibliographic metadata of licensed and relevant free (East) Asia-related databases for many years, both at book level and article level, in the different (East) Asian languages or in English. We either harvest the data via APIs or providers make the data available to CrossAsia in accordance with the license agreements, in various formats for bibliographic data or as part of the full-text XML files. A workflow routine defines the different steps, e.g. extraction, possibly data enrichment, quality control, transformation of the data, and import of the data into the index and search area. This collection of metadata, which now exceeds 100 million bibliographic records, is freely available to all users via CrossAsia Search. It is also possible to address the metadata via an API and to integrate the data into other contexts.
FULL-TEXT & NON-TEXTUAL DATA
Since 2016, CrossAsia has been setting up an Integrated Text Repository (CrossAsia ITR). This technical infrastructure is designed to ensure both secure and sustainable operation and access to Asia-related resources independent of particular provider systems. This we guarantee by using standard technologies for metadata, APIs (e.g., OAI) and repository frameworks (e.g., Fedora). We plan to assign a standard number (e.g., DOI) to each digital object in the ITR, which will make addressing and versioning of the objects possible.
In order to standardise and prepare the data for ingest in the ITR according to defined routines, the full-text and image-text data are extracted from their original database environments. The data is stored on a very small-scale level (if possible) in the ITR structure so that control and persistent reference of the data can be ensured. In addition, information about usage rights is added to the metadata and object data in order to ensure existing rights and to prevent improper use. This further makes it possible to integrate content into current and future CrossAsia services and to provide data for analysis, exploration, enrichment and visualisation in the field of digital science.
For the amount of titles and pages as well the list of collections included into the ITR, please refer to our introduction page of the Fulltext Search. All the data stored in the ITR is available via the Fulltext Search and the ITR Explorer.
The full-text and image-text data stored and archived in the ITR are research data – particularly with reference to secure and sustainable storage, ensuring accessibility and addressability, and allowing for the re-use of the data. The integrated text repository was conceptualised in a way that research data (e.g., transcriptions, excerpts, evaluations, annotations etc.) created in the context of the stored full-text or image-text data can be re-stored in the ITR with linkage and versioning to the source materials as well as the related media.
In addition, FID Asia is developing an information and consulting service for Asia-related research data. On the one hand, we will take into account the hitherto unspecific questions of researchers in Asia-related studies concerning use and application of research data, the national activities in the context of e.g. RADAR and DARIAH-DE, as well as the recommendations of the German Council for Scientific Information Infrastructures (RfII) for research data, research data management, and data-curating issues. On the other hand, we will observe, analyse and communicate the national and international activities in the context of research data for developing specific solutions and recommendations for Asia-related research data. The information and consulting service will include the following topics:
- Information and consultation on research data management with a special focus on Asian-related studies (data in Asian languages, etc.).
- Development of best practices and quality criteria for Asia-related studies.
- Legal advice.
- Establishment of a cooperation network with selected and approved data centres and service providers relevant to Asia-related studies.
- Support for creating research data by using materials available via CrossAsia, in particular available via the Meta and Full-text Data Repository Infrastructure (ITR – Integrated Text Repository).
- Central access point to metadata on Asian-related research data via the CrossAsia Search.