Chinese Characters in DNA Library

Abstract

Abstract

In recent years, DNA has become an excellent medium for data storage for its high-density and long-term durability characteristics, which has caused widespread concern. However, existing DNA storage has hardly covered the field of Chinese characters’ storage.

Here, we developed a novel scheme to encode Chinese characters, based on its own structure and utilizing next-generation sequencing technology. In our design, we adopted the concept of "representative etymon" and the idea of "split" of Chinese characters to come up with an encoding scheme, which makes full use of structure characteristics of Chinese characters itself.

At the same time, in order to verify the accuracy of our encoding rules, we conducted circularization, Hyperbranched Rolling Circle Amplification (HRCA), enzyme digestion and other methods on the synthesized sequences. After that we sequenced them by MinION. We continuously adjusted the storage scheme and re-verified it according to the experimental results.

Based on the designed encoding scheme, we completed the encoding tasks of 6763 Chinese characters and designed a software for encoding and decoding the stored information specially. Finally, we encoded a part of lyrics of the song "You and Me" for experimental verification, and decoded the stored information, and then successfully verified the feasibility of our encoding scheme.

In the end, our project can achieve Chinese characters’ storage in DNA with complex and huge numbers. We also designed a similar scheme for simple pictures that are monochromatic. What’s more, our design may provide an effective encoding idea for the DNA storage in the same type of languages.