Page 3 - B.Bioinformatics and systems biology
P. 3

A Study on the Improvement of the Cryptographic Alogrithm

  for Identifying Genetic relatives
                      1
         1
  Gi Ju Lee , Minhyeok Jeong , Jong Wha J. Joo 1 *
  1 Department of Computer Science and Engineering, Dongguk University-Seoul, 04620 Seoul, Republic of Korea
                                               ABSTRACT

   With the advent of high-throughput technology, genetic information is often compared to reveal individual
   relationships. Individual genetic information is one of the most important personal information, and efforts are being
   made to compare individual genetic information without compromising privacy. Of these, GenoCrypt identifying
   genetic relatives without compromising privacy. However, it requires a lot of computing resources. In this study, we
   propose a method referred to as AdGenoCrypt while consuming less computing resources.

                                                METHODS

                                                          In this study, to improve the encoding process of Fuzzy encryption, the
                                                          GS(Genome Sketch) is divided into n GS pieces before encrypting as
                                                          SGS(Secure Genome Sketch). GS contains hash information that divides
                                                          Haplotypes into 300 SNP (Single Nucleotide Polymorphism) units, 4625
                                                          pieces. Each piece has 24 bits of data, so out of a total of 16,777,216 binary
                                                          data, the hash value of genetic information represented by individuals is set
                                                          for about 9250.
                                                          And divide these GSs into n pieces again, each slice will save as much data
                                                          as log_2(n) bits, which will reduce the length of the binary data held by the GS
                                                          pieces. In addition, because the absolute figure of Fuzzy encryption is also
                                                          reduced, the computing resources required for SGS decryption are reduced.
                                                          This process can be described as in Fig1. In Fig1 (a) represents the total
                                                          Haplotypes of the individual by chromosome. (b) is evenly divided by 300
                                                          SNPs.
                                                          Each of these segments is set in a specific space out of 16,777,216 spaces in
                                                          GS, as shown in (e). This set is represented by a thin vertical line of Fig1.
                                                          Due to the characteristic of the hash function, the hash value is set in a
                                                          random place in GS regardless of the location of the segment.
                                                          The traditional approach is to perform a Fuzzy encryption on (e) and then
                                                          make a comparison. In this study, we compare the completed GS segment (f)
                                                          by dividing (e) equally again.
                                                RESULTS

















   As a result of performing 10 comparisons by dividing GS from 1 to 64, it was confirmed that the average time
   decreased sharply according to the log scale as shown in Figure 2 according to the number of divisions. When
   dividing by a multiple of 2, considering that the data stored in GS is neatly divided without spaces, it could be
   confirmed that there is no effect when dividing more than 16 equal parts. In addition, it was observed that if the
   number of divisions is up to 64, it takes more time as the number of comparisons increases. In a group of 150 people,
   the time taken to identify all relative relationships between each individual was measured as shown in Figure 3. The
   improved method was able to see a performance improvement of about 10 times compared to the previous method.
          CONCLUSION                         REFERENCES                   ACKNOWLEDGEMENTS

   If the GS is divided and encrypted   [1] Royal CD, Novembre J, Fullerton SM, Goldstein DB, Long   This research was supported by the
                                     JC, Bamshad MJ,Clark AG. 2010. Inferring genetic ancestry:
   and compared without increasing   opportunities, challenges,and implications. Am J Hum Genet   MSIT (Ministry of science and ICT ),
   the deviation of the number of    86: 661–673.                         Korea, under the National Program
                                     [2] Kyriazopoulou-Panagiotopoulou S, Kashef Haghighi D,
   segment distributions that can    Aerni SJ, Sundquist A, Bercovici S, Batzoglou S. 2011.   for Excellence in SW supervised by
                                     Reconstruction of genealogical relationships with applications
   affect the accuracy, it can be seen   to Phase III of HapMap. Bioinformatics 27:i333–i341.  the IITP (institute of Information &
   that it is possible to achieve similar   [3] Homer N, Szelinger S, Redman M,Duggan D,   communications Technology
                                     TembeW,Muehling J, Pearson J,Stephan D, Nelson S, Craig D.
   accuracy as before and save       2008. Resolving individuals contributingtrace amounts of DNA   Planning & Evaluation)”(2016-0-
                                     to highly complex mixtures using high-densitySNP genotyping
   computing resources at the same   microarrays. PLoS Genet 4: e1000167.  00017)
   time.                             [4] Dan He, Nicholas A. Furlotte, Farhad Hormozdiari, Jong
                                     Wha J. Joo, AkshayWadia, Rafail Ostrovsky, Amit Sahai, and
                                     Eleazar Eskin. 2014. Identifying geneticrelatives without
                                     compromising privacy.
   1   2   3   4   5   6