Friday 19 November 2021

Locating Restriction Sites (Rosalind | English)

rosalind


Hi everyone, how are you? This time, I want to discuss about one problem (bioinformatics problem) that exist in rosalind.info's web. The title is "Locating Restriction Sites". For the reference, you can first check out the problem that will be discussed (here). 

Overview 

In this problem, we will given a DNA string (let's call it s) based on FASTA format. Our task is to find the position and the length of the reverse palindrom substring. 

The reverse palindrom substring is a substring that also be a complement of this reverse. For example, "GATC" is a reverse palindrom because it is a complement of its reverse, which is "CTAG". 

To memorize, we already done something like this in a similar problem called "Complementing a Strand of DNA". 

The Code 

This is the code for solving this problem (with java language): 
  1. static void solve() {
  2. HashMap<Character, Character> thisPair = new HashMap<>();
  3. thisPair.put('G', 'C');
  4. thisPair.put('C', 'G');
  5. thisPair.put('A', 'T');
  6. thisPair.put('T', 'A');
  7. Scanner sc = new Scanner(System.in);
  8. sc.next();
  9. String s = "";
  10. while (sc.hasNext()) {
  11. s += sc.next();
  12. }
  13. for (int i = 1; i < s.length(); i++) {
  14. if (thisPair.get(s.charAt(i - 1)) == s.charAt(i)) {
  15. int panjang = 2;
  16. int l = i - 1;
  17. int r = i;
  18. while (--l >= 0 && ++r < s.length() && panjang < 12) {
  19. if (thisPair.get(s.charAt(l)) != s.charAt(r)) break;
  20. panjang += 2;
  21. if (panjang >= 4){
  22. out.println((l + 1) + " " + panjang);
  23. }
  24. }
  25. }
  26. }
  27. }    
  28.     
Code Review

Maybe it's clear enough. Here, we try to find a reverse palindrom manually. To make easier, we can first find 2 adjacent characters to start the searching process (line 14). 

The searching process is, first, checking whether those adjacent characters are complementary (or not); lets call those 2 adjacent characters t. If they do, then we check the actual t length by checking to the right and to the left beside of t whether they are complementary or not. If they do, repeat the process  again until the maximum length (line 18-24). 

Of course, in every single reverse palindrom we found we print its location and its length, as the problem asked. 

Input and Output

In the code above, I used next() function for entering the string-form dataset (line 11). 

I also used another input function called hasNext() (line 10). That function is very useful especially if we need to process an unknown-amount of data, just like a FASTA format data. 

And for the output I used out.println() function (line 22). That function is a modification from System.out.println() function which is very familiar in java. You can see the additional code for that modification (input and output) in my complete code at github

That's it. If you want to ask something, you can write it in the comment section below. I hope this article is useful and see you in the next article! 


Reference :
Source of image 1 :https://www.facebook.com/ProjectRosalind/

No comments:

Post a Comment