Friday 19 November 2021

Perfect Matchings and RNA Secondary Structures (Rosalind | English)

rosalind


Hi everyone, how are you? This time, I want to discuss about one problem (bioinformatics problem) that exists in rosalind.info's web. The title is "Perfect Matchings and RNA Secondary Structures". For the reference, you can first check out the problem that will be discussed (here). 

Overview 

In this problem we will given a RNA string based on FASTA format; let's call it s. Our task is to determine the total number of its perfect matching. 

Perfect matching is a condition such that every character in s has its own pair based on RNA rules: 'A' with 'U', 'C' with 'G'. For example, "ACGU" is a perfect matching, because 'A' can be paired with 'U' and 'C' can be paired with 'G'; "ACGUUU" is not a perfect matching because they are 2 'U' characters  that have not pairs. 



Left: the example for a RNA perfect matching.

The Code 

This is the code for solving this problem (with java language): 
  1. static BigInteger permut (int num) {
  2. BigInteger res = BigInteger.valueOf(1);
  3. for (int i = 1; i <= num; i++) res = res.multiply(BigInteger.valueOf(i));
  4. return res;
  5. }
  6. static void solve() {
  7. Scanner sc = new Scanner(System.in);
  8. sc.next();
  9. String s = "";
  10. while (sc.hasNext()) {
  11. s += sc.next();
  12. }
  13. int numA = 0;
  14. int numC = 0;
  15. for (char x : s.toCharArray()) {
  16. if (x == 'A') numA++;
  17. if (x == 'C') numC++;
  18. }
  19. out.println(permut(numA).multiply(permut(numC)));
  20. }   
  21.     
The Code Description

First, we find the possibility of 'A' & 'U'-matching and 'C' & 'G'-matching using the permutation formula. Because of the number of 'A' is similar with the number of 'U', and 'C' is similar with 'G'; thus, we just take the number of 'A' and 'C' and then finding their both permutations (line 15-18). 

After finding the number of permutation of 'A' and 'C; then, we multiply those two permutations to find the answer. The formula is P(x, x) * P(y, y) for x = the number of 'A' and y = 'C' and  P() is a permutation function (line 19). 

Here I used BigInteger because I want to anticipate if the number is too big to be handled by long datatype. 

Input and Output

In the code above, I used next() function for entering the string-form dataset (line 11). 

I also used another input function called hasNext() (line 10). That function is very useful especially if we need to process an unknown-amount of data, just like a FASTA format data. 

And for the output I used out.println() function (line 19). That function is a modification from System.out.println() function which is very familiar in java. You can see the additional code for that modification (input and output) in my complete code at github

That's it. If you want to ask something, you can write it in the comment section below. I hope this article is useful and see you in the next article! 


Reference :
Sumber gambar 1 :https://www.facebook.com/ProjectRosalind/
Sumber gambar 2 :https://dodona.ugent.be/nl/activities/1690462122/

No comments:

Post a Comment