Thursday, 19 May 2022

Introduction to Random Strings (Rosalind | English)

    

   rosalind


Hi everyone, how are you? This time, I want to discuss about one problem that exists in rosalind.info's web. The title is "Introduction to Random Strings". For the reference, you can first check out the problem that will be discussed (here). 

Overview 

In this problem we will gave a string (let's call it string s) and an integer array which contains gc-content datasets (let's call it array A[]). Gc-content itself is a value representing a probability of appearance of gianine ('G') and cytosine ('C') compund in a DNA. While the string s here represents the  sequence of random string DNA. 

Our task is determining the probability of string s using the gc-content exist in array A[] as a reference. Then, we transform the value of the probability into log10 form value. 
 
Here, the idea is putting the gc-content into string s : the value of 'G'&'C' is gc-content / 2, while 'A'&'T' is (1 - gc-content) / 2. For example, a DNA has a gc-content value 0.6, and we want to find the value of random string "ACGA". So, for solving this problem we just multiply all of the character value based on the gc-content: 'A' = 0.2 (0.4 / 2), 'C' = 0.3 (0.6 / 2), 'G' = 0.3(0.6 / 2), and 'A' = 0.2(0.4 / 2). That result will be 0.0036. Thus, the answer of this problem is the log10 from 0.0036. 

The Code 

This is the code for solving this problem using java language: 
  1. public void solve() {
  2. Scanner sc = new Scanner(System.in);
  3. String s = sc.next();
  4. int GC = 0;
  5. int AT = 0;
  6. for (char c : s.toCharArray()) if (c == 'G' || c == 'C') GC++; else AT++;
  7. while (sc.hasNext()) {
  8. double gcContent = Double.parseDouble(sc.next());
  9. out.printf("%.3f\n", Math.log10(Math.pow(gcContent / 2, GC) * Math.pow((1 - gcContent) / 2, AT)));
  10. }
  11. }
  12.     
The Code Description

First, we can count how many times the paired characters appeared: 'G' paired with 'C' , 'A' with 'T'. 

Then, we process the gc-content datasets exist in A[] array using the method that we have discussed above. To shorten the process, here I use Math.log10(Math.pow(gcContent / 2, GC) * Math.pow((1 - gcContent) / 2, AT)) formula; like the code above (line 9). That formula came from binomial distribution formula. 

Input dan Output

Like what you see, here I used next() function (line 3) for entering the string-form dataset. 

While for the output I used out.printf() function (line 9). That function is a modification function from System.out.printf() which is the default function in java. For more details, you can see the additional code for that modification (input and output) in my complete code at github

That's it. If you want to ask something, you can write it in the comment section below. I hope this article is useful and see you in the next article! 


Reference :
Source image 1 :https://www.facebook.com/ProjectRosalind/

No comments:

Post a Comment