Hi everyone, how are you? This time, I want to discuss about one problem that exists in rosalind.info's web. The title is "Introduction to Random Strings". For the reference, you can first check out the problem that will be discussed (here).
Overview
In this problem we will gave a string (let's call it string s) and an integer array which contains gc-content datasets (let's call it array A[]). Gc-content itself is a value representing a probability of appearance of gianine ('G') and cytosine ('C') compund in a DNA. While the string s here represents the sequence of random string DNA.
Our task is determining the probability of string s using the gc-content exist in array A[] as a reference. Then, we transform the value of the probability into log10 form value.
Here, the idea is putting the gc-content into string s : the value of 'G'&'C' is gc-content / 2, while 'A'&'T' is (1 - gc-content) / 2. For example, a DNA has a gc-content value 0.6, and we want to find the value of random string "ACGA". So, for solving this problem we just multiply all of the character value based on the gc-content: 'A' = 0.2 (0.4 / 2), 'C' = 0.3 (0.6 / 2), 'G' = 0.3(0.6 / 2), and 'A' = 0.2(0.4 / 2). That result will be 0.0036. Thus, the answer of this problem is the log10 from 0.0036.
The Code
This is the code for solving this problem using java language:
- public void solve() {
- Scanner sc = new Scanner(System.in);
- String s = sc.next();
- int GC = 0;
- int AT = 0;
- for (char c : s.toCharArray()) if (c == 'G' || c == 'C') GC++; else AT++;
- while (sc.hasNext()) {
- double gcContent = Double.parseDouble(sc.next());
- out.printf("%.3f\n", Math.log10(Math.pow(gcContent / 2, GC) * Math.pow((1 - gcContent) / 2, AT)));
- }
- }
The Code Description
First, we can count how many times the paired characters appeared: 'G' paired with 'C' , 'A' with 'T'.
Then, we process the gc-content datasets exist in A[] array using the method that we have discussed above. To shorten the process, here I use Math.log10(Math.pow(gcContent / 2, GC) * Math.pow((1 - gcContent) / 2, AT)) formula; like the code above (line 9). That formula came from binomial distribution formula.
Input dan Output
Like what you see, here I used next() function (line 3) for entering the string-form dataset.
While for the output I used out.printf() function (line 9). That function is a modification function from System.out.printf() which is the default function in java. For more details, you can see the additional code for that modification (input and output) in my complete code at github
Reference :
Source image 1 :https://www.facebook.com/ProjectRosalind/
No comments:
Post a Comment