Hi everyone, how are you? On this occasion, I want to discuss about one of the problems from rosalind.info's web. That title is "Finding a Motif in DNA". For the reference, you can first check out the problem what will be discussed (here).
Overview
In this problem we will be given 2 DNA strings, let's call them s and t (to be clear, string t is a substring of string s). Our job is to find the position of string t in string s; mention all if there are many.
The solving method is quite simple: we just have to match string t throughout string s, and then print the position it if you found one.
The human chromosome image.
The Code
This is the code for solving this problem (with java language):
static void solve() {
char[] ch1 = ("." + ns()).toCharArray();
char[] ch2 = ("." + ns()).toCharArray();
int[] rev = new int[ch2.length];
rev[0] = 1;
rev[1] = 1;
int pvt = 1;
for (int i = 2; i < ch2.length; i++) {
if (ch2[i] != ch2[pvt]) {
if (ch2[i] == ch2[1]) pvt = 2;
else pvt = 1;
}
else {
pvt++;
}
rev[i] = pvt;
}
pvt = 1;
for (int i = 1; i < ch1.length; i++) {
if (ch1[i] != ch2[pvt]) {
pvt = rev[pvt - 1];
if (ch1[i] == ch2[pvt]) pvt++;
else if (ch1[i] == ch2[1]) pvt = 2;
else pvt = 1;
continue;
}
if (pvt == ch2.length - 1) {
out.print((i - (ch2.length - 1) + 1) + " ");
pvt = rev[pvt];
continue;
}
pvt++;
}
}
As you've seen, here I chose to use KMP algorithm (Knuth-Morris-Pratt) for matching between 2 strings. I used KMP algorithm because of its time-consume efficiency. KMP algorithm is very useful, especially for processing the huge amount of data, which is more likely to consume a lot of time.
If you want to learning about KMP algorithm, I have a great video here on Youtube.
Input and Output
In the code above, I used ns() function for entering the string-form data. That function is a extend from one of the system inputs which are available in java, called BufferedReader.
And for the output I used out.print() function. That function is a modification from System.out.print() function which is very familiar in java. You can see the additional code for that modification (input and output) in my complete code at github.
That's it. If you want to ask something, you can write it in the comment section below. I hope this article is useful and see you in the next article!
Reference :
Source of image 1 :https://www.facebook.com/ProjectRosalind/
Source of image 2 :https://en.wikipedia.org/wiki/Alu_element
No comments:
Post a Comment