Hi everyone, how are you? This time, I want to discuss about one problem (which is bioinformatics problem) that exist in rosalind.info's web. The title is "Open Reading Frames". For the reference, you can first check out the problem that will be discussed (here).
Overview
In this problem, we will given a DNA string based on FASTA format; let's call it s string.
Before stepping further, first we must know about what the Open Reading Frame is.
In short, the Open Reading Frame (ORF) is a method for reading DNA string. ORF itself divided by 4 steps.
First, we transcribe a DNA string into a RNA string. For example,
"ATGGCCATGGCGTGA" becomes "AUGGCCAUGGCGUGA".
Second, we devide that DNA string by 3 for each. Continuing the string above:
"AUGGCCAUGGCGUGA" be modified into
AUG-GCC-AUG-GCG-UGA,
UGG-CCA-UGG-CGU,
and GGC-CAU-GGC-GUG.
Third, we make a reverse complement from the "3-letter string" above. Let's continue;
the reverse complement from AUG-GCC-AUG-GCG-UGA is UAC-CGG-UAC-CGC-ACU,
the reverse complement from UGG-CCA-UGG-CGU is ACC-GGU-ACC-GCA, and
the reverse complement from GGC-CAU-GGC-GUG isCGG-GUA-CCG-CAC
Forth, we translate all of the "3-letter string"s above (6 total) into protein strings. In the past, we've translated a RNA string into a protein string using RNA codon table. This step is something like that. Like in the RNA-translate rule, here we will translating a RNA starting from start-codon (AUG) and stopping at stop-codon (UAA, UGA, UAG). For example,
the translation of AUG-GCC-AUG-GCG-UGA is MAMA.
For the further understanding of Open Reading Frame, let's see a reference here on youtube
The Code
This is the code for solving this problem (with java language):
- static String reverseComplement (String s, HashMap<Character, Character> thisPair) {
- char[] ch = s.toCharArray();
- int n = ch.length;
- for (int i = 0; i < n / 2; i++) {
- char swap = ch[i];
- ch[i] = ch[n - 1 - i];
- ch[n - 1 - i] = swap;
- }
- s = "";
- for (char x : ch) {
- s += thisPair.get(x);
- }
- return s;
- }
- static void solve() {
- HashMap<Character, Character> thisPair = new HashMap<>();
- thisPair.put('A', 'T');
- thisPair.put('T', 'A');
- thisPair.put('G', 'C');
- thisPair.put('C', 'G');
- HashMap<String, String> codonTable = new HashMap<>();
- codonTable.put("ATT", "I");
- codonTable.put("ATC", "I");
- codonTable.put("ATA", "I");
- codonTable.put("ATG", "M");
- codonTable.put("ACT", "T");
- codonTable.put("ACC", "T");
- codonTable.put("ACA", "T");
- codonTable.put("ACG", "T");
- codonTable.put("AAT", "N");
- codonTable.put("AAC", "N");
- codonTable.put("AAA", "K");
- codonTable.put("AAG", "K");
- codonTable.put("AGT", "S");
- codonTable.put("AGC", "S");
- codonTable.put("AGA", "R");
- codonTable.put("AGG", "R");
- codonTable.put("TTT", "F");
- codonTable.put("TTC", "F");
- codonTable.put("TTA", "L");
- codonTable.put("TTG", "L");
- codonTable.put("TCT", "S");
- codonTable.put("TCC", "S");
- codonTable.put("TCA", "S");
- codonTable.put("TCG", "S");
- codonTable.put("TAT", "Y");
- codonTable.put("TAC", "Y");
- codonTable.put("TAA", "Stop");
- codonTable.put("TAG", "Stop");
- codonTable.put("TGT", "C");
- codonTable.put("TGC", "C");
- codonTable.put("TGA", "Stop");
- codonTable.put("TGG", "W");
- codonTable.put("CTT", "L");
- codonTable.put("CTC", "L");
- codonTable.put("CTA", "L");
- codonTable.put("CTG", "L");
- codonTable.put("CCT", "P");
- codonTable.put("CCC", "P");
- codonTable.put("CCA", "P");
- codonTable.put("CCG", "P");
- codonTable.put("CAT", "H");
- codonTable.put("CAC", "H");
- codonTable.put("CAA", "Q");
- codonTable.put("CAG", "Q");
- codonTable.put("CGT", "R");
- codonTable.put("CGC", "R");
- codonTable.put("CGA", "R");
- codonTable.put("CGG", "R");
- codonTable.put("GTT", "V");
- codonTable.put("GTC", "V");
- codonTable.put("GTA", "V");
- codonTable.put("GTG", "V");
- codonTable.put("GCT", "A");
- codonTable.put("GCC", "A");
- codonTable.put("GCA", "A");
- codonTable.put("GCG", "A");
- codonTable.put("GAT", "D");
- codonTable.put("GAC", "D");
- codonTable.put("GAA", "E");
- codonTable.put("GAG", "E");
- codonTable.put("GGT", "G");
- codonTable.put("GGC", "G");
- codonTable.put("GGA", "G");
- codonTable.put("GGG", "G");
- Scanner sc = new Scanner(System.in);
- sc.next();
- String s = "";
- while (sc.hasNext()) {
- s += sc.next();
- }
- Vector<String> res = new Vector<>();
- HashMap<String, Boolean> udah = new HashMap<>();
- for (int j = 0; j < 3; j++) {
- for (int i = 0; i + j <= s.length() - 3; i += 3) {
- int ii = i + j;
- String now = s.substring(ii, ii + 3);
- if (now.equals("ATG")) {
- res.add("");
- }
- else if (now.equals("TAA") || now.equals("TAG") || now.equals("TGA")) {
- for (String ss : res) {
- if (udah.get(ss) == null) out.println(ss);
- udah.put(ss, true);
- }
- res = new Vector<>();
- }
- for (int k = 0; k < res.size(); k++) {
- res.set(k, res.get(k) + codonTable.get(now));
- }
- }
- res = new Vector<>();
- }
- s = reverseComplement(s, thisPair);
- res = new Vector<>();
- for (int j = 0; j < 3; j++) {
- for (int i = 0; i + j <= s.length() - 3; i += 3) {
- int ii = i + j;
- String now = s.substring(ii, ii + 3);
- if (now.equals("ATG")) {
- res.add("");
- }
- else if (now.equals("TAA") || now.equals("TAG") || now.equals("TGA")) {
- for (String ss : res) {
- if (udah.get(ss) == null) out.println(ss);
- udah.put(ss, true);
- }
- res = new Vector<>();
- }
- for (int k = 0; k < res.size(); k++) {
- res.set(k, res.get(k) + codonTable.get(now));
- }
- }
- res = new Vector<>();
- }
- }
First, we make a RNA (or DNA) codon table in a HashMap variable. For efficiency, we could use a DNA codon table; thus, we can directly translate the DNA into the protein string (line 21-85).
Then, we transform the s string into a "3-letter string" and translate it into protein after that (line 95-114).
Don't forget to make a reverse complement from s too (line 115). Make it and translate it like in the previous.
Input and Output
In the code above, I used next() function for entering the string-form dataset (line 91).
I also used another input function called hasNext() (line 90). That function is very useful especially if we need to process an unknown-amount of data, just like a FASTA format data.
And for the output I used out.println() function (line 126). That function is a modification from System.out.println() function which is very familiar in java. You can see the additional code for that modification (input and output) in my complete code at github.
That's it. If you want to ask something, you can write it in the comment section below. I hope this article is useful and see you in the next article!
Reference :
Source of image 1 :https://www.facebook.com/ProjectRosalind/
Source of image 2 :https://www.pngdownload.id/png-e9lrt0/
No comments:
Post a Comment