Monday 8 November 2021

Overlap Graphs (Rosalind | English)

rosalind


Hi everyone, how are you? This time, I want to discuss about one problem (which is bioinformatics problem) that exist in rosalind.info's web. The title is "Overlap Graphs". For the reference, you can first check out the problem that will be discussed (here). 

Overview

In this problem, we will given some strings based on FASTA format that representing the unknown-amount of DNA strains. Our task is to determine the pair for each string. Two strings can be defined as a pair if 3-letter suffix of the string 1 is same as the 3-letter prefix of the string 2. For example, "CGAAC" string can be paired with "AACTT" as the "AAC" substring between those is same; "CGAAC" can be defined as string 1, and "AACTT" as string 2. 

How to solve this problem is quite simple: we just have to pair those strings based on prefix-suffix principle as we just learned in previous. 

The Code 

This is the code for solving this problem (with java language): 
  1. static class Data {
  2. String name = "";
  3. String strand = "";
  4. Data (String name, String strand) {
  5. this.name = name;
  6. this.strand = strand;
  7. }
  8. }
  9. static void tambahCekEkor (String strand, String name, HashMap<String, Vector<String>> cekEkor, Vector<Data> peserta) {
  10. String suffix = strand.substring(strand.length() - 3);
  11. if (cekEkor.get(suffix) == null) {
  12. cekEkor.put(suffix, new Vector<>());
  13. }
  14. cekEkor.get(suffix).add(name);
  15. peserta.add(new Data(name, strand));
  16. }
  17. static void solve() {
  18. Scanner sc = new Scanner(System.in);
  19. HashMap<String, Vector<String>> cekEkor = new HashMap<>();
  20. Vector<Data> peserta = new Vector<>();
  21. String name = sc.next().substring(1);
  22. while (sc.hasNext()) {
  23. boolean last = true;
  24. String strand = "";
  25. while (sc.hasNext()) {
  26. String masuk = sc.next();
  27. if (masuk.charAt(0) == '>') {
  28. tambahCekEkor(strand, name, cekEkor, peserta);
  29. name = masuk.substring(1);
  30. last = false;
  31. break;
  32. }
  33. strand += masuk;
  34. }
  35. if (last) {
  36. tambahCekEkor(strand, name, cekEkor, peserta);
  37. }
  38. }
  39. for (Data x : peserta) {
  40. String preffix = x.strand.substring(0, 3);
  41. if (cekEkor.get(preffix) != null) {
  42. for (String s : cekEkor.get(preffix)) {
  43. if (s != x.name) out.println(s + " " + x.name);
  44. }
  45. }
  46. }
  47. }
  48.   
First, we can make HashMap<String, Vector<String>> variable for saving those string's suffix (line 19). After saving those suffix (line 22-38), we can immediately search those pairs while print the answer at the same time (line 39-46). 

Input and Output

In the code above, I used next() function for entering the string-form dataset (line 22). 

I also used another input function called hasNext() (line 21). That function is very useful especially if we need to process an unknown-amount of data, just like a FASTA format data. 

And for the output I used out.println() function (line 43). That function is a modification from System.out.println() function which is very familiar in java. You can see the additional code for that modification (input and output) in my complete code at github.

That's it. If you want to ask something, you can write it in the comment section below. I hope this article is useful and see you in the next article!


Reference :
Source of image 1 :https://www.facebook.com/ProjectRosalind/

No comments:

Post a Comment