Genomic research relies on computers to process large amounts of genomic data.In order to digitize such data, the genomes have to be sequenced and assembled.Modern sequencing technologies allow fast and inexpensive sequencing.
Sequencing machines produce multiple chunks of sequences called reads, whichare assembled into contigs, and then further into larger pieces calledscaffolds. The process of scaffolding contigs often requires obtainingadditional data through lab work, which is both time-consuming and expensive.
The purpose of this thesis is to assess whether contigs can be scaffolded withthe aid of previously sequenced related genomes, and whether the use ofmultiple related genomes can increase the precision of the resulting scaffolds.
A pipeline with a simple, prototypical algorithm was developed to processcontigs using information from related genomes. This pipeline producesscaffolds and provides an evaluation of these.
Contigs from 4 bacterial sequencing projects were scaffolded with 10 relatedgenomes as guides for each bacterium.
The results showed that using multiple guiding genomes, which were closelyrelated to the target genome, enabled scaffolds to be produced with few errors.