Data

We will work with the most common textual formats for DNA, RNA and protein sequencing in bioinformatics, FASTA and Genbank; and we will also review other well-known ones.

xtec/bio/genfiles

Esta página todavía no se ha traducido — se muestra en su idioma original:English

Introduction

Clone the project we have prepared, which has everything needed to try out the proposed examples and activities.

shell

git clone https://gitlab.com/xtec/bio/genfiles.git

FASTA Format

It is probably the most widely used file format for sequences and one of the most common types of file formats in bioinformatics.

The FASTA file format has its origins in the FAST program, used for sequence alignment.

The file format is simply defined as a plain text file with one or more entries consisting of a line with a > symbol followed by a unique identifying definition line, or defline, and one or more sequence data lines.

Creating a fasta text file is very easy, both in a plain text editor like notepad and in VSCode.

We can create a unifasta file (a single sequence) called uniseq.fasta with the VSCode editor, just copy the text and save it.

txt

>Seqüència aminoàcids de prova
MTHCP*MTI*

Or create a multifasta file (more than one sequence) called sequences.fa from the Linux terminal:

shell

echo ">a
ACGCGTACGTGACGACGATCG
>b
ATTTCGCGACTCTGCCTACGCTAC
>c
GGGAAACCTTTTTTT" > sequences.fa

Bingo! We now have a multifasta file :) with sequences a, b and c.

The fundamental requirement is that the file be plain text so that it can be handled with any text processing application or programming language.

Therefore, these files are best handled in text editors like nano, sublime or VSCode.

To view a FASTA file from the command line without editing it, you can use the cat application.

shell

cat uniseq.fasta
>Seqüència aminoàcids de prova
MTHCP*MTI*

Estás leyendo una vista previa.

Inicia sesión con Google para leer la página completa. Sigue el itinerario de aprendizaje — cada página se desbloquea cuando has leído las que la preceden. El alumnado y el profesorado leen las páginas de su curso sin límite.

Iniciar sesión

Data

#Introduction

#FASTA Format

Introduction

FASTA Format