bioweb home - sequence randomizer - sequence finder - reverse complement - sequence cleaner

DNA Protein sequence cleaner

In order to properly clean your DNA, RNA or protein sequence we need to know which alphabet the sequence is using. For instance "N" will be stripped out if you select a strict DNA alphabet, while it will remain if you select a IUPC ambiguous alphabet, where N exists and means "any nucleotide". It will also remain if you select a protein alphabet, where N means asparagine. Any character not belonging to any DNA, RNA or protein alphabet, such as punctuations, spaces, symbols, numbers and others will be always removed.

This application supports IUPAC characters.

ALPHABET SELECTION

Please select your sequence alphabet below

Unambiguous DNA, allowed: GATC
Ambiguous DNA, allowed: GATCRYWSMKHBVDN
Unambiguous RNA, allowed: GAUC
Ambiguous RNA, allowed: GAUCRYWSMKHBVDN

Protein Letters: ACDEFGHIKLMNPQRSTVWY
Protein, Extended: ACDEFGHIKLMNPQRSTVWYBXZ

OUTPUT OPTIONS

Please select one of the options below for your output

UPPER CASE
lower case

Optional: include numbering and line breaks every:

nucleotides/residues (0=do not use this option)
INPUT SEQUENCE

Paste your DNA sequence below in any format


A web application written in Python by Andrea Cabibbo