NCBI Minute: The Updated ORFfinder

NCBI Minute: The Updated ORFfinder

Hello and welcome to today’s NCBI minute
on our updated ORFfinder, an open reading frame prediction tool. I’m Wayne Matten, with me today is Peter
Cooper and you can see on this slide the closed captioning information. go ahead and put in your questions in
the question pod as you think of them. Peter may very well be able to answer
those while I’m talking but we’ll put those in a document, with any
that we get after I quit talking, into a link “Materials” on the webinars and
courses page. and i’ll just go ahead and show you that page. so this is webinars and courses. this is today’s webinar but that will
before very long move over to the archive page, and there you see a link in
our previous webinars to materials. so those questions and answers will be
there. there are no slides from today.
eventually there will be a video file in materials and we’ll also post a
recording on our youtube channel. I”ll probably get that up within the next
several days. if you have questions about ORF finder
after we stopped the webinar feel free to go ahead and email me at [email protected] or you can send general ncbi questions to that info email alias. so this is intended to be a very short
minute so let me go ahead and jump over to the ORF finder page. you can find this
in a web search if you search for ncbi orffinder. you may well come upon the old page but there’s a link there to this page. and the URL is simply the base NCBI homepage URL, forward slash, orffinder, all lowercase. so what’s new in ORFfinder
now? primarily it’s the interface. it looks more like many of our other
resources, you’ll see that particularly on the results page, but we do now also
have a standalone version. you can download that binary here at this linux x64 link. and the
standalone version does not have the limitation of processing 50 kb that the
web version does. the other addition is you can now send your translated orf
directly to SmartBLAST, and we still maintain the ability to send it to the
full blastp service as well. just a quick caveat before I demonstrate.
ORFfinder is not intended to be, you know, a gene prediction tool, particularly for
eukaryotes. it works in many cases fine as an ORFfinder in bacterial and
archaeal genomes, viral genomes and also for eukaryotic transcripts. I’m not going to demonstrate the
transcripts today but there’s a link here to demonstrate that for you if you
want to look at that. what I’m going to do today is enter an
accession for an archaeal genomic sequence. there’s that refseq accession [NZ_LT158599]. this
sequence is actually close to three million nucleotides long. i could put in
some coordinates here and from and to but ORFfinder will automatically stop
the processing at about 50 kb so i’m not going to bother with that. you have a few
options here i’m going to go ahead and select a slightly larger minimal orf
length just to clean up my results a bit. you may or may not want to do that
depending on your goals and your sequences. i know i’m looking at an archaeal
sequence so i’ll go ahead and select genetic code 11, and i’m going to choose
to include alternative initiation codons. you may in some cases want to use this
“any sense codon” option ,particularly if you have
a feeling that your sequence may not begin at the at the beginning of the
open reading frame. so if you choose any sense codon you’ll
be able to perhaps walk upstream to a different initiation codon. ok, I’m going to go ahead and check
ignore nested ORFs, again primarily just to clean up my output because this is a
fairly long input sequence. I’ll click the submit button. and this is running in real time, this is
very fast. and given the conditions i had found 115 ORFs, and it reminds me that it
have calculated that on the first 50 kb. I can also notice that by sliding over
here in the viewer and see that the processing stopped right about 50k. ok so those of you familiar with our
gene records and our genome browsers will recognize this graphical viewer as
our sequence viewer. and so it has the same controls for zooming and a very similar
list of tools, and i’ll point out a couple of those later that you might
like. and then below that graphical view is a
table that’s sorted initially by length of the ORF. so the longest ORFs on top, but these
columns are all sortable. so for example you might want to see the orfs in the
order on the input sequence so I’m going to go down here and select
orf 46, it’s the first longish one. and when I select that it then gets
highlighted up here with a marker in the sequence viewer. now many of you are used
to using a six-frame translation, and orf finder allows you to do that here, you click this
button to add the six frame translation track because this is sequence viewer. it adds that track and I’m zoomed out too
far to see any detail here so let’s go ahead and zoom in on this first large ORF46. I’ll mouse over that marker name and
that gives me the option then to zoom to sequence marker, so i’ll click that and
now I’m zoomed all the way into the sequence. the shading here represents the
beginning of orf 46 and you can see that in fact it used the ctg as an alternate
initiation codon to begin the ORF, and down on the table you can see that this
is in frame plus three, so let’s take a look at the six-frame
translation, + 3, and we see here is an ATG
and down just a little bit farther another ATG. and so that would
be a good use of blast perhaps to see if we can get some information about what
other related proteins actually start with. and we’ll do that in just a minute
but first let’s look at some of the other options we have on this results
page. so you notice a button here that’s grayed out at the moment that says
Download Marked Set, so i could mark all 115 of these ORFs with this button,
or i can pick and choose ORFs. so how do you mark them? you have one
selected here you’re looking at the translation here
in the box on the left then you can mark that one that you’re viewing. so that changes the color in the table
and it actually puts a little gray box around the ORF up in the viewer. so let’s say I want to select now orf 47. I click on that in the table. now it’s
being displayed on the left, the translation is, and then I can mark that
one as well. so now i have two marked ORFs up in the
sequence viewer and so now my button to download marked sets is active, and i can
download then either the FASTA, asn1, or feature table formats. you’ll
notice here that’s grayed out, you can not select it at
the moment, is CDS FASTA. that’s because this is not implemented
yet but hopefully will be before too long, and that will allow you to download
the nucleotide FASTA for that coding sequence region. ok and marking these also allowed us to
use BLAST Marked Sets. so these two buttons will send the
translations off to the standard blastp page. you can select a database here, but
you get to the full page where you can select other databases as well, and you
also have all the algorithm parameters you can play with. I’m just going to demonstrate smart
blast. SmartBLAST does not have a SmartBLAST
the marked sets button, because Smart blast will take only the one query sequence at a
time. so if you do want to analyze multiple ORFs at once with blast, use the
full blastp algorithm. i’m going to go ahead and unmarked
orf 47. so I have this up here I can click
unmark. now I have just one of these marked but I want to blast 46 not 47, so I click on that again in the table
and that loads orf 46 up here. so the question is which of these methiones
might be the actual one that’s used as the initiation codon? so note those up here and i’ll go ahead
and click on SmartBLAST . and again this is also running in real time, and it’s
finished. if you’re not familiar with SmartBLAST
we had a webinar on that back in September so you can check that out on our youtube
channel. this did in fact search the NR database but it uses a faster
algorithm so that’s why it comes back so quickly. it searches another database called
landmark, with a much smaller set of sequences, but that’s detail we don’t
need to worry about today. and we got some names here which means we probably
got some pretty decent hits. so I’m just going to mouse over that
name and click on alignment in that pop up menu. that takes me down the page to
the alignment section. so this in fact is a self, this is the
genomic piece that I put in for this particular strain. and we can see here,
you can see this is actually that second methionine, the first ATG methionine, and
then if we scroll down a bit here’s another species that uses the
same methionine, and scroll down to another one yet another species of the same genus,
it’s also using that one. so that’s some evidence anyway that it uses that first
ATG as the start of that particular coding region. all right I’m going to go back to orf
finder. i have just one more example here. so i’ll go back to the submitting page
and here I’m going to input an accession for a bacteriophage [af334111], and this is a
much smaller sequence. i’m going to just keep all the defaults
and go ahead and click Submit. it found 21 ORFs this time. and i just want to
point out, if i had checked the box for ignore nested orfs, I would have seen this one, this one, this
one, and this one. so I would have gotten four orfs and it would have ignored all these
smaller orfs that are completely enclosed by a larger orf. and we’re not
going to do any analysis here i’m at about over 10 minutes already, so
i’m going to wrap this up pretty quickly. i just wanted to point out since you’re
in the sequence viewer there’s some nice features here. for example you could
download a pdf version of this graphics view, you might want to get rid of the
label first on this orf by removing that marker. and if you wanted to send a link
of this view to a colleague, you go over here to the pull down menu next to the
question mark and click on Link to View. It tells you that this link will be active
for one to two months, then you can send your colleague that tinyurl there. ok that’s it for this
quick demonstration. I remind you if you have any questions you can write to me
or send those into info. do we have any questions online Peter?
Not right right now. alright i will close the webinar then. be
sure to check out our courses and webinars page and our YouTube channel to
find out more about what we’re doing thanks, goodbye.

5 thoughts on “NCBI Minute: The Updated ORFfinder

  1. hi im a absolute beginner and i have the tesk to find out, wich ORFs are actually coding proteins.. i cant find how to find that out

  2. It is the 2018 Oscars and Tom Hanks stands at the podium to announce Best Actor. “Boy,” he says, “we got a real competition this year! Johnny Depp, Leo DiCaprio, George Clooney, anyone could win!” The world holds its breath to see who will win the most prestigious acting award in the world. “Wow!” says Hanks. “I don’t believe it! The winner is Wayne Matten in his NCBI Minute: The Updated ORFfinder!”

Leave a Reply

Your email address will not be published. Required fields are marked *