1 | .\" Man page generated from reStructuredText. |
---|
2 | . |
---|
3 | .TH "BEDTOOLS" "1" "November 17, 2013" "2.16.2" "bedtools" |
---|
4 | .SH NAME |
---|
5 | bedtools \- Bedtools Documentation |
---|
6 | . |
---|
7 | .nr rst2man-indent-level 0 |
---|
8 | . |
---|
9 | .de1 rstReportMargin |
---|
10 | \\$1 \\n[an-margin] |
---|
11 | level \\n[rst2man-indent-level] |
---|
12 | level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] |
---|
13 | - |
---|
14 | \\n[rst2man-indent0] |
---|
15 | \\n[rst2man-indent1] |
---|
16 | \\n[rst2man-indent2] |
---|
17 | .. |
---|
18 | .de1 INDENT |
---|
19 | .\" .rstReportMargin pre: |
---|
20 | . RS \\$1 |
---|
21 | . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] |
---|
22 | . nr rst2man-indent-level +1 |
---|
23 | .\" .rstReportMargin post: |
---|
24 | .. |
---|
25 | .de UNINDENT |
---|
26 | . RE |
---|
27 | .\" indent \\n[an-margin] |
---|
28 | .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] |
---|
29 | .nr rst2man-indent-level -1 |
---|
30 | .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] |
---|
31 | .in \\n[rst2man-indent\\n[rst2man-indent-level]]u |
---|
32 | .. |
---|
33 | . |
---|
34 | .nr rst2man-indent-level 0 |
---|
35 | . |
---|
36 | .de1 rstReportMargin |
---|
37 | \\$1 \\n[an-margin] |
---|
38 | level \\n[rst2man-indent-level] |
---|
39 | level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] |
---|
40 | - |
---|
41 | \\n[rst2man-indent0] |
---|
42 | \\n[rst2man-indent1] |
---|
43 | \\n[rst2man-indent2] |
---|
44 | .. |
---|
45 | .de1 INDENT |
---|
46 | .\" .rstReportMargin pre: |
---|
47 | . RS \\$1 |
---|
48 | . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] |
---|
49 | . nr rst2man-indent-level +1 |
---|
50 | .\" .rstReportMargin post: |
---|
51 | .. |
---|
52 | .de UNINDENT |
---|
53 | . RE |
---|
54 | .\" indent \\n[an-margin] |
---|
55 | .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] |
---|
56 | .nr rst2man-indent-level -1 |
---|
57 | .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] |
---|
58 | .in \\n[rst2man-indent\\n[rst2man-indent-level]]u |
---|
59 | .. |
---|
60 | .sp |
---|
61 | Brief paragraph of the software. |
---|
62 | .SH OVERVIEW |
---|
63 | .SS 1.1 Background |
---|
64 | .sp |
---|
65 | The development of BEDTools was motivated by a need for fast, flexible tools with which to compare large sets of genomic |
---|
66 | features. Answering fundamental research questions with existing tools was either too slow or required modifications to the |
---|
67 | way they reported or computed their results. We were aware of the utilities on the UCSC Genome Browser and Galaxy websites, as |
---|
68 | well as the elegant tools available as part of Jim Kent’s monolithic suite of tools (“Kent source”). However, we found that |
---|
69 | the web\-based tools were too cumbersome when working with large datasets generated by current sequencing technologies. |
---|
70 | Similarly, we found that the Kent source command line tools often required a local installation of the UCSC Genome Browser. |
---|
71 | These limitations, combined with the fact that we often wanted an extra option here or there that wasn’t available with |
---|
72 | existing tools, led us to develop our own from scratch. The initial version of BEDTools was publicly released in the spring of |
---|
73 | 2009. The current version has evolved from our research experiences and those of the scientists using the suite over the last |
---|
74 | year. The BEDTools suite enables one to answer common questions of genomic data in a fast and reliable manner. The fact that |
---|
75 | almost all the utilities accept input from “stdin” allows one to “stream / pipe” several commands together to facilitate more |
---|
76 | complicated analyses. Also, the tools allow fine control over how output is reported. The initial version of BEDTools |
---|
77 | supported solely 6\-column \fI\%BED\fP files. \fIHowever, we have subsequently added support for sequence alignments in\fP \fI\%BAM\fP |
---|
78 | \fIformat, as well as for features in\fP \fI\%GFF\fP , \fI“blocked” BED format, and\fP |
---|
79 | \fI\%VCF\fP \fIformat\fP\&. |
---|
80 | The tools are quite fast and typically finish in a matter of a few seconds, even for large datasets. This manual seeks to describe the behavior and |
---|
81 | available functionality for each BEDTool. Usage examples are scattered throughout the text, and formal examples are |
---|
82 | provided in the last two sections, we hope that this document will give you a sense of the flexibility of |
---|
83 | the toolkit and the types of analyses that are possible with BEDTools. If you have further questions, please join the BEDTools |
---|
84 | discussion group, visit the Usage Examples on the Google Code site (usage, advanced usage), or take a look at the nascent |
---|
85 | “Usage From the Wild” page. |
---|
86 | .SS 1.2 Summary of available tools. |
---|
87 | .sp |
---|
88 | BEDTools support a wide range of operations for interrogating and manipulating genomic features. The table below summarizes |
---|
89 | the tools available in the suite. |
---|
90 | .TS |
---|
91 | center; |
---|
92 | |l|l|. |
---|
93 | _ |
---|
94 | T{ |
---|
95 | Utility |
---|
96 | T} T{ |
---|
97 | Description |
---|
98 | T} |
---|
99 | _ |
---|
100 | T{ |
---|
101 | \fBintersectBed\fP |
---|
102 | T} T{ |
---|
103 | Returns overlaps between two BED/GFF/VCF files. |
---|
104 | T} |
---|
105 | _ |
---|
106 | T{ |
---|
107 | \fBpairToBed\fP |
---|
108 | T} T{ |
---|
109 | Returns overlaps between a paired\-end BED file and a regular BED/VCF/GFF file. |
---|
110 | T} |
---|
111 | _ |
---|
112 | T{ |
---|
113 | \fBbamToBed\fP |
---|
114 | T} T{ |
---|
115 | Converts BAM alignments to BED6, BED12, or BEDPE format. |
---|
116 | T} |
---|
117 | _ |
---|
118 | T{ |
---|
119 | \fBbedToBam\fP |
---|
120 | T} T{ |
---|
121 | Converts BED/GFF/VCF features to BAM format. |
---|
122 | T} |
---|
123 | _ |
---|
124 | T{ |
---|
125 | \fBbed12ToBed6\fP |
---|
126 | T} T{ |
---|
127 | Converts "blocked" BED12 features to discrete BED6 features. |
---|
128 | T} |
---|
129 | _ |
---|
130 | T{ |
---|
131 | \fBbedToIgv\fP |
---|
132 | T} T{ |
---|
133 | Creates IGV batch scripts for taking multiple snapshots from BED/GFF/VCF features. |
---|
134 | T} |
---|
135 | _ |
---|
136 | T{ |
---|
137 | \fBcoverageBed\fP |
---|
138 | T} T{ |
---|
139 | Summarizes the depth and breadth of coverage of features in one BED versus features (e.g, windows, exons, etc.) defined in another BED/GFF/VCF file. |
---|
140 | T} |
---|
141 | _ |
---|
142 | T{ |
---|
143 | \fBmultiBamCov\fP |
---|
144 | T} T{ |
---|
145 | Counts sequence coverage for multiple position\-sorted bams at specific loci defined in a BED/GFF/VCF file |
---|
146 | T} |
---|
147 | _ |
---|
148 | T{ |
---|
149 | \fBtagBam\fP |
---|
150 | T} T{ |
---|
151 | Annotates a BAM file with custom tag fields based on overlaps with BED/GFF/VCF files |
---|
152 | T} |
---|
153 | _ |
---|
154 | T{ |
---|
155 | \fBnuclBed\fP |
---|
156 | T} T{ |
---|
157 | Profiles the nucleotide content of intervals in a fasta file |
---|
158 | T} |
---|
159 | _ |
---|
160 | T{ |
---|
161 | \fBgenomeCoverageBed\fP |
---|
162 | T} T{ |
---|
163 | Creates either a histogram, BEDGRAPH, or a "per base" report of genome coverage. |
---|
164 | T} |
---|
165 | _ |
---|
166 | T{ |
---|
167 | \fBunionBedGraphs\fP |
---|
168 | T} T{ |
---|
169 | Combines multiple BedGraph? files into a single file, allowing coverage/other comparisons between them. |
---|
170 | T} |
---|
171 | _ |
---|
172 | T{ |
---|
173 | \fBannotateBed\fP |
---|
174 | T} T{ |
---|
175 | Annotates one BED/VCF/GFF file with overlaps from many others. |
---|
176 | T} |
---|
177 | _ |
---|
178 | T{ |
---|
179 | \fBgroupBy\fP |
---|
180 | T} T{ |
---|
181 | Deprecated. Now in the filo package. |
---|
182 | T} |
---|
183 | _ |
---|
184 | T{ |
---|
185 | \fBoverlap\fP |
---|
186 | T} T{ |
---|
187 | Returns the number of bases pairs of overlap b/w two features on the same line. |
---|
188 | T} |
---|
189 | _ |
---|
190 | T{ |
---|
191 | \fBpairToPair\fP |
---|
192 | T} T{ |
---|
193 | Returns overlaps between two paired\-end BED files. |
---|
194 | T} |
---|
195 | _ |
---|
196 | T{ |
---|
197 | \fBclosestBed\fP |
---|
198 | T} T{ |
---|
199 | Returns the closest feature to each entry in a BED/GFF/VCF file. |
---|
200 | T} |
---|
201 | _ |
---|
202 | T{ |
---|
203 | \fBsubtractBed\fP |
---|
204 | T} T{ |
---|
205 | Removes the portion of an interval that is overlapped by another feature. |
---|
206 | T} |
---|
207 | _ |
---|
208 | T{ |
---|
209 | \fBwindowBed\fP |
---|
210 | T} T{ |
---|
211 | Returns overlaps between two BED/VCF/GFF files based on a user\-defined window. |
---|
212 | T} |
---|
213 | _ |
---|
214 | T{ |
---|
215 | \fBmergeBed\fP |
---|
216 | T} T{ |
---|
217 | Merges overlapping features into a single feature. |
---|
218 | T} |
---|
219 | _ |
---|
220 | T{ |
---|
221 | \fBcomplementBed\fP |
---|
222 | T} T{ |
---|
223 | Returns all intervals not spanned by the features in a BED/GFF/VCF file. |
---|
224 | T} |
---|
225 | _ |
---|
226 | T{ |
---|
227 | \fBfastaFromBed\fP |
---|
228 | T} T{ |
---|
229 | Creates FASTA sequences based on intervals in a BED/GFF/VCF file. |
---|
230 | T} |
---|
231 | _ |
---|
232 | T{ |
---|
233 | \fBmaskFastaFromBed\fP |
---|
234 | T} T{ |
---|
235 | Masks a FASTA file based on BED coordinates. |
---|
236 | T} |
---|
237 | _ |
---|
238 | T{ |
---|
239 | \fBshuffleBed\fP |
---|
240 | T} T{ |
---|
241 | Randomly permutes the locations of a BED file among a genome. |
---|
242 | T} |
---|
243 | _ |
---|
244 | T{ |
---|
245 | \fBslopBed\fP |
---|
246 | T} T{ |
---|
247 | Adjusts each BED entry by a requested number of base pairs. |
---|
248 | T} |
---|
249 | _ |
---|
250 | T{ |
---|
251 | \fBflankBed\fP |
---|
252 | T} T{ |
---|
253 | Creates flanking intervals for each feature in a BED/GFF/VCF file. |
---|
254 | T} |
---|
255 | _ |
---|
256 | T{ |
---|
257 | \fBsortBed\fP |
---|
258 | T} T{ |
---|
259 | Sorts a BED file by chrom, then start position. Other ways as well. |
---|
260 | T} |
---|
261 | _ |
---|
262 | T{ |
---|
263 | \fBlinksBed\fP |
---|
264 | T} T{ |
---|
265 | Creates an HTML file of links to the UCSC or a custom browser. |
---|
266 | T} |
---|
267 | _ |
---|
268 | .TE |
---|
269 | .SS 1.3 Fundamental concepts. |
---|
270 | .SS 1.3.1 What are genome features and how are they represented? |
---|
271 | .sp |
---|
272 | Throughout this manual, we will discuss how to use BEDTools to manipulate, compare and ask questions of genome “features”. Genome features can be functional elements (e.g., genes), genetic polymorphisms (e.g. |
---|
273 | SNPs, INDELs, or structural variants), or other annotations that have been discovered or curated by genome sequencing groups or genome browser groups. In addition, genome features can be custom annotations that |
---|
274 | an individual lab or researcher defines (e.g., my novel gene or variant). |
---|
275 | .sp |
---|
276 | The basic characteristics of a genome feature are the chromosome or scaffold on which the feature “resides”, the base pair on which the |
---|
277 | feature starts (i.e. the “start”), the base pair on which feature ends (i.e. the “end”), the strand on which the feature exists (i.e. “+” or “\-“), and the name of the feature if one is applicable. |
---|
278 | .sp |
---|
279 | The two most widely used formats for representing genome features are the BED (Browser Extensible Data) and GFF (General Feature Format) formats. BEDTools was originally written to work exclusively with genome features |
---|
280 | described using the BED format, but it has been recently extended to seamlessly work with BED, GFF and VCF files. |
---|
281 | .sp |
---|
282 | Existing annotations for the genomes of many species can be easily downloaded in BED and GFF |
---|
283 | format from the UCSC Genome Browser’s “Table Browser” (\fI\%http://genome.ucsc.edu/cgi-bin/hgTables?command=start\fP) or from the “Bulk Downloads” page (\fI\%http://hgdownload.cse.ucsc.edu/downloads.html\fP). In addition, the |
---|
284 | Ensemble Genome Browser contains annotations in GFF/GTF format for many species (\fI\%http://www.ensembl.org/info/data/ftp/index.html\fP) |
---|
285 | .SS 1.3.2 Overlapping / intersecting features. |
---|
286 | .sp |
---|
287 | Two genome features (henceforth referred to as “features”) are said to overlap or intersect if they share at least one base in common. |
---|
288 | In the figure below, Feature A intersects/overlaps Feature B, but it does not intersect/overlap Feature C. |
---|
289 | .sp |
---|
290 | \fBTODO: place figure here\fP |
---|
291 | .SS 1.3.3 Comparing features in file “A” and file “B”. |
---|
292 | .sp |
---|
293 | The previous section briefly introduced a fundamental naming convention used in BEDTools. Specifically, all BEDTools that compare features contained in two distinct files refer to one file as feature set “A” and the other file as feature set “B”. This is mainly in the interest of brevity, but it also has its roots in set theory. |
---|
294 | As an example, if one wanted to look for SNPs (file A) that overlap with exons (file B), one would use intersectBed in the following manner: |
---|
295 | .INDENT 0.0 |
---|
296 | .INDENT 3.5 |
---|
297 | .sp |
---|
298 | .nf |
---|
299 | .ft C |
---|
300 | intersectBed –a snps.bed –b exons.bed |
---|
301 | .ft P |
---|
302 | .fi |
---|
303 | .UNINDENT |
---|
304 | .UNINDENT |
---|
305 | .sp |
---|
306 | There are two exceptions to this rule: 1) When the “A” file is in BAM format, the “\-abam” option must bed used. For example: |
---|
307 | .INDENT 0.0 |
---|
308 | .INDENT 3.5 |
---|
309 | .sp |
---|
310 | .nf |
---|
311 | .ft C |
---|
312 | intersectBed –abam alignedReads.bam –b exons.bed |
---|
313 | .ft P |
---|
314 | .fi |
---|
315 | .UNINDENT |
---|
316 | .UNINDENT |
---|
317 | .sp |
---|
318 | And 2) For tools where only one input feature file is needed, the “\-i” option is used. For example: |
---|
319 | .INDENT 0.0 |
---|
320 | .INDENT 3.5 |
---|
321 | .sp |
---|
322 | .nf |
---|
323 | .ft C |
---|
324 | mergeBed –i repeats.bed |
---|
325 | .ft P |
---|
326 | .fi |
---|
327 | .UNINDENT |
---|
328 | .UNINDENT |
---|
329 | .SS 1.3.4 BED starts are zero\-based and BED ends are one\-based. |
---|
330 | .sp |
---|
331 | BEDTools users are sometimes confused by the way the start and end of BED features are represented. Specifically, BEDTools uses the UCSC Genome Browser’s internal database convention of making the start position 0\-based and the end position 1\-based: (\fI\%http://genome.ucsc.edu/FAQ/FAQtracks#tracks1\fP) |
---|
332 | In other words, BEDTools interprets the “start” column as being 1 basepair higher than what is represented in the file. For example, the following BED feature represents a single base on chromosome 1; namely, the 1st base: |
---|
333 | .INDENT 0.0 |
---|
334 | .INDENT 3.5 |
---|
335 | .sp |
---|
336 | .nf |
---|
337 | .ft C |
---|
338 | chr1 0 1 first_base |
---|
339 | .ft P |
---|
340 | .fi |
---|
341 | .UNINDENT |
---|
342 | .UNINDENT |
---|
343 | .sp |
---|
344 | Why, you might ask? The advantage of storing features this way is that when computing the length of a feature, one must simply subtract the start from the end. Were the start position 1\-based, |
---|
345 | the calculation would be (slightly) more complex (i.e. (end\-start)+1). Thus, storing BED features this way reduces the computational burden. |
---|
346 | .SS 1.3.5 GFF starts and ends are one\-based. |
---|
347 | .sp |
---|
348 | In contrast, the GFF format uses 1\-based coordinates for both the start and the end positions. BEDTools is aware of this and adjusts the positions accordingly. |
---|
349 | In other words, you don’t need to subtract 1 from the start positions of your GFF features for them to work correctly with BEDTools. |
---|
350 | .SS 1.3.6 VCF coordinates are one\-based. |
---|
351 | .sp |
---|
352 | The VCF format uses 1\-based coordinates. As in GFF, BEDTools is aware of this and adjusts the positions accordingly. |
---|
353 | In other words, you don’t need to subtract 1 from the start positions of your VCF features for them to work correctly with BEDTools. |
---|
354 | .SS 1.3.7 File B is loaded into memory (most of the time). |
---|
355 | .sp |
---|
356 | Whenever a BEDTool compares two files of features, the “B” file is loaded into memory. By contrast, the “A” file is processed line by line and compared with the features from B. |
---|
357 | Therefore to minimize memory usage, one should set the smaller of the two files as the B file. One salient example is the comparison of aligned sequence reads from a |
---|
358 | current DNA sequencer to gene annotations. In this case, the aligned sequence file (in BED format) may have tens of millions of features (the sequence alignments), |
---|
359 | while the gene annotation file will have tens of thousands of features. In this case, it is wise to sets the reads as file A and the genes as file B. |
---|
360 | .SS 1.3.8 Feature files \fImust\fP be tab\-delimited. |
---|
361 | .sp |
---|
362 | This is rather self\-explanatory. While it is possible to allow BED files to be space\-delimited, we have decided to require tab delimiters for three reasons: |
---|
363 | .INDENT 0.0 |
---|
364 | .IP 1. 3 |
---|
365 | By requiring one delimiter type, the processing time is minimized. |
---|
366 | .IP 2. 3 |
---|
367 | Tab\-delimited files are more amenable to other UNIX utilities. |
---|
368 | .IP 3. 3 |
---|
369 | GFF files can contain spaces within attribute columns. This complicates the use of space\-delimited files as spaces must therefore be treated specially depending on the context. |
---|
370 | .UNINDENT |
---|
371 | .SS 1.3.9 All BEDTools allow features to be “piped” via standard input. |
---|
372 | .sp |
---|
373 | In an effort to allow one to combine multiple BEDTools and other UNIX utilities into more complicated “pipelines”, all BEDTools allow features |
---|
374 | to be passed to them via standard input. Only one feature file may be passed to a BEDTool via standard input. |
---|
375 | The convention used by all BEDTools is to set either file A or file B to “stdin” or "\-". For example: |
---|
376 | .INDENT 0.0 |
---|
377 | .INDENT 3.5 |
---|
378 | .sp |
---|
379 | .nf |
---|
380 | .ft C |
---|
381 | cat snps.bed | intersectBed –a stdin –b exons.bed |
---|
382 | cat snps.bed | intersectBed –a \- –b exons.bed |
---|
383 | .ft P |
---|
384 | .fi |
---|
385 | .UNINDENT |
---|
386 | .UNINDENT |
---|
387 | .sp |
---|
388 | In addition, all BEDTools that simply require one main input file (the \-i file) will assume that input is |
---|
389 | coming from standard input if the \-i parameter is ignored. For example, the following are equivalent: |
---|
390 | .INDENT 0.0 |
---|
391 | .INDENT 3.5 |
---|
392 | .sp |
---|
393 | .nf |
---|
394 | .ft C |
---|
395 | cat snps.bed | sortBed –i stdin |
---|
396 | cat snps.bed | sortBed |
---|
397 | .ft P |
---|
398 | .fi |
---|
399 | .UNINDENT |
---|
400 | .UNINDENT |
---|
401 | .SS 1.3.10 Most BEDTools write their results to standard output. |
---|
402 | .sp |
---|
403 | To allow one to combine multiple BEDTools and other UNIX utilities into more complicated “pipelines”, |
---|
404 | most BEDTools report their output to standard output, rather than to a named file. If one wants to write the output to a named file, one can use the UNIX “file redirection” symbol “>” to do so. |
---|
405 | Writing to standard output (the default): |
---|
406 | .INDENT 0.0 |
---|
407 | .INDENT 3.5 |
---|
408 | .sp |
---|
409 | .nf |
---|
410 | .ft C |
---|
411 | intersectBed –a snps.bed –b exons.bed |
---|
412 | chr1 100100 100101 rs233454 |
---|
413 | chr1 200100 200101 rs446788 |
---|
414 | chr1 300100 300101 rs645678 |
---|
415 | .ft P |
---|
416 | .fi |
---|
417 | .UNINDENT |
---|
418 | .UNINDENT |
---|
419 | .sp |
---|
420 | Writing to a file: |
---|
421 | .INDENT 0.0 |
---|
422 | .INDENT 3.5 |
---|
423 | .sp |
---|
424 | .nf |
---|
425 | .ft C |
---|
426 | intersectBed –a snps.bed –b exons.bed > snps.in.exons.bed |
---|
427 | |
---|
428 | cat snps.in.exons.bed |
---|
429 | chr1 100100 100101 rs233454 |
---|
430 | chr1 200100 200101 rs446788 |
---|
431 | chr1 300100 300101 rs645678 |
---|
432 | .ft P |
---|
433 | .fi |
---|
434 | .UNINDENT |
---|
435 | .UNINDENT |
---|
436 | .SS 1.3.11 What is a “genome” file? |
---|
437 | .sp |
---|
438 | Some of the BEDTools (e.g., genomeCoverageBed, complementBed, slopBed) need to know the size of |
---|
439 | the chromosomes for the organism for which your BED files are based. When using the UCSC Genome |
---|
440 | Browser, Ensemble, or Galaxy, you typically indicate which species / genome build you are working. |
---|
441 | The way you do this for BEDTools is to create a “genome” file, which simply lists the names of the |
---|
442 | chromosomes (or scaffolds, etc.) and their size (in basepairs). |
---|
443 | Genome files must be tab\-delimited and are structured as follows (this is an example for C. elegans): |
---|
444 | .INDENT 0.0 |
---|
445 | .INDENT 3.5 |
---|
446 | .sp |
---|
447 | .nf |
---|
448 | .ft C |
---|
449 | chrI 15072421 |
---|
450 | chrII 15279323 |
---|
451 | \&... |
---|
452 | chrX 17718854 |
---|
453 | chrM 13794 |
---|
454 | .ft P |
---|
455 | .fi |
---|
456 | .UNINDENT |
---|
457 | .UNINDENT |
---|
458 | .sp |
---|
459 | BEDTools includes predefined genome files for human and mouse in the /genomes directory included |
---|
460 | in the BEDTools distribution. Additionally, the “chromInfo” files/tables available from the UCSC |
---|
461 | Genome Browser website are acceptable. For example, one can download the hg19 chromInfo file here: |
---|
462 | \fI\%http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz\fP |
---|
463 | .SS 1.3.12 Paired\-end BED files (BEDPE files). |
---|
464 | .sp |
---|
465 | We have defined a new file format (BEDPE) to concisely describe disjoint genome features, such as |
---|
466 | structural variations or paired\-end sequence alignments. We chose to define a new format because the |
---|
467 | existing BED block format (i.e. BED12) does not allow inter\-chromosomal feature definitions. Moreover, |
---|
468 | the BED12 format feels rather bloated when one want to describe events with only two blocks. |
---|
469 | .SS 1.3.13 Use “\-h” for help with any BEDTool. |
---|
470 | .sp |
---|
471 | Rather straightforward. If you use the “\-h” option with any BEDTool, a full menu of example usage |
---|
472 | and available options (when applicable) will be reported. |
---|
473 | .SS 1.3.14 BED features must not contain negative positions. |
---|
474 | .sp |
---|
475 | BEDTools will typically reject BED features that contain negative positions. In special cases, however, |
---|
476 | BEDPE positions may be set to \-1 to indicate that one or more ends of a BEDPE feature is unaligned. |
---|
477 | .SS 1.3.15 The start position must be <= to the end position. |
---|
478 | .sp |
---|
479 | BEDTools will reject BED features where the start position is greater than the end position. |
---|
480 | .SS 1.3.16 Headers are allowed in GFF and BED files |
---|
481 | .sp |
---|
482 | BEDTools will ignore headers at the beginning of BED and GFF files. Valid header lines begin with a |
---|
483 | “#” symbol, the work “track”, or the word “browser”. For example, the following examples are valid |
---|
484 | headers for BED or GFF files: |
---|
485 | .INDENT 0.0 |
---|
486 | .INDENT 3.5 |
---|
487 | .sp |
---|
488 | .nf |
---|
489 | .ft C |
---|
490 | track name=aligned_read description="Illumina aligned reads” |
---|
491 | chr5 100000 500000 read1 50 + |
---|
492 | chr5 2380000 2386000 read2 60 \- |
---|
493 | |
---|
494 | #This is a fascinating dataset |
---|
495 | chr5 100000 500000 read1 50 + |
---|
496 | chr5 2380000 2386000 read2 60 \- |
---|
497 | |
---|
498 | browser position chr22:1\-20000 |
---|
499 | chr5 100000 500000 read1 50 + |
---|
500 | chr5 2380000 2386000 read2 60 \- |
---|
501 | .ft P |
---|
502 | .fi |
---|
503 | .UNINDENT |
---|
504 | .UNINDENT |
---|
505 | .SS 1.3.17 GZIP support: BED, GFF, VCF, and BEDPE file can be “gzipped” |
---|
506 | .sp |
---|
507 | BEDTools will process gzipped BED, GFF, VCF and BEDPE files in the same manner as |
---|
508 | uncompressed files. Gzipped files are auto\-detected thanks to a helpful contribution from Gordon |
---|
509 | Assaf. |
---|
510 | .SS 1.3.18 Support for “split” or “spliced” BAM alignments and “blocked” BED features |
---|
511 | .sp |
---|
512 | As of Version 2.8.0, five BEDTools (\fBintersectBed\fP, \fBcoverageBed\fP, \fBgenomeCoverageBed\fP, |
---|
513 | \fBbamToBed\fP, and \fBbed12ToBed6\fP) can properly handle “split”/”spliced” BAM alignments (i.e., having an |
---|
514 | “N” CIGAR operation) and/or “blocked” BED (aka BED12) features. |
---|
515 | .sp |
---|
516 | \fBintersectBed\fP, \fBcoverageBed\fP, and \fBgenomeCoverageBed\fP will optionally handle “split” BAM and/or |
---|
517 | “blocked” BED by using the \fB\-split\fP option. This will cause intersects or coverage to be computed only |
---|
518 | for the alignment or feature blocks. In contrast, without this option, the intersects/coverage would be |
---|
519 | computed for the entire “span” of the alignment or feature, regardless of the size of the gaps between |
---|
520 | each alignment or feature block. For example, imagine you have a RNA\-seq read that originates from |
---|
521 | the junction of two exons that were spliced together in a mRNA. In the genome, these two exons |
---|
522 | happen to be 30Kb apart. Thus, when the read is aligned to the reference genome, one portion of the |
---|
523 | read will align to the first exon, while another portion of the read will align ca. 30Kb downstream to the |
---|
524 | other exon. The corresponding CIGAR string would be something like (assuming a 76bp read): |
---|
525 | 30M*3000N*46M. In the genome, this alignment “spans” 3076 bp, yet the nucleotides in the sequencing |
---|
526 | read only align “cover” 76bp. Without the \fB\-split\fP option, coverage or overlaps would be reported for the |
---|
527 | entire 3076bp span of the alignment. However, with the \fB\-split\fP option, coverage or overlaps will only |
---|
528 | be reported for the portions of the read that overlap the exons (i.e. 30bp on one exon, and |
---|
529 | 46bp on the other). |
---|
530 | .sp |
---|
531 | Using the \-split option with bamToBed causes “spliced/split” alignments to be reported in BED12 |
---|
532 | format. Using the \-split option with bed12ToBed6 causes “blocked” BED12 features to be reported in |
---|
533 | BED6 format. |
---|
534 | .SS 1.3.19 Writing uncompressed BAM output. |
---|
535 | .sp |
---|
536 | When working with a large BAM file using a complex set of tools in a pipe/stream, it is advantageous |
---|
537 | to pass uncompressed BAM output to each downstream program. This minimizes the amount of time |
---|
538 | spent compressing and decompressing output from one program to the next. All BEDTools that create |
---|
539 | BAM output (e.g. \fBintersectBed\fP, \fBwindowBed\fP) will now optionally create uncompressed BAM output |
---|
540 | using the \fB\-ubam\fP option. |
---|
541 | .SS 1.4 Implementation and algorithmic notes. |
---|
542 | .sp |
---|
543 | BEDTools was implemented in C++ and makes extensive use of data structures and fundamental |
---|
544 | algorithms from the Standard Template Library (STL). Many of the core algorithms are based upon the |
---|
545 | genome binning algorithm described in the original UCSC Genome Browser paper (Kent et al, 2002). |
---|
546 | The tools have been designed to inherit core data structures from central source files, thus allowing |
---|
547 | rapid tool development and deployment of improvements and corrections. Support for BAM files is |
---|
548 | made possible through Derek Barnett’s elegant C++ API called BamTools. |
---|
549 | .SS 1.5 License and availability. |
---|
550 | .sp |
---|
551 | BEDTools is freely available under a GNU Public License (Version 2) at: |
---|
552 | \fI\%http://bedtools.googlecode.com\fP |
---|
553 | .SS 1.6 Mailing list. |
---|
554 | .sp |
---|
555 | A discussion group for reporting bugs, asking questions of the developer and of the user community, as |
---|
556 | well as for requesting new features is available at: |
---|
557 | \fI\%http://groups.google.com/group/bedtools-discuss\fP |
---|
558 | .SS 1.7 Contributors. |
---|
559 | .sp |
---|
560 | As open\-source software, BEDTools greatly benefits from contributions made by other developers and |
---|
561 | users of the tools. We encourage and welcome suggestions, contributions and complaints. This is how |
---|
562 | software matures, improves and stays on top of the needs of its user community. The Google Code |
---|
563 | (GC) site maintains a list of individuals who have contributed either source code or useful ideas for |
---|
564 | improving the tools. In the near future, we hope to maintain a source repository on the GC site in |
---|
565 | order to facilitate further contributions. We are currently unable to do so because we use Git for |
---|
566 | version control, which is not yet supported by GC. |
---|
567 | .SH INSTALLATION |
---|
568 | .sp |
---|
569 | BEDTools is intended to run in a "command line" environment on UNIX, LINUX and Apple OS X |
---|
570 | operating systems. Installing BEDTools involves downloading the latest source code archive followed by |
---|
571 | compiling the source code into binaries on your local system. The following commands will install |
---|
572 | BEDTools in a local directory on a NIX or OS X machine. Note that the \fB"<version>"\fP refers to the |
---|
573 | latest posted version number on \fI\%http://bedtools.googlecode.com/\fP\&. |
---|
574 | .sp |
---|
575 | Note: \fIThe BEDTools "makefiles" use the GCC compiler. One should edit the Makefiles accordingly if |
---|
576 | one wants to use a different compiler.\fP: |
---|
577 | .INDENT 0.0 |
---|
578 | .INDENT 3.5 |
---|
579 | .sp |
---|
580 | .nf |
---|
581 | .ft C |
---|
582 | curl http://bedtools.googlecode.com/files/BEDTools.<version>.tar.gz > BEDTools.tar.gz |
---|
583 | tar \-zxvf BEDTools.tar.gz |
---|
584 | cd BEDTools\-<version> |
---|
585 | make clean |
---|
586 | make all |
---|
587 | ls bin |
---|
588 | .ft P |
---|
589 | .fi |
---|
590 | .UNINDENT |
---|
591 | .UNINDENT |
---|
592 | .sp |
---|
593 | At this point, one should copy the binaries in BEDTools/bin/ to either usr/local/bin/ or some |
---|
594 | other repository for commonly used UNIX tools in your environment. You will typically require |
---|
595 | administrator (e.g. "root" or "sudo") privileges to copy to usr/local/bin/. If in doubt, contact you |
---|
596 | system administrator for help. |
---|
597 | .SH QUICK START |
---|
598 | .SS Install BEDTools |
---|
599 | .INDENT 0.0 |
---|
600 | .INDENT 3.5 |
---|
601 | .sp |
---|
602 | .nf |
---|
603 | .ft C |
---|
604 | curl http://bedtools.googlecode.com/files/BEDTools.<version>.tar.gz > BEDTools.tar.gz |
---|
605 | tar \-zxvf BEDTools.tar.gz |
---|
606 | cd BEDTools |
---|
607 | make clean |
---|
608 | make all |
---|
609 | sudo cp bin/* /usr/local/bin/ |
---|
610 | .ft P |
---|
611 | .fi |
---|
612 | .UNINDENT |
---|
613 | .UNINDENT |
---|
614 | .SS Use BEDTools |
---|
615 | .sp |
---|
616 | Below are examples of typical BEDTools usage. \fBAdditional usage examples are described in |
---|
617 | section 6 of this manual.\fP Using the "\-h" option with any BEDTools will report a list of all command |
---|
618 | line options. |
---|
619 | .sp |
---|
620 | A. Report the base\-pair overlap between the features in two BED files. |
---|
621 | .INDENT 0.0 |
---|
622 | .INDENT 3.5 |
---|
623 | .sp |
---|
624 | .nf |
---|
625 | .ft C |
---|
626 | intersectBed \-a reads.bed \-b genes.bed |
---|
627 | .ft P |
---|
628 | .fi |
---|
629 | .UNINDENT |
---|
630 | .UNINDENT |
---|
631 | .sp |
---|
632 | B. Report those entries in A that overlap NO entries in B. Like "grep \-v" |
---|
633 | .INDENT 0.0 |
---|
634 | .INDENT 3.5 |
---|
635 | .sp |
---|
636 | .nf |
---|
637 | .ft C |
---|
638 | intersectBed \-a reads.bed \-b genes.bed ?Cv |
---|
639 | .ft P |
---|
640 | .fi |
---|
641 | .UNINDENT |
---|
642 | .UNINDENT |
---|
643 | .sp |
---|
644 | C. Read BED A from stdin. Useful for stringing together commands. For example, find genes that overlap LINEs |
---|
645 | but not SINEs. |
---|
646 | .INDENT 0.0 |
---|
647 | .INDENT 3.5 |
---|
648 | .sp |
---|
649 | .nf |
---|
650 | .ft C |
---|
651 | intersectBed \-a genes.bed \-b LINES.bed | intersectBed \-a stdin \-b SINEs.bed ?Cv |
---|
652 | .ft P |
---|
653 | .fi |
---|
654 | .UNINDENT |
---|
655 | .UNINDENT |
---|
656 | .sp |
---|
657 | D. Find the closest ALU to each gene. |
---|
658 | .INDENT 0.0 |
---|
659 | .INDENT 3.5 |
---|
660 | .sp |
---|
661 | .nf |
---|
662 | .ft C |
---|
663 | closestBed \-a genes.bed \-b ALUs.bed |
---|
664 | .ft P |
---|
665 | .fi |
---|
666 | .UNINDENT |
---|
667 | .UNINDENT |
---|
668 | .sp |
---|
669 | E. Merge overlapping repetitive elements into a single entry, returning the number of entries merged. |
---|
670 | .INDENT 0.0 |
---|
671 | .INDENT 3.5 |
---|
672 | .sp |
---|
673 | .nf |
---|
674 | .ft C |
---|
675 | mergeBed \-i repeatMasker.bed \-n |
---|
676 | .ft P |
---|
677 | .fi |
---|
678 | .UNINDENT |
---|
679 | .UNINDENT |
---|
680 | .sp |
---|
681 | F. Merge nearby repetitive elements into a single entry, so long as they are within 1000 bp of one another. |
---|
682 | .INDENT 0.0 |
---|
683 | .INDENT 3.5 |
---|
684 | .sp |
---|
685 | .nf |
---|
686 | .ft C |
---|
687 | mergeBed \-i repeatMasker.bed \-d 1000 |
---|
688 | .ft P |
---|
689 | .fi |
---|
690 | .UNINDENT |
---|
691 | .UNINDENT |
---|
692 | .SH GENERAL USAGE |
---|
693 | .SS 4.1 Supported file formats |
---|
694 | .SS 4.1.1 BED format |
---|
695 | .sp |
---|
696 | As described on the UCSC Genome Browser website (see link below), the BED format is a concise and |
---|
697 | flexible way to represent genomic features and annotations. The BED format description supports up to |
---|
698 | 12 columns, but only the first 3 are required for the UCSC browser, the Galaxy browser and for |
---|
699 | BEDTools. BEDTools allows one to use the "BED12" format (that is, all 12 fields listed below). |
---|
700 | However, only intersectBed, coverageBed, genomeCoverageBed, and bamToBed will obey the BED12 |
---|
701 | "blocks" when computing overlaps, etc., via the \fB"\-split"\fP option. For all other tools, the last six columns |
---|
702 | are not used for any comparisons by the BEDTools. Instead, they will use the entire span (start to end) |
---|
703 | of the BED12 entry to perform any relevant feature comparisons. The last six columns will be reported |
---|
704 | in the output of all comparisons. |
---|
705 | .sp |
---|
706 | The file description below is modified from: \fI\%http://genome.ucsc.edu/FAQ/FAQformat#format1\fP\&. |
---|
707 | .INDENT 0.0 |
---|
708 | .IP 1. 3 |
---|
709 | \fBchrom\fP \- The name of the chromosome on which the genome feature exists. |
---|
710 | .UNINDENT |
---|
711 | .INDENT 0.0 |
---|
712 | .INDENT 3.5 |
---|
713 | .INDENT 0.0 |
---|
714 | .IP \(bu 2 |
---|
715 | \fIAny string can be used\fP\&. For example, "chr1", "III", "myChrom", "contig1112.23". |
---|
716 | .IP \(bu 2 |
---|
717 | \fIThis column is required\fP\&. |
---|
718 | .UNINDENT |
---|
719 | .UNINDENT |
---|
720 | .UNINDENT |
---|
721 | .INDENT 0.0 |
---|
722 | .IP 2. 3 |
---|
723 | \fBstart\fP \- The zero\-based starting position of the feature in the chromosome. |
---|
724 | .UNINDENT |
---|
725 | .INDENT 0.0 |
---|
726 | .INDENT 3.5 |
---|
727 | .INDENT 0.0 |
---|
728 | .IP \(bu 2 |
---|
729 | \fIThe first base in a chromosome is numbered 0\fP\&. |
---|
730 | .IP \(bu 2 |
---|
731 | \fIThe start position in each BED feature is therefore interpreted to be 1 greater than the start position listed in the feature. For example, start=9, end=20 is interpreted to span bases 10 through 20,inclusive\fP\&. |
---|
732 | .IP \(bu 2 |
---|
733 | \fIThis column is required\fP\&. |
---|
734 | .UNINDENT |
---|
735 | .UNINDENT |
---|
736 | .UNINDENT |
---|
737 | .INDENT 0.0 |
---|
738 | .IP 3. 3 |
---|
739 | \fBend\fP \- The one\-based ending position of the feature in the chromosome. |
---|
740 | .UNINDENT |
---|
741 | .INDENT 0.0 |
---|
742 | .INDENT 3.5 |
---|
743 | .INDENT 0.0 |
---|
744 | .IP \(bu 2 |
---|
745 | \fIThe end position in each BED feature is one\-based. See example above\fP\&. |
---|
746 | .IP \(bu 2 |
---|
747 | \fIThis column is required\fP\&. |
---|
748 | .UNINDENT |
---|
749 | .UNINDENT |
---|
750 | .UNINDENT |
---|
751 | .INDENT 0.0 |
---|
752 | .IP 4. 3 |
---|
753 | \fBname\fP \- Defines the name of the BED feature. |
---|
754 | .UNINDENT |
---|
755 | .INDENT 0.0 |
---|
756 | .INDENT 3.5 |
---|
757 | .INDENT 0.0 |
---|
758 | .IP \(bu 2 |
---|
759 | \fIAny string can be used\fP\&. For example, "LINE", "Exon3", "HWIEAS_0001:3:1:0:266#0/1", or "my_Feature". |
---|
760 | .IP \(bu 2 |
---|
761 | \fIThis column is optional\fP\&. |
---|
762 | .UNINDENT |
---|
763 | .UNINDENT |
---|
764 | .UNINDENT |
---|
765 | .INDENT 0.0 |
---|
766 | .IP 5. 3 |
---|
767 | \fBscore\fP \- The UCSC definition requires that a BED score range from 0 to 1000, inclusive. However, BEDTools allows any string to be stored in this field in order to allow greater flexibility in annotation features. For example, strings allow scientific notation for p\-values, mean enrichment values, etc. It should be noted that this flexibility could prevent such annotations from being correctly displayed on the UCSC browser. |
---|
768 | .UNINDENT |
---|
769 | .INDENT 0.0 |
---|
770 | .INDENT 3.5 |
---|
771 | .INDENT 0.0 |
---|
772 | .IP \(bu 2 |
---|
773 | \fIAny string can be used\fP\&. For example, 7.31E\-05 (p\-value), 0.33456 (mean enrichment value), "up", "down", etc. |
---|
774 | .IP \(bu 2 |
---|
775 | \fIThis column is optional\fP\&. |
---|
776 | .UNINDENT |
---|
777 | .UNINDENT |
---|
778 | .UNINDENT |
---|
779 | .INDENT 0.0 |
---|
780 | .IP 6. 3 |
---|
781 | \fBstrand\fP \- Defines the strand \- either \(aq+\(aq or \(aq\-\(aq. |
---|
782 | .UNINDENT |
---|
783 | .INDENT 0.0 |
---|
784 | .INDENT 3.5 |
---|
785 | .INDENT 0.0 |
---|
786 | .IP \(bu 2 |
---|
787 | \fIThis column is optional\fP\&. |
---|
788 | .UNINDENT |
---|
789 | .UNINDENT |
---|
790 | .UNINDENT |
---|
791 | .INDENT 0.0 |
---|
792 | .IP 7. 3 |
---|
793 | \fBthickStart\fP \- The starting position at which the feature is drawn thickly. |
---|
794 | .UNINDENT |
---|
795 | .INDENT 0.0 |
---|
796 | .INDENT 3.5 |
---|
797 | .INDENT 0.0 |
---|
798 | .IP \(bu 2 |
---|
799 | \fIAllowed yet ignored by BEDTools\fP\&. |
---|
800 | .UNINDENT |
---|
801 | .UNINDENT |
---|
802 | .UNINDENT |
---|
803 | .INDENT 0.0 |
---|
804 | .IP 8. 3 |
---|
805 | \fBthickEnd\fP \- The ending position at which the feature is drawn thickly. |
---|
806 | .UNINDENT |
---|
807 | .INDENT 0.0 |
---|
808 | .INDENT 3.5 |
---|
809 | .INDENT 0.0 |
---|
810 | .IP \(bu 2 |
---|
811 | \fIAllowed yet ignored by BEDTools\fP\&. |
---|
812 | .UNINDENT |
---|
813 | .UNINDENT |
---|
814 | .UNINDENT |
---|
815 | .INDENT 0.0 |
---|
816 | .IP 9. 3 |
---|
817 | \fBitemRgb\fP \- An RGB value of the form R,G,B (e.g. 255,0,0). |
---|
818 | .UNINDENT |
---|
819 | .INDENT 0.0 |
---|
820 | .INDENT 3.5 |
---|
821 | .INDENT 0.0 |
---|
822 | .IP \(bu 2 |
---|
823 | \fIAllowed yet ignored by BEDTools\fP\&. |
---|
824 | .UNINDENT |
---|
825 | .UNINDENT |
---|
826 | .UNINDENT |
---|
827 | .INDENT 0.0 |
---|
828 | .IP 10. 3 |
---|
829 | \fBblockCount\fP \- The number of blocks (exons) in the BED line. |
---|
830 | .UNINDENT |
---|
831 | .INDENT 0.0 |
---|
832 | .INDENT 3.5 |
---|
833 | .INDENT 0.0 |
---|
834 | .IP \(bu 2 |
---|
835 | \fIAllowed yet ignored by BEDTools\fP\&. |
---|
836 | .UNINDENT |
---|
837 | .UNINDENT |
---|
838 | .UNINDENT |
---|
839 | .INDENT 0.0 |
---|
840 | .IP 11. 4 |
---|
841 | \fBblockSizes\fP \- A comma\-separated list of the block sizes. |
---|
842 | .UNINDENT |
---|
843 | .INDENT 0.0 |
---|
844 | .INDENT 3.5 |
---|
845 | .INDENT 0.0 |
---|
846 | .IP \(bu 2 |
---|
847 | \fIAllowed yet ignored by BEDTools\fP\&. |
---|
848 | .UNINDENT |
---|
849 | .UNINDENT |
---|
850 | .UNINDENT |
---|
851 | .INDENT 0.0 |
---|
852 | .IP 12. 4 |
---|
853 | \fBblockStarts\fP \- A comma\-separated list of block starts. |
---|
854 | .UNINDENT |
---|
855 | .INDENT 0.0 |
---|
856 | .INDENT 3.5 |
---|
857 | .INDENT 0.0 |
---|
858 | .IP \(bu 2 |
---|
859 | \fIAllowed yet ignored by BEDTools\fP\&. |
---|
860 | .UNINDENT |
---|
861 | .UNINDENT |
---|
862 | .UNINDENT |
---|
863 | .sp |
---|
864 | BEDTools requires that all BED input files (and input received from stdin) are \fBtab\-delimited\fP\&. The following types of BED files are supported by BEDTools: |
---|
865 | .INDENT 0.0 |
---|
866 | .IP 1. 3 |
---|
867 | .nf |
---|
868 | \fBBED3\fP: A BED file where each feature is described by \fBchrom\fP, \fBstart\fP, and \fBend\fP\&. |
---|
869 | For example: chr1 11873 14409 |
---|
870 | .fi |
---|
871 | .sp |
---|
872 | .IP 2. 3 |
---|
873 | .nf |
---|
874 | \fBBED4\fP: A BED file where each feature is described by \fBchrom\fP, \fBstart\fP, \fBend\fP, and \fBname\fP\&. |
---|
875 | For example: chr1 11873 14409 uc001aaa.3 |
---|
876 | .fi |
---|
877 | .sp |
---|
878 | .IP 3. 3 |
---|
879 | .nf |
---|
880 | \fBBED5\fP: A BED file where each feature is described by \fBchrom\fP, \fBstart\fP, \fBend\fP, \fBname\fP, and \fBscore\fP\&. |
---|
881 | For example: chr1 11873 14409 uc001aaa.3 0 |
---|
882 | .fi |
---|
883 | .sp |
---|
884 | .IP 4. 3 |
---|
885 | .nf |
---|
886 | \fBBED6\fP: A BED file where each feature is described by \fBchrom\fP, \fBstart\fP, \fBend\fP, \fBname\fP, \fBscore\fP, and \fBstrand\fP\&. |
---|
887 | For example: chr1 11873 14409 uc001aaa.3 0 + |
---|
888 | .fi |
---|
889 | .sp |
---|
890 | .IP 5. 3 |
---|
891 | .nf |
---|
892 | \fBBED12\fP: A BED file where each feature is described by all twelve columns listed above. |
---|
893 | For example: chr1 11873 14409 uc001aaa.3 0 + 11873 |
---|
894 | 11873 0 3 354,109,1189, 0,739,1347, |
---|
895 | .fi |
---|
896 | .sp |
---|
897 | .UNINDENT |
---|
898 | .SS 4.1.2 BEDPE format |
---|
899 | .sp |
---|
900 | We have defined a new file format (BEDPE) in order to concisely describe disjoint genome features, |
---|
901 | such as structural variations or paired\-end sequence alignments. We chose to define a new format |
---|
902 | because the existing "blocked" BED format (a.k.a. BED12) does not allow inter\-chromosomal feature |
---|
903 | definitions. In addition, BED12 only has one strand field, which is insufficient for paired\-end sequence |
---|
904 | alignments, especially when studying structural variation. |
---|
905 | .sp |
---|
906 | The BEDPE format is described below. The description is modified from: \fI\%http://genome.ucsc.edu/FAQ/FAQformat#format1\fP\&. |
---|
907 | .INDENT 0.0 |
---|
908 | .IP 1. 3 |
---|
909 | \fBchrom1\fP \- The name of the chromosome on which the \fBfirst\fP end of the feature exists. |
---|
910 | .UNINDENT |
---|
911 | .INDENT 0.0 |
---|
912 | .INDENT 3.5 |
---|
913 | .INDENT 0.0 |
---|
914 | .IP \(bu 2 |
---|
915 | \fIAny string can be used\fP\&. For example, "chr1", "III", "myChrom", "contig1112.23". |
---|
916 | .IP \(bu 2 |
---|
917 | \fIThis column is required\fP\&. |
---|
918 | .IP \(bu 2 |
---|
919 | \fIUse "." for unknown\fP\&. |
---|
920 | .UNINDENT |
---|
921 | .UNINDENT |
---|
922 | .UNINDENT |
---|
923 | .INDENT 0.0 |
---|
924 | .IP 2. 3 |
---|
925 | \fBstart1\fP \- The zero\-based starting position of the \fBfirst\fP end of the feature on \fBchrom1\fP\&. |
---|
926 | .UNINDENT |
---|
927 | .INDENT 0.0 |
---|
928 | .INDENT 3.5 |
---|
929 | .INDENT 0.0 |
---|
930 | .IP \(bu 2 |
---|
931 | \fIThe first base in a chromosome is numbered 0\fP\&. |
---|
932 | .IP \(bu 2 |
---|
933 | \fIAs with BED format, the start position in each BEDPE feature is therefore interpreted to be 1 greater than the start position listed in the feature. This column is required\fP\&. |
---|
934 | .IP \(bu 2 |
---|
935 | \fIUse \-1 for unknown\fP\&. |
---|
936 | .UNINDENT |
---|
937 | .UNINDENT |
---|
938 | .UNINDENT |
---|
939 | .INDENT 0.0 |
---|
940 | .IP 3. 3 |
---|
941 | \fBend1\fP \- The one\-based ending position of the first end of the feature on \fBchrom1\fP\&. |
---|
942 | .UNINDENT |
---|
943 | .INDENT 0.0 |
---|
944 | .INDENT 3.5 |
---|
945 | .INDENT 0.0 |
---|
946 | .IP \(bu 2 |
---|
947 | \fIThe end position in each BEDPE feature is one\-based\fP\&. |
---|
948 | .IP \(bu 2 |
---|
949 | \fIThis column is required\fP\&. |
---|
950 | .IP \(bu 2 |
---|
951 | \fIUse \-1 for unknown\fP\&. |
---|
952 | .UNINDENT |
---|
953 | .UNINDENT |
---|
954 | .UNINDENT |
---|
955 | .INDENT 0.0 |
---|
956 | .IP 4. 3 |
---|
957 | \fBchrom2\fP \- The name of the chromosome on which the \fBsecond\fP end of the feature exists. |
---|
958 | .UNINDENT |
---|
959 | .INDENT 0.0 |
---|
960 | .INDENT 3.5 |
---|
961 | .INDENT 0.0 |
---|
962 | .IP \(bu 2 |
---|
963 | \fIAny string can be used\fP\&. For example, "chr1", "III", "myChrom", "contig1112.23". |
---|
964 | .IP \(bu 2 |
---|
965 | \fIThis column is required\fP\&. |
---|
966 | .IP \(bu 2 |
---|
967 | \fIUse "." for unknown\fP\&. |
---|
968 | .UNINDENT |
---|
969 | .UNINDENT |
---|
970 | .UNINDENT |
---|
971 | .INDENT 0.0 |
---|
972 | .IP 5. 3 |
---|
973 | \fBstart2\fP \- The zero\-based starting position of the \fBsecond\fP end of the feature on \fBchrom2\fP\&. |
---|
974 | .UNINDENT |
---|
975 | .INDENT 0.0 |
---|
976 | .INDENT 3.5 |
---|
977 | .INDENT 0.0 |
---|
978 | .IP \(bu 2 |
---|
979 | \fIThe first base in a chromosome is numbered 0\fP\&. |
---|
980 | .IP \(bu 2 |
---|
981 | \fIAs with BED format, the start position in each BEDPE feature is therefore interpreted to be 1 greater than the start position listed in the feature. This column is required\fP\&. |
---|
982 | .IP \(bu 2 |
---|
983 | \fIUse \-1 for unknown\fP\&. |
---|
984 | .UNINDENT |
---|
985 | .UNINDENT |
---|
986 | .UNINDENT |
---|
987 | .INDENT 0.0 |
---|
988 | .IP 6. 3 |
---|
989 | \fBend2\fP \- The one\-based ending position of the \fBsecond\fP end of the feature on \fBchrom2\fP\&. |
---|
990 | .UNINDENT |
---|
991 | .INDENT 0.0 |
---|
992 | .INDENT 3.5 |
---|
993 | .INDENT 0.0 |
---|
994 | .IP \(bu 2 |
---|
995 | \fIThe end position in each BEDPE feature is one\-based\fP\&. |
---|
996 | .IP \(bu 2 |
---|
997 | \fIThis column is required\fP\&. |
---|
998 | .IP \(bu 2 |
---|
999 | \fIUse \-1 for unknown\fP\&. |
---|
1000 | .UNINDENT |
---|
1001 | .UNINDENT |
---|
1002 | .UNINDENT |
---|
1003 | .INDENT 0.0 |
---|
1004 | .IP 7. 3 |
---|
1005 | \fBname\fP \- Defines the name of the BEDPE feature. |
---|
1006 | .UNINDENT |
---|
1007 | .INDENT 0.0 |
---|
1008 | .INDENT 3.5 |
---|
1009 | .INDENT 0.0 |
---|
1010 | .IP \(bu 2 |
---|
1011 | \fIAny string can be used\fP\&. For example, "LINE", "Exon3", "HWIEAS_0001:3:1:0:266#0/1", or "my_Feature". |
---|
1012 | .IP \(bu 2 |
---|
1013 | \fIThis column is optional\fP\&. |
---|
1014 | .UNINDENT |
---|
1015 | .UNINDENT |
---|
1016 | .UNINDENT |
---|
1017 | .INDENT 0.0 |
---|
1018 | .IP 8. 3 |
---|
1019 | \fBscore\fP \- The UCSC definition requires that a BED score range from 0 to 1000, inclusive. \fIHowever, BEDTools allows any string to be stored in this field in order to allow greater flexibility in annotation features\fP\&. For example, strings allow scientific notation for p\-values, mean enrichment values, etc. It should be noted that this flexibility could prevent such annotations from being correctly displayed on the UCSC browser. |
---|
1020 | .UNINDENT |
---|
1021 | .INDENT 0.0 |
---|
1022 | .INDENT 3.5 |
---|
1023 | .INDENT 0.0 |
---|
1024 | .IP \(bu 2 |
---|
1025 | \fIAny string can be used\fP\&. For example, 7.31E\-05 (p\-value), 0.33456 (mean enrichment value), "up", "down", etc. |
---|
1026 | .IP \(bu 2 |
---|
1027 | \fIThis column is optional\fP\&. |
---|
1028 | .UNINDENT |
---|
1029 | .UNINDENT |
---|
1030 | .UNINDENT |
---|
1031 | .INDENT 0.0 |
---|
1032 | .IP 9. 3 |
---|
1033 | \fBstrand1\fP \- Defines the strand for the first end of the feature. Either \(aq+\(aq or \(aq\-\(aq. |
---|
1034 | .UNINDENT |
---|
1035 | .INDENT 0.0 |
---|
1036 | .INDENT 3.5 |
---|
1037 | .INDENT 0.0 |
---|
1038 | .IP \(bu 2 |
---|
1039 | \fIThis column is optional\fP\&. |
---|
1040 | .IP \(bu 2 |
---|
1041 | \fIUse "." for unknown\fP\&. |
---|
1042 | .UNINDENT |
---|
1043 | .UNINDENT |
---|
1044 | .UNINDENT |
---|
1045 | .INDENT 0.0 |
---|
1046 | .IP 10. 3 |
---|
1047 | \fBstrand2\fP \- Defines the strand for the second end of the feature. Either \(aq+\(aq or \(aq\-\(aq. |
---|
1048 | .UNINDENT |
---|
1049 | .INDENT 0.0 |
---|
1050 | .INDENT 3.5 |
---|
1051 | .INDENT 0.0 |
---|
1052 | .IP \(bu 2 |
---|
1053 | \fIThis column is optional\fP\&. |
---|
1054 | .IP \(bu 2 |
---|
1055 | \fIUse "." for unknown\fP\&. |
---|
1056 | .UNINDENT |
---|
1057 | .UNINDENT |
---|
1058 | .UNINDENT |
---|
1059 | .INDENT 0.0 |
---|
1060 | .IP 11. 4 |
---|
1061 | \fBAny number of additional, user\-defined fields\fP \- BEDTools allows one to add as many additional fields to the normal, 10\-column BEDPE format as necessary. These columns are merely "passed through" \fBpairToBed\fP and \fBpairToPair\fP and are not part of any analysis. One would use these additional columns to add extra information (e.g., edit distance for each end of an alignment, or "deletion", "inversion", etc.) to each BEDPE feature. |
---|
1062 | .UNINDENT |
---|
1063 | .INDENT 0.0 |
---|
1064 | .INDENT 3.5 |
---|
1065 | .INDENT 0.0 |
---|
1066 | .IP \(bu 2 |
---|
1067 | \fIThese additional columns are optional\fP\&. |
---|
1068 | .UNINDENT |
---|
1069 | .UNINDENT |
---|
1070 | .UNINDENT |
---|
1071 | .sp |
---|
1072 | Entries from an typical BEDPE file: |
---|
1073 | .INDENT 0.0 |
---|
1074 | .INDENT 3.5 |
---|
1075 | .sp |
---|
1076 | .nf |
---|
1077 | .ft C |
---|
1078 | chr1 100 200 chr5 5000 5100 bedpe_example1 30 + \- |
---|
1079 | chr9 1000 5000 chr9 3000 3800 bedpe_example2 100 + \- |
---|
1080 | .ft P |
---|
1081 | .fi |
---|
1082 | .UNINDENT |
---|
1083 | .UNINDENT |
---|
1084 | .sp |
---|
1085 | Entries from a BEDPE file with two custom fields added to each record: |
---|
1086 | .INDENT 0.0 |
---|
1087 | .INDENT 3.5 |
---|
1088 | .sp |
---|
1089 | .nf |
---|
1090 | .ft C |
---|
1091 | chr1 10 20 chr5 50 60 a1 30 + \- 0 1 |
---|
1092 | chr9 30 40 chr9 80 90 a2 100 + \- 2 1 |
---|
1093 | .ft P |
---|
1094 | .fi |
---|
1095 | .UNINDENT |
---|
1096 | .UNINDENT |
---|
1097 | .SS 4.1.3 GFF format |
---|
1098 | .sp |
---|
1099 | The GFF format is described on the Sanger Institute\(aqs website (\fI\%http://www.sanger.ac.uk/resources/software/gff/spec.html\fP). The GFF description below is modified from the definition at this URL. All nine columns in the GFF format description are required by BEDTools. |
---|
1100 | .INDENT 0.0 |
---|
1101 | .IP 1. 3 |
---|
1102 | \fBseqname\fP \- The name of the sequence (e.g. chromosome) on which the feature exists. |
---|
1103 | .UNINDENT |
---|
1104 | .INDENT 0.0 |
---|
1105 | .INDENT 3.5 |
---|
1106 | .INDENT 0.0 |
---|
1107 | .IP \(bu 2 |
---|
1108 | \fIAny string can be used\fP\&. For example, "chr1", "III", "myChrom", "contig1112.23". |
---|
1109 | .IP \(bu 2 |
---|
1110 | \fIThis column is required\fP\&. |
---|
1111 | .UNINDENT |
---|
1112 | .UNINDENT |
---|
1113 | .UNINDENT |
---|
1114 | .INDENT 0.0 |
---|
1115 | .IP 2. 3 |
---|
1116 | \fBsource\fP \- The source of this feature. This field will normally be used to indicate the program making the prediction, or if it comes from public database annotation, or is experimentally verified, etc. |
---|
1117 | .UNINDENT |
---|
1118 | .INDENT 0.0 |
---|
1119 | .INDENT 3.5 |
---|
1120 | .INDENT 0.0 |
---|
1121 | .IP \(bu 2 |
---|
1122 | \fIThis column is required\fP\&. |
---|
1123 | .UNINDENT |
---|
1124 | .UNINDENT |
---|
1125 | .UNINDENT |
---|
1126 | .INDENT 0.0 |
---|
1127 | .IP 3. 3 |
---|
1128 | \fBfeature\fP \- The feature type name. Equivalent to BED\(aqs \fBname\fP field. |
---|
1129 | .UNINDENT |
---|
1130 | .INDENT 0.0 |
---|
1131 | .INDENT 3.5 |
---|
1132 | .INDENT 0.0 |
---|
1133 | .IP \(bu 2 |
---|
1134 | \fIAny string can be used\fP\&. For example, "exon", etc. |
---|
1135 | .IP \(bu 2 |
---|
1136 | \fIThis column is required\fP\&. |
---|
1137 | .UNINDENT |
---|
1138 | .UNINDENT |
---|
1139 | .UNINDENT |
---|
1140 | .INDENT 0.0 |
---|
1141 | .IP 4. 3 |
---|
1142 | \fBstart\fP \- The one\-based starting position of feature on \fBseqname\fP\&. |
---|
1143 | .UNINDENT |
---|
1144 | .INDENT 0.0 |
---|
1145 | .INDENT 3.5 |
---|
1146 | .INDENT 0.0 |
---|
1147 | .IP \(bu 2 |
---|
1148 | \fIThis column is required\fP\&. |
---|
1149 | .IP \(bu 2 |
---|
1150 | \fIBEDTools accounts for the fact the GFF uses a one\-based position and BED uses a zero\-based start position\fP\&. |
---|
1151 | .UNINDENT |
---|
1152 | .UNINDENT |
---|
1153 | .UNINDENT |
---|
1154 | .INDENT 0.0 |
---|
1155 | .IP 5. 3 |
---|
1156 | \fBend\fP \- The one\-based ending position of feature on \fBseqname\fP\&. |
---|
1157 | .UNINDENT |
---|
1158 | .INDENT 0.0 |
---|
1159 | .INDENT 3.5 |
---|
1160 | .INDENT 0.0 |
---|
1161 | .IP \(bu 2 |
---|
1162 | \fIThis column is required\fP\&. |
---|
1163 | .UNINDENT |
---|
1164 | .UNINDENT |
---|
1165 | .UNINDENT |
---|
1166 | .INDENT 0.0 |
---|
1167 | .IP 6. 3 |
---|
1168 | \fBscore\fP \- A score assigned to the GFF feature. Like BED format, BEDTools allows any string to be stored in this field in order to allow greater flexibility in annotation features. We note that this differs from the GFF definition in the interest of flexibility. |
---|
1169 | .UNINDENT |
---|
1170 | .INDENT 0.0 |
---|
1171 | .INDENT 3.5 |
---|
1172 | .INDENT 0.0 |
---|
1173 | .IP \(bu 2 |
---|
1174 | \fIThis column is required\fP\&. |
---|
1175 | .UNINDENT |
---|
1176 | .UNINDENT |
---|
1177 | .UNINDENT |
---|
1178 | .INDENT 0.0 |
---|
1179 | .IP 7. 3 |
---|
1180 | \fBstrand\fP \- Defines the strand. Use \(aq+\(aq, \(aq\-\(aq or \(aq.\(aq |
---|
1181 | .UNINDENT |
---|
1182 | .INDENT 0.0 |
---|
1183 | .INDENT 3.5 |
---|
1184 | .INDENT 0.0 |
---|
1185 | .IP \(bu 2 |
---|
1186 | \fIThis column is required\fP\&. |
---|
1187 | .UNINDENT |
---|
1188 | .UNINDENT |
---|
1189 | .UNINDENT |
---|
1190 | .INDENT 0.0 |
---|
1191 | .IP 8. 3 |
---|
1192 | \fBframe\fP \- The frame of the coding sequence. Use \(aq0\(aq, \(aq1\(aq, \(aq2\(aq, or \(aq.\(aq. |
---|
1193 | .UNINDENT |
---|
1194 | .INDENT 0.0 |
---|
1195 | .INDENT 3.5 |
---|
1196 | .INDENT 0.0 |
---|
1197 | .IP \(bu 2 |
---|
1198 | \fIThis column is required\fP\&. |
---|
1199 | .UNINDENT |
---|
1200 | .UNINDENT |
---|
1201 | .UNINDENT |
---|
1202 | .INDENT 0.0 |
---|
1203 | .IP 9. 3 |
---|
1204 | \fBattribute\fP \- Taken from \fI\%http://www.sanger.ac.uk/resources/software/gff/spec.html\fP: From version 2 onwards, the attribute field must have an tag value structure following the syntax used within objects in a .ace file, flattened onto one line by semicolon separators. Tags must be standard identifiers ([A\-Za\-z][ |
---|
1205 | .nf |
---|
1206 | AZa\-z0\-9_ |
---|
1207 | .fi |
---|
1208 | ]*). Free text values must be quoted with double quotes. \fINote: all non\-printing characters in such free text value strings (e.g. newlines, tabs, control characters, etc) must be explicitly represented by their C (UNIX) style backslash\-escaped representation (e.g. newlines as \(aqn\(aq, tabs as \(aqt\(aq)\fP\&. As in ACEDB, multiple values can follow a specific tag. The aim is to establish consistent use of particular tags, corresponding to an underlying implied ACEDB model if you want to think that way (but acedb is not required). |
---|
1209 | .UNINDENT |
---|
1210 | .INDENT 0.0 |
---|
1211 | .INDENT 3.5 |
---|
1212 | .INDENT 0.0 |
---|
1213 | .IP \(bu 2 |
---|
1214 | \fIThis column is required\fP\&. |
---|
1215 | .UNINDENT |
---|
1216 | .UNINDENT |
---|
1217 | .UNINDENT |
---|
1218 | .sp |
---|
1219 | An entry from an example GFF file : |
---|
1220 | .INDENT 0.0 |
---|
1221 | .INDENT 3.5 |
---|
1222 | .sp |
---|
1223 | .nf |
---|
1224 | .ft C |
---|
1225 | seq1 BLASTX similarity 101 235 87.1 + 0 Target "HBA_HUMAN" 11 55 ; |
---|
1226 | E_value 0.0003 dJ102G20 GD_mRNA coding_exon 7105 7201 . \- 2 Sequence |
---|
1227 | "dJ102G20.C1.1" |
---|
1228 | .ft P |
---|
1229 | .fi |
---|
1230 | .UNINDENT |
---|
1231 | .UNINDENT |
---|
1232 | .SS 4.1.3 GFF format |
---|
1233 | .sp |
---|
1234 | Some of the BEDTools (e.g., genomeCoverageBed, complementBed, slopBed) need to know the size of |
---|
1235 | the chromosomes for the organism for which your BED files are based. When using the UCSC Genome |
---|
1236 | Browser, Ensemble, or Galaxy, you typically indicate which which species/genome build you are |
---|
1237 | working. The way you do this for BEDTools is to create a "genome" file, which simply lists the names of |
---|
1238 | the chromosomes (or scaffolds, etc.) and their size (in basepairs). |
---|
1239 | .sp |
---|
1240 | Genome files must be \fBtab\-delimited\fP and are structured as follows (this is an example for \fIC. elegans\fP): |
---|
1241 | .INDENT 0.0 |
---|
1242 | .INDENT 3.5 |
---|
1243 | .sp |
---|
1244 | .nf |
---|
1245 | .ft C |
---|
1246 | chrI 15072421 |
---|
1247 | chrII 15279323 |
---|
1248 | \&... |
---|
1249 | chrX 17718854 |
---|
1250 | chrM 13794 |
---|
1251 | .ft P |
---|
1252 | .fi |
---|
1253 | .UNINDENT |
---|
1254 | .UNINDENT |
---|
1255 | .sp |
---|
1256 | BEDTools includes pre\-defined genome files for human and mouse in the \fB/genomes\fP directory included |
---|
1257 | in the BEDTools distribution. |
---|
1258 | .SS 4.1.5 SAM/BAM format |
---|
1259 | .sp |
---|
1260 | The SAM / BAM format is a powerful and widely\-used format for storing sequence alignment data (see |
---|
1261 | \fI\%http://samtools.sourceforge.net/\fP for more details). It has quickly become the standard format to which |
---|
1262 | most DNA sequence alignment programs write their output. Currently, the following BEDTools |
---|
1263 | support inout in BAM format: \fIintersectBed, windowBed, coverageBed, genomeCoverageBed, |
---|
1264 | pairToBed, bamToBed\fP\&. Support for the BAM format in BEDTools allows one to (to name a few): |
---|
1265 | compare sequence alignments to annotations, refine alignment datasets, screen for potential mutations |
---|
1266 | and compute aligned sequence coverage. |
---|
1267 | .sp |
---|
1268 | The details of how these tools work with BAM files are addressed in \fBSection 5\fP of this manual. |
---|
1269 | .SS 4.1.6 VCF format |
---|
1270 | .sp |
---|
1271 | The Variant Call Format (VCF) was conceived as part of the 1000 Genomes Project as a standardized |
---|
1272 | means to report genetic variation calls from SNP, INDEL and structural variant detection programs |
---|
1273 | (see \fI\%http://www.1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcf4.0\fP for details). |
---|
1274 | BEDTools now supports the latest version of this format (i.e, Version 4.0). As a result, BEDTools can |
---|
1275 | be used to compare genetic variation calls with other genomic features. |
---|
1276 | .SH THE BEDTOOLS SUITE |
---|
1277 | .sp |
---|
1278 | This section covers the functionality and default / optional usage for each of the available BEDTools. |
---|
1279 | Example "figures" are provided in some cases in an effort to convey the purpose of the tool. The |
---|
1280 | behavior of each available parameter is discussed for each tool in abstract terms. More concrete usage |
---|
1281 | examples are provided in \fBSection 6\fP\&. |
---|
1282 | .SS Table of contents |
---|
1283 | .SS 5.1 intersect |
---|
1284 | .sp |
---|
1285 | By far, the most common question asked of two sets of genomic features is whether or not any of the |
---|
1286 | features in the two sets "overlap" with one another. This is known as feature intersection. \fBbedtools intersect\fP |
---|
1287 | allows one to screen for overlaps between two sets of genomic features. Moreover, it allows one to have |
---|
1288 | fine control as to how the intersections are reported. \fBbedtools intersect\fP works with both BED/GFF/VCF |
---|
1289 | and BAM files as input. |
---|
1290 | .SS 5.1.1 Usage and option summary |
---|
1291 | .sp |
---|
1292 | \fBUsage\fP: |
---|
1293 | .INDENT 0.0 |
---|
1294 | .INDENT 3.5 |
---|
1295 | .sp |
---|
1296 | .nf |
---|
1297 | .ft C |
---|
1298 | bedtools intersect [OPTIONS] [\-a <BED/GFF/VCF> || \-abam <BAM>] \-b <BED/GFF/VCF> |
---|
1299 | |
---|
1300 | intersectBed [OPTIONS] [\-a <BED/GFF/VCF> || \-abam <BAM>] \-b <BED/GFF/VCF> |
---|
1301 | .ft P |
---|
1302 | .fi |
---|
1303 | .UNINDENT |
---|
1304 | .UNINDENT |
---|
1305 | .TS |
---|
1306 | center; |
---|
1307 | |l|l|. |
---|
1308 | _ |
---|
1309 | T{ |
---|
1310 | Option |
---|
1311 | T} T{ |
---|
1312 | Description |
---|
1313 | T} |
---|
1314 | _ |
---|
1315 | T{ |
---|
1316 | \fB\-a\fP |
---|
1317 | T} T{ |
---|
1318 | BED/GFF/VCF file A. Each feature in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe. |
---|
1319 | T} |
---|
1320 | _ |
---|
1321 | T{ |
---|
1322 | \fB\-b\fP |
---|
1323 | T} T{ |
---|
1324 | BED/GFF/VCF file B. Use "stdin" if passing B with a UNIX pipe. |
---|
1325 | T} |
---|
1326 | _ |
---|
1327 | T{ |
---|
1328 | \fB\-abam\fP |
---|
1329 | T} T{ |
---|
1330 | BAM file A. Each BAM alignment in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe: For example: samtools view \-b <BAM> | bedtools intersect \-abam stdin \-b genes.bed |
---|
1331 | T} |
---|
1332 | _ |
---|
1333 | T{ |
---|
1334 | \fB\-ubam\fP |
---|
1335 | T} T{ |
---|
1336 | Write uncompressed BAM output. The default is write compressed BAM output. |
---|
1337 | T} |
---|
1338 | _ |
---|
1339 | T{ |
---|
1340 | \fB\-bed\fP |
---|
1341 | T} T{ |
---|
1342 | When using BAM input (\-abam), write output as BED. The default is to write output in BAM when using \-abam. For example: bedtools intersect \-abam reads.bam \-b genes.bed \-bed |
---|
1343 | T} |
---|
1344 | _ |
---|
1345 | T{ |
---|
1346 | \fB\-wa\fP |
---|
1347 | T} T{ |
---|
1348 | Write the original entry in A for each overlap. |
---|
1349 | T} |
---|
1350 | _ |
---|
1351 | T{ |
---|
1352 | \fB\-wb\fP |
---|
1353 | T} T{ |
---|
1354 | Write the original entry in B for each overlap. Useful for knowing what A overlaps. Restricted by \-f and \-r. |
---|
1355 | T} |
---|
1356 | _ |
---|
1357 | T{ |
---|
1358 | \fB\-wo\fP |
---|
1359 | T} T{ |
---|
1360 | Write the original A and B entries plus the number of base pairs of overlap between the two features. Only A features with overlap are reported. Restricted by \-f and \-r. |
---|
1361 | T} |
---|
1362 | _ |
---|
1363 | T{ |
---|
1364 | \fB\-wao\fP |
---|
1365 | T} T{ |
---|
1366 | Write the original A and B entries plus the number of base pairs of overlap between the two features. However, A features w/o overlap are also reported with a NULL B feature and overlap = 0. Restricted by \-f and \-r. |
---|
1367 | T} |
---|
1368 | _ |
---|
1369 | T{ |
---|
1370 | \fB\-u\fP |
---|
1371 | T} T{ |
---|
1372 | Write original A entry once if any overlaps found in B. In other words, just report the fact at least one overlap was found in B. Restricted by \-f and \-r. |
---|
1373 | T} |
---|
1374 | _ |
---|
1375 | T{ |
---|
1376 | \fB\-c\fP |
---|
1377 | T} T{ |
---|
1378 | For each entry in A, report the number of hits in B while restricting to \-f. Reports 0 for A entries that have no overlap with B. Restricted by \-f and \-r. |
---|
1379 | T} |
---|
1380 | _ |
---|
1381 | T{ |
---|
1382 | \fB\-v\fP |
---|
1383 | T} T{ |
---|
1384 | Only report those entries in A that have no overlap in B. Restricted by \-f and \-r. |
---|
1385 | T} |
---|
1386 | _ |
---|
1387 | T{ |
---|
1388 | \fB\-f\fP |
---|
1389 | T} T{ |
---|
1390 | Minimum overlap required as a fraction of A. Default is 1E\-9 (i.e. 1bp). |
---|
1391 | T} |
---|
1392 | _ |
---|
1393 | T{ |
---|
1394 | \fB\-r\fP |
---|
1395 | T} T{ |
---|
1396 | Require that the fraction of overlap be reciprocal for A and B. In other words, if \-f is 0.90 and \-r is used, this requires that B overlap at least 90% of A and that A also overlaps at least 90% of B. |
---|
1397 | T} |
---|
1398 | _ |
---|
1399 | T{ |
---|
1400 | \fB\-s\fP |
---|
1401 | T} T{ |
---|
1402 | Force "strandedness". That is, only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand. |
---|
1403 | T} |
---|
1404 | _ |
---|
1405 | T{ |
---|
1406 | \fB\-split\fP |
---|
1407 | T} T{ |
---|
1408 | Treat "split" BAM (i.e., having an "N" CIGAR operation) or BED12 entries as distinct BED intervals. |
---|
1409 | T} |
---|
1410 | _ |
---|
1411 | .TE |
---|
1412 | .SS 5.1.2 Default behavior |
---|
1413 | .sp |
---|
1414 | By default, if an overlap is found, \fBbedtools intersect\fP reports the shared interval between the two |
---|
1415 | overlapping features. |
---|
1416 | .sp |
---|
1417 | For example: |
---|
1418 | .INDENT 0.0 |
---|
1419 | .INDENT 3.5 |
---|
1420 | .sp |
---|
1421 | .nf |
---|
1422 | .ft C |
---|
1423 | cat A.bed |
---|
1424 | chr1 10 20 |
---|
1425 | chr1 30 40 |
---|
1426 | |
---|
1427 | cat B.bed |
---|
1428 | chr1 15 20 |
---|
1429 | |
---|
1430 | bedtools intersect \-a A.bed \-b B.bed |
---|
1431 | chr1 15 20 |
---|
1432 | .ft P |
---|
1433 | .fi |
---|
1434 | .UNINDENT |
---|
1435 | .UNINDENT |
---|
1436 | .SS 5.1.3 (\-wa) Reporting the original A feature |
---|
1437 | .sp |
---|
1438 | Instead, one can force \fBbedtools intersect\fP to report the \fIoriginal\fP \fB"A"\fP feature when an overlap is found. As |
---|
1439 | shown below, the entire "A" feature is reported, not just the portion that overlaps with the "B" feature. |
---|
1440 | .sp |
---|
1441 | For example: |
---|
1442 | .INDENT 0.0 |
---|
1443 | .INDENT 3.5 |
---|
1444 | .sp |
---|
1445 | .nf |
---|
1446 | .ft C |
---|
1447 | cat A.bed |
---|
1448 | chr1 10 20 |
---|
1449 | chr1 30 40 |
---|
1450 | |
---|
1451 | cat B.bed |
---|
1452 | chr1 15 20 |
---|
1453 | |
---|
1454 | bedtools intersect \-a A.bed \-b B.bed \-wa |
---|
1455 | chr1 10 20 |
---|
1456 | .ft P |
---|
1457 | .fi |
---|
1458 | .UNINDENT |
---|
1459 | .UNINDENT |
---|
1460 | .SS 5.1.4 (\-wb) Reporting the original B feature |
---|
1461 | .sp |
---|
1462 | Similarly, one can force \fBbedtools intersect\fP to report the \fIoriginal\fP \fB"B"\fP feature when an overlap is found. If |
---|
1463 | just \-wb is used, the overlapping portion of A will be reported followed by the \fIoriginal\fP \fB"B"\fP\&. If both \-wa |
---|
1464 | and \-wb are used, the \fIoriginals\fP of both \fB"A"\fP and \fB"B"\fP will be reported. |
---|
1465 | .sp |
---|
1466 | For example (\-wb alone): |
---|
1467 | :: |
---|
1468 | For example: |
---|
1469 | .INDENT 0.0 |
---|
1470 | .INDENT 3.5 |
---|
1471 | .sp |
---|
1472 | .nf |
---|
1473 | .ft C |
---|
1474 | cat A.bed |
---|
1475 | chr1 10 20 |
---|
1476 | chr1 30 40 |
---|
1477 | |
---|
1478 | cat B.bed |
---|
1479 | chr1 15 20 |
---|
1480 | |
---|
1481 | bedtools intersect \-a A.bed \-b B.bed \-wb |
---|
1482 | chr1 15 20 chr 15 20 |
---|
1483 | .ft P |
---|
1484 | .fi |
---|
1485 | .UNINDENT |
---|
1486 | .UNINDENT |
---|
1487 | .sp |
---|
1488 | Now \-wa and \-wb: |
---|
1489 | .INDENT 0.0 |
---|
1490 | .INDENT 3.5 |
---|
1491 | .sp |
---|
1492 | .nf |
---|
1493 | .ft C |
---|
1494 | cat A.bed |
---|
1495 | chr1 10 20 |
---|
1496 | chr1 30 40 |
---|
1497 | |
---|
1498 | cat B.bed |
---|
1499 | chr1 15 20 |
---|
1500 | |
---|
1501 | bedtools intersect \-a A.bed \-b B.bed \-wa \-wb |
---|
1502 | chr1 10 20 chr 15 20 |
---|
1503 | .ft P |
---|
1504 | .fi |
---|
1505 | .UNINDENT |
---|
1506 | .UNINDENT |
---|
1507 | .SS 5.1.5 (\-u) Reporting the presence of \fIat least one\fP overlapping feature |
---|
1508 | .sp |
---|
1509 | Frequently a feature in "A" will overlap with multiple features in "B". By default, \fBbedtools intersect\fP will |
---|
1510 | report each overlap as a separate output line. However, one may want to simply know that there is at |
---|
1511 | least one overlap (or none). When one uses the \-u option, "A" features that overlap with one or more |
---|
1512 | "B" features are reported once. Those that overlap with no "B" features are not reported at all. |
---|
1513 | .sp |
---|
1514 | For example (\fIwithout\fP \-u): |
---|
1515 | .INDENT 0.0 |
---|
1516 | .INDENT 3.5 |
---|
1517 | .sp |
---|
1518 | .nf |
---|
1519 | .ft C |
---|
1520 | cat A.bed |
---|
1521 | chr1 10 20 |
---|
1522 | chr1 30 40 |
---|
1523 | |
---|
1524 | cat B.bed |
---|
1525 | chr1 15 20 |
---|
1526 | chr1 18 25 |
---|
1527 | |
---|
1528 | bedtools intersect \-a A.bed \-b B.bed \-wb |
---|
1529 | chr1 10 20 chr 15 20 |
---|
1530 | chr1 10 20 chr 18 25 |
---|
1531 | .ft P |
---|
1532 | .fi |
---|
1533 | .UNINDENT |
---|
1534 | .UNINDENT |
---|
1535 | .sp |
---|
1536 | For example (\fIwith\fP \-u): |
---|
1537 | .INDENT 0.0 |
---|
1538 | .INDENT 3.5 |
---|
1539 | .sp |
---|
1540 | .nf |
---|
1541 | .ft C |
---|
1542 | cat A.bed |
---|
1543 | chr1 10 20 |
---|
1544 | chr1 30 40 |
---|
1545 | |
---|
1546 | cat B.bed |
---|
1547 | chr1 15 20 |
---|
1548 | chr1 18 25 |
---|
1549 | |
---|
1550 | bedtools intersect \-a A.bed \-b B.bed \-u |
---|
1551 | chr1 10 20 |
---|
1552 | .ft P |
---|
1553 | .fi |
---|
1554 | .UNINDENT |
---|
1555 | .UNINDENT |
---|
1556 | .SS 5.1.6 (\-c) Reporting the number of overlapping features |
---|
1557 | .sp |
---|
1558 | The \-c option reports a column after each "A" feature indicating the \fInumber\fP (0 or more) of overlapping |
---|
1559 | features found in "B". Therefore, \fIeach feature in A is reported once\fP\&. |
---|
1560 | .sp |
---|
1561 | For example: |
---|
1562 | .INDENT 0.0 |
---|
1563 | .INDENT 3.5 |
---|
1564 | .sp |
---|
1565 | .nf |
---|
1566 | .ft C |
---|
1567 | cat A.bed |
---|
1568 | chr1 10 20 |
---|
1569 | chr1 30 40 |
---|
1570 | |
---|
1571 | cat B.bed |
---|
1572 | chr1 15 20 |
---|
1573 | chr1 18 25 |
---|
1574 | |
---|
1575 | bedtools intersect \-a A.bed \-b B.bed \-u |
---|
1576 | chr1 10 20 2 |
---|
1577 | chr1 30 40 0 |
---|
1578 | .ft P |
---|
1579 | .fi |
---|
1580 | .UNINDENT |
---|
1581 | .UNINDENT |
---|
1582 | .SS 5.1.7 (\-v) Reporting the absence of any overlapping features |
---|
1583 | .sp |
---|
1584 | There will likely be cases where you\(aqd like to know which "A" features do not overlap with any of the |
---|
1585 | "B" features. Perhaps you\(aqd like to know which SNPs don\(aqt overlap with any gene annotations. The \-v |
---|
1586 | (an homage to "grep \-v") option will only report those "A" features that have no overlaps in "B". |
---|
1587 | .sp |
---|
1588 | For example: |
---|
1589 | .INDENT 0.0 |
---|
1590 | .INDENT 3.5 |
---|
1591 | .sp |
---|
1592 | .nf |
---|
1593 | .ft C |
---|
1594 | cat A.bed |
---|
1595 | chr1 10 20 |
---|
1596 | chr1 30 40 |
---|
1597 | |
---|
1598 | cat B.bed |
---|
1599 | chr1 15 20 |
---|
1600 | |
---|
1601 | bedtools intersect \-a A.bed \-b B.bed \-v |
---|
1602 | chr1 30 40 |
---|
1603 | .ft P |
---|
1604 | .fi |
---|
1605 | .UNINDENT |
---|
1606 | .UNINDENT |
---|
1607 | .SS 5.1.8 (\-f) Requiring a minimal overlap fraction |
---|
1608 | .sp |
---|
1609 | By default, \fBbedtools intersect\fP will report an overlap between A and B so long as there is at least one base |
---|
1610 | pair is overlapping. Yet sometimes you may want to restrict reported overlaps between A and B to cases |
---|
1611 | where the feature in B overlaps at least X% (e.g. 50%) of the A feature. The \-f option does exactly |
---|
1612 | this. |
---|
1613 | .sp |
---|
1614 | For example (note that the second B entry is not reported): |
---|
1615 | .INDENT 0.0 |
---|
1616 | .INDENT 3.5 |
---|
1617 | .sp |
---|
1618 | .nf |
---|
1619 | .ft C |
---|
1620 | cat A.bed |
---|
1621 | chr1 100 200 |
---|
1622 | |
---|
1623 | cat B.bed |
---|
1624 | chr1 130 201 |
---|
1625 | chr1 180 220 |
---|
1626 | |
---|
1627 | bedtools intersect \-a A.bed \-b B.bed \-f 0.50 \-wa \-wb |
---|
1628 | chr1 100 200 chr1 130 201 |
---|
1629 | .ft P |
---|
1630 | .fi |
---|
1631 | .UNINDENT |
---|
1632 | .UNINDENT |
---|
1633 | .SS 5.1.9 (\-r, combined with \-f)Requiring reciprocal minimal overlap fraction |
---|
1634 | .sp |
---|
1635 | Similarly, you may want to require that a minimal fraction of both the A and the B features is |
---|
1636 | overlapped. For example, if feature A is 1kb and feature B is 1Mb, you might not want to report the |
---|
1637 | overlap as feature A can overlap at most 1% of feature B. If one set \-f to say, 0.02, and one also |
---|
1638 | enable the \-r (reciprocal overlap fraction required), this overlap would not be reported. |
---|
1639 | .sp |
---|
1640 | For example (note that the second B entry is not reported): |
---|
1641 | .INDENT 0.0 |
---|
1642 | .INDENT 3.5 |
---|
1643 | .sp |
---|
1644 | .nf |
---|
1645 | .ft C |
---|
1646 | cat A.bed |
---|
1647 | chr1 100 200 |
---|
1648 | |
---|
1649 | cat B.bed |
---|
1650 | chr1 130 201 |
---|
1651 | chr1 130 200000 |
---|
1652 | |
---|
1653 | bedtools intersect \-a A.bed \-b B.bed \-f 0.50 \-r \-wa \-wb |
---|
1654 | chr1 100 200 chr1 130 201 |
---|
1655 | .ft P |
---|
1656 | .fi |
---|
1657 | .UNINDENT |
---|
1658 | .UNINDENT |
---|
1659 | .SS 5.1.10 (\-s)Enforcing "strandedness" |
---|
1660 | .sp |
---|
1661 | By default, \fBbedtools intersect\fP will report overlaps between features even if the features are on opposite |
---|
1662 | strands. However, if strand information is present in both BED files and the "\-s" option is used, overlaps |
---|
1663 | will only be reported when features are on the same strand. |
---|
1664 | .sp |
---|
1665 | For example (note that the second B entry is not reported): |
---|
1666 | .INDENT 0.0 |
---|
1667 | .INDENT 3.5 |
---|
1668 | .sp |
---|
1669 | .nf |
---|
1670 | .ft C |
---|
1671 | cat A.bed |
---|
1672 | chr1 100 200 a1 100 + |
---|
1673 | |
---|
1674 | cat B.bed |
---|
1675 | chr1 130 201 b1 100 \- |
---|
1676 | chr1 130 201 b2 100 + |
---|
1677 | |
---|
1678 | bedtools intersect \-a A.bed \-b B.bed \-wa \-wb \-s |
---|
1679 | chr1 100 200 a1 100 + chr1 130 201 b2 100 + |
---|
1680 | .ft P |
---|
1681 | .fi |
---|
1682 | .UNINDENT |
---|
1683 | .UNINDENT |
---|
1684 | .SS 5.1.11 (\-abam)Default behavior when using BAM input |
---|
1685 | .sp |
---|
1686 | When comparing alignments in BAM format (\fB\-abam\fP) to features in BED format (\fB\-b\fP), \fBbedtools intersect\fP |
---|
1687 | will, \fBby default\fP, write the output in BAM format. That is, each alignment in the BAM file that meets |
---|
1688 | the user\(aqs criteria will be written (to standard output) in BAM format. This serves as a mechanism to |
---|
1689 | create subsets of BAM alignments are of biological interest, etc. Note that only the mate in the BAM |
---|
1690 | alignment is compared to the BED file. Thus, if only one end of a paired\-end sequence overlaps with a |
---|
1691 | feature in B, then that end will be written to the BAM output. By contrast, the other mate for the |
---|
1692 | pair will not be written. One should use \fBpairToBed(Section 5.2)\fP if one wants each BAM alignment |
---|
1693 | for a pair to be written to BAM output. |
---|
1694 | .sp |
---|
1695 | For example: |
---|
1696 | .INDENT 0.0 |
---|
1697 | .INDENT 3.5 |
---|
1698 | .sp |
---|
1699 | .nf |
---|
1700 | .ft C |
---|
1701 | bedtools intersect \-abam reads.unsorted.bam \-b simreps.bed | samtools view \- | head \-3 |
---|
1702 | |
---|
1703 | BERTHA_0001:3:1:15:1362#0 99 chr4 9236904 0 50M = 9242033 5 1 7 9 |
---|
1704 | AGACGTTAACTTTACACACCTCTGCCAAGGTCCTCATCCTTGTATTGAAG W c T U ] b \e g c e g X g f c b f c c b d d g g V Y P W W _ |
---|
1705 | \ec\(gadcdabdfW^a^gggfgd XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:19 X1:i:2 XM:i:0 XO:i:0 XG:i:0 MD:Z:50 |
---|
1706 | BERTHA _0001:3:1:16:994#0 83 chr6 114221672 37 25S6M1I11M7S = |
---|
1707 | 114216196 \-5493 G A A A G G C C A G A G T A T A G A A T A A A C A C A A C A A T G T C C A A G G T A C A C T G T T A |
---|
1708 | gffeaaddddggggggedgcgeggdegggggffcgggggggegdfggfgf XT:A:M NM:i:3 SM:i:37 AM:i:37 XM:i:2 X O : i : |
---|
1709 | 1 XG:i:1 MD:Z:6A6T3 |
---|
1710 | BERTHA _0001:3:1:16:594#0 147 chr8 43835330 0 50M = |
---|
1711 | 43830893 \-4487 CTTTGGGAGGGCTTTGTAGCCTATCTGGAAAAAGGAAATATCTTCCCATG U |
---|
1712 | \ee^bgeTdg_Kgcg\(gaggeggg_gggggggggddgdggVg\egWdfgfgff XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:10 X1:i:7 X M : i : |
---|
1713 | 2 XO:i:0 XG:i:0 MD:Z:1A2T45 |
---|
1714 | .ft P |
---|
1715 | .fi |
---|
1716 | .UNINDENT |
---|
1717 | .UNINDENT |
---|
1718 | .SS 5.1.12 (\-bed)Output BED format when using BAM input |
---|
1719 | .sp |
---|
1720 | When comparing alignments in BAM format (\fB\-abam\fP) to features in BED format (\fB\-b\fP), \fBbedtools intersect\fP |
---|
1721 | will \fBoptionally\fP write the output in BED format. That is, each alignment in the BAM file is converted |
---|
1722 | to a 6 column BED feature and if overlaps are found (or not) based on the user\(aqs criteria, the BAM |
---|
1723 | alignment will be reported in BED format. The BED "name" field is comprised of the RNAME field in |
---|
1724 | the BAM alignment. If mate information is available, the mate (e.g., "/1" or "/2") field will be |
---|
1725 | appended to the name. The "score" field is the mapping quality score from the BAM alignment. |
---|
1726 | .sp |
---|
1727 | For example: |
---|
1728 | .INDENT 0.0 |
---|
1729 | .INDENT 3.5 |
---|
1730 | .sp |
---|
1731 | .nf |
---|
1732 | .ft C |
---|
1733 | bedtools intersect \-abam reads.unsorted.bam \-b simreps.bed \-bed | head \-20 |
---|
1734 | |
---|
1735 | chr4 9236903 9236953 BERTHA_0001:3:1:15:1362#0/1 0 + |
---|
1736 | chr6 114221671 114221721 BERTHA_0001:3:1:16:994#0/1 37 \- |
---|
1737 | chr8 43835329 43835379 BERTHA_0001:3:1:16:594#0/2 0 \- |
---|
1738 | chr4 49110668 49110718 BERTHA_0001:3:1:31:487#0/1 23 + |
---|
1739 | chr19 27732052 27732102 BERTHA_0001:3:1:32:890#0/2 46 + |
---|
1740 | chr19 27732012 27732062 BERTHA_0001:3:1:45:1135#0/1 37 + |
---|
1741 | chr10 117494252 117494302 BERTHA_0001:3:1:68:627#0/1 37 \- |
---|
1742 | chr19 27731966 27732016 BERTHA_0001:3:1:83:931#0/2 9 + |
---|
1743 | chr8 48660075 48660125 BERTHA_0001:3:1:86:608#0/2 37 \- |
---|
1744 | chr9 34986400 34986450 BERTHA_0001:3:1:113:183#0/2 37 \- |
---|
1745 | chr10 42372771 42372821 BERTHA_0001:3:1:128:1932#0/1 3 \- |
---|
1746 | chr19 27731954 27732004 BERTHA_0001:3:1:130:1402#0/2 0 + |
---|
1747 | chr10 42357337 42357387 BERTHA_0001:3:1:137:868#0/2 9 + |
---|
1748 | chr1 159720631 159720681 BERTHA_0001:3:1:147:380#0/2 37 \- |
---|
1749 | chrX 58230155 58230205 BERTHA_0001:3:1:151:656#0/2 37 \- |
---|
1750 | chr5 142612746 142612796 BERTHA_0001:3:1:152:1893#0/1 37 \- |
---|
1751 | chr9 71795659 71795709 BERTHA_0001:3:1:177:387#0/1 37 + |
---|
1752 | chr1 106240854 106240904 BERTHA_0001:3:1:194:928#0/1 37 \- |
---|
1753 | chr4 74128456 74128506 BERTHA_0001:3:1:221:724#0/1 37 \- |
---|
1754 | chr8 42606164 42606214 BERTHA_0001:3:1:244:962#0/1 37 + |
---|
1755 | .ft P |
---|
1756 | .fi |
---|
1757 | .UNINDENT |
---|
1758 | .UNINDENT |
---|
1759 | .SS 5.1.13 (\-split)Reporting overlaps with spliced alignments or blocked BED features |
---|
1760 | .sp |
---|
1761 | As described in section 1.3.19, bedtools intersect will, by default, screen for overlaps against the entire span |
---|
1762 | of a spliced/split BAM alignment or blocked BED12 feature. When dealing with RNA\-seq reads, for |
---|
1763 | example, one typically wants to only screen for overlaps for the portions of the reads that come from |
---|
1764 | exons (and ignore the interstitial intron sequence). The \fB\-split\fP command allows for such overlaps to be |
---|
1765 | performed. |
---|
1766 | .sp |
---|
1767 | For example, the diagram below illustrates the \fIdefault\fP behavior. The blue dots represent the "split/ |
---|
1768 | spliced" portion of the alignment (i.e., CIGAR "N" operation). In this case, the two exon annotations |
---|
1769 | are reported as overlapping with the "split" BAM alignment, but in addition, a third feature that |
---|
1770 | overlaps the "split" portion of the alignment is also reported. |
---|
1771 | .INDENT 0.0 |
---|
1772 | .INDENT 3.5 |
---|
1773 | .sp |
---|
1774 | .nf |
---|
1775 | .ft C |
---|
1776 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
1777 | |
---|
1778 | Exons \-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\- |
---|
1779 | |
---|
1780 | BED/BAM A ************.......................................**** |
---|
1781 | |
---|
1782 | BED File B ^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^ |
---|
1783 | |
---|
1784 | Result =============== ======== ========== |
---|
1785 | .ft P |
---|
1786 | .fi |
---|
1787 | .UNINDENT |
---|
1788 | .UNINDENT |
---|
1789 | .sp |
---|
1790 | In contrast, when using the \fB\-split\fP option, only the exon overlaps are reported. |
---|
1791 | .INDENT 0.0 |
---|
1792 | .INDENT 3.5 |
---|
1793 | .sp |
---|
1794 | .nf |
---|
1795 | .ft C |
---|
1796 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
1797 | |
---|
1798 | Exons \-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\- |
---|
1799 | |
---|
1800 | BED/BAM A ************.......................................**** |
---|
1801 | |
---|
1802 | BED File B ^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^ |
---|
1803 | |
---|
1804 | Result =============== ========== |
---|
1805 | .ft P |
---|
1806 | .fi |
---|
1807 | .UNINDENT |
---|
1808 | .UNINDENT |
---|
1809 | .SS 5.2 pairToBed |
---|
1810 | .sp |
---|
1811 | \fBpairToBed\fP compares each end of a BEDPE feature or a paired\-end BAM alignment to a feature file in |
---|
1812 | search of overlaps. |
---|
1813 | .sp |
---|
1814 | \fBNOTE: pairToBed requires that the BAM file is sorted/grouped by the read name. This |
---|
1815 | allows pairToBed to extract correct alignment coordinates for each end based on their |
---|
1816 | respective CIGAR strings. It also assumes that the alignments for a given pair come in |
---|
1817 | groups of twos. There is not yet a standard method for reporting multiple alignments |
---|
1818 | using BAM. pairToBed will fail if an aligner does not report alignments in pairs.\fP |
---|
1819 | .SS 5.2.1 Usage and option summary |
---|
1820 | .sp |
---|
1821 | \fBUsage:\fP |
---|
1822 | .INDENT 0.0 |
---|
1823 | .INDENT 3.5 |
---|
1824 | .sp |
---|
1825 | .nf |
---|
1826 | .ft C |
---|
1827 | pairToBed [OPTIONS] [\-a <BEDPE> || \-abam <BAM>] \-b <BED/GFF/VCF> |
---|
1828 | .ft P |
---|
1829 | .fi |
---|
1830 | .UNINDENT |
---|
1831 | .UNINDENT |
---|
1832 | .TS |
---|
1833 | center; |
---|
1834 | |l|l|. |
---|
1835 | _ |
---|
1836 | T{ |
---|
1837 | Option |
---|
1838 | T} T{ |
---|
1839 | Description |
---|
1840 | T} |
---|
1841 | _ |
---|
1842 | T{ |
---|
1843 | \fB\-a\fP |
---|
1844 | T} T{ |
---|
1845 | BEDPE file A. Each feature in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe. Output will be in BEDPE format. |
---|
1846 | T} |
---|
1847 | _ |
---|
1848 | T{ |
---|
1849 | \fB\-b\fP |
---|
1850 | T} T{ |
---|
1851 | BED file B. Use "stdin" if passing B with a UNIX pipe. |
---|
1852 | T} |
---|
1853 | _ |
---|
1854 | T{ |
---|
1855 | \fB\-abam\fP |
---|
1856 | T} T{ |
---|
1857 | BAM file A. Each end of each BAM alignment in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe: For example: samtools view ?Cb <BAM> | pairToBed ?Cabam stdin ?Cb genes.bed | samtools view \- |
---|
1858 | T} |
---|
1859 | _ |
---|
1860 | T{ |
---|
1861 | \fB\-ubam\fP |
---|
1862 | T} T{ |
---|
1863 | Write uncompressed BAM output. The default is write compressed BAM output. |
---|
1864 | T} |
---|
1865 | _ |
---|
1866 | T{ |
---|
1867 | \fB\-bedpe\fP |
---|
1868 | T} T{ |
---|
1869 | When using BAM input (\-abam), write output as BEDPE. The default is to write output in BAM when using \-abam. For example: pairToBed ?Cabam reads.bam ?Cb genes.bed ?Cbedpe |
---|
1870 | T} |
---|
1871 | _ |
---|
1872 | T{ |
---|
1873 | \fB\-ed\fP |
---|
1874 | T} T{ |
---|
1875 | Use BAM total edit distance (NM tag) for BEDPE score. Default for BEDPE is to use the \fIminimum\fP of the two mapping qualities for the pair. When \-ed is used the \fItotal\fP edit distance from the two mates is reported as the score. |
---|
1876 | T} |
---|
1877 | _ |
---|
1878 | T{ |
---|
1879 | \fB\-f\fP |
---|
1880 | T} T{ |
---|
1881 | Minimum overlap required as a fraction of A. Default is 1E\-9 (i.e. 1bp). |
---|
1882 | T} |
---|
1883 | _ |
---|
1884 | T{ |
---|
1885 | \fB\-s\fP |
---|
1886 | T} T{ |
---|
1887 | Force "strandedness". That is, only report hits in B that overlap A on the \fBsame\fP strand. By default, overlaps are reported without respect to strand. |
---|
1888 | T} |
---|
1889 | _ |
---|
1890 | T{ |
---|
1891 | \fB\-type\fP |
---|
1892 | T} T{ |
---|
1893 | Approach to reporting overlaps between BEDPE and BED. |
---|
1894 | .INDENT 0.0 |
---|
1895 | .INDENT 3.5 |
---|
1896 | .INDENT 0.0 |
---|
1897 | .INDENT 3.5 |
---|
1898 | \fBeither\-\fP Report overlaps if either end of A overlaps B. |
---|
1899 | .INDENT 0.0 |
---|
1900 | .IP \(bu 2 |
---|
1901 | \fIDefault\fP |
---|
1902 | .UNINDENT |
---|
1903 | .sp |
---|
1904 | \fBneither\-\fP Report A if neither end of A overlaps B. |
---|
1905 | .sp |
---|
1906 | \fBxor\-\fP Report overlaps if one and only one end of A overlaps B. |
---|
1907 | .sp |
---|
1908 | \fBboth\-\fP Report overlaps if both ends of A overlap B. |
---|
1909 | .sp |
---|
1910 | \fBnotboth\-\fP Report overlaps if neither end or one and only one end of A overlap B. |
---|
1911 | .sp |
---|
1912 | \fBispan\-\fP Report overlaps between [end1, start2] of A and B. |
---|
1913 | .INDENT 0.0 |
---|
1914 | .IP \(bu 2 |
---|
1915 | Note: If chrom1 <> chrom2, entry is ignored. |
---|
1916 | .UNINDENT |
---|
1917 | .UNINDENT |
---|
1918 | .UNINDENT |
---|
1919 | .sp |
---|
1920 | \fBospan\-\fP Report overlaps between [start1, end2] of A and B. |
---|
1921 | .INDENT 0.0 |
---|
1922 | .INDENT 3.5 |
---|
1923 | .INDENT 0.0 |
---|
1924 | .IP \(bu 2 |
---|
1925 | Note: If chrom1 <> chrom2, entry is ignored. |
---|
1926 | .UNINDENT |
---|
1927 | .sp |
---|
1928 | \fBnotispan\-\fP Report A if ispan of A doesn\(aqt overlap B. |
---|
1929 | \- Note: If chrom1 <> chrom2, entry is ignored. |
---|
1930 | .sp |
---|
1931 | \fBnotospan\-\fP Report A if ospan of A doesn\(aqt overlap B. |
---|
1932 | \- Note: If chrom1 <> chrom2, entry is ignored. |
---|
1933 | .UNINDENT |
---|
1934 | .UNINDENT |
---|
1935 | .UNINDENT |
---|
1936 | .UNINDENT |
---|
1937 | T} |
---|
1938 | _ |
---|
1939 | .TE |
---|
1940 | .SS 5.2.2 Default behavior |
---|
1941 | .sp |
---|
1942 | By default, a BEDPE / BAM feature will be reported if \fIeither\fP end overlaps a feature in the BED file. |
---|
1943 | In the example below, the left end of the pair overlaps B yet the right end does not. Thus, BEDPE/ |
---|
1944 | BAM A is reported since the default is to report A if either end overlaps B. |
---|
1945 | .sp |
---|
1946 | Default: Report A if \fIeither\fP end overlaps B. |
---|
1947 | .INDENT 0.0 |
---|
1948 | .INDENT 3.5 |
---|
1949 | .sp |
---|
1950 | .nf |
---|
1951 | .ft C |
---|
1952 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
1953 | |
---|
1954 | BEDPE/BAM A *****.................................***** |
---|
1955 | |
---|
1956 | BED File B ^^^^^^^^ ^^^^^^ |
---|
1957 | |
---|
1958 | Result =====.................................===== |
---|
1959 | .ft P |
---|
1960 | .fi |
---|
1961 | .UNINDENT |
---|
1962 | .UNINDENT |
---|
1963 | .SS 5.2.3 (\-type)Optional overlap requirements |
---|
1964 | .sp |
---|
1965 | Using then \fB\-type\fP option, \fBpairToBed\fP provides several other overlap requirements for controlling how |
---|
1966 | overlaps between BEDPE/BAM A and BED B are reported. The examples below illustrate how each |
---|
1967 | option behaves. |
---|
1968 | .sp |
---|
1969 | \fB\-type both\fP: Report A only if \fIboth\fP ends overlap B. |
---|
1970 | .INDENT 0.0 |
---|
1971 | .INDENT 3.5 |
---|
1972 | .sp |
---|
1973 | .nf |
---|
1974 | .ft C |
---|
1975 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
1976 | |
---|
1977 | BEDPE/BAM A *****.................................***** |
---|
1978 | |
---|
1979 | BED File B ^^^^^^^^ ^^^^^^ |
---|
1980 | |
---|
1981 | Result |
---|
1982 | |
---|
1983 | |
---|
1984 | |
---|
1985 | BEDPE/BAM A *****.................................***** |
---|
1986 | |
---|
1987 | BED File B ^^^^^^^^ ^^^^^^ |
---|
1988 | |
---|
1989 | Result =====.................................===== |
---|
1990 | .ft P |
---|
1991 | .fi |
---|
1992 | .UNINDENT |
---|
1993 | .UNINDENT |
---|
1994 | .sp |
---|
1995 | \fB\-type neither\fP: Report A only if \fIneither\fP end overlaps B. |
---|
1996 | .INDENT 0.0 |
---|
1997 | .INDENT 3.5 |
---|
1998 | .sp |
---|
1999 | .nf |
---|
2000 | .ft C |
---|
2001 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2002 | |
---|
2003 | BEDPE/BAM A *****.................................***** |
---|
2004 | |
---|
2005 | BED File B ^^^^^^^^ ^^^^^^ |
---|
2006 | |
---|
2007 | Result |
---|
2008 | |
---|
2009 | |
---|
2010 | |
---|
2011 | BEDPE/BAM A *****.................................***** |
---|
2012 | |
---|
2013 | BED File B ^^^^ ^^^^^^ |
---|
2014 | |
---|
2015 | Result =====.................................===== |
---|
2016 | .ft P |
---|
2017 | .fi |
---|
2018 | .UNINDENT |
---|
2019 | .UNINDENT |
---|
2020 | .sp |
---|
2021 | \fB\-type xor\fP: Report A only if \fIone and only one\fP end overlaps B. |
---|
2022 | .INDENT 0.0 |
---|
2023 | .INDENT 3.5 |
---|
2024 | .sp |
---|
2025 | .nf |
---|
2026 | .ft C |
---|
2027 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2028 | |
---|
2029 | BEDPE/BAM A *****.................................***** |
---|
2030 | |
---|
2031 | BED File B ^^^^^^^^ ^^^^^^ |
---|
2032 | |
---|
2033 | Result =====.................................===== |
---|
2034 | |
---|
2035 | |
---|
2036 | |
---|
2037 | BEDPE/BAM A *****.................................***** |
---|
2038 | |
---|
2039 | BED File B ^^^^ ^^^^^^ |
---|
2040 | |
---|
2041 | Result |
---|
2042 | .ft P |
---|
2043 | .fi |
---|
2044 | .UNINDENT |
---|
2045 | .UNINDENT |
---|
2046 | .sp |
---|
2047 | \fB\-type notboth\fP: Report A only if \fIneither end\fP \fBor\fP \fIone and only one\fP end overlaps B. Thus "notboth" |
---|
2048 | includes what would be reported by "neither" and by "xor". |
---|
2049 | .INDENT 0.0 |
---|
2050 | .INDENT 3.5 |
---|
2051 | .sp |
---|
2052 | .nf |
---|
2053 | .ft C |
---|
2054 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2055 | |
---|
2056 | BEDPE/BAM A *****.................................***** |
---|
2057 | |
---|
2058 | BED File B ^^^^^^^^ ^^^^^^ |
---|
2059 | |
---|
2060 | Result =====.................................===== |
---|
2061 | |
---|
2062 | |
---|
2063 | |
---|
2064 | BEDPE/BAM A *****.................................***** |
---|
2065 | |
---|
2066 | BED File B ^^^ ^^^^^^ |
---|
2067 | |
---|
2068 | Result =====.................................===== |
---|
2069 | |
---|
2070 | |
---|
2071 | |
---|
2072 | BEDPE/BAM A *****.................................***** |
---|
2073 | |
---|
2074 | BED File B ^^^^ ^^^^^^ |
---|
2075 | |
---|
2076 | Result |
---|
2077 | .ft P |
---|
2078 | .fi |
---|
2079 | .UNINDENT |
---|
2080 | .UNINDENT |
---|
2081 | .sp |
---|
2082 | \fB\-type ispan\fP: Report A if it\(aqs "\fIinner span\fP" overlaps B. Applicable only to intra\-chromosomal features. |
---|
2083 | .INDENT 0.0 |
---|
2084 | .INDENT 3.5 |
---|
2085 | .sp |
---|
2086 | .nf |
---|
2087 | .ft C |
---|
2088 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2089 | |
---|
2090 | Inner span |\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-| |
---|
2091 | |
---|
2092 | BEDPE/BAM A *****.................................***** |
---|
2093 | |
---|
2094 | BED File B ^^^^^^^^ |
---|
2095 | |
---|
2096 | Result =====.................................===== |
---|
2097 | |
---|
2098 | |
---|
2099 | |
---|
2100 | BEDPE/BAM A =====.................................===== |
---|
2101 | |
---|
2102 | BED File B ==== |
---|
2103 | |
---|
2104 | Result |
---|
2105 | .ft P |
---|
2106 | .fi |
---|
2107 | .UNINDENT |
---|
2108 | .UNINDENT |
---|
2109 | .sp |
---|
2110 | \fB\-type ospan\fP: Report A if it\(aqs "\fIouter span\fP" overlaps B. Applicable only to intra\-chromosomal features. |
---|
2111 | .INDENT 0.0 |
---|
2112 | .INDENT 3.5 |
---|
2113 | .sp |
---|
2114 | .nf |
---|
2115 | .ft C |
---|
2116 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2117 | |
---|
2118 | Outer span |\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-| |
---|
2119 | |
---|
2120 | BEDPE/BAM A *****.................................***** |
---|
2121 | |
---|
2122 | BED File B ^^^^^^^^^^^^ |
---|
2123 | |
---|
2124 | Result =====.................................===== |
---|
2125 | |
---|
2126 | |
---|
2127 | |
---|
2128 | BEDPE/BAM A *****.................................***** |
---|
2129 | |
---|
2130 | BED File B ^^^^ |
---|
2131 | |
---|
2132 | Result |
---|
2133 | .ft P |
---|
2134 | .fi |
---|
2135 | .UNINDENT |
---|
2136 | .UNINDENT |
---|
2137 | .sp |
---|
2138 | \fB\-type notispan\fP: Report A only if it\(aqs "\fIinner span\fP" does not overlap B. Applicable only to intrachromosomal |
---|
2139 | features. |
---|
2140 | .INDENT 0.0 |
---|
2141 | .INDENT 3.5 |
---|
2142 | .sp |
---|
2143 | .nf |
---|
2144 | .ft C |
---|
2145 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2146 | |
---|
2147 | Inner span |\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-| |
---|
2148 | |
---|
2149 | BEDPE/BAM A *****.................................***** |
---|
2150 | |
---|
2151 | BED File B ^^^^^^^^ |
---|
2152 | |
---|
2153 | Result |
---|
2154 | |
---|
2155 | |
---|
2156 | |
---|
2157 | BEDPE/BAM A *****.................................***** |
---|
2158 | |
---|
2159 | BED File B ^^^^ |
---|
2160 | |
---|
2161 | Result =====.................................===== |
---|
2162 | .ft P |
---|
2163 | .fi |
---|
2164 | .UNINDENT |
---|
2165 | .UNINDENT |
---|
2166 | .sp |
---|
2167 | \fB\-type notospan\fP: Report A if it\(aqs "\fIouter span\fP" overlaps B. Applicable only to intra\-chromosomal |
---|
2168 | features. |
---|
2169 | .INDENT 0.0 |
---|
2170 | .INDENT 3.5 |
---|
2171 | .sp |
---|
2172 | .nf |
---|
2173 | .ft C |
---|
2174 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2175 | |
---|
2176 | Outer span |\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-| |
---|
2177 | |
---|
2178 | BEDPE/BAM A *****.................................***** |
---|
2179 | |
---|
2180 | BED File B ^^^^^^^^^^^^ |
---|
2181 | |
---|
2182 | Result |
---|
2183 | |
---|
2184 | |
---|
2185 | |
---|
2186 | BEDPE/BAM A *****.................................***** |
---|
2187 | |
---|
2188 | BED File B ^^^^ |
---|
2189 | |
---|
2190 | Result =====.................................===== |
---|
2191 | .ft P |
---|
2192 | .fi |
---|
2193 | .UNINDENT |
---|
2194 | .UNINDENT |
---|
2195 | .SS 5.2.4 (\-f)Requiring a minimum overlap fraction |
---|
2196 | .sp |
---|
2197 | By default, \fBpairToBed\fP will report an overlap between A and B so long as there is at least one base |
---|
2198 | pair is overlapping on either end. Yet sometimes you may want to restrict reported overlaps between A |
---|
2199 | and B to cases where the feature in B overlaps at least X% (e.g. 50%) of A. The \fB?Cf\fP option does exactly |
---|
2200 | this. The \fB\-f\fP option may also be combined with the \-type option for additional control. For example, |
---|
2201 | combining \fB\-f 0.50\fP with \fB\-type both\fP requires that both ends of A have at least 50% overlap with a |
---|
2202 | feature in B. |
---|
2203 | .sp |
---|
2204 | For example, report A only at least 50% of one of the two ends is overlapped by B. |
---|
2205 | .INDENT 0.0 |
---|
2206 | .INDENT 3.5 |
---|
2207 | .sp |
---|
2208 | .nf |
---|
2209 | .ft C |
---|
2210 | pairToBed \-a A.bedpe \-b B.bed \-f 0.5 |
---|
2211 | |
---|
2212 | |
---|
2213 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2214 | |
---|
2215 | BEDPE/BAM A *****.................................***** |
---|
2216 | |
---|
2217 | BED File B ^^ ^^^^^^ |
---|
2218 | |
---|
2219 | Result |
---|
2220 | |
---|
2221 | |
---|
2222 | |
---|
2223 | BEDPE/BAM A *****.................................***** |
---|
2224 | |
---|
2225 | BED File B ^^^^ ^^^^^^ |
---|
2226 | |
---|
2227 | Result =====.................................===== |
---|
2228 | .ft P |
---|
2229 | .fi |
---|
2230 | .UNINDENT |
---|
2231 | .UNINDENT |
---|
2232 | .SS 5.2.5 (\-s)Enforcing "strandedness" |
---|
2233 | .sp |
---|
2234 | By default, \fBpairToBed\fP will report overlaps between features even if the features are on opposing |
---|
2235 | strands. However, if strand information is present in both files and the \fB"\-s"\fP option is used, overlaps will |
---|
2236 | only be reported when features are on the same strand. |
---|
2237 | .sp |
---|
2238 | For example, report A only at least 50% of one of the two ends is overlapped by B. |
---|
2239 | .INDENT 0.0 |
---|
2240 | .INDENT 3.5 |
---|
2241 | .sp |
---|
2242 | .nf |
---|
2243 | .ft C |
---|
2244 | pairToBed \-a A.bedpe \-b B.bed \-s |
---|
2245 | |
---|
2246 | |
---|
2247 | |
---|
2248 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2249 | |
---|
2250 | BEDPE/BAM A >>>>>.................................<<<<< |
---|
2251 | |
---|
2252 | BED File B << >>>>> |
---|
2253 | |
---|
2254 | Result |
---|
2255 | |
---|
2256 | |
---|
2257 | |
---|
2258 | BEDPE/BAM A >>>>>.................................<<<<< |
---|
2259 | |
---|
2260 | BED File B >> >>>>> |
---|
2261 | |
---|
2262 | Result >>>>>.................................<<<<< |
---|
2263 | .ft P |
---|
2264 | .fi |
---|
2265 | .UNINDENT |
---|
2266 | .UNINDENT |
---|
2267 | .SS 5.2.6 (\-abam)Default is to write BAM output when using BAM input |
---|
2268 | .sp |
---|
2269 | When comparing \fIpaired\fP alignments in BAM format (\fB\-abam\fP) to features in BED format (\fB\-b\fP), |
---|
2270 | \fBpairToBed\fP will , by default, write the output in BAM format. That is, each alignment in the BAM |
---|
2271 | file that meets the user\(aqs criteria will be written (to standard output) in BAM format. This serves as a |
---|
2272 | mechanism to create subsets of BAM alignments are of biological interest, etc. Note that both |
---|
2273 | alignments for each aligned pair will be written to the BAM output. |
---|
2274 | .sp |
---|
2275 | For example: |
---|
2276 | .INDENT 0.0 |
---|
2277 | .INDENT 3.5 |
---|
2278 | .sp |
---|
2279 | .nf |
---|
2280 | .ft C |
---|
2281 | pairToBed ?Cabam pairedReads.bam ?Cb simreps.bed | samtools view \- | head \-4 |
---|
2282 | |
---|
2283 | JOBU_0001:3:1:4:1060#0 99 chr10 42387928 29 50M = 42393091 5 2 1 3 |
---|
2284 | AA A A A C G G A A T T A T C G A A T G G A A T C G A A G A G A A T C T T C G A A C G G A C C C G A |
---|
2285 | dcgggggfbgfgdgggggggfdfgggcggggfcggcggggggagfgbggc XT:A:R NM:i:5 SM:i:0 AM:i:0 X0:i:3 X 1 : i : |
---|
2286 | 3 XM:i:5 XO:i:0 XG:i:0 MD:Z:0T0C33A5T4T3 |
---|
2287 | JOBU_0001:3:1:4:1060#0 147 chr10 42393091 0 50M = 42387928 \- 5 2 1 3 |
---|
2288 | AAATGGAATCGAATGGAATCAACATCAAATGGAATCAAATGGAATCATTG K g d c g g d e c d g |
---|
2289 | \ed\(gaggfcgcggffcgggc^cgfgccgggfc^gcdgg\ebg XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:3 X1:i:13 XM:i:2 X O : i : |
---|
2290 | 0 XG:i:0 MD:Z:21T14G13 |
---|
2291 | JOBU_0001:3:1:8:446#0 99 chr10 42388091 9 50M = 42392738 4 6 9 7 |
---|
2292 | GAATCGACTGGAATCATCATCGGATGGAAATGAATGGAATAATCATCGAA f _ O f f \(ga ] I e Y f f \(ga f f e d d c f e f c P \(ga c _ W \e \e R _ ] |
---|
2293 | _BBBBBBBBBBBBBBBB XT:A:U NM:i:4 SM:i:0 AM:i:0 X0:i:1 X1:i:3 XM:i:4 XO:i:0 XG:i:0 M D : Z : |
---|
2294 | 7A22C9C2T6 |
---|
2295 | JOBU_0001:3:1:8:446#0 147 chr10 42392738 9 50M = 42388091 \- 4 6 9 7 |
---|
2296 | TTATCGAATGCAATCGAATGGAATTATCGAATGCAATCGAATAGAATCAT df^ffec_JW[\(gaMWceRec\(ga\(gafee\(gadcecfeeZae\(gac] |
---|
2297 | f^cNeecfccf^ XT:A:R NM:i:1 SM:i:0 AM:i:0 X0:i:2 X1:i:2 XM:i:1 XO:i:0 XG:i:0 MD:Z:38A11 |
---|
2298 | .ft P |
---|
2299 | .fi |
---|
2300 | .UNINDENT |
---|
2301 | .UNINDENT |
---|
2302 | .SS 5.2.7 (\-bedpe)Output BEDPE format when using BAM input |
---|
2303 | .sp |
---|
2304 | When comparing \fIpaired\fP alignments in BAM format (\fB\-abam\fP) to features in BED format (\fB\-b\fP), |
---|
2305 | \fBpairToBed\fP will optionally write the output in BEDPE format. That is, each alignment in the BAM |
---|
2306 | file is converted to a 10 column BEDPE feature and if overlaps are found (or not) based on the user\(aqs |
---|
2307 | criteria, the BAM alignment will be reported in BEDPE format. The BEDPE "name" field is comprised |
---|
2308 | of the RNAME field in the BAM alignment. The "score" field is the mapping quality score from the |
---|
2309 | BAM alignment. |
---|
2310 | .sp |
---|
2311 | For example: |
---|
2312 | .INDENT 0.0 |
---|
2313 | .INDENT 3.5 |
---|
2314 | .sp |
---|
2315 | .nf |
---|
2316 | .ft C |
---|
2317 | pairToBed ?Cabam pairedReads.bam ?Cb simreps.bed \-bedpe | head \-5 |
---|
2318 | chr10 42387927 42387977 chr10 42393090 42393140 |
---|
2319 | JOBU_0001:3:1:4:1060#0 29 + \- |
---|
2320 | chr10 42388090 42388140 chr10 42392737 42392787 |
---|
2321 | JOBU_0001:3:1:8:446#0 9 + \- |
---|
2322 | chr10 42390552 42390602 chr10 42396045 42396095 |
---|
2323 | JOBU_0001:3:1:10:1865#0 9 + \- |
---|
2324 | chrX 139153741 139153791 chrX 139159018 139159068 |
---|
2325 | JOBU_0001:3:1:14:225#0 37 + \- |
---|
2326 | chr4 9236903 9236953 chr4 9242032 9242082 |
---|
2327 | JOBU_0001:3:1:15:1362#0 0 + \- |
---|
2328 | .ft P |
---|
2329 | .fi |
---|
2330 | .UNINDENT |
---|
2331 | .UNINDENT |
---|
2332 | .SS 5.3 pairToPair |
---|
2333 | .sp |
---|
2334 | \fBpairToPair\fP compares two BEDPE files in search of overlaps where each end of a BEDPE feature in A |
---|
2335 | overlaps with the ends of a feature in B. For example, using pairToPair, one could screen for the exact |
---|
2336 | same discordant paired\-end alignment in two files. This could suggest (among other things) that the |
---|
2337 | discordant pair suggests the same structural variation in each file/sample. |
---|
2338 | .SS 5.3.1 Usage and option summary |
---|
2339 | .sp |
---|
2340 | \fBUsage:\fP |
---|
2341 | .INDENT 0.0 |
---|
2342 | .INDENT 3.5 |
---|
2343 | .sp |
---|
2344 | .nf |
---|
2345 | .ft C |
---|
2346 | pairToPair [OPTIONS] \-a <BEDPE> \-b <BEDPE> |
---|
2347 | .ft P |
---|
2348 | .fi |
---|
2349 | .UNINDENT |
---|
2350 | .UNINDENT |
---|
2351 | .TS |
---|
2352 | center; |
---|
2353 | |l|l|. |
---|
2354 | _ |
---|
2355 | T{ |
---|
2356 | Option |
---|
2357 | T} T{ |
---|
2358 | Description |
---|
2359 | T} |
---|
2360 | _ |
---|
2361 | T{ |
---|
2362 | \fB\-a\fP |
---|
2363 | T} T{ |
---|
2364 | BEDPE file A. Each feature in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe. |
---|
2365 | T} |
---|
2366 | _ |
---|
2367 | T{ |
---|
2368 | \fB\-b\fP |
---|
2369 | T} T{ |
---|
2370 | BEDPE file B. Use "stdin" if passing B with a UNIX pipe. |
---|
2371 | T} |
---|
2372 | _ |
---|
2373 | T{ |
---|
2374 | \fB\-f\fP |
---|
2375 | T} T{ |
---|
2376 | Minimum overlap required as a fraction of A. Default is 1E\-9 (i.e. 1bp). |
---|
2377 | T} |
---|
2378 | _ |
---|
2379 | T{ |
---|
2380 | \fB\-is\fP |
---|
2381 | T} T{ |
---|
2382 | Force "strandedness". That is, only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand. |
---|
2383 | T} |
---|
2384 | _ |
---|
2385 | T{ |
---|
2386 | \fB\-type\fP |
---|
2387 | T} T{ |
---|
2388 | .INDENT 0.0 |
---|
2389 | .INDENT 3.5 |
---|
2390 | Approach to reporting overlaps between BEDPE and BED. |
---|
2391 | .UNINDENT |
---|
2392 | .UNINDENT |
---|
2393 | .nf |
---|
2394 | \fBeither\fP Report overlaps if either ends of A overlap B. |
---|
2395 | .fi |
---|
2396 | .sp |
---|
2397 | .INDENT 0.0 |
---|
2398 | .INDENT 3.5 |
---|
2399 | .nf |
---|
2400 | \fBneither\fP Report A if neither end of A overlaps B. |
---|
2401 | .fi |
---|
2402 | .sp |
---|
2403 | .nf |
---|
2404 | \fBboth\fP Report overlaps if both ends of A overlap B. \-\fIDefault behavior.\fP |
---|
2405 | .fi |
---|
2406 | .sp |
---|
2407 | .UNINDENT |
---|
2408 | .UNINDENT |
---|
2409 | T} |
---|
2410 | _ |
---|
2411 | .TE |
---|
2412 | .SS 5.3.2 Default behavior |
---|
2413 | .sp |
---|
2414 | By default, a BEDPE feature from A will be reported if \fIboth\fP ends overlap a feature in the BEDPE B |
---|
2415 | file. If strand information is present for the two BEDPE files, it will be further required that the |
---|
2416 | overlaps on each end be on the same strand. This way, an otherwise overlapping (in terms of genomic |
---|
2417 | locations) F/R alignment will not be matched with a R/R alignment. |
---|
2418 | .sp |
---|
2419 | Default: Report A if \fIboth\fP ends overlaps B. |
---|
2420 | .INDENT 0.0 |
---|
2421 | .INDENT 3.5 |
---|
2422 | .sp |
---|
2423 | .nf |
---|
2424 | .ft C |
---|
2425 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2426 | |
---|
2427 | BEDPE/BAM A *****.................................***** |
---|
2428 | |
---|
2429 | BED File B ^^^^^^^^ ^^^^^^ |
---|
2430 | |
---|
2431 | Result =====.................................===== |
---|
2432 | .ft P |
---|
2433 | .fi |
---|
2434 | .UNINDENT |
---|
2435 | .UNINDENT |
---|
2436 | .sp |
---|
2437 | Default when strand information is present in both BEDPE files: Report A if \fIboth\fP ends overlaps B \fIon |
---|
2438 | the same strands\fP\&. |
---|
2439 | .INDENT 0.0 |
---|
2440 | .INDENT 3.5 |
---|
2441 | .sp |
---|
2442 | .nf |
---|
2443 | .ft C |
---|
2444 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2445 | |
---|
2446 | BEDPE A >>>>>.................................>>>>> |
---|
2447 | |
---|
2448 | BEDPE B <<<<<.............................>>>>> |
---|
2449 | |
---|
2450 | Result |
---|
2451 | |
---|
2452 | |
---|
2453 | |
---|
2454 | BEDPE A >>>>>.................................>>>>> |
---|
2455 | |
---|
2456 | BEDPE B >>>>>.............................>>>>> |
---|
2457 | |
---|
2458 | Result >>>>>.................................>>>>> |
---|
2459 | .ft P |
---|
2460 | .fi |
---|
2461 | .UNINDENT |
---|
2462 | .UNINDENT |
---|
2463 | .SS 5.3.3 (\-type neither)Optional overlap requirements |
---|
2464 | .sp |
---|
2465 | Using then \fB\-type neither, pairToPair\fP will only report A if \fIneither\fP end overlaps with a BEDPE |
---|
2466 | feature in B. |
---|
2467 | .sp |
---|
2468 | \fB\-type neither\fP: Report A only if \fIneither\fP end overlaps B. |
---|
2469 | .INDENT 0.0 |
---|
2470 | .INDENT 3.5 |
---|
2471 | .sp |
---|
2472 | .nf |
---|
2473 | .ft C |
---|
2474 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2475 | |
---|
2476 | BEDPE/BAM A *****.................................***** |
---|
2477 | |
---|
2478 | BED File B ^^^^^^^^......................................^^^^^^ |
---|
2479 | |
---|
2480 | Result |
---|
2481 | |
---|
2482 | |
---|
2483 | |
---|
2484 | BEDPE/BAM A *****.................................***** |
---|
2485 | |
---|
2486 | BED File B ^^^^................................................^^^^^^ |
---|
2487 | |
---|
2488 | Result =====.................................===== |
---|
2489 | .ft P |
---|
2490 | .fi |
---|
2491 | .UNINDENT |
---|
2492 | .UNINDENT |
---|
2493 | .SS 5.4 bamToBed |
---|
2494 | .sp |
---|
2495 | \fBbamToBed\fP is a general purpose tool that will convert sequence alignments in BAM format to either |
---|
2496 | BED6, BED12 or BEDPE format. This enables one to convert BAM files for use with all of the other |
---|
2497 | BEDTools. The CIGAR string is used to compute the alignment end coordinate in an "ungapped" |
---|
2498 | fashion. That is, match ("M"), deletion ("D"), and splice ("N") operations are observed when computing |
---|
2499 | alignment ends. |
---|
2500 | .SS 5.4.1 Usage and option summary |
---|
2501 | .sp |
---|
2502 | \fBUsage:\fP |
---|
2503 | .INDENT 0.0 |
---|
2504 | .INDENT 3.5 |
---|
2505 | .sp |
---|
2506 | .nf |
---|
2507 | .ft C |
---|
2508 | bamToBed [OPTIONS] \-i <BAM> |
---|
2509 | .ft P |
---|
2510 | .fi |
---|
2511 | .UNINDENT |
---|
2512 | .UNINDENT |
---|
2513 | .TS |
---|
2514 | center; |
---|
2515 | |l|l|. |
---|
2516 | _ |
---|
2517 | T{ |
---|
2518 | Option |
---|
2519 | T} T{ |
---|
2520 | Description |
---|
2521 | T} |
---|
2522 | _ |
---|
2523 | T{ |
---|
2524 | \fB\-bedpe\fP |
---|
2525 | T} T{ |
---|
2526 | .INDENT 0.0 |
---|
2527 | .INDENT 3.5 |
---|
2528 | .INDENT 0.0 |
---|
2529 | .TP |
---|
2530 | .B Write BAM alignments in BEDPE format. Only one alignment from paired\-end reads will be reported. Specifically, it each mate is aligned to the same chromosome, the BAM alignment reported will be the one where the BAM insert size is greater than zero. When the mate alignments are interchromosomal, the lexicographically lower chromosome will be reported first. Lastly, when an end is unmapped, the chromosome and strand will be set to "." and the start and end coordinates will be set to \-1. \fIBy default, this is disabled and the output will be reported in BED format\fP\&. |
---|
2531 | \fBNOTE: When using this option, it is required that the BAM file is sorted/grouped by the read name. This allows bamToBed to extract correct alignment coordinates for each end based on their respective CIGAR strings. It also assumes that the alignments for a given pair come in groups of twos. There is not yet a standard method for reporting multiple alignments using BAM. bamToBed will fail if an aligner does not report alignments in pairs\fP\&. |
---|
2532 | .UNINDENT |
---|
2533 | .UNINDENT |
---|
2534 | .UNINDENT |
---|
2535 | .sp |
---|
2536 | BAM files may be piped to bamToBed by specifying "\-i stdin". See example below. |
---|
2537 | T} |
---|
2538 | _ |
---|
2539 | T{ |
---|
2540 | \fB\-bed12\fP |
---|
2541 | T} T{ |
---|
2542 | Write "blocked" BED (a.k.a. BED12) format. This will convert "spliced" BAM alignments (denoted by the "N" CIGAR operation) to BED12. |
---|
2543 | T} |
---|
2544 | _ |
---|
2545 | T{ |
---|
2546 | \fB\-ed\fP |
---|
2547 | T} T{ |
---|
2548 | Use the "edit distance" tag (NM) for the BED score field. Default for BED is to use mapping quality. Default for BEDPE is to use the \fIminimum\fP of the two mapping qualities for the pair. When \-ed is used with \-bedpe, the total edit distance from the two mates is reported. |
---|
2549 | T} |
---|
2550 | _ |
---|
2551 | T{ |
---|
2552 | \fB\-tag\fP |
---|
2553 | T} T{ |
---|
2554 | Use other \fInumeric\fP BAM alignment tag for BED score. Default for BED is to use mapping quality. Disallowed with BEDPE output. |
---|
2555 | T} |
---|
2556 | _ |
---|
2557 | T{ |
---|
2558 | \fB\-color\fP |
---|
2559 | T} T{ |
---|
2560 | An R,G,B string for the color used with BED12 format. Default is (255,0,0). |
---|
2561 | T} |
---|
2562 | _ |
---|
2563 | T{ |
---|
2564 | \fB\-split\fP |
---|
2565 | T} T{ |
---|
2566 | Report each portion of a "split" BAM (i.e., having an "N" CIGAR operation) alignment as a distinct BED intervals. |
---|
2567 | T} |
---|
2568 | _ |
---|
2569 | .TE |
---|
2570 | .sp |
---|
2571 | By default, each alignment in the BAM file is converted to a 6 column BED. The BED "name" field is |
---|
2572 | comprised of the RNAME field in the BAM alignment. If mate information is available, the mate (e.g., |
---|
2573 | "/1" or "/2") field will be appended to the name. The "score" field is the mapping quality score from the |
---|
2574 | BAM alignment, unless the \fB\-ed\fP option is used. |
---|
2575 | .sp |
---|
2576 | Examples: |
---|
2577 | .INDENT 0.0 |
---|
2578 | .INDENT 3.5 |
---|
2579 | .sp |
---|
2580 | .nf |
---|
2581 | .ft C |
---|
2582 | bamToBed \-i reads.bam | head \-5 |
---|
2583 | chr7 118970079 118970129 TUPAC_0001:3:1:0:1452#0/1 37 \- |
---|
2584 | chr7 118965072 118965122 TUPAC_0001:3:1:0:1452#0/2 37 + |
---|
2585 | chr11 46769934 46769984 TUPAC_0001:3:1:0:1472#0/1 37 \- |
---|
2586 | |
---|
2587 | bamToBed \-i reads.bam \-tag NM | head \-5 |
---|
2588 | chr7 118970079 118970129 TUPAC_0001:3:1:0:1452#0/1 1 \- |
---|
2589 | chr7 118965072 118965122 TUPAC_0001:3:1:0:1452#0/2 3 + |
---|
2590 | chr11 46769934 46769984 TUPAC_0001:3:1:0:1472#0/1 1 \- |
---|
2591 | |
---|
2592 | bamToBed \-i reads.bam \-bedpe | head \-3 |
---|
2593 | chr7 118965072 118965122 chr7 118970079 118970129 |
---|
2594 | TUPAC_0001:3:1:0:1452#0 37 + \- |
---|
2595 | chr11 46765606 46765656 chr11 46769934 46769984 |
---|
2596 | TUPAC_0001:3:1:0:1472#0 37 + \- |
---|
2597 | chr20 54704674 54704724 chr20 54708987 54709037 |
---|
2598 | TUPAC_0001:3:1:1:1833#0 37 + |
---|
2599 | .ft P |
---|
2600 | .fi |
---|
2601 | .UNINDENT |
---|
2602 | .UNINDENT |
---|
2603 | .sp |
---|
2604 | One can easily use samtools and bamToBed together as part of a UNIX pipe. In this example, we will |
---|
2605 | only convert properly\-paired (BAM flag == 0x2) reads to BED format. |
---|
2606 | .INDENT 0.0 |
---|
2607 | .INDENT 3.5 |
---|
2608 | .sp |
---|
2609 | .nf |
---|
2610 | .ft C |
---|
2611 | samtools view \-bf 0x2 reads.bam | bamToBed \-i stdin | head |
---|
2612 | chr7 118970079 118970129 TUPAC_0001:3:1:0:1452#0/1 37 \- |
---|
2613 | chr7 118965072 118965122 TUPAC_0001:3:1:0:1452#0/2 37 + |
---|
2614 | chr11 46769934 46769984 TUPAC_0001:3:1:0:1472#0/1 37 \- |
---|
2615 | chr11 46765606 46765656 TUPAC_0001:3:1:0:1472#0/2 37 + |
---|
2616 | chr20 54704674 54704724 TUPAC_0001:3:1:1:1833#0/1 37 + |
---|
2617 | chr20 54708987 54709037 TUPAC_0001:3:1:1:1833#0/2 37 \- |
---|
2618 | chrX 9380413 9380463 TUPAC_0001:3:1:1:285#0/1 0 \- |
---|
2619 | chrX 9375861 9375911 TUPAC_0001:3:1:1:285#0/2 0 + |
---|
2620 | chrX 131756978 131757028 TUPAC_0001:3:1:2:523#0/1 37 + |
---|
2621 | chrX 131761790 131761840 TUPAC_0001:3:1:2:523#0/2 37 \- |
---|
2622 | .ft P |
---|
2623 | .fi |
---|
2624 | .UNINDENT |
---|
2625 | .UNINDENT |
---|
2626 | .SS 5.4.2 (\-split)Creating BED12 features from "spliced" BAM entries. |
---|
2627 | .sp |
---|
2628 | bamToBed will, by default, create a BED6 feature that represents the entire span of a spliced/split |
---|
2629 | BAM alignment. However, when using the \fB\-split\fP command, a BED12 feature is reported where BED |
---|
2630 | blocks will be created for each aligned portion of the sequencing read. |
---|
2631 | .INDENT 0.0 |
---|
2632 | .INDENT 3.5 |
---|
2633 | .sp |
---|
2634 | .nf |
---|
2635 | .ft C |
---|
2636 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2637 | |
---|
2638 | Exons *************** ********** |
---|
2639 | |
---|
2640 | BED/BAM A ^^^^^^^^^^^^....................................^^^^ |
---|
2641 | |
---|
2642 | Result =============== ==== |
---|
2643 | .ft P |
---|
2644 | .fi |
---|
2645 | .UNINDENT |
---|
2646 | .UNINDENT |
---|
2647 | .SS 5.5 windowBed |
---|
2648 | .sp |
---|
2649 | Similar to \fBintersectBed\fP, \fBwindowBed\fP searches for overlapping features in A and B. However, |
---|
2650 | \fBwindowBed\fP adds a specified number (1000, by default) of base pairs upstream and downstream of |
---|
2651 | each feature in A. In effect, this allows features in B that are "near" features in A to be detected. |
---|
2652 | .SS 5.5.1 Usage and option summary |
---|
2653 | .sp |
---|
2654 | \fBUsage:\fP |
---|
2655 | .INDENT 0.0 |
---|
2656 | .INDENT 3.5 |
---|
2657 | .sp |
---|
2658 | .nf |
---|
2659 | .ft C |
---|
2660 | windowBed [OPTIONS] \-a <BED/GFF/VCF> \-b <BED/GFF/VCF> |
---|
2661 | .ft P |
---|
2662 | .fi |
---|
2663 | .UNINDENT |
---|
2664 | .UNINDENT |
---|
2665 | .TS |
---|
2666 | center; |
---|
2667 | |l|l|. |
---|
2668 | _ |
---|
2669 | T{ |
---|
2670 | Option |
---|
2671 | T} T{ |
---|
2672 | Description |
---|
2673 | T} |
---|
2674 | _ |
---|
2675 | T{ |
---|
2676 | \fB\-abam\fP |
---|
2677 | T} T{ |
---|
2678 | BAM file A. Each BAM alignment in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe: For example: samtools view \-b <BAM> | windowBed \-abam stdin \-b genes.bed |
---|
2679 | T} |
---|
2680 | _ |
---|
2681 | T{ |
---|
2682 | \fB\-ubam\fP |
---|
2683 | T} T{ |
---|
2684 | Write uncompressed BAM output. The default is write compressed BAM output. |
---|
2685 | T} |
---|
2686 | _ |
---|
2687 | T{ |
---|
2688 | \fB\-bed\fP |
---|
2689 | T} T{ |
---|
2690 | When using BAM input (\-abam), write output as BED. The default is to write output in BAM when using \-abam. For example: windowBed \-abam reads.bam \-b genes.bed \-bed |
---|
2691 | T} |
---|
2692 | _ |
---|
2693 | T{ |
---|
2694 | \fB\-w\fP |
---|
2695 | T} T{ |
---|
2696 | Base pairs added upstream and downstream of each entry in A when searching for overlaps in B. \fIDefault is 1000 bp\fP\&. |
---|
2697 | T} |
---|
2698 | _ |
---|
2699 | T{ |
---|
2700 | \fB\-l\fP |
---|
2701 | T} T{ |
---|
2702 | Base pairs added upstream (left of) of each entry in A when searching for overlaps in B. \fIAllows one to create assymetrical "windows". Default is 1000bp\fP\&. |
---|
2703 | T} |
---|
2704 | _ |
---|
2705 | T{ |
---|
2706 | \fB\-r\fP |
---|
2707 | T} T{ |
---|
2708 | Base pairs added downstream (right of) of each entry in A when searching for overlaps in B. \fIAllows one to create assymetrical "windows". Default is 1000bp\fP\&. |
---|
2709 | T} |
---|
2710 | _ |
---|
2711 | T{ |
---|
2712 | \fB\-sw\fP |
---|
2713 | T} T{ |
---|
2714 | Define \-l and \-r based on strand. For example if used, \-l 500 for a negative\-stranded feature will add 500 bp downstream. \fIBy default, this is disabled\fP\&. |
---|
2715 | T} |
---|
2716 | _ |
---|
2717 | T{ |
---|
2718 | \fB\-sm\fP |
---|
2719 | T} T{ |
---|
2720 | Only report hits in B that overlap A on the same strand. \fIBy default, overlaps are reported without respect to strand\fP\&. |
---|
2721 | T} |
---|
2722 | _ |
---|
2723 | T{ |
---|
2724 | \fB\-u\fP |
---|
2725 | T} T{ |
---|
2726 | Write original A entry once if any overlaps found in B. In other words, just report the fact at least one overlap was found in B. |
---|
2727 | T} |
---|
2728 | _ |
---|
2729 | T{ |
---|
2730 | \fB\-c\fP |
---|
2731 | T} T{ |
---|
2732 | For each entry in A, report the number of hits in B while restricting to \-f. Reports 0 for A entries that have no overlap with B. |
---|
2733 | T} |
---|
2734 | _ |
---|
2735 | .TE |
---|
2736 | .SS 5.5.2 Default behavior |
---|
2737 | .sp |
---|
2738 | By default, \fBwindowBed\fP adds 1000 bp upstream and downstream of each A feature and searches for |
---|
2739 | features in B that overlap this "window". If an overlap is found in B, both the \fIoriginal\fP A feature and the |
---|
2740 | \fIoriginal\fP B feature are reported. For example, in the figure below, feature B1 would be found, but B2 |
---|
2741 | would not. |
---|
2742 | .INDENT 0.0 |
---|
2743 | .INDENT 3.5 |
---|
2744 | .sp |
---|
2745 | .nf |
---|
2746 | .ft C |
---|
2747 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2748 | "window" = 10 |
---|
2749 | BED File A <\-\-\-\-\-\-\-\-\-\-*************\-\-\-\-\-\-\-\-\-\-> |
---|
2750 | |
---|
2751 | BED File B ^^^^^^^^ ^^^^^^ |
---|
2752 | |
---|
2753 | Result ======== |
---|
2754 | .ft P |
---|
2755 | .fi |
---|
2756 | .UNINDENT |
---|
2757 | .UNINDENT |
---|
2758 | .sp |
---|
2759 | For example: |
---|
2760 | .INDENT 0.0 |
---|
2761 | .INDENT 3.5 |
---|
2762 | .sp |
---|
2763 | .nf |
---|
2764 | .ft C |
---|
2765 | cat A.bed |
---|
2766 | chr1 100 200 |
---|
2767 | |
---|
2768 | cat B.bed |
---|
2769 | chr1 500 1000 |
---|
2770 | chr1 1300 2000 |
---|
2771 | |
---|
2772 | windowBed \-a A.bed \-b B.bed |
---|
2773 | chr1 100 200 chr1 500 1000 |
---|
2774 | .ft P |
---|
2775 | .fi |
---|
2776 | .UNINDENT |
---|
2777 | .UNINDENT |
---|
2778 | .SS 5.5.3 (\-w)Defining a custom window size |
---|
2779 | .sp |
---|
2780 | Instead of using the default window size of 1000bp, one can define a custom, \fIsymmetric\fP window around |
---|
2781 | each feature in A using the \fB\-w\fP option. One should specify the window size in base pairs. For example, |
---|
2782 | a window of 5kb should be defined as \fB\-w 5000\fP\&. |
---|
2783 | .sp |
---|
2784 | For example (note that in contrast to the default behavior, the second B entry is reported): |
---|
2785 | .INDENT 0.0 |
---|
2786 | .INDENT 3.5 |
---|
2787 | .sp |
---|
2788 | .nf |
---|
2789 | .ft C |
---|
2790 | cat A.bed |
---|
2791 | chr1 100 200 |
---|
2792 | |
---|
2793 | cat B.bed |
---|
2794 | chr1 500 1000 |
---|
2795 | chr1 1300 2000 |
---|
2796 | |
---|
2797 | windowBed \-a A.bed \-b B.bed \-w 5000 |
---|
2798 | chr1 100 200 chr1 500 1000 |
---|
2799 | chr1 100 200 chr1 1300 2000 |
---|
2800 | .ft P |
---|
2801 | .fi |
---|
2802 | .UNINDENT |
---|
2803 | .UNINDENT |
---|
2804 | .SS 5.5.4 (\-l and \-r)Defining assymteric windows |
---|
2805 | .sp |
---|
2806 | One can also define asymmetric windows where a differing number of bases are added upstream and |
---|
2807 | downstream of each feature using the \fB\-l (upstream)\fP and \fB\-r (downstream)\fP options. |
---|
2808 | .sp |
---|
2809 | For example (note the difference between \-l 200 and \-l 300): |
---|
2810 | .INDENT 0.0 |
---|
2811 | .INDENT 3.5 |
---|
2812 | .sp |
---|
2813 | .nf |
---|
2814 | .ft C |
---|
2815 | cat A.bed |
---|
2816 | chr1 1000 2000 |
---|
2817 | |
---|
2818 | cat B.bed |
---|
2819 | chr1 500 800 |
---|
2820 | chr1 10000 20000 |
---|
2821 | |
---|
2822 | windowBed \-a A.bed \-b B.bed \-l 200 \-r 20000 |
---|
2823 | chr1 100 200 chr1 10000 20000 |
---|
2824 | |
---|
2825 | windowBed \-a A.bed \-b B.bed \-l 300 \-r 20000 |
---|
2826 | chr1 100 200 chr1 500 800 |
---|
2827 | chr1 100 200 chr1 10000 20000 |
---|
2828 | .ft P |
---|
2829 | .fi |
---|
2830 | .UNINDENT |
---|
2831 | .UNINDENT |
---|
2832 | .SS 5.5.5 (\-sw)Defining assymteric windows based on strand |
---|
2833 | .sp |
---|
2834 | Especially when dealing with gene annotations or RNA\-seq experiments, you may want to define |
---|
2835 | asymmetric windows based on "strand". For example, you may want to screen for overlaps that occur |
---|
2836 | within 5000 bp upstream of a gene (e.g. a promoter region) while screening only 1000 bp downstream of |
---|
2837 | the gene. By enabling the \fB\-sw\fP ("stranded" windows) option, the windows are added upstream or |
---|
2838 | downstream according to strand. For example, imagine one specifies \fB\-l 5000 \-r 1000\fP as well as the \fB\- |
---|
2839 | sw\fP option. In this case, forward stranded ("+") features will screen 5000 bp to the \fIleft\fP (that is, \fIlower\fP |
---|
2840 | genomic coordinates) and 1000 bp to the \fIright\fP (that is, \fIhigher\fP genomic coordinates). By contrast, |
---|
2841 | reverse stranded ("\-") features will screen 5000 bp to the \fIright\fP (that is, \fIhigher\fP genomic coordinates) and |
---|
2842 | 1000 bp to the \fIleft\fP (that is, \fIlower\fP genomic coordinates). |
---|
2843 | .sp |
---|
2844 | For example (note the difference between \-l 200 and \-l 300): |
---|
2845 | .INDENT 0.0 |
---|
2846 | .INDENT 3.5 |
---|
2847 | .sp |
---|
2848 | .nf |
---|
2849 | .ft C |
---|
2850 | cat A.bed |
---|
2851 | chr1 10000 20000 A.forward 1 + |
---|
2852 | chr1 10000 20000 A.reverse 1 \- |
---|
2853 | |
---|
2854 | cat B.bed |
---|
2855 | chr1 1000 8000 B1 |
---|
2856 | chr1 24000 32000 B2 |
---|
2857 | |
---|
2858 | windowBed \-a A.bed \-b B.bed \-l 5000 \-r 1000 \-sw |
---|
2859 | chr1 10000 20000 A.forward 1 + chr1 1000 8000 B1 |
---|
2860 | chr1 10000 20000 A.reverse 1 \- chr1 24000 32000 B2 |
---|
2861 | .ft P |
---|
2862 | .fi |
---|
2863 | .UNINDENT |
---|
2864 | .UNINDENT |
---|
2865 | .SS 5.5.6 (\-sm)Enforcing "strandedness" |
---|
2866 | .sp |
---|
2867 | This option behaves the same as the \-s option for intersectBed while scanning for overlaps within the |
---|
2868 | "window" surrounding A. See the discussion in the intersectBed section for details. |
---|
2869 | .SS 5.5.7 (\-u)Reporting the presence of at least one overlapping feature |
---|
2870 | .sp |
---|
2871 | This option behaves the same as for intersectBed while scanning for overlaps within the "window" |
---|
2872 | surrounding A. See the discussion in the intersectBed section for details. |
---|
2873 | .SS 5.5.8 (\-c)Reporting the number of overlapping features |
---|
2874 | .sp |
---|
2875 | This option behaves the same as for intersectBed while scanning for overlaps within the "window" |
---|
2876 | surrounding A. See the discussion in the intersectBed section for details. |
---|
2877 | .SS 5.5.9 (\-v)Reporting the absence of any overlapping features |
---|
2878 | .sp |
---|
2879 | This option behaves the same as for intersectBed while scanning for overlaps within the "window" |
---|
2880 | surrounding A. See the discussion in the intersectBed section for details. |
---|
2881 | .SS 5.6 closestBed |
---|
2882 | .sp |
---|
2883 | Similar to \fBintersectBed, closestBed\fP searches for overlapping features in A and B. In the event that |
---|
2884 | no feature in B overlaps the current feature in A, \fBclosestBed\fP will report the \fIclosest\fP (that is, least |
---|
2885 | genomic distance from the start or end of A) feature in B. For example, one might want to find which |
---|
2886 | is the closest gene to a significant GWAS polymorphism. Note that \fBclosestBed\fP will report an |
---|
2887 | overlapping feature as the closest\-\-\-that is, it does not restrict to closest \fInon\-overlapping\fP feature. |
---|
2888 | .SS 5.6.1 Usage and option summary |
---|
2889 | .sp |
---|
2890 | \fBUsage:\fP |
---|
2891 | .INDENT 0.0 |
---|
2892 | .INDENT 3.5 |
---|
2893 | .sp |
---|
2894 | .nf |
---|
2895 | .ft C |
---|
2896 | closestBed [OPTIONS] \-a <BED/GFF/VCF> \-b <BED/GFF/VCF> |
---|
2897 | .ft P |
---|
2898 | .fi |
---|
2899 | .UNINDENT |
---|
2900 | .UNINDENT |
---|
2901 | .TS |
---|
2902 | center; |
---|
2903 | |l|l|. |
---|
2904 | _ |
---|
2905 | T{ |
---|
2906 | Option |
---|
2907 | T} T{ |
---|
2908 | Description |
---|
2909 | T} |
---|
2910 | _ |
---|
2911 | T{ |
---|
2912 | \fB\-s\fP |
---|
2913 | T} T{ |
---|
2914 | Force strandedness. That is, find the closest feature in B overlaps A on the same strand. \fIBy default, this is disabled\fP\&. |
---|
2915 | T} |
---|
2916 | _ |
---|
2917 | T{ |
---|
2918 | \fB\-d\fP |
---|
2919 | T} T{ |
---|
2920 | In addition to the closest feature in B, report its distance to A as an extra column. The reported distance for overlapping features will be 0. |
---|
2921 | T} |
---|
2922 | _ |
---|
2923 | T{ |
---|
2924 | \fB\-t\fP |
---|
2925 | T} T{ |
---|
2926 | How ties for closest feature should be handled. This occurs when two features in B have exactly the same overlap with a feature in A. \fIBy default, all such features in B are reported\fP\&. |
---|
2927 | .INDENT 0.0 |
---|
2928 | .INDENT 3.5 |
---|
2929 | Here are the other choices controlling how ties are handled: |
---|
2930 | .sp |
---|
2931 | \fIall\-\fP Report all ties (default). |
---|
2932 | .sp |
---|
2933 | \fIfirst\-\fP Report the first tie that occurred in the B file. |
---|
2934 | .sp |
---|
2935 | \fIlast\-\fP Report the last tie that occurred in the B file. |
---|
2936 | .UNINDENT |
---|
2937 | .UNINDENT |
---|
2938 | T} |
---|
2939 | _ |
---|
2940 | .TE |
---|
2941 | .SS 5.6.2 Default behavior |
---|
2942 | .sp |
---|
2943 | \fBclosestBed\fP first searches for features in B that overlap a feature in A. If overlaps are found, the feature |
---|
2944 | in B that overlaps the highest fraction of A is reported. If no overlaps are found, \fBclosestBed\fP looks for |
---|
2945 | the feature in B that is \fIclosest\fP (that is, least genomic distance to the start or end of A) to A. For |
---|
2946 | example, in the figure below, feature B1 would be reported as the closest feature to A1. |
---|
2947 | .INDENT 0.0 |
---|
2948 | .INDENT 3.5 |
---|
2949 | .sp |
---|
2950 | .nf |
---|
2951 | .ft C |
---|
2952 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
2953 | |
---|
2954 | BED FILE A ************* |
---|
2955 | |
---|
2956 | BED File B ^^^^^^^^ ^^^^^^ |
---|
2957 | |
---|
2958 | Result ====== |
---|
2959 | .ft P |
---|
2960 | .fi |
---|
2961 | .UNINDENT |
---|
2962 | .UNINDENT |
---|
2963 | .sp |
---|
2964 | For example: |
---|
2965 | .INDENT 0.0 |
---|
2966 | .INDENT 3.5 |
---|
2967 | .sp |
---|
2968 | .nf |
---|
2969 | .ft C |
---|
2970 | cat A.bed |
---|
2971 | chr1 100 200 |
---|
2972 | |
---|
2973 | cat B.bed |
---|
2974 | chr1 500 1000 |
---|
2975 | chr1 1300 2000 |
---|
2976 | |
---|
2977 | closestBed \-a A.bed \-b B.bed |
---|
2978 | chr1 100 200 chr1 500 1000 |
---|
2979 | .ft P |
---|
2980 | .fi |
---|
2981 | .UNINDENT |
---|
2982 | .UNINDENT |
---|
2983 | .SS 5.6.3 (\-s)Enforcing "strandedness" |
---|
2984 | .sp |
---|
2985 | This option behaves the same as the \-s option for intersectBed while scanning for the closest |
---|
2986 | (overlapping or not) feature in B. See the discussion in the intersectBed section for details. |
---|
2987 | .SS 5.6.4 (\-t)Controlling how ties for "closest" are broken |
---|
2988 | .sp |
---|
2989 | When there are two or more features in B that overlap the \fIsame fraction\fP of A, \fBclosestBed\fP will, by |
---|
2990 | default, report both features in B. Imagine feature A is a SNP and file B contains genes. It can often |
---|
2991 | occur that two gene annotations (e.g. opposite strands) in B will overlap the SNP. As mentioned, the |
---|
2992 | default behavior is to report both such genes in B. However, the \-t option allows one to optionally |
---|
2993 | choose the just first or last feature (in terms of where it occurred in the input file, not chromosome |
---|
2994 | position) that occurred in B. |
---|
2995 | .sp |
---|
2996 | For example (note the difference between \-l 200 and \-l 300): |
---|
2997 | .INDENT 0.0 |
---|
2998 | .INDENT 3.5 |
---|
2999 | .sp |
---|
3000 | .nf |
---|
3001 | .ft C |
---|
3002 | cat A.bed |
---|
3003 | chr1 100 101 rs1234 |
---|
3004 | |
---|
3005 | cat B.bed |
---|
3006 | chr1 0 1000 geneA 100 + |
---|
3007 | chr1 0 1000 geneB 100 \- |
---|
3008 | |
---|
3009 | closestBed \-a A.bed \-b B.bed |
---|
3010 | chr1 100 101 rs1234 chr1 0 1000 geneA 100 + |
---|
3011 | chr1 100 101 rs1234 chr1 0 1000 geneB 100 \- |
---|
3012 | |
---|
3013 | closestBed \-a A.bed \-b B.bed \-t all |
---|
3014 | chr1 100 101 rs1234 chr1 0 1000 geneA 100 + |
---|
3015 | chr1 100 101 rs1234 chr1 0 1000 geneB 100 \- |
---|
3016 | |
---|
3017 | closestBed \-a A.bed \-b B.bed \-t first |
---|
3018 | chr1 100 101 rs1234 chr1 0 1000 geneA 100 + |
---|
3019 | |
---|
3020 | closestBed \-a A.bed \-b B.bed \-t last |
---|
3021 | chr1 100 101 rs1234 chr1 0 1000 geneB 100 \- |
---|
3022 | .ft P |
---|
3023 | .fi |
---|
3024 | .UNINDENT |
---|
3025 | .UNINDENT |
---|
3026 | .SS 5.6.5 (\-d)Reporting the distance to the closest feature in base pairs |
---|
3027 | .sp |
---|
3028 | ClosestBed will optionally report the distance to the closest feature in the B file using the \fB\-d\fP option. |
---|
3029 | When a feature in B overlaps a feature in A, a distance of 0 is reported. |
---|
3030 | .INDENT 0.0 |
---|
3031 | .INDENT 3.5 |
---|
3032 | .sp |
---|
3033 | .nf |
---|
3034 | .ft C |
---|
3035 | cat A.bed |
---|
3036 | chr1 100 200 |
---|
3037 | chr1 500 600 |
---|
3038 | |
---|
3039 | cat B.bed |
---|
3040 | chr1 500 1000 |
---|
3041 | chr1 1300 2000 |
---|
3042 | |
---|
3043 | closestBed \-a A.bed \-b B.bed \-d |
---|
3044 | chr1 100 200 chr1 500 1000 300 |
---|
3045 | chr1 500 600 chr1 500 1000 0 |
---|
3046 | .ft P |
---|
3047 | .fi |
---|
3048 | .UNINDENT |
---|
3049 | .UNINDENT |
---|
3050 | .SS 5.7 subtractBed |
---|
3051 | .sp |
---|
3052 | \fBsubtractBed\fP searches for features in B that overlap A. If an overlapping feature is found in B, the |
---|
3053 | overlapping portion is removed from A and the remaining portion of A is reported. If a feature in B |
---|
3054 | overlaps all of a feature in A, the A feature will not be reported. |
---|
3055 | .SS 5.7.1 Usage and option summary |
---|
3056 | .sp |
---|
3057 | Usage: |
---|
3058 | .INDENT 0.0 |
---|
3059 | .INDENT 3.5 |
---|
3060 | .sp |
---|
3061 | .nf |
---|
3062 | .ft C |
---|
3063 | subtractBed [OPTIONS] \-a <BED/GFF/VCF> \-b <BED/GFF/VCF> |
---|
3064 | .ft P |
---|
3065 | .fi |
---|
3066 | .UNINDENT |
---|
3067 | .UNINDENT |
---|
3068 | .TS |
---|
3069 | center; |
---|
3070 | |l|l|. |
---|
3071 | _ |
---|
3072 | T{ |
---|
3073 | Option |
---|
3074 | T} T{ |
---|
3075 | Description |
---|
3076 | T} |
---|
3077 | _ |
---|
3078 | T{ |
---|
3079 | \fB\-f\fP |
---|
3080 | T} T{ |
---|
3081 | Minimum overlap required as a fraction of A. Default is 1E\-9 (i.e. 1bp). |
---|
3082 | T} |
---|
3083 | _ |
---|
3084 | T{ |
---|
3085 | \fB\-s\fP |
---|
3086 | T} T{ |
---|
3087 | Force strandedness. That is, find the closest feature in B overlaps A on the same strand. \fIBy default, this is disabled\fP\&. |
---|
3088 | T} |
---|
3089 | _ |
---|
3090 | .TE |
---|
3091 | .SS 5.7.2 Default behavior |
---|
3092 | .sp |
---|
3093 | Figure: |
---|
3094 | .INDENT 0.0 |
---|
3095 | .INDENT 3.5 |
---|
3096 | .sp |
---|
3097 | .nf |
---|
3098 | .ft C |
---|
3099 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
3100 | |
---|
3101 | BED FILE A ************* ****** |
---|
3102 | |
---|
3103 | BED File B ^^^^^^^^ ^^^^^^^^^^^ |
---|
3104 | |
---|
3105 | Result ========= |
---|
3106 | .ft P |
---|
3107 | .fi |
---|
3108 | .UNINDENT |
---|
3109 | .UNINDENT |
---|
3110 | .sp |
---|
3111 | For example: |
---|
3112 | .INDENT 0.0 |
---|
3113 | .INDENT 3.5 |
---|
3114 | .sp |
---|
3115 | .nf |
---|
3116 | .ft C |
---|
3117 | cat A.bed |
---|
3118 | chr1 100 200 |
---|
3119 | chr1 10 20 |
---|
3120 | |
---|
3121 | cat B.bed |
---|
3122 | chr1 0 30 |
---|
3123 | chr1 180 300 |
---|
3124 | |
---|
3125 | subtractBed \-a A.bed \-b B.bed |
---|
3126 | chr1 100 180 |
---|
3127 | .ft P |
---|
3128 | .fi |
---|
3129 | .UNINDENT |
---|
3130 | .UNINDENT |
---|
3131 | .SS 5.7.3 (\-f)Requiring a minimal overlap fraction before subtracting |
---|
3132 | .sp |
---|
3133 | This option behaves the same as the \-f option for intersectBed. In this case, subtractBed will only |
---|
3134 | subtract an overlap with B if it covers at least the fraction of A defined by \-f. If an overlap is found, |
---|
3135 | but it does not meet the overlap fraction, the original A feature is reported without subtraction. |
---|
3136 | .sp |
---|
3137 | For example: |
---|
3138 | .INDENT 0.0 |
---|
3139 | .INDENT 3.5 |
---|
3140 | .sp |
---|
3141 | .nf |
---|
3142 | .ft C |
---|
3143 | cat A.bed |
---|
3144 | chr1 100 200 |
---|
3145 | |
---|
3146 | cat B.bed |
---|
3147 | chr1 180 300 |
---|
3148 | |
---|
3149 | subtractBed \-a A.bed \-b B.bed \-f 0.10 |
---|
3150 | chr1 100 180 |
---|
3151 | |
---|
3152 | subtractBed \-a A.bed \-b B.bed \-f 0.80 |
---|
3153 | chr1 100 200 |
---|
3154 | .ft P |
---|
3155 | .fi |
---|
3156 | .UNINDENT |
---|
3157 | .UNINDENT |
---|
3158 | .SS 5.7.4 (\-s)Enforcing "strandedness" |
---|
3159 | .sp |
---|
3160 | This option behaves the same as the \-s option for intersectBed while scanning for features in B that |
---|
3161 | should be subtracted from A. See the discussion in the intersectBed section for details. |
---|
3162 | .SS 5.8 mergeBed |
---|
3163 | .sp |
---|
3164 | \fBmergeBed\fP combines overlapping or "book\-ended" (that is, one base pair away) features in a feature file |
---|
3165 | into a single feature which spans all of the combined features. |
---|
3166 | .SS 5.8.1 Usage and option summary |
---|
3167 | .sp |
---|
3168 | Usage: |
---|
3169 | .INDENT 0.0 |
---|
3170 | .INDENT 3.5 |
---|
3171 | .sp |
---|
3172 | .nf |
---|
3173 | .ft C |
---|
3174 | mergeBed [OPTIONS] \-i <BED/GFF/VCF> |
---|
3175 | .ft P |
---|
3176 | .fi |
---|
3177 | .UNINDENT |
---|
3178 | .UNINDENT |
---|
3179 | .TS |
---|
3180 | center; |
---|
3181 | |l|l|. |
---|
3182 | _ |
---|
3183 | T{ |
---|
3184 | Option |
---|
3185 | T} T{ |
---|
3186 | Description |
---|
3187 | T} |
---|
3188 | _ |
---|
3189 | T{ |
---|
3190 | \fB\-s\fP |
---|
3191 | T} T{ |
---|
3192 | Force strandedness. That is, only merge features that are the same strand. \fIBy default, this is disabled\fP\&. |
---|
3193 | T} |
---|
3194 | _ |
---|
3195 | T{ |
---|
3196 | \fB\-n\fP |
---|
3197 | T} T{ |
---|
3198 | Report the number of BED entries that were merged. \fI1 is reported if no merging occurred\fP\&. |
---|
3199 | T} |
---|
3200 | _ |
---|
3201 | T{ |
---|
3202 | \fB\-d\fP |
---|
3203 | T} T{ |
---|
3204 | Maximum distance between features allowed for features to be merged. \fIDefault is 0. That is, overlapping and/or book\-ended features are merged\fP\&. |
---|
3205 | T} |
---|
3206 | _ |
---|
3207 | T{ |
---|
3208 | \fB\-nms\fP |
---|
3209 | T} T{ |
---|
3210 | Report the names of the merged features separated by semicolons. |
---|
3211 | T} |
---|
3212 | _ |
---|
3213 | .TE |
---|
3214 | .SS 5.8.2 Default behavior |
---|
3215 | .sp |
---|
3216 | Figure: |
---|
3217 | .INDENT 0.0 |
---|
3218 | .INDENT 3.5 |
---|
3219 | .sp |
---|
3220 | .nf |
---|
3221 | .ft C |
---|
3222 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
3223 | |
---|
3224 | BED FILE ************* *************** ********************** |
---|
3225 | ******** |
---|
3226 | |
---|
3227 | Result =============================== ====================== |
---|
3228 | .ft P |
---|
3229 | .fi |
---|
3230 | .UNINDENT |
---|
3231 | .UNINDENT |
---|
3232 | .sp |
---|
3233 | For example: |
---|
3234 | .INDENT 0.0 |
---|
3235 | .INDENT 3.5 |
---|
3236 | .sp |
---|
3237 | .nf |
---|
3238 | .ft C |
---|
3239 | cat A.bed |
---|
3240 | chr1 100 200 |
---|
3241 | chr1 180 250 |
---|
3242 | chr1 250 500 |
---|
3243 | chr1 501 1000 |
---|
3244 | |
---|
3245 | mergeBed \-i A.bed |
---|
3246 | chr1 100 500 |
---|
3247 | chr1 501 1000 |
---|
3248 | .ft P |
---|
3249 | .fi |
---|
3250 | .UNINDENT |
---|
3251 | .UNINDENT |
---|
3252 | .SS 5.8.3 (\-s)Enforcing "strandedness" |
---|
3253 | .sp |
---|
3254 | This option behaves the same as the \-s option for intersectBed while scanning for features that should |
---|
3255 | be merged. Only features on the same strand will be merged. See the discussion in the intersectBed |
---|
3256 | section for details. |
---|
3257 | .SS 5.8.4 (\-n)Reporting the number of features that were merged |
---|
3258 | .sp |
---|
3259 | The \-n option will report the number of features that were combined from the original file in order to |
---|
3260 | make the newly merged feature. If a feature in the original file was not merged with any other features, |
---|
3261 | a "1" is reported. |
---|
3262 | .sp |
---|
3263 | For example: |
---|
3264 | .INDENT 0.0 |
---|
3265 | .INDENT 3.5 |
---|
3266 | .sp |
---|
3267 | .nf |
---|
3268 | .ft C |
---|
3269 | cat A.bed |
---|
3270 | chr1 100 200 |
---|
3271 | chr1 180 250 |
---|
3272 | chr1 250 500 |
---|
3273 | chr1 501 1000 |
---|
3274 | |
---|
3275 | mergeBed \-i A.bed \-n |
---|
3276 | chr1 100 500 3 |
---|
3277 | chr1 501 1000 1 |
---|
3278 | .ft P |
---|
3279 | .fi |
---|
3280 | .UNINDENT |
---|
3281 | .UNINDENT |
---|
3282 | .SS 5.8.5 (\-d)Controlling how close two features must be in order to merge |
---|
3283 | .sp |
---|
3284 | By default, only overlapping or book\-ended features are combined into a new feature. However, one can |
---|
3285 | force mergeBed to combine more distant features with the \-d option. For example, were one to set \-d to |
---|
3286 | 1000, any features that overlap or are within 1000 base pairs of one another will be combined. |
---|
3287 | .sp |
---|
3288 | For example: |
---|
3289 | .INDENT 0.0 |
---|
3290 | .INDENT 3.5 |
---|
3291 | .sp |
---|
3292 | .nf |
---|
3293 | .ft C |
---|
3294 | cat A.bed |
---|
3295 | chr1 100 200 |
---|
3296 | chr1 501 1000 |
---|
3297 | |
---|
3298 | mergeBed \-i A.bed |
---|
3299 | chr1 100 200 |
---|
3300 | chr1 501 1000 |
---|
3301 | |
---|
3302 | mergeBed \-i A.bed \-d 1000 |
---|
3303 | chr1 100 200 1000 |
---|
3304 | .ft P |
---|
3305 | .fi |
---|
3306 | .UNINDENT |
---|
3307 | .UNINDENT |
---|
3308 | .SS 5.8.6 (\-nms)Reporting the names of the features that were merged |
---|
3309 | .sp |
---|
3310 | Occasionally, one might like to know that names of the features that were merged into a new feature. |
---|
3311 | The \-nms option will add an extra column to the mergeBed output which lists (separated by |
---|
3312 | semicolons) the names of the merged features. |
---|
3313 | .sp |
---|
3314 | For example: |
---|
3315 | .INDENT 0.0 |
---|
3316 | .INDENT 3.5 |
---|
3317 | .sp |
---|
3318 | .nf |
---|
3319 | .ft C |
---|
3320 | cat A.bed |
---|
3321 | chr1 100 200 A1 |
---|
3322 | chr1 150 300 A2 |
---|
3323 | chr1 250 500 A3 |
---|
3324 | |
---|
3325 | mergeBed \-i A.bed \-nms |
---|
3326 | chr1 100 500 A1;A2;A3 |
---|
3327 | .ft P |
---|
3328 | .fi |
---|
3329 | .UNINDENT |
---|
3330 | .UNINDENT |
---|
3331 | .SS 5.9 coverageBed |
---|
3332 | .sp |
---|
3333 | \fBcoverageBed\fP computes both the \fIdepth\fP and \fIbreadth\fP of coverage of features in file A across the features |
---|
3334 | in file B. For example, \fBcoverageBed\fP can compute the coverage of sequence alignments (file A) across 1 |
---|
3335 | kilobase (arbitrary) windows (file B) tiling a genome of interest. One advantage that \fBcoverageBed\fP |
---|
3336 | offers is that it not only \fIcounts\fP the number of features that overlap an interval in file B, it also |
---|
3337 | computes the fraction of bases in B interval that were overlapped by one or more features. Thus, |
---|
3338 | \fBcoverageBed\fP also computes the \fIbreadth\fP of coverage for each interval in B. |
---|
3339 | .SS 5.9.1 Usage and option summary |
---|
3340 | .sp |
---|
3341 | Usage: |
---|
3342 | .INDENT 0.0 |
---|
3343 | .INDENT 3.5 |
---|
3344 | .sp |
---|
3345 | .nf |
---|
3346 | .ft C |
---|
3347 | coverageBed [OPTIONS] \-a <BED/GFF/VCF> \-b <BED/GFF/VCF> |
---|
3348 | .ft P |
---|
3349 | .fi |
---|
3350 | .UNINDENT |
---|
3351 | .UNINDENT |
---|
3352 | .TS |
---|
3353 | center; |
---|
3354 | |l|l|. |
---|
3355 | _ |
---|
3356 | T{ |
---|
3357 | Option |
---|
3358 | T} T{ |
---|
3359 | Description |
---|
3360 | T} |
---|
3361 | _ |
---|
3362 | T{ |
---|
3363 | \fB\-abam\fP |
---|
3364 | T} T{ |
---|
3365 | .INDENT 0.0 |
---|
3366 | .INDENT 3.5 |
---|
3367 | BAM file A. Each BAM alignment in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe: For example: |
---|
3368 | .UNINDENT |
---|
3369 | .UNINDENT |
---|
3370 | .nf |
---|
3371 | samtools view \-b <BAM> | intersectBed \-abam stdin \-b genes.bed |
---|
3372 | .fi |
---|
3373 | T} |
---|
3374 | _ |
---|
3375 | T{ |
---|
3376 | \fB\-s\fP |
---|
3377 | T} T{ |
---|
3378 | Force strandedness. That is, only features in A are only counted towards coverage in B if they are the same strand. \fIBy default, this is disabled and coverage is counted without respect to strand\fP\&. |
---|
3379 | T} |
---|
3380 | _ |
---|
3381 | T{ |
---|
3382 | \fB\-hist\fP |
---|
3383 | T} T{ |
---|
3384 | Report a histogram of coverage for each feature in B as well as a summary histogram for _all_ features in B. |
---|
3385 | .nf |
---|
3386 | Output (tab delimited) after each feature in B: |
---|
3387 | .fi |
---|
3388 | .sp |
---|
3389 | .INDENT 0.0 |
---|
3390 | .INDENT 3.5 |
---|
3391 | .nf |
---|
3392 | 1) depth |
---|
3393 | 2) # bases at depth |
---|
3394 | 3) size of B |
---|
3395 | 4) % of B at depth |
---|
3396 | .fi |
---|
3397 | .sp |
---|
3398 | .UNINDENT |
---|
3399 | .UNINDENT |
---|
3400 | T} |
---|
3401 | _ |
---|
3402 | T{ |
---|
3403 | \fB\-d\fP |
---|
3404 | T} T{ |
---|
3405 | Report the depth at each position in each B feature. Positions reported are one based. Each position and depth follow the complete B feature. |
---|
3406 | T} |
---|
3407 | _ |
---|
3408 | T{ |
---|
3409 | \fB\-split\fP |
---|
3410 | T} T{ |
---|
3411 | Treat "split" BAM or BED12 entries as distinct BED intervals when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). |
---|
3412 | T} |
---|
3413 | _ |
---|
3414 | .TE |
---|
3415 | .SS 5.9.2 Default behavior |
---|
3416 | .sp |
---|
3417 | After each interval in B, \fBcoverageBed\fP will report: |
---|
3418 | .INDENT 0.0 |
---|
3419 | .IP 1. 3 |
---|
3420 | The number of features in A that overlapped (by at least one base pair) the B interval. |
---|
3421 | .IP 2. 3 |
---|
3422 | The number of bases in B that had non\-zero coverage from features in A. |
---|
3423 | .IP 3. 3 |
---|
3424 | The length of the entry in B. |
---|
3425 | .IP 4. 3 |
---|
3426 | The fraction of bases in B that had non\-zero coverage from features in A. |
---|
3427 | .UNINDENT |
---|
3428 | .sp |
---|
3429 | Below are the number of features in A (N=...) overlapping B and fraction of bases in B with coverage. |
---|
3430 | .INDENT 0.0 |
---|
3431 | .INDENT 3.5 |
---|
3432 | .sp |
---|
3433 | .nf |
---|
3434 | .ft C |
---|
3435 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
3436 | |
---|
3437 | BED FILE B *************** *************** ****** ************** |
---|
3438 | |
---|
3439 | BED File A ^^^^ ^^^^ ^^ ^^^^^^^^^ ^^^ ^^ ^^^^ |
---|
3440 | ^^^^^^^^ ^^^^^ ^^^^^ ^^ |
---|
3441 | |
---|
3442 | Result [ N=3, 10/15 ] [ N=1, 2/16 ] [N=1,6/6] [N=5, 11/12 ] |
---|
3443 | .ft P |
---|
3444 | .fi |
---|
3445 | .UNINDENT |
---|
3446 | .UNINDENT |
---|
3447 | .sp |
---|
3448 | For example: |
---|
3449 | .INDENT 0.0 |
---|
3450 | .INDENT 3.5 |
---|
3451 | .sp |
---|
3452 | .nf |
---|
3453 | .ft C |
---|
3454 | cat A.bed |
---|
3455 | chr1 10 20 |
---|
3456 | chr1 20 30 |
---|
3457 | chr1 30 40 |
---|
3458 | chr1 100 200 |
---|
3459 | |
---|
3460 | cat B.bed |
---|
3461 | chr1 0 100 |
---|
3462 | chr1 100 200 |
---|
3463 | chr2 0 100 |
---|
3464 | |
---|
3465 | coverageBed \-a A.bed \-b B.bed |
---|
3466 | chr1 0 100 3 30 100 0.3000000 |
---|
3467 | chr1 100 200 1 100 100 1.0000000 |
---|
3468 | chr2 0 100 0 0 100 0.0000000 |
---|
3469 | .ft P |
---|
3470 | .fi |
---|
3471 | .UNINDENT |
---|
3472 | .UNINDENT |
---|
3473 | .SS 5.9.4 (\-s)Calculating coverage by strand |
---|
3474 | .sp |
---|
3475 | Use the "\fB\-s\fP" option if one wants to only count coverage if features in A are on the same strand as the |
---|
3476 | feature / window in B. This is especially useful for RNA\-seq experiments. |
---|
3477 | .sp |
---|
3478 | For example (note the difference in coverage with and without \fB\-s\fP: |
---|
3479 | .INDENT 0.0 |
---|
3480 | .INDENT 3.5 |
---|
3481 | .sp |
---|
3482 | .nf |
---|
3483 | .ft C |
---|
3484 | cat A.bed |
---|
3485 | chr1 10 20 a1 1 \- |
---|
3486 | chr1 20 30 a2 1 \- |
---|
3487 | chr1 30 40 a3 1 \- |
---|
3488 | chr1 100 200 a4 1 + |
---|
3489 | |
---|
3490 | cat B.bed |
---|
3491 | chr1 0 100 b1 1 + |
---|
3492 | chr1 100 200 b2 1 \- |
---|
3493 | chr2 0 100 b3 1 + |
---|
3494 | |
---|
3495 | coverageBed \-a A.bed \-b B.bed |
---|
3496 | chr1 0 100 b1 1 + 3 30 100 0.3000000 |
---|
3497 | chr1 100 200 b2 1 \- 1 100 100 1.0000000 |
---|
3498 | chr2 0 100 b3 1 + 0 0 100 0.0000000 |
---|
3499 | |
---|
3500 | coverageBed \-a A.bed \-b B.bed \-s |
---|
3501 | chr1 0 100 b1 1 + 0 0 100 0.0000000 |
---|
3502 | chr1 100 200 b2 1 \- 0 0 100 0.0000000 |
---|
3503 | chr2 0 100 b3 1 + 0 0 100 0.0000000 |
---|
3504 | .ft P |
---|
3505 | .fi |
---|
3506 | .UNINDENT |
---|
3507 | .UNINDENT |
---|
3508 | .SS 5.9.5 (\-hist)Creating a histogram of coverage for each feature in the B file |
---|
3509 | .sp |
---|
3510 | One should use the "\fB\-hist\fP" option to create, for each interval in B, a histogram of coverage of the |
---|
3511 | features in A across B. |
---|
3512 | .sp |
---|
3513 | In this case, each entire feature in B will be reported, followed by the depth of coverage, the number of |
---|
3514 | bases at that depth, the size of the feature, and the fraction covered. After all of the features in B have |
---|
3515 | been reported, a histogram summarizing the coverage among all features in B will be reported. |
---|
3516 | .INDENT 0.0 |
---|
3517 | .INDENT 3.5 |
---|
3518 | .sp |
---|
3519 | .nf |
---|
3520 | .ft C |
---|
3521 | cat A.bed |
---|
3522 | chr1 10 20 a1 1 \- |
---|
3523 | chr1 20 30 a2 1 \- |
---|
3524 | chr1 30 40 a3 1 \- |
---|
3525 | chr1 100 200 a4 1 + |
---|
3526 | |
---|
3527 | cat B.bed |
---|
3528 | chr1 0 100 b1 1 + |
---|
3529 | chr1 100 200 b2 1 \- |
---|
3530 | chr2 0 100 b3 1 + |
---|
3531 | |
---|
3532 | coverageBed \-a A.bed \-b B.bed \-hist |
---|
3533 | chr1 0 100 b1 1 + 0 70 100 0.7000000 |
---|
3534 | chr1 0 100 b1 1 + 1 30 100 0.3000000 |
---|
3535 | chr1 100 200 b2 1 \- 1 100 100 1.0000000 |
---|
3536 | chr2 0 100 b3 1 + 0 100 100 1.0000000 |
---|
3537 | all 0 170 300 0.5666667 |
---|
3538 | all 1 130 300 0.4333333 |
---|
3539 | .ft P |
---|
3540 | .fi |
---|
3541 | .UNINDENT |
---|
3542 | .UNINDENT |
---|
3543 | .SS 5.9.6 (\-hist)Reporting the per\-base of coverage for each feature in the B file |
---|
3544 | .sp |
---|
3545 | One should use the "\fB\-d\fP" option to create, for each interval in B, a detailed list of coverage at each of the |
---|
3546 | positions across each B interval. |
---|
3547 | .sp |
---|
3548 | The output will consist of a line for each one\-based position in each B feature, followed by the coverage |
---|
3549 | detected at that position. |
---|
3550 | .INDENT 0.0 |
---|
3551 | .INDENT 3.5 |
---|
3552 | .sp |
---|
3553 | .nf |
---|
3554 | .ft C |
---|
3555 | cat A.bed |
---|
3556 | chr1 0 5 |
---|
3557 | chr1 3 8 |
---|
3558 | chr1 4 8 |
---|
3559 | chr1 5 9 |
---|
3560 | |
---|
3561 | cat B.bed |
---|
3562 | chr1 0 10 |
---|
3563 | |
---|
3564 | coverageBed \-a A.bed \-b B.bed \-d |
---|
3565 | chr1 0 10 B 1 1 |
---|
3566 | chr1 0 10 B 2 1 |
---|
3567 | chr1 0 10 B 3 1 |
---|
3568 | chr1 0 10 B 4 2 |
---|
3569 | chr1 0 10 B 5 3 |
---|
3570 | chr1 0 10 B 6 3 |
---|
3571 | chr1 0 10 B 7 3 |
---|
3572 | chr1 0 10 B 8 3 |
---|
3573 | chr1 0 10 B 9 1 |
---|
3574 | chr1 0 10 B 10 0 |
---|
3575 | .ft P |
---|
3576 | .fi |
---|
3577 | .UNINDENT |
---|
3578 | .UNINDENT |
---|
3579 | .SS 5.9.7 (\-split)Reporting coverage with spliced alignments or blocked BED features |
---|
3580 | .sp |
---|
3581 | As described in section 1.3.19, coverageBed will, by default, screen for overlaps against the entire span |
---|
3582 | of a spliced/split BAM alignment or blocked BED12 feature. When dealing with RNA\-seq reads, for |
---|
3583 | example, one typically wants to only tabulate coverage for the portions of the reads that come from |
---|
3584 | exons (and ignore the interstitial intron sequence). The \fB\-split\fP command allows for such coverage to be |
---|
3585 | performed. |
---|
3586 | .SS 5.10 genomeCoverageBed |
---|
3587 | .sp |
---|
3588 | \fBgenomeCoverageBed\fP computes a histogram of feature coverage (e.g., aligned sequences) for a given |
---|
3589 | genome. Optionally, by using the \fB\-d\fP option, it will report the depth of coverage at \fIeach base\fP on each |
---|
3590 | chromosome in the genome file (\fB\-g\fP). |
---|
3591 | .SS 5.10.1 Usage and option summary |
---|
3592 | .sp |
---|
3593 | Usage: |
---|
3594 | .INDENT 0.0 |
---|
3595 | .INDENT 3.5 |
---|
3596 | .sp |
---|
3597 | .nf |
---|
3598 | .ft C |
---|
3599 | genomeCoverageBed [OPTIONS] \-i <BED> \-g <GENOME> |
---|
3600 | .ft P |
---|
3601 | .fi |
---|
3602 | .UNINDENT |
---|
3603 | .UNINDENT |
---|
3604 | .sp |
---|
3605 | NOTE: genomeCoverageBed requires that the input BED file be sorted by |
---|
3606 | chromosome. A simple sort \-k1,1 will suffice. |
---|
3607 | .TS |
---|
3608 | center; |
---|
3609 | |l|l|. |
---|
3610 | _ |
---|
3611 | T{ |
---|
3612 | Option |
---|
3613 | T} T{ |
---|
3614 | Description |
---|
3615 | T} |
---|
3616 | _ |
---|
3617 | T{ |
---|
3618 | \fB\-ibam\fP |
---|
3619 | T} T{ |
---|
3620 | .INDENT 0.0 |
---|
3621 | .INDENT 3.5 |
---|
3622 | BAM file as input for coverage. Each BAM alignment in A added to the total coverage for the genome. Use "stdin" if passing it with a UNIX pipe: For example: |
---|
3623 | .UNINDENT |
---|
3624 | .UNINDENT |
---|
3625 | .nf |
---|
3626 | samtools view \-b <BAM> | genomeCoverageBed \-ibam stdin \-g hg18.genome |
---|
3627 | .fi |
---|
3628 | T} |
---|
3629 | _ |
---|
3630 | T{ |
---|
3631 | \fB\-d\fP |
---|
3632 | T} T{ |
---|
3633 | Report the depth at each genome position. \fIDefault behavior is to report a histogram\fP\&. |
---|
3634 | T} |
---|
3635 | _ |
---|
3636 | T{ |
---|
3637 | \fB\-max\fP |
---|
3638 | T} T{ |
---|
3639 | Combine all positions with a depth >= max into a single bin in the histogram. |
---|
3640 | T} |
---|
3641 | _ |
---|
3642 | T{ |
---|
3643 | \fB\-bg\fP |
---|
3644 | T} T{ |
---|
3645 | Report depth in BedGraph format. For details, see: \fI\%http://genome.ucsc.edu/goldenPath/help/bedgraph.html\fP |
---|
3646 | T} |
---|
3647 | _ |
---|
3648 | T{ |
---|
3649 | \fB\-bga\fP |
---|
3650 | T} T{ |
---|
3651 | Report depth in BedGraph format, as above (i.e., \-bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep \-w 0$" to the output. |
---|
3652 | T} |
---|
3653 | _ |
---|
3654 | T{ |
---|
3655 | \fB\-split\fP |
---|
3656 | T} T{ |
---|
3657 | Treat "split" BAM or BED12 entries as distinct BED intervals when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). |
---|
3658 | T} |
---|
3659 | _ |
---|
3660 | T{ |
---|
3661 | \fB\-strand\fP |
---|
3662 | T} T{ |
---|
3663 | Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6). |
---|
3664 | T} |
---|
3665 | _ |
---|
3666 | .TE |
---|
3667 | .SS 5.10.2 Default behavior |
---|
3668 | .sp |
---|
3669 | By default, \fBgenomeCoverageBed\fP will compute a histogram of coverage for the genome file provided. |
---|
3670 | The default output format is as follows: |
---|
3671 | 1. chromosome (or entire genome) |
---|
3672 | 2. depth of coverage from features in input file |
---|
3673 | 3. number of bases on chromosome (or genome) with depth equal to column 2. |
---|
3674 | 4. size of chromosome (or entire genome) in base pairs |
---|
3675 | 5. fraction of bases on chromosome (or entire genome) with depth equal to column 2. |
---|
3676 | .sp |
---|
3677 | For example: |
---|
3678 | .INDENT 0.0 |
---|
3679 | .INDENT 3.5 |
---|
3680 | .sp |
---|
3681 | .nf |
---|
3682 | .ft C |
---|
3683 | cat A.bed |
---|
3684 | chr1 10 20 |
---|
3685 | chr1 20 30 |
---|
3686 | chr2 0 500 |
---|
3687 | |
---|
3688 | cat my.genome |
---|
3689 | chr1 1000 |
---|
3690 | chr2 500 |
---|
3691 | |
---|
3692 | genomeCoverageBed \-i A.bed \-g my.genome |
---|
3693 | chr1 0 980 1000 0.98 |
---|
3694 | chr1 1 20 1000 0.02 |
---|
3695 | chr2 1 500 500 1 |
---|
3696 | genome 0 980 1500 0.653333 |
---|
3697 | genome 1 520 1500 0.346667 |
---|
3698 | .ft P |
---|
3699 | .fi |
---|
3700 | .UNINDENT |
---|
3701 | .UNINDENT |
---|
3702 | .SS 5.10.3 (\-max)Controlling the histogram\(aqs maximum depth |
---|
3703 | .sp |
---|
3704 | Using the \fB\-max\fP option, \fBgenomeCoverageBed\fP will "lump" all positions in the genome having feature |
---|
3705 | coverage greather than or equal to \fBmax\fP into the \fBmax\fP histogram bin. For example, if one sets \fB\-max\fP |
---|
3706 | equal to 50, the max depth reported in the output will be 50 and all positions with a depth >= 50 will |
---|
3707 | be represented in bin 50. |
---|
3708 | .SS 5.10.4 (\-d)Reporting "per\-base" genome coverage |
---|
3709 | .sp |
---|
3710 | Using the \fB\-d\fP option, \fBgenomeCoverageBed\fP will compute the depth of feature coverage for each base |
---|
3711 | on each chromosome in genome file provided. |
---|
3712 | .sp |
---|
3713 | The "per\-base" output format is as follows: |
---|
3714 | 1. chromosome |
---|
3715 | 2. chromosome position |
---|
3716 | 3. depth (number) of features overlapping this chromosome position. |
---|
3717 | .sp |
---|
3718 | For example: |
---|
3719 | .INDENT 0.0 |
---|
3720 | .INDENT 3.5 |
---|
3721 | .sp |
---|
3722 | .nf |
---|
3723 | .ft C |
---|
3724 | cat A.bed |
---|
3725 | chr1 10 20 |
---|
3726 | chr1 20 30 |
---|
3727 | chr2 0 500 |
---|
3728 | |
---|
3729 | cat my.genome |
---|
3730 | chr1 1000 |
---|
3731 | chr2 500 |
---|
3732 | |
---|
3733 | genomeCoverageBed \-i A.bed \-g my.genome \-d | head \-15 | tail \-n 10 |
---|
3734 | chr1 6 0 |
---|
3735 | chr1 7 0 |
---|
3736 | chr1 8 0 |
---|
3737 | chr1 9 0 |
---|
3738 | chr1 10 0 |
---|
3739 | chr1 11 1 |
---|
3740 | chr1 12 1 |
---|
3741 | chr1 13 1 |
---|
3742 | chr1 14 1 |
---|
3743 | chr1 15 1 |
---|
3744 | .ft P |
---|
3745 | .fi |
---|
3746 | .UNINDENT |
---|
3747 | .UNINDENT |
---|
3748 | .SS 5.1.13 (\-split)Reporting coverage with spliced alignments or blocked BED features |
---|
3749 | .sp |
---|
3750 | As described in section 1.3.19, genomeCoverageBed will, by default, screen for overlaps against the |
---|
3751 | entire span of a spliced/split BAM alignment or blocked BED12 feature. When dealing with RNA\-seq |
---|
3752 | reads, for example, one typically wants to only screen for overlaps for the portions of the reads that |
---|
3753 | come from exons (and ignore the interstitial intron sequence). The \fB\-split\fP command allows for such |
---|
3754 | overlaps to be performed. |
---|
3755 | .sp |
---|
3756 | For additional details, please visit the Usage From The Wild site and have a look at example 5, |
---|
3757 | contributed by Assaf Gordon. |
---|
3758 | .SS 5.11 fastaFromBed |
---|
3759 | .sp |
---|
3760 | \fBfastaFromBed\fP extracts sequences from a FASTA file for each of the intervals defined in a BED file. |
---|
3761 | The headers in the input FASTA file must exactly match the chromosome column in the BED file. |
---|
3762 | .SS 5.11.1 Usage and option summary |
---|
3763 | .sp |
---|
3764 | Usage: |
---|
3765 | .INDENT 0.0 |
---|
3766 | .INDENT 3.5 |
---|
3767 | .sp |
---|
3768 | .nf |
---|
3769 | .ft C |
---|
3770 | fastaFromBed [OPTIONS] \-fi <input FASTA> \-bed <BED/GFF/VCF> \-fo <output FASTA> |
---|
3771 | .ft P |
---|
3772 | .fi |
---|
3773 | .UNINDENT |
---|
3774 | .UNINDENT |
---|
3775 | .TS |
---|
3776 | center; |
---|
3777 | |l|l|. |
---|
3778 | _ |
---|
3779 | T{ |
---|
3780 | Option |
---|
3781 | T} T{ |
---|
3782 | Description |
---|
3783 | T} |
---|
3784 | _ |
---|
3785 | T{ |
---|
3786 | \fB\-name\fP |
---|
3787 | T} T{ |
---|
3788 | Use the "name" column in the BED file for the FASTA headers in the output FASTA file. |
---|
3789 | T} |
---|
3790 | _ |
---|
3791 | T{ |
---|
3792 | \fB\-tab\fP |
---|
3793 | T} T{ |
---|
3794 | Report extract sequences in a tab\-delimited format instead of in FASTA format. |
---|
3795 | T} |
---|
3796 | _ |
---|
3797 | T{ |
---|
3798 | \fB\-s\fP |
---|
3799 | T} T{ |
---|
3800 | Force strandedness. If the feature occupies the antisense strand, the sequence will be reverse complemented. \fIDefault: strand information is ignored\fP\&. |
---|
3801 | T} |
---|
3802 | _ |
---|
3803 | .TE |
---|
3804 | .SS 5.11.2 Default behavior |
---|
3805 | .sp |
---|
3806 | \fBfastaFromBed\fP will extract the sequence defined by the coordinates in a BED interval and create a |
---|
3807 | new FASTA entry in the output file for each extracted sequence. By default, the FASTA header for each |
---|
3808 | extracted sequence will be formatted as follows: "<chrom>:<start>\-<end>". |
---|
3809 | .sp |
---|
3810 | For example: |
---|
3811 | .INDENT 0.0 |
---|
3812 | .INDENT 3.5 |
---|
3813 | .sp |
---|
3814 | .nf |
---|
3815 | .ft C |
---|
3816 | $ cat test.fa |
---|
3817 | >chr1 |
---|
3818 | AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG |
---|
3819 | |
---|
3820 | cat test.bed |
---|
3821 | chr1 5 10 |
---|
3822 | |
---|
3823 | fastaFromBed \-fi test.fa \-bed test.bed \-fo test.fa.out |
---|
3824 | |
---|
3825 | cat test.fa.out |
---|
3826 | >chr1:5\-10 |
---|
3827 | AAACC |
---|
3828 | .ft P |
---|
3829 | .fi |
---|
3830 | .UNINDENT |
---|
3831 | .UNINDENT |
---|
3832 | .SS 5.11.3 Using the BED "name" column as a FASTA header. |
---|
3833 | .sp |
---|
3834 | Using the \fB\-name\fP option, one can set the FASTA header for each extracted sequence to be the "name" |
---|
3835 | columns from the BED feature. |
---|
3836 | .sp |
---|
3837 | For example: |
---|
3838 | .INDENT 0.0 |
---|
3839 | .INDENT 3.5 |
---|
3840 | .sp |
---|
3841 | .nf |
---|
3842 | .ft C |
---|
3843 | cat test.fa |
---|
3844 | >chr1 |
---|
3845 | AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG |
---|
3846 | |
---|
3847 | cat test.bed |
---|
3848 | chr1 5 10 myseq |
---|
3849 | |
---|
3850 | fastaFromBed \-fi test.fa \-bed test.bed \-fo test.fa.out \-name |
---|
3851 | |
---|
3852 | cat test.fa.out |
---|
3853 | >myseq |
---|
3854 | AAACC |
---|
3855 | .ft P |
---|
3856 | .fi |
---|
3857 | .UNINDENT |
---|
3858 | .UNINDENT |
---|
3859 | .SS 5.11.4 Creating a tab\-delimited output file in lieu of FASTA output. |
---|
3860 | .sp |
---|
3861 | Using the \fB\-tab\fP option, the \fB\-fo\fP output file will be tab\-delimited instead of in FASTA format. |
---|
3862 | .sp |
---|
3863 | For example: |
---|
3864 | .INDENT 0.0 |
---|
3865 | .INDENT 3.5 |
---|
3866 | .sp |
---|
3867 | .nf |
---|
3868 | .ft C |
---|
3869 | cat test.fa |
---|
3870 | >chr1 |
---|
3871 | AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG |
---|
3872 | |
---|
3873 | cat test.bed |
---|
3874 | chr1 5 10 myseq |
---|
3875 | |
---|
3876 | fastaFromBed \-fi test.fa \-bed test.bed \-fo test.fa.out.tab \-name \-tab |
---|
3877 | |
---|
3878 | cat test.fa.out |
---|
3879 | myseq AAACC |
---|
3880 | .ft P |
---|
3881 | .fi |
---|
3882 | .UNINDENT |
---|
3883 | .UNINDENT |
---|
3884 | .SS 5.11.5 (\-s)Forcing the extracted sequence to reflect the requested strand |
---|
3885 | .sp |
---|
3886 | \fBfastaFromBed\fP will extract the sequence in the orientation defined in the strand column when the "\-s" |
---|
3887 | option is used. |
---|
3888 | .sp |
---|
3889 | For example: |
---|
3890 | .INDENT 0.0 |
---|
3891 | .INDENT 3.5 |
---|
3892 | .sp |
---|
3893 | .nf |
---|
3894 | .ft C |
---|
3895 | cat test.fa |
---|
3896 | >chr1 |
---|
3897 | AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG |
---|
3898 | |
---|
3899 | cat test.bed |
---|
3900 | chr1 20 25 forward 1 + |
---|
3901 | chr1 20 25 reverse 1 \- |
---|
3902 | |
---|
3903 | fastaFromBed \-fi test.fa \-bed test.bed \-s \-name \-fo test.fa.out |
---|
3904 | |
---|
3905 | cat test.fa.out |
---|
3906 | >forward |
---|
3907 | CGCTA |
---|
3908 | >reverse |
---|
3909 | TAGCG |
---|
3910 | .ft P |
---|
3911 | .fi |
---|
3912 | .UNINDENT |
---|
3913 | .UNINDENT |
---|
3914 | .SS 5.12 maskFastaFromBed |
---|
3915 | .sp |
---|
3916 | \fBmaskFastaFromBed\fP masks sequences in a FASTA file based on intervals defined in a feature file. The |
---|
3917 | headers in the input FASTA file must exactly match the chromosome column in the feature file. This |
---|
3918 | may be useful fro creating your own masked genome file based on custom annotations or for masking all |
---|
3919 | but your target regions when aligning sequence data from a targeted capture experiment. |
---|
3920 | .SS 5.12.1 Usage and option summary |
---|
3921 | .sp |
---|
3922 | Usage: |
---|
3923 | .INDENT 0.0 |
---|
3924 | .INDENT 3.5 |
---|
3925 | .sp |
---|
3926 | .nf |
---|
3927 | .ft C |
---|
3928 | maskFastaFromBed [OPTIONS] \-fi <input FASTA> \-bed <BED/GFF/VCF> \-fo <output FASTA> |
---|
3929 | .ft P |
---|
3930 | .fi |
---|
3931 | .UNINDENT |
---|
3932 | .UNINDENT |
---|
3933 | .sp |
---|
3934 | NOTE: The input and output FASTA files must be different. |
---|
3935 | .TS |
---|
3936 | center; |
---|
3937 | |l|l|. |
---|
3938 | _ |
---|
3939 | T{ |
---|
3940 | Option |
---|
3941 | T} T{ |
---|
3942 | Description |
---|
3943 | T} |
---|
3944 | _ |
---|
3945 | T{ |
---|
3946 | \fB\-soft\fP |
---|
3947 | T} T{ |
---|
3948 | Soft\-mask (that is, convert to lower\-case bases) the FASTA sequence. \fIBy default, hard\-masking (that is, conversion to Ns) is performed\fP\&. |
---|
3949 | T} |
---|
3950 | _ |
---|
3951 | .TE |
---|
3952 | .SS 5.12.2 Default behavior |
---|
3953 | .sp |
---|
3954 | \fBmaskFastaFromBed\fP will mask a FASTA file based on the intervals in a BED file. The newly masked |
---|
3955 | FASTA file is written to the output FASTA file. |
---|
3956 | .sp |
---|
3957 | For example: |
---|
3958 | .INDENT 0.0 |
---|
3959 | .INDENT 3.5 |
---|
3960 | .sp |
---|
3961 | .nf |
---|
3962 | .ft C |
---|
3963 | cat test.fa |
---|
3964 | >chr1 |
---|
3965 | AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG |
---|
3966 | |
---|
3967 | cat test.bed |
---|
3968 | chr1 5 10 |
---|
3969 | |
---|
3970 | maskFastaFromBed \-fi test.fa \-bed test.bed \-fo test.fa.out |
---|
3971 | |
---|
3972 | cat test.fa.out |
---|
3973 | >chr1 |
---|
3974 | AAAAANNNNNCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG |
---|
3975 | .ft P |
---|
3976 | .fi |
---|
3977 | .UNINDENT |
---|
3978 | .UNINDENT |
---|
3979 | .SS 5.12.3 Soft\-masking the FASTA file. |
---|
3980 | .sp |
---|
3981 | Using the \fB\-soft\fP option, one can optionally "soft\-mask" the FASTA file. |
---|
3982 | .sp |
---|
3983 | For example: |
---|
3984 | .INDENT 0.0 |
---|
3985 | .INDENT 3.5 |
---|
3986 | .sp |
---|
3987 | .nf |
---|
3988 | .ft C |
---|
3989 | cat test.fa |
---|
3990 | >chr1 |
---|
3991 | AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG |
---|
3992 | |
---|
3993 | cat test.bed |
---|
3994 | chr1 5 10 |
---|
3995 | |
---|
3996 | maskFastaFromBed \-fi test.fa \-bed test.bed \-fo test.fa.out \-soft |
---|
3997 | |
---|
3998 | cat test.fa.out |
---|
3999 | >chr1 |
---|
4000 | AAAAAaaaccCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG |
---|
4001 | .ft P |
---|
4002 | .fi |
---|
4003 | .UNINDENT |
---|
4004 | .UNINDENT |
---|
4005 | .SS 5.13 shuffleBed |
---|
4006 | .sp |
---|
4007 | \fBshuffleBed\fP will randomly permute the genomic locations of a fearure file among a genome defined in a |
---|
4008 | genome file. One can also provide an "exclusions" BED/GFF/VCF file that lists regions where you do |
---|
4009 | not want the permuted features to be placed. For example, one might want to prevent features from |
---|
4010 | being placed in known genome gaps. \fBshuffleBed\fP is useful as a \fInull\fP basis against which to test the |
---|
4011 | significance of associations of one feature with another. |
---|
4012 | .SS 5.13.1 Usage and option summary |
---|
4013 | .sp |
---|
4014 | Usage: |
---|
4015 | .INDENT 0.0 |
---|
4016 | .INDENT 3.5 |
---|
4017 | .sp |
---|
4018 | .nf |
---|
4019 | .ft C |
---|
4020 | shuffleBed [OPTIONS] \-i <BED/GFF/VCF> \-g <GENOME> |
---|
4021 | .ft P |
---|
4022 | .fi |
---|
4023 | .UNINDENT |
---|
4024 | .UNINDENT |
---|
4025 | .TS |
---|
4026 | center; |
---|
4027 | |l|l|. |
---|
4028 | _ |
---|
4029 | T{ |
---|
4030 | Option |
---|
4031 | T} T{ |
---|
4032 | Description |
---|
4033 | T} |
---|
4034 | _ |
---|
4035 | T{ |
---|
4036 | \fB\-excl\fP |
---|
4037 | T} T{ |
---|
4038 | A BED file of coordinates in which features from \-i should \fInot\fP be placed (e.g., genome gaps). |
---|
4039 | T} |
---|
4040 | _ |
---|
4041 | T{ |
---|
4042 | \fB\-chrom\fP |
---|
4043 | T} T{ |
---|
4044 | Keep features in \-i on the same chromosome. Solely permute their location on the chromosome. \fIBy default, both the chromosome and position are randomly chosen\fP\&. |
---|
4045 | T} |
---|
4046 | _ |
---|
4047 | T{ |
---|
4048 | \fB\-seed\fP |
---|
4049 | T} T{ |
---|
4050 | Supply an integer seed for the shuffling. This will allow feature shuffling experiments to be recreated exactly as the seed for the pseudo\-random number generation will be constant. \fIBy default, the seed is chosen automatically\fP\&. |
---|
4051 | T} |
---|
4052 | _ |
---|
4053 | .TE |
---|
4054 | .SS 5.13.2 Default behavior |
---|
4055 | .sp |
---|
4056 | By default, \fBshuffleBed\fP will reposition each feature in the input BED file on a random chromosome at a |
---|
4057 | random position. The size and strand of each feature are preserved. |
---|
4058 | .sp |
---|
4059 | For example: |
---|
4060 | .INDENT 0.0 |
---|
4061 | .INDENT 3.5 |
---|
4062 | .sp |
---|
4063 | .nf |
---|
4064 | .ft C |
---|
4065 | cat A.bed |
---|
4066 | chr1 0 100 a1 1 + |
---|
4067 | chr1 0 1000 a2 2 \- |
---|
4068 | |
---|
4069 | cat my.genome |
---|
4070 | chr1 10000 |
---|
4071 | chr2 8000 |
---|
4072 | chr3 5000 |
---|
4073 | chr4 2000 |
---|
4074 | |
---|
4075 | shuffleBed \-i A.bed \-g my.genome |
---|
4076 | chr4 1498 1598 a1 1 + |
---|
4077 | chr3 2156 3156 a2 2 \- |
---|
4078 | .ft P |
---|
4079 | .fi |
---|
4080 | .UNINDENT |
---|
4081 | .UNINDENT |
---|
4082 | .SS 5.13.3 (\-chrom)Requiring that features be shuffled on the same chromosome |
---|
4083 | .sp |
---|
4084 | The "\fB\-chrom\fP" option behaves the same as the default behavior except that features are randomly |
---|
4085 | placed on the same chromosome as defined in the BED file. |
---|
4086 | .sp |
---|
4087 | For example: |
---|
4088 | .INDENT 0.0 |
---|
4089 | .INDENT 3.5 |
---|
4090 | .sp |
---|
4091 | .nf |
---|
4092 | .ft C |
---|
4093 | cat A.bed |
---|
4094 | chr1 0 100 a1 1 + |
---|
4095 | chr1 0 1000 a2 2 \- |
---|
4096 | |
---|
4097 | cat my.genome |
---|
4098 | chr1 10000 |
---|
4099 | chr2 8000 |
---|
4100 | chr3 5000 |
---|
4101 | chr4 2000 |
---|
4102 | |
---|
4103 | shuffleBed \-i A.bed \-g my.genome \-chrom |
---|
4104 | chr1 9560 9660 a1 1 + |
---|
4105 | chr1 7258 8258 a2 2 \- |
---|
4106 | .ft P |
---|
4107 | .fi |
---|
4108 | .UNINDENT |
---|
4109 | .UNINDENT |
---|
4110 | .SS 5.13.4 Excluding certain genome regions from shuffleBed |
---|
4111 | .sp |
---|
4112 | One may want to prevent BED features from being placed in certain regions of the genome. For |
---|
4113 | example, one may want to exclude genome gaps from permutation experiment. The "\fB\-excl\fP" option |
---|
4114 | defines a BED file of regions that should be excluded. \fBshuffleBed\fP will attempt to permute the |
---|
4115 | locations of all features while adhering to the exclusion rules. However it will stop looking for an |
---|
4116 | appropriate location if it cannot find a valid spot for a feature after 1,000,000 tries. |
---|
4117 | .sp |
---|
4118 | For example (\fInote that the exclude file excludes all but 100 base pairs of the chromosome\fP): |
---|
4119 | .INDENT 0.0 |
---|
4120 | .INDENT 3.5 |
---|
4121 | .sp |
---|
4122 | .nf |
---|
4123 | .ft C |
---|
4124 | cat A.bed |
---|
4125 | chr1 0 100 a1 1 + |
---|
4126 | chr1 0 1000 a2 2 \- |
---|
4127 | |
---|
4128 | cat my.genome |
---|
4129 | chr1 10000 |
---|
4130 | |
---|
4131 | cat exclude.bed |
---|
4132 | chr1 100 10000 |
---|
4133 | |
---|
4134 | shuffleBed \-i A.bed \-g my.genome \-excl exclude.bed |
---|
4135 | chr1 0 100 a1 1 + |
---|
4136 | Error, line 2: tried 1000000 potential loci for entry, but could not avoid excluded |
---|
4137 | regions. Ignoring entry and moving on. |
---|
4138 | .ft P |
---|
4139 | .fi |
---|
4140 | .UNINDENT |
---|
4141 | .UNINDENT |
---|
4142 | .sp |
---|
4143 | For example (\fInow the exclusion file only excludes the first 100 bases of the chromosome\fP): |
---|
4144 | .INDENT 0.0 |
---|
4145 | .INDENT 3.5 |
---|
4146 | .sp |
---|
4147 | .nf |
---|
4148 | .ft C |
---|
4149 | cat A.bed |
---|
4150 | chr1 0 100 a1 1 + |
---|
4151 | chr1 0 1000 a2 2 \- |
---|
4152 | |
---|
4153 | cat my.genome |
---|
4154 | chr1 10000 |
---|
4155 | |
---|
4156 | cat exclude.bed |
---|
4157 | chr1 0 100 |
---|
4158 | |
---|
4159 | shuffleBed \-i A.bed \-g my.genome \-excl exclude.bed |
---|
4160 | chr1 147 247 a1 1 + |
---|
4161 | chr1 2441 3441 a2 2 \- |
---|
4162 | .ft P |
---|
4163 | .fi |
---|
4164 | .UNINDENT |
---|
4165 | .UNINDENT |
---|
4166 | .SS 5.13.5 Defining a "seed" for the random replacement. |
---|
4167 | .sp |
---|
4168 | \fBshuffleBed\fP uses a pseudo\-random number generator to permute the locations of BED features. |
---|
4169 | Therefore, each run should produce a different result. This can be problematic if one wants to exactly |
---|
4170 | recreate an experiment. By using the "\fB\-seed\fP" option, one can supply a custom integer seed for |
---|
4171 | \fBshuffleBed\fP\&. In turn, each execution of \fBshuffleBed\fP with the same seed and input files should produce |
---|
4172 | identical results. |
---|
4173 | .sp |
---|
4174 | For example (\fInote that the exclude file below excludes all but 100 base pairs of the chromosome\fP): |
---|
4175 | .INDENT 0.0 |
---|
4176 | .INDENT 3.5 |
---|
4177 | .sp |
---|
4178 | .nf |
---|
4179 | .ft C |
---|
4180 | cat A.bed |
---|
4181 | chr1 0 100 a1 1 + |
---|
4182 | chr1 0 1000 a2 2 \- |
---|
4183 | |
---|
4184 | cat my.genome |
---|
4185 | chr1 10000 |
---|
4186 | |
---|
4187 | shuffleBed \-i A.bed \-g my.genome \-seed 927442958 |
---|
4188 | chr1 6177 6277 a1 1 + |
---|
4189 | chr1 8119 9119 a2 2 \- |
---|
4190 | |
---|
4191 | shuffleBed \-i A.bed \-g my.genome \-seed 927442958 |
---|
4192 | chr1 6177 6277 a1 1 + |
---|
4193 | chr1 8119 9119 a2 2 \- |
---|
4194 | |
---|
4195 | \&. . . |
---|
4196 | |
---|
4197 | shuffleBed \-i A.bed \-g my.genome \-seed 927442958 |
---|
4198 | chr1 6177 6277 a1 1 + |
---|
4199 | chr1 8119 9119 a2 2 \- |
---|
4200 | .ft P |
---|
4201 | .fi |
---|
4202 | .UNINDENT |
---|
4203 | .UNINDENT |
---|
4204 | .SS 5.14 slopBed |
---|
4205 | .sp |
---|
4206 | \fBslopBed\fP will increase the size of each feature in a feature file be a user\-defined number of bases. While |
---|
4207 | something like this could be done with an "\fBawk \(aq{OFS="t" print $1,$2\-<slop>,$3+<slop>}\(aq\fP", |
---|
4208 | \fBslopBed\fP will restrict the resizing to the size of the chromosome (i.e. no start < 0 and no end > |
---|
4209 | chromosome size). |
---|
4210 | .SS 5.14.1 Usage and option summary |
---|
4211 | .sp |
---|
4212 | Usage: |
---|
4213 | .INDENT 0.0 |
---|
4214 | .INDENT 3.5 |
---|
4215 | .sp |
---|
4216 | .nf |
---|
4217 | .ft C |
---|
4218 | slopBed [OPTIONS] \-i <BED/GFF/VCF> \-g <GENOME> [\-b or (\-l and \-r)] |
---|
4219 | .ft P |
---|
4220 | .fi |
---|
4221 | .UNINDENT |
---|
4222 | .UNINDENT |
---|
4223 | .TS |
---|
4224 | center; |
---|
4225 | |l|l|. |
---|
4226 | _ |
---|
4227 | T{ |
---|
4228 | Option |
---|
4229 | T} T{ |
---|
4230 | Description |
---|
4231 | T} |
---|
4232 | _ |
---|
4233 | T{ |
---|
4234 | \fB\-b\fP |
---|
4235 | T} T{ |
---|
4236 | Increase the BED/GFF/VCF entry by the same number base pairs in each direction. \fIInteger\fP\&. |
---|
4237 | T} |
---|
4238 | _ |
---|
4239 | T{ |
---|
4240 | \fB\-l\fP |
---|
4241 | T} T{ |
---|
4242 | The number of base pairs to subtract from the start coordinate. \fIInteger\fP\&. |
---|
4243 | T} |
---|
4244 | _ |
---|
4245 | T{ |
---|
4246 | \fB\-r\fP |
---|
4247 | T} T{ |
---|
4248 | The number of base pairs to add to the end coordinate. \fIInteger\fP\&. |
---|
4249 | T} |
---|
4250 | _ |
---|
4251 | T{ |
---|
4252 | \fB\-s\fP |
---|
4253 | T} T{ |
---|
4254 | Define \-l and \-r based on strand. For example. if used, \-l 500 for a negative\-stranded feature, it will add 500 bp to the \fIend\fP coordinate. |
---|
4255 | T} |
---|
4256 | _ |
---|
4257 | .TE |
---|
4258 | .SS 5.14.2 Default behavior |
---|
4259 | .sp |
---|
4260 | By default, \fBslopBed\fP will either add a fixed number of bases in each direction (\fB\-b\fP) or an asymmetric |
---|
4261 | number of bases in each direction (\fB\-l\fP and \fB\-r\fP). |
---|
4262 | .sp |
---|
4263 | For example: |
---|
4264 | .INDENT 0.0 |
---|
4265 | .INDENT 3.5 |
---|
4266 | .sp |
---|
4267 | .nf |
---|
4268 | .ft C |
---|
4269 | cat A.bed |
---|
4270 | chr1 5 100 |
---|
4271 | chr1 800 980 |
---|
4272 | |
---|
4273 | cat my.genome |
---|
4274 | chr1 1000 |
---|
4275 | |
---|
4276 | slopBed \-i A.bed \-g my.genome \-b 5 |
---|
4277 | chr1 0 105 |
---|
4278 | chr1 795 985 |
---|
4279 | |
---|
4280 | slopBed \-i A.bed \-g my.genome \-l 2 \-r 3 |
---|
4281 | chr1 3 103 |
---|
4282 | chr1 798 983 |
---|
4283 | .ft P |
---|
4284 | .fi |
---|
4285 | .UNINDENT |
---|
4286 | .UNINDENT |
---|
4287 | .sp |
---|
4288 | However, if the requested number of bases exceeds the boundaries of the chromosome, \fBslopBed\fP will |
---|
4289 | "clip" the feature accordingly. |
---|
4290 | .INDENT 0.0 |
---|
4291 | .INDENT 3.5 |
---|
4292 | .sp |
---|
4293 | .nf |
---|
4294 | .ft C |
---|
4295 | cat A.bed |
---|
4296 | chr1 5 100 |
---|
4297 | chr1 800 980 |
---|
4298 | |
---|
4299 | cat my.genome |
---|
4300 | chr1 1000 |
---|
4301 | |
---|
4302 | slopBed \-i A.bed \-g my.genome \-b 5000 |
---|
4303 | chr1 0 1000 |
---|
4304 | chr1 0 1000 |
---|
4305 | .ft P |
---|
4306 | .fi |
---|
4307 | .UNINDENT |
---|
4308 | .UNINDENT |
---|
4309 | .SS 5.14.3 Resizing features according to strand |
---|
4310 | .sp |
---|
4311 | \fBslopBed\fP will optionally increase the size of a feature based on strand. |
---|
4312 | .sp |
---|
4313 | For example: |
---|
4314 | .INDENT 0.0 |
---|
4315 | .INDENT 3.5 |
---|
4316 | .sp |
---|
4317 | .nf |
---|
4318 | .ft C |
---|
4319 | cat A.bed |
---|
4320 | chr1 100 200 a1 1 + |
---|
4321 | chr1 100 200 a2 2 \- |
---|
4322 | |
---|
4323 | cat my.genome |
---|
4324 | chr1 1000 |
---|
4325 | |
---|
4326 | slopBed \-i A.bed \-g my.genome \-l 50 \-r 80 \-s |
---|
4327 | chr1 50 280 a1 1 + |
---|
4328 | chr1 20 250 a2 2 \- |
---|
4329 | .ft P |
---|
4330 | .fi |
---|
4331 | .UNINDENT |
---|
4332 | .UNINDENT |
---|
4333 | .SS 5.15 sortBed |
---|
4334 | .sp |
---|
4335 | \fBsortBed\fP sorts a feature file by chromosome and other criteria. |
---|
4336 | .SS 5.15.1 Usage and option summary |
---|
4337 | .sp |
---|
4338 | Usage: |
---|
4339 | .INDENT 0.0 |
---|
4340 | .INDENT 3.5 |
---|
4341 | .sp |
---|
4342 | .nf |
---|
4343 | .ft C |
---|
4344 | sortBed [OPTIONS] \-i <BED/GFF/VCF> |
---|
4345 | .ft P |
---|
4346 | .fi |
---|
4347 | .UNINDENT |
---|
4348 | .UNINDENT |
---|
4349 | .TS |
---|
4350 | center; |
---|
4351 | |l|l|. |
---|
4352 | _ |
---|
4353 | T{ |
---|
4354 | Option |
---|
4355 | T} T{ |
---|
4356 | Description |
---|
4357 | T} |
---|
4358 | _ |
---|
4359 | T{ |
---|
4360 | \fB\-sizeA\fP |
---|
4361 | T} T{ |
---|
4362 | Sort by feature size in ascending order. |
---|
4363 | T} |
---|
4364 | _ |
---|
4365 | T{ |
---|
4366 | \fB\-sizeD\fP |
---|
4367 | T} T{ |
---|
4368 | Sort by feature size in descending order. |
---|
4369 | T} |
---|
4370 | _ |
---|
4371 | T{ |
---|
4372 | \fB\-chrThenSizeA\fP |
---|
4373 | T} T{ |
---|
4374 | Sort by chromosome, then by feature size (asc). |
---|
4375 | T} |
---|
4376 | _ |
---|
4377 | T{ |
---|
4378 | \fB\-chrThenSizeD\fP |
---|
4379 | T} T{ |
---|
4380 | Sort by chromosome, then by feature size (desc). |
---|
4381 | T} |
---|
4382 | _ |
---|
4383 | T{ |
---|
4384 | \fB\-chrThenScoreA\fP |
---|
4385 | T} T{ |
---|
4386 | Sort by chromosome, then by score (asc). |
---|
4387 | T} |
---|
4388 | _ |
---|
4389 | T{ |
---|
4390 | \fB\-chrThenScoreD\fP |
---|
4391 | T} T{ |
---|
4392 | Sort by chromosome, then by score (desc). |
---|
4393 | T} |
---|
4394 | _ |
---|
4395 | .TE |
---|
4396 | .SS 5.15.2 Default behavior |
---|
4397 | .sp |
---|
4398 | By default, \fBsortBed\fP sorts a BED file by chromosome and then by start position in ascending order. |
---|
4399 | .sp |
---|
4400 | For example: |
---|
4401 | .INDENT 0.0 |
---|
4402 | .INDENT 3.5 |
---|
4403 | .sp |
---|
4404 | .nf |
---|
4405 | .ft C |
---|
4406 | cat A.bed |
---|
4407 | chr1 800 1000 |
---|
4408 | chr1 80 180 |
---|
4409 | chr1 1 10 |
---|
4410 | chr1 750 10000 |
---|
4411 | |
---|
4412 | sortBed \-i A.bed |
---|
4413 | chr1 1 10 |
---|
4414 | chr1 80 180 |
---|
4415 | chr1 750 10000 |
---|
4416 | chr1 800 1000 |
---|
4417 | .ft P |
---|
4418 | .fi |
---|
4419 | .UNINDENT |
---|
4420 | .UNINDENT |
---|
4421 | .SS 5.15.3 Optional sorting behavior |
---|
4422 | .sp |
---|
4423 | \fBsortBed\fP will also sorts a BED file by chromosome and then by other criteria. |
---|
4424 | .sp |
---|
4425 | For example, to sort by chromosome and then by feature size (in descending order): |
---|
4426 | .INDENT 0.0 |
---|
4427 | .INDENT 3.5 |
---|
4428 | .sp |
---|
4429 | .nf |
---|
4430 | .ft C |
---|
4431 | cat A.bed |
---|
4432 | chr1 800 1000 |
---|
4433 | chr1 80 180 |
---|
4434 | chr1 1 10 |
---|
4435 | chr1 750 10000 |
---|
4436 | |
---|
4437 | sortBed \-i A.bed \-sizeD |
---|
4438 | chr1 750 10000 |
---|
4439 | chr1 800 1000 |
---|
4440 | chr1 80 180 |
---|
4441 | chr1 1 10 |
---|
4442 | .ft P |
---|
4443 | .fi |
---|
4444 | .UNINDENT |
---|
4445 | .UNINDENT |
---|
4446 | .sp |
---|
4447 | \fBDisclaimer:\fP it should be noted that \fBsortBed\fP is merely a convenience utility, as the UNIX sort utility |
---|
4448 | will sort BED files more quickly while using less memory. For example, UNIX sort will sort a BED file |
---|
4449 | by chromosome then by start position in the following manner: |
---|
4450 | .INDENT 0.0 |
---|
4451 | .INDENT 3.5 |
---|
4452 | .sp |
---|
4453 | .nf |
---|
4454 | .ft C |
---|
4455 | sort \-k 1,1 \-k2,2 \-n a.bed |
---|
4456 | chr1 1 10 |
---|
4457 | chr1 80 180 |
---|
4458 | chr1 750 10000 |
---|
4459 | chr1 800 1000 |
---|
4460 | .ft P |
---|
4461 | .fi |
---|
4462 | .UNINDENT |
---|
4463 | .UNINDENT |
---|
4464 | .SS 5.16 linksBed |
---|
4465 | .sp |
---|
4466 | Creates an HTML file with links to an instance of the UCSC Genome Browser for all features / |
---|
4467 | intervals in a file. This is useful for cases when one wants to manually inspect through a large set of |
---|
4468 | annotations or features. |
---|
4469 | .SS 5.16.1 Usage and option summary |
---|
4470 | .sp |
---|
4471 | Usage: |
---|
4472 | .INDENT 0.0 |
---|
4473 | .INDENT 3.5 |
---|
4474 | .sp |
---|
4475 | .nf |
---|
4476 | .ft C |
---|
4477 | linksBed [OPTIONS] \-i <BED/GFF/VCF> > <HTML file> |
---|
4478 | .ft P |
---|
4479 | .fi |
---|
4480 | .UNINDENT |
---|
4481 | .UNINDENT |
---|
4482 | .TS |
---|
4483 | center; |
---|
4484 | |l|l|. |
---|
4485 | _ |
---|
4486 | T{ |
---|
4487 | Option |
---|
4488 | T} T{ |
---|
4489 | Description |
---|
4490 | T} |
---|
4491 | _ |
---|
4492 | T{ |
---|
4493 | \fB\-base\fP |
---|
4494 | T} T{ |
---|
4495 | The "basename" for the UCSC browser. \fIDefault: http://genome.ucsc.edu\fP |
---|
4496 | T} |
---|
4497 | _ |
---|
4498 | T{ |
---|
4499 | \fB\-org\fP |
---|
4500 | T} T{ |
---|
4501 | The organism (e.g. mouse, human). \fIDefault: human\fP |
---|
4502 | T} |
---|
4503 | _ |
---|
4504 | T{ |
---|
4505 | \fB\-db\fP |
---|
4506 | T} T{ |
---|
4507 | The genome build. \fIDefault: hg18\fP |
---|
4508 | T} |
---|
4509 | _ |
---|
4510 | .TE |
---|
4511 | .SS 5.16.2 Default behavior |
---|
4512 | .sp |
---|
4513 | By default, \fBlinksBed\fP creates links to the public UCSC Genome Browser. |
---|
4514 | .sp |
---|
4515 | For example: |
---|
4516 | .INDENT 0.0 |
---|
4517 | .INDENT 3.5 |
---|
4518 | .sp |
---|
4519 | .nf |
---|
4520 | .ft C |
---|
4521 | head genes.bed |
---|
4522 | chr21 9928613 10012791 uc002yip.1 0 \- |
---|
4523 | chr21 9928613 10012791 uc002yiq.1 0 \- |
---|
4524 | chr21 9928613 10012791 uc002yir.1 0 \- |
---|
4525 | chr21 9928613 10012791 uc010gkv.1 0 \- |
---|
4526 | chr21 9928613 10061300 uc002yis.1 0 \- |
---|
4527 | chr21 10042683 10120796 uc002yit.1 0 \- |
---|
4528 | chr21 10042683 10120808 uc002yiu.1 0 \- |
---|
4529 | chr21 10079666 10120808 uc002yiv.1 0 \- |
---|
4530 | chr21 10080031 10081687 uc002yiw.1 0 \- |
---|
4531 | chr21 10081660 10120796 uc002yix.2 0 \- |
---|
4532 | |
---|
4533 | linksBed \-i genes.bed > genes.html |
---|
4534 | .ft P |
---|
4535 | .fi |
---|
4536 | .UNINDENT |
---|
4537 | .UNINDENT |
---|
4538 | .sp |
---|
4539 | When genes.html is opened in a web browser, one should see something like the following, where each |
---|
4540 | link on the page is built from the features in genes.bed: |
---|
4541 | .SS 5.16.3 Creating HTML links to a local UCSC Browser installation |
---|
4542 | .sp |
---|
4543 | Optionally, \fBlinksBed\fP will create links to a local copy of the UCSC Genome Browser. |
---|
4544 | .sp |
---|
4545 | For example: |
---|
4546 | .INDENT 0.0 |
---|
4547 | .INDENT 3.5 |
---|
4548 | .sp |
---|
4549 | .nf |
---|
4550 | .ft C |
---|
4551 | head \-3 genes.bed |
---|
4552 | chr21 9928613 10012791 uc002yip.1 0 \- |
---|
4553 | chr21 9928613 10012791 uc002yiq.1 0 \- |
---|
4554 | |
---|
4555 | linksBed \-i genes.bed \-base http://mirror.uni.edu > genes.html |
---|
4556 | .ft P |
---|
4557 | .fi |
---|
4558 | .UNINDENT |
---|
4559 | .UNINDENT |
---|
4560 | .sp |
---|
4561 | One can point the links to the appropriate organism and genome build as well: |
---|
4562 | .INDENT 0.0 |
---|
4563 | .INDENT 3.5 |
---|
4564 | .sp |
---|
4565 | .nf |
---|
4566 | .ft C |
---|
4567 | head \-3 genes.bed |
---|
4568 | chr21 9928613 10012791 uc002yip.1 0 \- |
---|
4569 | chr21 9928613 10012791 uc002yiq.1 0 \- |
---|
4570 | |
---|
4571 | linksBed \-i genes.bed \-base http://mirror.uni.edu \-org mouse \-db mm9 > genes.html |
---|
4572 | .ft P |
---|
4573 | .fi |
---|
4574 | .UNINDENT |
---|
4575 | .UNINDENT |
---|
4576 | .SS 5.17 complementBed |
---|
4577 | .sp |
---|
4578 | \fBcomplementBed\fP returns the intervals in a genome that are not by the features in a feature file. An |
---|
4579 | example usage of this tool would be to return the intervals of the genome that are not annotated as a |
---|
4580 | repeat. |
---|
4581 | .SS 5.17.1 Usage and option summary |
---|
4582 | .sp |
---|
4583 | Usage: |
---|
4584 | .INDENT 0.0 |
---|
4585 | .INDENT 3.5 |
---|
4586 | .sp |
---|
4587 | .nf |
---|
4588 | .ft C |
---|
4589 | complementBed [OPTIONS] \-i <BED/GFF/VCF> \-g <GENOME> |
---|
4590 | .ft P |
---|
4591 | .fi |
---|
4592 | .UNINDENT |
---|
4593 | .UNINDENT |
---|
4594 | .sp |
---|
4595 | \fBNo additional options.\fP |
---|
4596 | .SS 5.17.2 Default behavior |
---|
4597 | .sp |
---|
4598 | Figure: |
---|
4599 | .INDENT 0.0 |
---|
4600 | .INDENT 3.5 |
---|
4601 | .sp |
---|
4602 | .nf |
---|
4603 | .ft C |
---|
4604 | Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
---|
4605 | |
---|
4606 | BED FILE A ************* *************** ****************** |
---|
4607 | |
---|
4608 | Result === === ===== ======= |
---|
4609 | .ft P |
---|
4610 | .fi |
---|
4611 | .UNINDENT |
---|
4612 | .UNINDENT |
---|
4613 | .sp |
---|
4614 | For example: |
---|
4615 | .INDENT 0.0 |
---|
4616 | .INDENT 3.5 |
---|
4617 | .sp |
---|
4618 | .nf |
---|
4619 | .ft C |
---|
4620 | cat A.bed |
---|
4621 | chr1 100 200 |
---|
4622 | chr1 400 500 |
---|
4623 | chr1 500 800 |
---|
4624 | |
---|
4625 | cat my.genome |
---|
4626 | chr1 1000 |
---|
4627 | |
---|
4628 | complementBed \-i A.bed \-g my.genome |
---|
4629 | chr1 0 100 |
---|
4630 | chr1 200 400 |
---|
4631 | chr1 800 1000 |
---|
4632 | .ft P |
---|
4633 | .fi |
---|
4634 | .UNINDENT |
---|
4635 | .UNINDENT |
---|
4636 | .SS 5.18 bedToBam |
---|
4637 | .sp |
---|
4638 | \fBbedToBam\fP converts features in a feature file to BAM format. This is useful as an efficient means of |
---|
4639 | storing large genome annotations in a compact, indexed format for visualization purposes. |
---|
4640 | .SS 5.18.1 Usage and option summary |
---|
4641 | .sp |
---|
4642 | Usage: |
---|
4643 | .INDENT 0.0 |
---|
4644 | .INDENT 3.5 |
---|
4645 | .sp |
---|
4646 | .nf |
---|
4647 | .ft C |
---|
4648 | bedToBam [OPTIONS] \-i <BED/GFF/VCF> \-g <GENOME> > <BAM> |
---|
4649 | .ft P |
---|
4650 | .fi |
---|
4651 | .UNINDENT |
---|
4652 | .UNINDENT |
---|
4653 | .TS |
---|
4654 | center; |
---|
4655 | |l|l|. |
---|
4656 | _ |
---|
4657 | T{ |
---|
4658 | Option |
---|
4659 | T} T{ |
---|
4660 | Description |
---|
4661 | T} |
---|
4662 | _ |
---|
4663 | T{ |
---|
4664 | \fB\-mapq\fP |
---|
4665 | T} T{ |
---|
4666 | Set a mapping quality (SAM MAPQ field) value for all BED entries. \fIDefault: 255\fP |
---|
4667 | T} |
---|
4668 | _ |
---|
4669 | T{ |
---|
4670 | \fB\-ubam\fP |
---|
4671 | T} T{ |
---|
4672 | Write uncompressed BAM output. The default is write compressed BAM output. |
---|
4673 | T} |
---|
4674 | _ |
---|
4675 | T{ |
---|
4676 | \fB\-bed12\fP |
---|
4677 | T} T{ |
---|
4678 | Indicate that the input BED file is in BED12 (a.k.a "blocked" BED) format. In this case, bedToBam will convert blocked BED features (e.g., gene annotaions) into "spliced" BAM alignments by creating an appropriate CIGAR string. |
---|
4679 | T} |
---|
4680 | _ |
---|
4681 | .TE |
---|
4682 | .SS 5.18.2 Default behavior |
---|
4683 | .sp |
---|
4684 | The default behavior is to assume that the input file is in unblocked format. For example: |
---|
4685 | .INDENT 0.0 |
---|
4686 | .INDENT 3.5 |
---|
4687 | .sp |
---|
4688 | .nf |
---|
4689 | .ft C |
---|
4690 | head \-5 rmsk.hg18.chr21.bed |
---|
4691 | chr21 9719768 9721892 ALR/Alpha 1004 + |
---|
4692 | chr21 9721905 9725582 ALR/Alpha 1010 + |
---|
4693 | chr21 9725582 9725977 L1PA3 3288 + |
---|
4694 | chr21 9726021 9729309 ALR/Alpha 1051 + |
---|
4695 | chr21 9729320 9729809 L1PA3 3897 \- |
---|
4696 | |
---|
4697 | bedToBam \-i rmsk.hg18.chr21.bed \-g human.hg18.genome > rmsk.hg18.chr21.bam |
---|
4698 | |
---|
4699 | samtools view rmsk.hg18.chr21.bam | head \-5 |
---|
4700 | ALR/Alpha 0 chr21 9719769 255 2124M * 0 0 * * |
---|
4701 | ALR/Alpha 0 chr21 9721906 255 3677M * 0 0 * * |
---|
4702 | L1PA3 0 chr21 9725583 255 395M * 0 0 * * |
---|
4703 | ALR/Alpha 0 chr21 9726022 255 3288M * 0 0 * * |
---|
4704 | L1PA3 16 chr21 9729321 255 489M * 0 0 * * |
---|
4705 | .ft P |
---|
4706 | .fi |
---|
4707 | .UNINDENT |
---|
4708 | .UNINDENT |
---|
4709 | .SS 5.18.3 Creating "spliced" BAM entries from "blocked" BED features |
---|
4710 | .sp |
---|
4711 | Optionally, \fBbedToBam\fP will create spliced BAM entries from "blocked" BED features by using the |
---|
4712 | \-bed12 option. This will create CIGAR strings in the BAM output that will be displayed as "spliced" |
---|
4713 | alignments. The image illustrates this behavior, as the top track is a BAM representation (using |
---|
4714 | bedToBam) of a BED file of UCSC genes. |
---|
4715 | .sp |
---|
4716 | For example: |
---|
4717 | .INDENT 0.0 |
---|
4718 | .INDENT 3.5 |
---|
4719 | .sp |
---|
4720 | .nf |
---|
4721 | .ft C |
---|
4722 | bedToBam \-i knownGene.hg18.chr21.bed \-g human.hg18.genome \-bed12 > knownGene.bam |
---|
4723 | |
---|
4724 | samtools view knownGene.bam | head \-2 |
---|
4725 | uc002yip.1 16 chr21 9928614 2 5 5 |
---|
4726 | |
---|
4727 | 298M1784N71M1411N93M3963N80M1927N106M3608N81M1769N62M11856N89M98N82M816N61M6910N65M |
---|
4728 | 738N64M146N100M1647N120M6478N162M1485N51M6777N60M9274N54M880N54M1229N54M2377N54M112 |
---|
4729 | 68N58M2666N109M2885N158M * 0 0 * * |
---|
4730 | uc002yiq.1 16 chr21 9928614 2 5 5 |
---|
4731 | |
---|
4732 | 298M1784N71M1411N93M3963N80M1927N106M3608N81M1769N62M11856N89M98N82M816N61M6910N65M |
---|
4733 | 738N64M146N100M1647N120M6478N162M1485N51M6777N60M10208N54M1229N54M2377N54M11268N58M |
---|
4734 | 2666N109M2885N158M * 0 0 * * |
---|
4735 | .ft P |
---|
4736 | .fi |
---|
4737 | .UNINDENT |
---|
4738 | .UNINDENT |
---|
4739 | .SS 5.19 overlap |
---|
4740 | .sp |
---|
4741 | \fBoverlap\fP computes the amount of overlap (in the case of positive values) or distance (in the case of |
---|
4742 | negative values) between feature coordinates occurring on the same input line and reports the result at |
---|
4743 | the end of the same line. In this way, it is a useful method for computing custom overlap scores from |
---|
4744 | the output of other BEDTools. |
---|
4745 | .SS 5.19.1 Usage and option summary |
---|
4746 | .sp |
---|
4747 | Usage: |
---|
4748 | .INDENT 0.0 |
---|
4749 | .INDENT 3.5 |
---|
4750 | .sp |
---|
4751 | .nf |
---|
4752 | .ft C |
---|
4753 | overlap [OPTIONS] \-i <input> \-cols s1,e1,s2,e2 |
---|
4754 | .ft P |
---|
4755 | .fi |
---|
4756 | .UNINDENT |
---|
4757 | .UNINDENT |
---|
4758 | .TS |
---|
4759 | center; |
---|
4760 | |l|l|. |
---|
4761 | _ |
---|
4762 | T{ |
---|
4763 | Option |
---|
4764 | T} T{ |
---|
4765 | Description |
---|
4766 | T} |
---|
4767 | _ |
---|
4768 | T{ |
---|
4769 | \fB\-i\fP |
---|
4770 | T} T{ |
---|
4771 | Input file. Use "stdin" for pipes. |
---|
4772 | T} |
---|
4773 | _ |
---|
4774 | T{ |
---|
4775 | \fB\-cols\fP |
---|
4776 | T} T{ |
---|
4777 | Specify the columns (1\-based) for the starts and ends of the features for which you\(aqd like to compute the overlap/distance. The columns must be listed in the following order: \fIstart1,end1,start2,end2\fP |
---|
4778 | T} |
---|
4779 | _ |
---|
4780 | .TE |
---|
4781 | .SS 5.19.2 Default behavior |
---|
4782 | .sp |
---|
4783 | The default behavior is to compute the amount of overlap between the features you specify based on the |
---|
4784 | start and end coordinates. For example: |
---|
4785 | .INDENT 0.0 |
---|
4786 | .INDENT 3.5 |
---|
4787 | .sp |
---|
4788 | .nf |
---|
4789 | .ft C |
---|
4790 | windowBed \-a A.bed \-b B.bed \-w 10 |
---|
4791 | chr1 10 20 A chr1 15 25 B |
---|
4792 | chr1 10 20 C chr1 25 35 D |
---|
4793 | .ft P |
---|
4794 | .fi |
---|
4795 | .UNINDENT |
---|
4796 | .UNINDENT |
---|
4797 | .sp |
---|
4798 | # Now let\(aqs say we want to compute the number of base pairs of overlap |
---|
4799 | # between the overlapping features from the output of windowBed. |
---|
4800 | .INDENT 0.0 |
---|
4801 | .INDENT 3.5 |
---|
4802 | .sp |
---|
4803 | .nf |
---|
4804 | .ft C |
---|
4805 | windowBed \-a A.bed \-b B.bed \-w 10 | overlap \-i stdin \-cols 2,3,6,7 |
---|
4806 | chr1 10 20 A chr1 15 25 B 5 |
---|
4807 | chr1 10 20 C chr1 25 35 D \-5 |
---|
4808 | .ft P |
---|
4809 | .fi |
---|
4810 | .UNINDENT |
---|
4811 | .UNINDENT |
---|
4812 | .SS 5.20 bedToIgv |
---|
4813 | .sp |
---|
4814 | \fBbedToIgv\fP creates an IGV (\fI\%http://www.broadinstitute.org/igv/\fP) batch script (see: \fI\%http://\fP |
---|
4815 | www.broadinstitute.org/igv/batch for details) such that a ??snapshot?? will be taken at each features in a |
---|
4816 | feature file. This is useful as an efficient means for quickly collecting images of primary data at several |
---|
4817 | loci for subsequent screening, etc. |
---|
4818 | .sp |
---|
4819 | \fBNOTE: One must use IGV version 1.5 or higher.\fP |
---|
4820 | .SS 5.20.1 Usage and option summary |
---|
4821 | .sp |
---|
4822 | Usage: |
---|
4823 | .INDENT 0.0 |
---|
4824 | .INDENT 3.5 |
---|
4825 | .sp |
---|
4826 | .nf |
---|
4827 | .ft C |
---|
4828 | bedToIgv [OPTIONS] \-i <BED/GFF/VCF> > <igv.batch> |
---|
4829 | .ft P |
---|
4830 | .fi |
---|
4831 | .UNINDENT |
---|
4832 | .UNINDENT |
---|
4833 | .TS |
---|
4834 | center; |
---|
4835 | |l|l|. |
---|
4836 | _ |
---|
4837 | T{ |
---|
4838 | Option |
---|
4839 | T} T{ |
---|
4840 | Description |
---|
4841 | T} |
---|
4842 | _ |
---|
4843 | T{ |
---|
4844 | \fB\-path\fP |
---|
4845 | T} T{ |
---|
4846 | The full path to which the IGV snapshots should be written. \fIDefault: ./\fP |
---|
4847 | T} |
---|
4848 | _ |
---|
4849 | T{ |
---|
4850 | \fB\-sess\fP |
---|
4851 | T} T{ |
---|
4852 | The full path to an existing IGV session file to be loaded prior to taking snapshots. \fIDefault is for no session to be loaded and the assumption is that you already have IGV open and loaded with your relevant data prior to running the batch script\fP\&. |
---|
4853 | T} |
---|
4854 | _ |
---|
4855 | T{ |
---|
4856 | \fB\-sort\fP |
---|
4857 | T} T{ |
---|
4858 | The type of BAM sorting you would like to apply to each image. \fBValid sorting options\fP: \fIbase, position, strand, quality, sample, and readGroup Default is to apply no sorting at all\fP\&. |
---|
4859 | T} |
---|
4860 | _ |
---|
4861 | T{ |
---|
4862 | \fB\-clps\fP |
---|
4863 | T} T{ |
---|
4864 | Collapse the aligned reads prior to taking a snapshot. \fIDefault is to not collapse\fP\&. |
---|
4865 | T} |
---|
4866 | _ |
---|
4867 | T{ |
---|
4868 | \fB\-name\fP |
---|
4869 | T} T{ |
---|
4870 | Use the "name" field (column 4) for each image\(aqs filename. \fIDefault is to use the "chr:start\-pos.ext"\fP\&. |
---|
4871 | T} |
---|
4872 | _ |
---|
4873 | T{ |
---|
4874 | \fB\-slop\fP |
---|
4875 | T} T{ |
---|
4876 | Number of flanking base pairs on the left & right of the image. |
---|
4877 | T} |
---|
4878 | _ |
---|
4879 | T{ |
---|
4880 | \fB\-img\fP |
---|
4881 | T} T{ |
---|
4882 | The type of image to be created. \fBValid options\fP: \fIpng, eps, svg Default is png\fP\&. |
---|
4883 | T} |
---|
4884 | _ |
---|
4885 | .TE |
---|
4886 | .SS 5.20.2 Default behavior |
---|
4887 | .sp |
---|
4888 | Figure: |
---|
4889 | .INDENT 0.0 |
---|
4890 | .INDENT 3.5 |
---|
4891 | .sp |
---|
4892 | .nf |
---|
4893 | .ft C |
---|
4894 | bedToIgv \-i data/rmsk.hg18.chr21.bed | head \-9 |
---|
4895 | snapshotDirectory ./ |
---|
4896 | goto chr21:9719768\-9721892 |
---|
4897 | snapshot chr21:9719768\-9721892.png |
---|
4898 | goto chr21:9721905\-9725582 |
---|
4899 | snapshot chr21:9721905\-9725582.png |
---|
4900 | goto chr21:9725582\-9725977 |
---|
4901 | snapshot chr21:9725582\-9725977.png |
---|
4902 | goto chr21:9726021\-9729309 |
---|
4903 | snapshot chr21:9726021\-9729309.png |
---|
4904 | .ft P |
---|
4905 | .fi |
---|
4906 | .UNINDENT |
---|
4907 | .UNINDENT |
---|
4908 | .SS 5.20.3 Using a bedToIgv batch script within IGV. |
---|
4909 | .sp |
---|
4910 | Once an IGV batch script has been created with \fBbedToIgv\fP, it is simply a matter of running it from |
---|
4911 | within IGV. |
---|
4912 | .sp |
---|
4913 | For example, first create the batch script: |
---|
4914 | .INDENT 0.0 |
---|
4915 | .INDENT 3.5 |
---|
4916 | .sp |
---|
4917 | .nf |
---|
4918 | .ft C |
---|
4919 | bedToIgv \-i data/rmsk.hg18.chr21.bed > rmsk.igv.batch |
---|
4920 | .ft P |
---|
4921 | .fi |
---|
4922 | .UNINDENT |
---|
4923 | .UNINDENT |
---|
4924 | .sp |
---|
4925 | Then, open and launch the batch script from within IGV. This will immediately cause IGV to begin |
---|
4926 | taking snapshots of your requested regions. |
---|
4927 | .SS 5.21 bed12ToBed6 |
---|
4928 | .sp |
---|
4929 | \fBbed12ToBed6\fP is a convenience tool that converts BED features in BED12 (a.k.a. "blocked" BED |
---|
4930 | features such as genes) to discrete BED6 features. For example, in the case of a gene with six exons, |
---|
4931 | bed12ToBed6 would create six separate BED6 features (i.e., one for each exon). |
---|
4932 | .SS 5.21.1 Usage and option summary |
---|
4933 | .sp |
---|
4934 | Usage: |
---|
4935 | .INDENT 0.0 |
---|
4936 | .INDENT 3.5 |
---|
4937 | .sp |
---|
4938 | .nf |
---|
4939 | .ft C |
---|
4940 | bed12ToBed6 [OPTIONS] \-i <BED12> |
---|
4941 | .ft P |
---|
4942 | .fi |
---|
4943 | .UNINDENT |
---|
4944 | .UNINDENT |
---|
4945 | .TS |
---|
4946 | center; |
---|
4947 | |l|l|. |
---|
4948 | _ |
---|
4949 | T{ |
---|
4950 | Option |
---|
4951 | T} T{ |
---|
4952 | Description |
---|
4953 | T} |
---|
4954 | _ |
---|
4955 | T{ |
---|
4956 | \fB\-i\fP |
---|
4957 | T} T{ |
---|
4958 | The BED12 file that should be split into discrete BED6 features. \fIUse "stdin" when using piped input\fP\&. |
---|
4959 | T} |
---|
4960 | _ |
---|
4961 | .TE |
---|
4962 | .SS 5.21.2 Default behavior |
---|
4963 | .sp |
---|
4964 | Figure: |
---|
4965 | .INDENT 0.0 |
---|
4966 | .INDENT 3.5 |
---|
4967 | .sp |
---|
4968 | .nf |
---|
4969 | .ft C |
---|
4970 | head data/knownGene.hg18.chr21.bed | tail \-n 3 |
---|
4971 | chr21 10079666 10120808 uc002yiv.1 0 \- 10081686 1 0 1 2 0 6 0 8 |
---|
4972 | 0 4 528,91,101,215, 0,1930,39750,40927, |
---|
4973 | chr21 10080031 10081687 uc002yiw.1 0 \- 10080031 1 0 0 8 0 0 3 1 |
---|
4974 | 0 2 200,91, 0,1565, |
---|
4975 | chr21 10081660 10120796 uc002yix.2 0 \- 10081660 1 0 0 8 1 6 6 0 |
---|
4976 | 0 3 27,101,223,0,37756,38913, |
---|
4977 | |
---|
4978 | head data/knownGene.hg18.chr21.bed | tail \-n 3 | bed12ToBed6 \-i stdin |
---|
4979 | chr21 10079666 10080194 uc002yiv.1 0 \- |
---|
4980 | chr21 10081596 10081687 uc002yiv.1 0 \- |
---|
4981 | chr21 10119416 10119517 uc002yiv.1 0 \- |
---|
4982 | chr21 10120593 10120808 uc002yiv.1 0 \- |
---|
4983 | chr21 10080031 10080231 uc002yiw.1 0 \- |
---|
4984 | chr21 10081596 10081687 uc002yiw.1 0 \- |
---|
4985 | chr21 10081660 10081687 uc002yix.2 0 \- |
---|
4986 | chr21 10119416 10119517 uc002yix.2 0 \- |
---|
4987 | chr21 10120573 10120796 uc002yix.2 0 \- |
---|
4988 | .ft P |
---|
4989 | .fi |
---|
4990 | .UNINDENT |
---|
4991 | .UNINDENT |
---|
4992 | .SS 5.22 groupBy |
---|
4993 | .sp |
---|
4994 | \fBgroupBy\fP is a useful tool that mimics the "groupBy" clause in database systems. Given a file or stream |
---|
4995 | that is sorted by the appropriate "grouping columns", groupBy will compute summary statistics on |
---|
4996 | another column in the file or stream. This will work with output from all BEDTools as well as any other |
---|
4997 | tab\-delimited file or stream. |
---|
4998 | .sp |
---|
4999 | \fBNOTE: When using groupBy, the input data must be ordered by the same |
---|
5000 | columns as specified with the \-grp argument. For example, if \-grp is 1,2,3, the the |
---|
5001 | data should be pre\-grouped accordingly. When groupBy detects changes in the |
---|
5002 | group columns it then summarizes all lines with that group\fP\&. |
---|
5003 | .SS 5.22.1 Usage and option summary |
---|
5004 | .sp |
---|
5005 | Usage: |
---|
5006 | .INDENT 0.0 |
---|
5007 | .INDENT 3.5 |
---|
5008 | .sp |
---|
5009 | .nf |
---|
5010 | .ft C |
---|
5011 | groupBy [OPTIONS] \-i <input> \-opCol <input column> |
---|
5012 | .ft P |
---|
5013 | .fi |
---|
5014 | .UNINDENT |
---|
5015 | .UNINDENT |
---|
5016 | .TS |
---|
5017 | center; |
---|
5018 | |l|l|. |
---|
5019 | _ |
---|
5020 | T{ |
---|
5021 | Option |
---|
5022 | T} T{ |
---|
5023 | Description |
---|
5024 | T} |
---|
5025 | _ |
---|
5026 | T{ |
---|
5027 | \fB\-i\fP |
---|
5028 | T} T{ |
---|
5029 | .INDENT 0.0 |
---|
5030 | .INDENT 3.5 |
---|
5031 | The input file that should be grouped and summarized. \fIUse "stdin" when using piped input\fP\&. |
---|
5032 | .UNINDENT |
---|
5033 | .UNINDENT |
---|
5034 | .sp |
---|
5035 | \fBNote: if \-i is omitted, input is assumed to come from standard input (stdin)\fP |
---|
5036 | T} |
---|
5037 | _ |
---|
5038 | T{ |
---|
5039 | \fB\-g OR \-grp\fP |
---|
5040 | T} T{ |
---|
5041 | Specifies which column(s) (1\-based) should be used to group the input. The columns must be comma\-separated and each column must be explicitly listed. No ranges (e.g. 1\-4) yet allowed. \fIDefault: 1,2,3\fP |
---|
5042 | T} |
---|
5043 | _ |
---|
5044 | T{ |
---|
5045 | \fB\-c OR \-opCol\fP |
---|
5046 | T} T{ |
---|
5047 | Specify the column (1\-based) that should be summarized. \fIRequired\fP\&. |
---|
5048 | T} |
---|
5049 | _ |
---|
5050 | T{ |
---|
5051 | \fB\-o OR \-op\fP |
---|
5052 | T} T{ |
---|
5053 | Specify the operation that should be applied to \fBopCol\fP\&. |
---|
5054 | .nf |
---|
5055 | Valid operations: |
---|
5056 | .fi |
---|
5057 | .sp |
---|
5058 | .INDENT 0.0 |
---|
5059 | .INDENT 3.5 |
---|
5060 | .nf |
---|
5061 | \fBsum\fP \- \fInumeric only\fP |
---|
5062 | \fBcount\fP \- \fInumeric or text\fP |
---|
5063 | \fBmin\fP \- \fInumeric only\fP |
---|
5064 | \fBmax\fP \- \fInumeric only\fP |
---|
5065 | \fBmean\fP \- \fInumeric only\fP |
---|
5066 | \fBstdev\fP \- \fInumeric only\fP |
---|
5067 | \fBmedian\fP \- \fInumeric only\fP |
---|
5068 | \fBmode\fP \- \fInumeric or text\fP |
---|
5069 | \fBantimode\fP \- \fInumeric or text\fP |
---|
5070 | \fBcollapse\fP (i.e., print a comma separated list) \- \fInumeric or text\fP |
---|
5071 | \fBfreqasc\fP \- \fIprint a comma separated list of values observed and the number of times they were observed. Reported in ascending order of frequency\fP |
---|
5072 | .fi |
---|
5073 | .sp |
---|
5074 | .UNINDENT |
---|
5075 | .UNINDENT |
---|
5076 | .nf |
---|
5077 | \fBfreqdesc\fP \- \fIprint a comma separated list of values observed and the number of times they were observed. Reported in descending order of frequency\fP |
---|
5078 | .fi |
---|
5079 | .sp |
---|
5080 | .INDENT 0.0 |
---|
5081 | .INDENT 3.5 |
---|
5082 | .nf |
---|
5083 | \fIDefault: sum\fP |
---|
5084 | .fi |
---|
5085 | .sp |
---|
5086 | .UNINDENT |
---|
5087 | .UNINDENT |
---|
5088 | T} |
---|
5089 | _ |
---|
5090 | .TE |
---|
5091 | .SS 5.22.2 Default behavior. |
---|
5092 | .sp |
---|
5093 | Let\(aqs imagine we have three incredibly interesting genetic variants that we are studying and we are |
---|
5094 | interested in what annotated repeats these variants overlap. |
---|
5095 | .INDENT 0.0 |
---|
5096 | .INDENT 3.5 |
---|
5097 | .sp |
---|
5098 | .nf |
---|
5099 | .ft C |
---|
5100 | cat variants.bed |
---|
5101 | chr21 9719758 9729320 variant1 |
---|
5102 | chr21 9729310 9757478 variant2 |
---|
5103 | chr21 9795588 9796685 variant3 |
---|
5104 | |
---|
5105 | intersectBed \-a variants.bed \-b repeats.bed \-wa \-wb > variantsToRepeats.bed |
---|
5106 | cat variantsToRepeats.bed |
---|
5107 | chr21 9719758 9729320 variant1 chr21 9719768 9721892 ALR/Alpha 1004 + |
---|
5108 | chr21 9719758 9729320 variant1 chr21 9721905 9725582 ALR/Alpha 1010 + |
---|
5109 | chr21 9719758 9729320 variant1 chr21 9725582 9725977 L1PA3 3288 + |
---|
5110 | chr21 9719758 9729320 variant1 chr21 9726021 9729309 ALR/Alpha 1051 + |
---|
5111 | chr21 9729310 9757478 variant2 chr21 9729320 9729809 L1PA3 3897 \- |
---|
5112 | chr21 9729310 9757478 variant2 chr21 9729809 9730866 L1P1 8367 + |
---|
5113 | chr21 9729310 9757478 variant2 chr21 9730866 9734026 ALR/Alpha 1036 \- |
---|
5114 | chr21 9729310 9757478 variant2 chr21 9734037 9757471 ALR/Alpha 1182 \- |
---|
5115 | chr21 9795588 9796685 variant3 chr21 9795589 9795713 (GAATG)n 308 + |
---|
5116 | chr21 9795588 9796685 variant3 chr21 9795736 9795894 (GAATG)n 683 + |
---|
5117 | chr21 9795588 9796685 variant3 chr21 9795911 9796007 (GAATG)n 345 + |
---|
5118 | chr21 9795588 9796685 variant3 chr21 9796028 9796187 (GAATG)n 756 + |
---|
5119 | chr21 9795588 9796685 variant3 chr21 9796202 9796615 (GAATG)n 891 + |
---|
5120 | chr21 9795588 9796685 variant3 chr21 9796637 9796824 (GAATG)n 621 + |
---|
5121 | .ft P |
---|
5122 | .fi |
---|
5123 | .UNINDENT |
---|
5124 | .UNINDENT |
---|
5125 | .sp |
---|
5126 | We can see that variant1 overlaps with 3 repeats, variant2 with 4 and variant3 with 6. We can use |
---|
5127 | groupBy to summarize the hits for each variant in several useful ways. The default behavior is to |
---|
5128 | compute the \fIsum\fP of the opCol. |
---|
5129 | .INDENT 0.0 |
---|
5130 | .INDENT 3.5 |
---|
5131 | .sp |
---|
5132 | .nf |
---|
5133 | .ft C |
---|
5134 | groupBy \-i variantsToRepeats.bed \-grp 1,2,3 \-opCol 9 |
---|
5135 | chr21 9719758 9729320 6353 |
---|
5136 | chr21 9729310 9757478 14482 |
---|
5137 | chr21 9795588 9796685 3604 |
---|
5138 | .ft P |
---|
5139 | .fi |
---|
5140 | .UNINDENT |
---|
5141 | .UNINDENT |
---|
5142 | .SS 5.22.3 Computing the min and max. |
---|
5143 | .sp |
---|
5144 | Now let\(aqs find the \fImin\fP and \fImax\fP repeat score for each variant. We do this by "grouping" on the variant |
---|
5145 | coordinate columns (i.e. cols. 1,2 and 3) and ask for the min and max of the repeat score column (i.e. |
---|
5146 | col. 9). |
---|
5147 | .INDENT 0.0 |
---|
5148 | .INDENT 3.5 |
---|
5149 | .sp |
---|
5150 | .nf |
---|
5151 | .ft C |
---|
5152 | groupBy \-i variantsToRepeats.bed \-g 1,2,3 \-c 9 \-o min |
---|
5153 | chr21 9719758 9729320 1004 |
---|
5154 | chr21 9729310 9757478 1036 |
---|
5155 | chr21 9795588 9796685 308 |
---|
5156 | .ft P |
---|
5157 | .fi |
---|
5158 | .UNINDENT |
---|
5159 | .UNINDENT |
---|
5160 | .sp |
---|
5161 | We can also group on just the \fIname\fP column with similar effect. |
---|
5162 | .INDENT 0.0 |
---|
5163 | .INDENT 3.5 |
---|
5164 | .sp |
---|
5165 | .nf |
---|
5166 | .ft C |
---|
5167 | groupBy \-i variantsToRepeats.bed \-grp 4 \-opCol 9 \-op min |
---|
5168 | variant1 1004 |
---|
5169 | variant2 1036 |
---|
5170 | variant3 308 |
---|
5171 | .ft P |
---|
5172 | .fi |
---|
5173 | .UNINDENT |
---|
5174 | .UNINDENT |
---|
5175 | .sp |
---|
5176 | What about the \fImax\fP score? Let\(aqs keep the coordinates and the name of the variants so that we |
---|
5177 | stay in BED format. |
---|
5178 | .INDENT 0.0 |
---|
5179 | .INDENT 3.5 |
---|
5180 | .sp |
---|
5181 | .nf |
---|
5182 | .ft C |
---|
5183 | groupBy \-i variantsToRepeats.bed \-grp 1,2,3,4 \-opCol 9 \-op max |
---|
5184 | chr21 9719758 9729320 variant1 3288 |
---|
5185 | chr21 9729310 9757478 variant2 8367 |
---|
5186 | chr21 9795588 9796685 variant3 891 |
---|
5187 | .ft P |
---|
5188 | .fi |
---|
5189 | .UNINDENT |
---|
5190 | .UNINDENT |
---|
5191 | .SS 5.22.4 Computing the mean and median. |
---|
5192 | .sp |
---|
5193 | Now let\(aqs find the \fImean\fP and \fImedian\fP repeat score for each variant. |
---|
5194 | .INDENT 0.0 |
---|
5195 | .INDENT 3.5 |
---|
5196 | .sp |
---|
5197 | .nf |
---|
5198 | .ft C |
---|
5199 | cat variantsToRepeats.bed | groupBy \-g 1,2,3,4 \-c 9 \-o mean |
---|
5200 | chr21 9719758 9729320 variant1 1588.25 |
---|
5201 | chr21 9729310 9757478 variant2 3620.5 |
---|
5202 | chr21 9795588 9796685 variant3 600.6667 |
---|
5203 | |
---|
5204 | groupBy \-i variantsToRepeats.bed \-grp 1,2,3,4 \-opCol 9 \-op median |
---|
5205 | chr21 9719758 9729320 variant1 1030.5 |
---|
5206 | chr21 9729310 9757478 variant2 2539.5 |
---|
5207 | chr21 9795588 9796685 variant3 652 |
---|
5208 | .ft P |
---|
5209 | .fi |
---|
5210 | .UNINDENT |
---|
5211 | .UNINDENT |
---|
5212 | .SS 5.22.5 Computing the mode and "antimode". |
---|
5213 | .sp |
---|
5214 | Now let\(aqs find the \fImode\fP and \fIantimode\fP (i.e., the least frequent) repeat score for each variant (in this case |
---|
5215 | they are identical). |
---|
5216 | .INDENT 0.0 |
---|
5217 | .INDENT 3.5 |
---|
5218 | .sp |
---|
5219 | .nf |
---|
5220 | .ft C |
---|
5221 | groupBy \-i variantsToRepeats.bed \-grp 1,2,3,4 \-opCol 9 \-op mode |
---|
5222 | chr21 9719758 9729320 variant1 1004 |
---|
5223 | chr21 9729310 9757478 variant2 1036 |
---|
5224 | chr21 9795588 9796685 variant3 308 |
---|
5225 | |
---|
5226 | groupBy \-i variantsToRepeats.bed \-grp 1,2,3,4 \-opCol 9 \-op antimode |
---|
5227 | chr21 9719758 9729320 variant1 1004 |
---|
5228 | chr21 9729310 9757478 variant2 1036 |
---|
5229 | chr21 9795588 9796685 variant3 308 |
---|
5230 | .ft P |
---|
5231 | .fi |
---|
5232 | .UNINDENT |
---|
5233 | .UNINDENT |
---|
5234 | .SS 5.22.6 Computing the count of lines for a given group. |
---|
5235 | .sp |
---|
5236 | Figure: |
---|
5237 | .INDENT 0.0 |
---|
5238 | .INDENT 3.5 |
---|
5239 | .sp |
---|
5240 | .nf |
---|
5241 | .ft C |
---|
5242 | groupBy \-i variantsToRepeats.bed \-g 1,2,3,4 \-c 9 \-c count |
---|
5243 | chr21 9719758 9729320 variant1 4 |
---|
5244 | chr21 9729310 9757478 variant2 4 |
---|
5245 | chr21 9795588 9796685 variant3 6 |
---|
5246 | .ft P |
---|
5247 | .fi |
---|
5248 | .UNINDENT |
---|
5249 | .UNINDENT |
---|
5250 | .SS 5.22.7 Collapsing: listing all of the values in the opCol for a given group. |
---|
5251 | .sp |
---|
5252 | Now for something different. What if we wanted all of the names of the repeats listed on the same line |
---|
5253 | as the variants? Use the collapse option. This "denormalizes" things. Now you have a list of all the |
---|
5254 | repeats on a single line. |
---|
5255 | .INDENT 0.0 |
---|
5256 | .INDENT 3.5 |
---|
5257 | .sp |
---|
5258 | .nf |
---|
5259 | .ft C |
---|
5260 | groupBy \-i variantsToRepeats.bed \-grp 1,2,3,4 \-opCol 9 \-op collapse |
---|
5261 | chr21 9719758 9729320 variant1 ALR/Alpha,ALR/Alpha,L1PA3,ALR/Alpha, |
---|
5262 | chr21 9729310 9757478 variant2 L1PA3,L1P1,ALR/Alpha,ALR/Alpha, |
---|
5263 | chr21 9795588 9796685 variant3 (GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n, |
---|
5264 | .ft P |
---|
5265 | .fi |
---|
5266 | .UNINDENT |
---|
5267 | .UNINDENT |
---|
5268 | .SS 5.22.8 Computing frequencies: freqasc and freqdesc. |
---|
5269 | .sp |
---|
5270 | Now for something different. What if we wanted all of the names of the repeats listed on the same line |
---|
5271 | as the variants? Use the collapse option. This "denormalizes" things. Now you have a list of all the |
---|
5272 | repeats on a single line. |
---|
5273 | .INDENT 0.0 |
---|
5274 | .INDENT 3.5 |
---|
5275 | .sp |
---|
5276 | .nf |
---|
5277 | .ft C |
---|
5278 | cat variantsToRepeats.bed | groupBy \-g 1 \-c 8 \-o freqdesc |
---|
5279 | chr21 (GAATG)n:6,ALR/Alpha:5,L1PA3:2,L1P1:1, |
---|
5280 | |
---|
5281 | cat variantsToRepeats.bed | groupBy \-g 1 \-c 8 \-o freqasc |
---|
5282 | chr21 L1P1:1,L1PA3:2,ALR/Alpha:5,(GAATG)n:6, |
---|
5283 | .ft P |
---|
5284 | .fi |
---|
5285 | .UNINDENT |
---|
5286 | .UNINDENT |
---|
5287 | .SS 5.23 unionBedGraphs |
---|
5288 | .sp |
---|
5289 | \fBunionBedGraphs\fP combines multiple BEDGRAPH files into a single file such that one can directly |
---|
5290 | compare coverage (and other text\-values such as genotypes) across multiple sample |
---|
5291 | .SS 5.23.1 Usage and option summary |
---|
5292 | .sp |
---|
5293 | Usage: |
---|
5294 | .INDENT 0.0 |
---|
5295 | .INDENT 3.5 |
---|
5296 | .sp |
---|
5297 | .nf |
---|
5298 | .ft C |
---|
5299 | unionBedGraphs [OPTIONS] \-i FILE1 FILE2 FILE3 ... FILEn |
---|
5300 | .ft P |
---|
5301 | .fi |
---|
5302 | .UNINDENT |
---|
5303 | .UNINDENT |
---|
5304 | .TS |
---|
5305 | center; |
---|
5306 | |l|l|. |
---|
5307 | _ |
---|
5308 | T{ |
---|
5309 | Option |
---|
5310 | T} T{ |
---|
5311 | Description |
---|
5312 | T} |
---|
5313 | _ |
---|
5314 | T{ |
---|
5315 | \fB\-header\fP |
---|
5316 | T} T{ |
---|
5317 | Print a header line, consisting of chrom, start, end followed by the names of each input BEDGRAPH file. |
---|
5318 | T} |
---|
5319 | _ |
---|
5320 | T{ |
---|
5321 | \fB\-names\fP |
---|
5322 | T} T{ |
---|
5323 | A list of names (one per file) to describe each file in \-i. These names will be printed in the header line. |
---|
5324 | T} |
---|
5325 | _ |
---|
5326 | T{ |
---|
5327 | \fB\-empty\fP |
---|
5328 | T} T{ |
---|
5329 | Report empty regions (i.e., start/end intervals w/o values in all files). \fIRequires the \(aq\-g FILE\(aq parameter (see below)\fP\&. |
---|
5330 | T} |
---|
5331 | _ |
---|
5332 | T{ |
---|
5333 | \fB\-g\fP |
---|
5334 | T} T{ |
---|
5335 | The genome file to be used to calculate empty regions. |
---|
5336 | T} |
---|
5337 | _ |
---|
5338 | T{ |
---|
5339 | \fB\-filler TEXT\fP |
---|
5340 | T} T{ |
---|
5341 | Use TEXT when representing intervals having no value. Default is \(aq0\(aq, but you can use \(aqN/A\(aq or any other text. |
---|
5342 | T} |
---|
5343 | _ |
---|
5344 | T{ |
---|
5345 | \fB\-examples\fP |
---|
5346 | T} T{ |
---|
5347 | Show detailed usage examples. |
---|
5348 | T} |
---|
5349 | _ |
---|
5350 | .TE |
---|
5351 | .SS 5.23.2 Default behavior |
---|
5352 | .sp |
---|
5353 | Figure: |
---|
5354 | .INDENT 0.0 |
---|
5355 | .INDENT 3.5 |
---|
5356 | .sp |
---|
5357 | .nf |
---|
5358 | .ft C |
---|
5359 | cat 1.bg |
---|
5360 | chr1 1000 1500 10 |
---|
5361 | chr1 2000 2100 20 |
---|
5362 | |
---|
5363 | cat 2.bg |
---|
5364 | chr1 900 1600 60 |
---|
5365 | chr1 1700 2050 50 |
---|
5366 | |
---|
5367 | cat 3.bg |
---|
5368 | chr1 1980 2070 80 |
---|
5369 | chr1 2090 2100 20 |
---|
5370 | |
---|
5371 | cat sizes.txt |
---|
5372 | chr1 5000 |
---|
5373 | |
---|
5374 | unionBedGraphs \-i 1.bg 2.bg 3.bg |
---|
5375 | chr1 900 1000 0 60 0 |
---|
5376 | chr1 1000 1500 10 60 0 |
---|
5377 | chr1 1500 1600 0 60 0 |
---|
5378 | chr1 1700 1980 0 50 0 |
---|
5379 | chr1 1980 2000 0 50 80 |
---|
5380 | chr1 2000 2050 20 50 80 |
---|
5381 | chr1 2050 2070 20 0 80 |
---|
5382 | chr1 2070 2090 20 0 0 |
---|
5383 | chr1 2090 2100 20 0 20 |
---|
5384 | .ft P |
---|
5385 | .fi |
---|
5386 | .UNINDENT |
---|
5387 | .UNINDENT |
---|
5388 | .SS 5.23.3 Add a header line to the output |
---|
5389 | .sp |
---|
5390 | Figure: |
---|
5391 | .INDENT 0.0 |
---|
5392 | .INDENT 3.5 |
---|
5393 | .sp |
---|
5394 | .nf |
---|
5395 | .ft C |
---|
5396 | unionBedGraphs \-i 1.bg 2.bg 3.bg \-header |
---|
5397 | chrom start end 1 2 3 |
---|
5398 | chr1 900 1000 0 60 0 |
---|
5399 | chr1 1000 1500 10 60 0 |
---|
5400 | chr1 1500 1600 0 60 0 |
---|
5401 | chr1 1700 1980 0 50 0 |
---|
5402 | chr1 1980 2000 0 50 80 |
---|
5403 | chr1 2000 2050 20 50 80 |
---|
5404 | chr1 2050 2070 20 0 80 |
---|
5405 | chr1 2070 2090 20 0 0 |
---|
5406 | chr1 2090 2100 20 0 20 |
---|
5407 | .ft P |
---|
5408 | .fi |
---|
5409 | .UNINDENT |
---|
5410 | .UNINDENT |
---|
5411 | .SS 5.23.4 Add a header line with custom file names to the output |
---|
5412 | .sp |
---|
5413 | Figure: |
---|
5414 | .INDENT 0.0 |
---|
5415 | .INDENT 3.5 |
---|
5416 | .sp |
---|
5417 | .nf |
---|
5418 | .ft C |
---|
5419 | unionBedGraphs \-i 1.bg 2.bg 3.bg \-header \-names WT\-1 WT\-2 KO\-1 |
---|
5420 | chrom start end WT\-1 WT\-2 KO\-1 |
---|
5421 | chr1 900 1000 0 60 0 |
---|
5422 | chr1 1000 1500 10 60 0 |
---|
5423 | chr1 1500 1600 0 60 0 |
---|
5424 | chr1 1700 1980 0 50 0 |
---|
5425 | chr1 1980 2000 0 50 80 |
---|
5426 | chr1 2000 2050 20 50 80 |
---|
5427 | chr1 2050 2070 20 0 80 |
---|
5428 | chr1 2070 2090 20 0 0 |
---|
5429 | chr1 2090 2100 20 0 20 |
---|
5430 | .ft P |
---|
5431 | .fi |
---|
5432 | .UNINDENT |
---|
5433 | .UNINDENT |
---|
5434 | .SS 5.23.5 Include regions that have zero coverage in all BEDGRAPH files. |
---|
5435 | .sp |
---|
5436 | Figure: |
---|
5437 | .INDENT 0.0 |
---|
5438 | .INDENT 3.5 |
---|
5439 | .sp |
---|
5440 | .nf |
---|
5441 | .ft C |
---|
5442 | unionBedGraphs \-i 1.bg 2.bg 3.bg \-empty \-g sizes.txt \-header |
---|
5443 | chrom start end WT\-1 WT\-2 KO\-1 |
---|
5444 | chrom start end 1 2 3 |
---|
5445 | chr1 0 900 0 0 0 |
---|
5446 | chr1 900 1000 0 60 0 |
---|
5447 | chr1 1000 1500 10 60 0 |
---|
5448 | chr1 1500 1600 0 60 0 |
---|
5449 | chr1 1600 1700 0 0 0 |
---|
5450 | chr1 1700 1980 0 50 0 |
---|
5451 | chr1 1980 2000 0 50 80 |
---|
5452 | chr1 2000 2050 20 50 80 |
---|
5453 | chr1 2050 2070 20 0 80 |
---|
5454 | chr1 2070 2090 20 0 0 |
---|
5455 | chr1 2090 2100 20 0 20 |
---|
5456 | chr1 2100 5000 0 0 0 |
---|
5457 | .ft P |
---|
5458 | .fi |
---|
5459 | .UNINDENT |
---|
5460 | .UNINDENT |
---|
5461 | .SS 5.23.6 Use a custom value for missing values. |
---|
5462 | .sp |
---|
5463 | Figure: |
---|
5464 | .INDENT 0.0 |
---|
5465 | .INDENT 3.5 |
---|
5466 | .sp |
---|
5467 | .nf |
---|
5468 | .ft C |
---|
5469 | unionBedGraphs \-i 1.bg 2.bg 3.bg \-empty \-g sizes.txt \-header \-filler N/A |
---|
5470 | chrom start end WT\-1 WT\-2 KO\-1 |
---|
5471 | chrom start end 1 2 3 |
---|
5472 | chr1 0 900 N/A N/A N/A |
---|
5473 | chr1 900 1000 N/A 60 N/A |
---|
5474 | chr1 1000 1500 10 60 N/A |
---|
5475 | chr1 1500 1600 N/A 60 N/A |
---|
5476 | chr1 1600 1700 N/A N/A N/A |
---|
5477 | chr1 1700 1980 N/A 50 N/A |
---|
5478 | chr1 1980 2000 N/A 50 80 |
---|
5479 | chr1 2000 2050 20 50 80 |
---|
5480 | chr1 2050 2070 20 N/A 80 |
---|
5481 | chr1 2070 2090 20 N/A N/A |
---|
5482 | chr1 2090 2100 20 N/A 20 |
---|
5483 | chr1 2100 5000 N/A N/A N/A |
---|
5484 | .ft P |
---|
5485 | .fi |
---|
5486 | .UNINDENT |
---|
5487 | .UNINDENT |
---|
5488 | .SS 5.23.7 Use BEDGRAPH files with non\-numeric values. |
---|
5489 | .sp |
---|
5490 | Figure: |
---|
5491 | .INDENT 0.0 |
---|
5492 | .INDENT 3.5 |
---|
5493 | .sp |
---|
5494 | .nf |
---|
5495 | .ft C |
---|
5496 | cat 1.snp.bg |
---|
5497 | chr1 0 1 A/G |
---|
5498 | chr1 5 6 C/T |
---|
5499 | |
---|
5500 | cat 2.snp.bg |
---|
5501 | chr1 0 1 C/C |
---|
5502 | chr1 7 8 T/T |
---|
5503 | |
---|
5504 | cat 3.snp.bg |
---|
5505 | chr1 0 1 A/G |
---|
5506 | chr1 5 6 C/T |
---|
5507 | |
---|
5508 | unionBedGraphs \-i 1.snp.bg 2.snp.bg 3.snp.bg \-filler \-/\- |
---|
5509 | chr1 0 1 A/G C/C A/G |
---|
5510 | chr1 5 6 C/T \-/\- C/T |
---|
5511 | chr1 7 8 \-/\- T/T \-/\- |
---|
5512 | .ft P |
---|
5513 | .fi |
---|
5514 | .UNINDENT |
---|
5515 | .UNINDENT |
---|
5516 | .SS 5.24 annotateBed |
---|
5517 | .sp |
---|
5518 | \fBannotateBed\fP annotates one BED/VCF/GFF file with the coverage and number of overlaps observed |
---|
5519 | from multiple other BED/VCF/GFF files. In this way, it allows one to ask to what degree one feature |
---|
5520 | coincides with multiple other feature types with a single command. |
---|
5521 | .SS 5.24.1 Usage and option summary |
---|
5522 | .sp |
---|
5523 | Usage: |
---|
5524 | .INDENT 0.0 |
---|
5525 | .INDENT 3.5 |
---|
5526 | .sp |
---|
5527 | .nf |
---|
5528 | .ft C |
---|
5529 | annotateBed [OPTIONS] \-i <BED/GFF/VCF> \-files FILE1 FILE2 FILE3 ... FILEn |
---|
5530 | .ft P |
---|
5531 | .fi |
---|
5532 | .UNINDENT |
---|
5533 | .UNINDENT |
---|
5534 | .TS |
---|
5535 | center; |
---|
5536 | |l|l|. |
---|
5537 | _ |
---|
5538 | T{ |
---|
5539 | Option |
---|
5540 | T} T{ |
---|
5541 | Description |
---|
5542 | T} |
---|
5543 | _ |
---|
5544 | T{ |
---|
5545 | \fB\-namesr\fP |
---|
5546 | T} T{ |
---|
5547 | A list of names (one per file) to describe each file in \-i. These names will be printed as a header line. |
---|
5548 | T} |
---|
5549 | _ |
---|
5550 | T{ |
---|
5551 | \fB\-counts\fP |
---|
5552 | T} T{ |
---|
5553 | Report the count of features in each file that overlap \-i. Default behavior is to report the fraction of \-i covered by each file. |
---|
5554 | T} |
---|
5555 | _ |
---|
5556 | T{ |
---|
5557 | \fB\-both\fP |
---|
5558 | T} T{ |
---|
5559 | Report the count of features followed by the % coverage for each annotation file. Default is to report solely the fraction of \-i covered by each file. |
---|
5560 | T} |
---|
5561 | _ |
---|
5562 | T{ |
---|
5563 | \fB\-s\fP |
---|
5564 | T} T{ |
---|
5565 | Force strandedness. That is, only include hits in A that overlap B on the same strand. By default, hits are included without respect to strand. |
---|
5566 | T} |
---|
5567 | _ |
---|
5568 | .TE |
---|
5569 | .SS 5.24.2 Default behavior \- annotate one file with coverage from others. |
---|
5570 | .sp |
---|
5571 | By default, the fraction of each feature covered by each annotation file is reported after the complete |
---|
5572 | feature in the file to be annotated. |
---|
5573 | .INDENT 0.0 |
---|
5574 | .INDENT 3.5 |
---|
5575 | .sp |
---|
5576 | .nf |
---|
5577 | .ft C |
---|
5578 | cat variants.bed |
---|
5579 | chr1 100 200 nasty 1 \- |
---|
5580 | chr2 500 1000 ugly 2 + |
---|
5581 | chr3 1000 5000 big 3 \- |
---|
5582 | |
---|
5583 | cat genes.bed |
---|
5584 | chr1 150 200 geneA 1 + |
---|
5585 | chr1 175 250 geneB 2 + |
---|
5586 | chr3 0 10000 geneC 3 \- |
---|
5587 | |
---|
5588 | cat conserve.bed |
---|
5589 | chr1 0 10000 cons1 1 + |
---|
5590 | chr2 700 10000 cons2 2 \- |
---|
5591 | chr3 4000 10000 cons3 3 + |
---|
5592 | |
---|
5593 | cat known_var.bed |
---|
5594 | chr1 0 120 known1 \- |
---|
5595 | chr1 150 160 known2 \- |
---|
5596 | chr2 0 10000 known3 + |
---|
5597 | |
---|
5598 | annotateBed \-i variants.bed \-files genes.bed conserv.bed known_var.bed |
---|
5599 | chr1 100 200 nasty 1 \- 0.500000 1.000000 0.300000 |
---|
5600 | chr2 500 1000 ugly 2 + 0.000000 0.600000 1.000000 |
---|
5601 | chr3 1000 5000 big 3 \- 1.000000 0.250000 0.000000 |
---|
5602 | .ft P |
---|
5603 | .fi |
---|
5604 | .UNINDENT |
---|
5605 | .UNINDENT |
---|
5606 | .SS 5.24.3 Report the count of hits from the annotation files |
---|
5607 | .sp |
---|
5608 | Figure: |
---|
5609 | .INDENT 0.0 |
---|
5610 | .INDENT 3.5 |
---|
5611 | .sp |
---|
5612 | .nf |
---|
5613 | .ft C |
---|
5614 | annotateBed \-counts \-i variants.bed \-files genes.bed conserv.bed known_var.bed |
---|
5615 | chr1 100 200 nasty 1 \- 2 1 2 |
---|
5616 | chr2 500 1000 ugly 2 + 0 1 1 |
---|
5617 | chr3 1000 5000 big 3 \- 1 1 0 |
---|
5618 | .ft P |
---|
5619 | .fi |
---|
5620 | .UNINDENT |
---|
5621 | .UNINDENT |
---|
5622 | .SS 5.24.4 Report both the count of hits and the fraction covered from the annotation files |
---|
5623 | .sp |
---|
5624 | Figure: |
---|
5625 | .INDENT 0.0 |
---|
5626 | .INDENT 3.5 |
---|
5627 | .sp |
---|
5628 | .nf |
---|
5629 | .ft C |
---|
5630 | annotateBed \-both \-i variants.bed \-files genes.bed conserv.bed known_var.bed |
---|
5631 | #chr start end name score +/\- cnt1 pct1 cnt2 pct2 cnt3 pct3 |
---|
5632 | chr1 100 200 nasty 1 \- 2 0.500000 1 1.000000 2 0.300000 |
---|
5633 | chr2 500 1000 ugly 2 + 0 0.000000 1 0.600000 1 1.000000 |
---|
5634 | chr3 1000 5000 big 3 \- 1 1.000000 1 0.250000 0 0.000000 |
---|
5635 | .ft P |
---|
5636 | .fi |
---|
5637 | .UNINDENT |
---|
5638 | .UNINDENT |
---|
5639 | .SS 5.24.5 Restrict the reporting to overlaps on the same strand. |
---|
5640 | .sp |
---|
5641 | Note: Compare with the result from 5.24.3 |
---|
5642 | .INDENT 0.0 |
---|
5643 | .INDENT 3.5 |
---|
5644 | .sp |
---|
5645 | .nf |
---|
5646 | .ft C |
---|
5647 | annotateBed \-s \-i variants.bed \-files genes.bed conserv.bed known_var.bed |
---|
5648 | chr1 100 200 nasty var1 \- 0.000000 0.000000 0.000000 |
---|
5649 | chr2 500 1000 ugly var2 + 0.000000 0.000000 0.000000 |
---|
5650 | chr3 1000 5000 big var3 \- 1.000000 0.000000 0.000000 |
---|
5651 | .ft P |
---|
5652 | .fi |
---|
5653 | .UNINDENT |
---|
5654 | .UNINDENT |
---|
5655 | .SH EXAMPLE USAGE |
---|
5656 | .sp |
---|
5657 | Below are several examples of basic BEDTools usage. Example BED files are provided in the |
---|
5658 | /data directory of the BEDTools distribution. |
---|
5659 | .SS 6.1 intersectBed |
---|
5660 | .sp |
---|
5661 | 6.1.1 Report the base\-pair overlap between sequence alignments and genes. |
---|
5662 | .INDENT 0.0 |
---|
5663 | .INDENT 3.5 |
---|
5664 | .sp |
---|
5665 | .nf |
---|
5666 | .ft C |
---|
5667 | intersectBed \-a reads.bed \-b genes.bed |
---|
5668 | .ft P |
---|
5669 | .fi |
---|
5670 | .UNINDENT |
---|
5671 | .UNINDENT |
---|
5672 | .sp |
---|
5673 | 6.1.2 Report whether each alignment overlaps one or more genes. If not, the alignment is not reported. |
---|
5674 | .INDENT 0.0 |
---|
5675 | .INDENT 3.5 |
---|
5676 | .sp |
---|
5677 | .nf |
---|
5678 | .ft C |
---|
5679 | intersectBed \-a reads.bed \-b genes.bed \-u |
---|
5680 | .ft P |
---|
5681 | .fi |
---|
5682 | .UNINDENT |
---|
5683 | .UNINDENT |
---|
5684 | .sp |
---|
5685 | 6.1.3 Report those alignments that overlap NO genes. Like "grep \-v" |
---|
5686 | .INDENT 0.0 |
---|
5687 | .INDENT 3.5 |
---|
5688 | .sp |
---|
5689 | .nf |
---|
5690 | .ft C |
---|
5691 | intersectBed \-a reads.bed \-b genes.bed \-v |
---|
5692 | .ft P |
---|
5693 | .fi |
---|
5694 | .UNINDENT |
---|
5695 | .UNINDENT |
---|
5696 | .sp |
---|
5697 | 6.1.4 Report the number of genes that each alignment overlaps. |
---|
5698 | .INDENT 0.0 |
---|
5699 | .INDENT 3.5 |
---|
5700 | .sp |
---|
5701 | .nf |
---|
5702 | .ft C |
---|
5703 | intersectBed \-a reads.bed \-b genes.bed \-c |
---|
5704 | .ft P |
---|
5705 | .fi |
---|
5706 | .UNINDENT |
---|
5707 | .UNINDENT |
---|
5708 | .sp |
---|
5709 | 6.1.5 Report the entire, original alignment entry for each overlap with a gene. |
---|
5710 | .INDENT 0.0 |
---|
5711 | .INDENT 3.5 |
---|
5712 | .sp |
---|
5713 | .nf |
---|
5714 | .ft C |
---|
5715 | intersectBed \-a reads.bed \-b genes.bed \-wa |
---|
5716 | .ft P |
---|
5717 | .fi |
---|
5718 | .UNINDENT |
---|
5719 | .UNINDENT |
---|
5720 | .sp |
---|
5721 | 6.1.6 Report the entire, original gene entry for each overlap with a gene. |
---|
5722 | .INDENT 0.0 |
---|
5723 | .INDENT 3.5 |
---|
5724 | .sp |
---|
5725 | .nf |
---|
5726 | .ft C |
---|
5727 | intersectBed \-a reads.bed \-b genes.bed \-wb |
---|
5728 | .ft P |
---|
5729 | .fi |
---|
5730 | .UNINDENT |
---|
5731 | .UNINDENT |
---|
5732 | .sp |
---|
5733 | 6.1.7 Report the entire, original alignment and gene entries for each overlap. |
---|
5734 | .INDENT 0.0 |
---|
5735 | .INDENT 3.5 |
---|
5736 | .sp |
---|
5737 | .nf |
---|
5738 | .ft C |
---|
5739 | intersectBed \-a reads.bed \-b genes.bed \-wa \-wb |
---|
5740 | .ft P |
---|
5741 | .fi |
---|
5742 | .UNINDENT |
---|
5743 | .UNINDENT |
---|
5744 | .sp |
---|
5745 | 6.1.8 Only report an overlap with a repeat if it spans at least 50% of the exon. |
---|
5746 | .INDENT 0.0 |
---|
5747 | .INDENT 3.5 |
---|
5748 | .sp |
---|
5749 | .nf |
---|
5750 | .ft C |
---|
5751 | intersectBed \-a exons.bed \-b repeatMasker.bed \-f 0.50 |
---|
5752 | .ft P |
---|
5753 | .fi |
---|
5754 | .UNINDENT |
---|
5755 | .UNINDENT |
---|
5756 | .sp |
---|
5757 | 6.1.9 Only report an overlap if comprises 50% of the structural variant and 50% of the segmental duplication. Thus, it is reciprocally at least a 50% overlap. |
---|
5758 | .INDENT 0.0 |
---|
5759 | .INDENT 3.5 |
---|
5760 | .sp |
---|
5761 | .nf |
---|
5762 | .ft C |
---|
5763 | intersectBed \-a SV.bed \-b segmentalDups.bed \-f 0.50 \-r |
---|
5764 | .ft P |
---|
5765 | .fi |
---|
5766 | .UNINDENT |
---|
5767 | .UNINDENT |
---|
5768 | .sp |
---|
5769 | 6.1.10 Read BED A from stdin. For example, find genes that overlap LINEs but not SINEs. |
---|
5770 | .INDENT 0.0 |
---|
5771 | .INDENT 3.5 |
---|
5772 | .sp |
---|
5773 | .nf |
---|
5774 | .ft C |
---|
5775 | intersectBed \-a genes.bed \-b LINES.bed | intersectBed \-a stdin \-b SINEs.bed \-v |
---|
5776 | .ft P |
---|
5777 | .fi |
---|
5778 | .UNINDENT |
---|
5779 | .UNINDENT |
---|
5780 | .sp |
---|
5781 | 6.1.11 Retain only single\-end BAM alignments that overlap exons. |
---|
5782 | .INDENT 0.0 |
---|
5783 | .INDENT 3.5 |
---|
5784 | .sp |
---|
5785 | .nf |
---|
5786 | .ft C |
---|
5787 | intersectBed \-abam reads.bam \-b exons.bed > reads.touchingExons.bam |
---|
5788 | .ft P |
---|
5789 | .fi |
---|
5790 | .UNINDENT |
---|
5791 | .UNINDENT |
---|
5792 | .sp |
---|
5793 | 6.1.12 Retain only single\-end BAM alignments that do not overlap simple sequence |
---|
5794 | repeats. |
---|
5795 | .INDENT 0.0 |
---|
5796 | .INDENT 3.5 |
---|
5797 | .sp |
---|
5798 | .nf |
---|
5799 | .ft C |
---|
5800 | intersectBed \-abam reads.bam \-b SSRs.bed \-v > reads.noSSRs.bam |
---|
5801 | .ft P |
---|
5802 | .fi |
---|
5803 | .UNINDENT |
---|
5804 | .UNINDENT |
---|
5805 | .SS 6.2 pairToBed |
---|
5806 | .sp |
---|
5807 | 6.2.1 Return all structural variants (in BEDPE format) that overlap with genes on either |
---|
5808 | end. |
---|
5809 | .INDENT 0.0 |
---|
5810 | .INDENT 3.5 |
---|
5811 | .sp |
---|
5812 | .nf |
---|
5813 | .ft C |
---|
5814 | pairToBed \-a sv.bedpe \-b genes > sv.genes |
---|
5815 | .ft P |
---|
5816 | .fi |
---|
5817 | .UNINDENT |
---|
5818 | .UNINDENT |
---|
5819 | .sp |
---|
5820 | 6.2.2 Return all structural variants (in BEDPE format) that overlap with genes on both |
---|
5821 | end. |
---|
5822 | .INDENT 0.0 |
---|
5823 | .INDENT 3.5 |
---|
5824 | .sp |
---|
5825 | .nf |
---|
5826 | .ft C |
---|
5827 | pairToBed \-a sv.bedpe \-b genes \-type both > sv.genes |
---|
5828 | .ft P |
---|
5829 | .fi |
---|
5830 | .UNINDENT |
---|
5831 | .UNINDENT |
---|
5832 | .sp |
---|
5833 | 6.2.3 Retain only paired\-end BAM alignments where neither end overlaps simple |
---|
5834 | sequence repeats. |
---|
5835 | .INDENT 0.0 |
---|
5836 | .INDENT 3.5 |
---|
5837 | .sp |
---|
5838 | .nf |
---|
5839 | .ft C |
---|
5840 | pairToBed \-abam reads.bam \-b SSRs.bed \-type neither > reads.noSSRs.bam |
---|
5841 | .ft P |
---|
5842 | .fi |
---|
5843 | .UNINDENT |
---|
5844 | .UNINDENT |
---|
5845 | .sp |
---|
5846 | 6.2.4 Retain only paired\-end BAM alignments where both ends overlap segmental |
---|
5847 | duplications. |
---|
5848 | .INDENT 0.0 |
---|
5849 | .INDENT 3.5 |
---|
5850 | .sp |
---|
5851 | .nf |
---|
5852 | .ft C |
---|
5853 | pairToBed \-abam reads.bam \-b segdups.bed \-type both > reads.SSRs.bam |
---|
5854 | .ft P |
---|
5855 | .fi |
---|
5856 | .UNINDENT |
---|
5857 | .UNINDENT |
---|
5858 | .sp |
---|
5859 | 6.2.5 Retain only paired\-end BAM alignments where neither or one and only one end |
---|
5860 | overlaps segmental duplications. |
---|
5861 | .INDENT 0.0 |
---|
5862 | .INDENT 3.5 |
---|
5863 | .sp |
---|
5864 | .nf |
---|
5865 | .ft C |
---|
5866 | pairToBed \-abam reads.bam \-b segdups.bed \-type notboth > reads.notbothSSRs.bam |
---|
5867 | .ft P |
---|
5868 | .fi |
---|
5869 | .UNINDENT |
---|
5870 | .UNINDENT |
---|
5871 | .SS 6.3 pairToPair |
---|
5872 | .sp |
---|
5873 | 6.3.1 Find all SVs (in BEDPE format) in sample 1 that are also in sample 2. |
---|
5874 | .INDENT 0.0 |
---|
5875 | .INDENT 3.5 |
---|
5876 | .sp |
---|
5877 | .nf |
---|
5878 | .ft C |
---|
5879 | pairToPair \-a 1.sv.bedpe \-b 2.sv.bedpe | cut \-f 1\-10 > 1.sv.in2.bedpe |
---|
5880 | .ft P |
---|
5881 | .fi |
---|
5882 | .UNINDENT |
---|
5883 | .UNINDENT |
---|
5884 | .sp |
---|
5885 | 6.3.2 Find all SVs (in BEDPE format) in sample 1 that are not in sample 2. |
---|
5886 | .INDENT 0.0 |
---|
5887 | .INDENT 3.5 |
---|
5888 | .sp |
---|
5889 | .nf |
---|
5890 | .ft C |
---|
5891 | pairToPair \-a 1.sv.bedpe \-b 2.sv.bedpe \-type neither | cut \-f 1\-10 > |
---|
5892 | .ft P |
---|
5893 | .fi |
---|
5894 | .UNINDENT |
---|
5895 | .UNINDENT |
---|
5896 | .sp |
---|
5897 | 1.sv.notin2.bedpe |
---|
5898 | .SS 6.4 bamToBed |
---|
5899 | .sp |
---|
5900 | 6.4.1 Convert BAM alignments to BED format. |
---|
5901 | .INDENT 0.0 |
---|
5902 | .INDENT 3.5 |
---|
5903 | .sp |
---|
5904 | .nf |
---|
5905 | .ft C |
---|
5906 | bamToBed \-i reads.bam > reads.bed |
---|
5907 | .ft P |
---|
5908 | .fi |
---|
5909 | .UNINDENT |
---|
5910 | .UNINDENT |
---|
5911 | .sp |
---|
5912 | 6.4.2 Convert BAM alignments to BED format using the BAM edit distance (NM) as the |
---|
5913 | BED "score". |
---|
5914 | .INDENT 0.0 |
---|
5915 | .INDENT 3.5 |
---|
5916 | .sp |
---|
5917 | .nf |
---|
5918 | .ft C |
---|
5919 | bamToBed \-i reads.bam \-ed > reads.bed |
---|
5920 | .ft P |
---|
5921 | .fi |
---|
5922 | .UNINDENT |
---|
5923 | .UNINDENT |
---|
5924 | .sp |
---|
5925 | 6.4.3 Convert BAM alignments to BEDPE format. |
---|
5926 | .INDENT 0.0 |
---|
5927 | .INDENT 3.5 |
---|
5928 | .sp |
---|
5929 | .nf |
---|
5930 | .ft C |
---|
5931 | bamToBed \-i reads.bam \-bedpe > reads.bedpe |
---|
5932 | .ft P |
---|
5933 | .fi |
---|
5934 | .UNINDENT |
---|
5935 | .UNINDENT |
---|
5936 | .SS 6.5 windowBed |
---|
5937 | .sp |
---|
5938 | 6.5.1 Report all genes that are within 10000 bp upstream or downstream of CNVs. |
---|
5939 | .INDENT 0.0 |
---|
5940 | .INDENT 3.5 |
---|
5941 | .sp |
---|
5942 | .nf |
---|
5943 | .ft C |
---|
5944 | windowBed \-a CNVs.bed \-b genes.bed \-w 10000 |
---|
5945 | .ft P |
---|
5946 | .fi |
---|
5947 | .UNINDENT |
---|
5948 | .UNINDENT |
---|
5949 | .sp |
---|
5950 | 6.5.2 Report all genes that are within 10000 bp upstream or 5000 bp downstream of |
---|
5951 | CNVs. |
---|
5952 | .INDENT 0.0 |
---|
5953 | .INDENT 3.5 |
---|
5954 | .sp |
---|
5955 | .nf |
---|
5956 | .ft C |
---|
5957 | windowBed \-a CNVs.bed \-b genes.bed \-l 10000 \-r 5000 |
---|
5958 | .ft P |
---|
5959 | .fi |
---|
5960 | .UNINDENT |
---|
5961 | .UNINDENT |
---|
5962 | .sp |
---|
5963 | 6.5.3 Report all SNPs that are within 5000 bp upstream or 1000 bp downstream of genes. |
---|
5964 | Define upstream and downstream based on strand. |
---|
5965 | .INDENT 0.0 |
---|
5966 | .INDENT 3.5 |
---|
5967 | .sp |
---|
5968 | .nf |
---|
5969 | .ft C |
---|
5970 | windowBed \-a genes.bed \-b snps.bed \-l 5000 \-r 1000 \-sw |
---|
5971 | .ft P |
---|
5972 | .fi |
---|
5973 | .UNINDENT |
---|
5974 | .UNINDENT |
---|
5975 | .SS 6.6 closestBed |
---|
5976 | .sp |
---|
5977 | Note: By default, if there is a tie for closest, all ties will be reported. \fBclosestBed\fP allows overlapping |
---|
5978 | features to be the closest. |
---|
5979 | .sp |
---|
5980 | 6.6.1 Find the closest ALU to each gene. |
---|
5981 | .INDENT 0.0 |
---|
5982 | .INDENT 3.5 |
---|
5983 | .sp |
---|
5984 | .nf |
---|
5985 | .ft C |
---|
5986 | closestBed \-a genes.bed \-b ALUs.bed |
---|
5987 | .ft P |
---|
5988 | .fi |
---|
5989 | .UNINDENT |
---|
5990 | .UNINDENT |
---|
5991 | .sp |
---|
5992 | 6.6.2 Find the closest ALU to each gene, choosing the first ALU in the file if there is a |
---|
5993 | tie. |
---|
5994 | .INDENT 0.0 |
---|
5995 | .INDENT 3.5 |
---|
5996 | .sp |
---|
5997 | .nf |
---|
5998 | .ft C |
---|
5999 | closestBed \-a genes.bed \-b ALUs.bed \-t first |
---|
6000 | .ft P |
---|
6001 | .fi |
---|
6002 | .UNINDENT |
---|
6003 | .UNINDENT |
---|
6004 | .sp |
---|
6005 | 6.6.3 Find the closest ALU to each gene, choosing the last ALU in the file if there is a |
---|
6006 | tie. |
---|
6007 | .INDENT 0.0 |
---|
6008 | .INDENT 3.5 |
---|
6009 | .sp |
---|
6010 | .nf |
---|
6011 | .ft C |
---|
6012 | closestBed \-a genes.bed \-b ALUs.bed \-t last |
---|
6013 | .ft P |
---|
6014 | .fi |
---|
6015 | .UNINDENT |
---|
6016 | .UNINDENT |
---|
6017 | .SS 6.7 subtractBed |
---|
6018 | .sp |
---|
6019 | Note: If a feature in A is entirely "spanned" by any feature in B, it will not be reported. |
---|
6020 | .sp |
---|
6021 | 6.7.1 Remove introns from gene features. Exons will (should) be reported. |
---|
6022 | .INDENT 0.0 |
---|
6023 | .INDENT 3.5 |
---|
6024 | .sp |
---|
6025 | .nf |
---|
6026 | .ft C |
---|
6027 | subtractBed \-a genes.bed \-b introns.bed |
---|
6028 | .ft P |
---|
6029 | .fi |
---|
6030 | .UNINDENT |
---|
6031 | .UNINDENT |
---|
6032 | .SS 6.8 mergeBed |
---|
6033 | .sp |
---|
6034 | 6.8.1 Merge overlapping repetitive elements into a single entry. |
---|
6035 | .INDENT 0.0 |
---|
6036 | .INDENT 3.5 |
---|
6037 | .sp |
---|
6038 | .nf |
---|
6039 | .ft C |
---|
6040 | mergeBed \-i repeatMasker.bed |
---|
6041 | .ft P |
---|
6042 | .fi |
---|
6043 | .UNINDENT |
---|
6044 | .UNINDENT |
---|
6045 | .sp |
---|
6046 | 6.8.2 Merge overlapping repetitive elements into a single entry, returning the number of |
---|
6047 | entries merged. |
---|
6048 | .INDENT 0.0 |
---|
6049 | .INDENT 3.5 |
---|
6050 | .sp |
---|
6051 | .nf |
---|
6052 | .ft C |
---|
6053 | mergeBed \-i repeatMasker.bed \-n |
---|
6054 | .ft P |
---|
6055 | .fi |
---|
6056 | .UNINDENT |
---|
6057 | .UNINDENT |
---|
6058 | .sp |
---|
6059 | 6.8.3 Merge nearby (within 1000 bp) repetitive elements into a single entry. |
---|
6060 | .INDENT 0.0 |
---|
6061 | .INDENT 3.5 |
---|
6062 | .sp |
---|
6063 | .nf |
---|
6064 | .ft C |
---|
6065 | mergeBed \-i repeatMasker.bed \-d 1000 |
---|
6066 | .ft P |
---|
6067 | .fi |
---|
6068 | .UNINDENT |
---|
6069 | .UNINDENT |
---|
6070 | .SS 6.9 coverageBed |
---|
6071 | .sp |
---|
6072 | 6.9.1 Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the |
---|
6073 | genome. |
---|
6074 | .INDENT 0.0 |
---|
6075 | .INDENT 3.5 |
---|
6076 | .sp |
---|
6077 | .nf |
---|
6078 | .ft C |
---|
6079 | coverageBed \-a reads.bed \-b windows10kb.bed | head |
---|
6080 | chr1 0 10000 0 10000 0.00 |
---|
6081 | chr1 10001 20000 33 10000 0.21 |
---|
6082 | chr1 20001 30000 42 10000 0.29 |
---|
6083 | chr1 30001 40000 71 10000 0.36 |
---|
6084 | .ft P |
---|
6085 | .fi |
---|
6086 | .UNINDENT |
---|
6087 | .UNINDENT |
---|
6088 | .sp |
---|
6089 | 6.9.2 Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the |
---|
6090 | genome and created a BEDGRAPH of the number of aligned reads in each window for |
---|
6091 | display on the UCSC browser. |
---|
6092 | .INDENT 0.0 |
---|
6093 | .INDENT 3.5 |
---|
6094 | .sp |
---|
6095 | .nf |
---|
6096 | .ft C |
---|
6097 | coverageBed \-a reads.bed \-b windows10kb.bed | cut \-f 1\-4 > windows10kb.cov.bedg |
---|
6098 | .ft P |
---|
6099 | .fi |
---|
6100 | .UNINDENT |
---|
6101 | .UNINDENT |
---|
6102 | .sp |
---|
6103 | 6.9.3 Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the |
---|
6104 | genome and created a BEDGRAPH of the fraction of each window covered by at least |
---|
6105 | one aligned read for display on the UCSC browser. |
---|
6106 | .INDENT 0.0 |
---|
6107 | .INDENT 3.5 |
---|
6108 | .sp |
---|
6109 | .nf |
---|
6110 | .ft C |
---|
6111 | coverageBed \-a reads.bed \-b windows10kb.bed | awk ??{OFS="\et"; print $1,$2,$3,$6}?? |
---|
6112 | > windows10kb.pctcov.bedg |
---|
6113 | .ft P |
---|
6114 | .fi |
---|
6115 | .UNINDENT |
---|
6116 | .UNINDENT |
---|
6117 | .SS 6.10 complementBed |
---|
6118 | .sp |
---|
6119 | 6.10.1 Report all intervals in the human genome that are not covered by repetitive |
---|
6120 | elements. |
---|
6121 | .INDENT 0.0 |
---|
6122 | .INDENT 3.5 |
---|
6123 | .sp |
---|
6124 | .nf |
---|
6125 | .ft C |
---|
6126 | complementBed \-i repeatMasker.bed \-g hg18.genome |
---|
6127 | .ft P |
---|
6128 | .fi |
---|
6129 | .UNINDENT |
---|
6130 | .UNINDENT |
---|
6131 | .SS 6.11 shuffleBed |
---|
6132 | .sp |
---|
6133 | 6.11.1 Randomly place all discovered variants in the genome. However, prevent them |
---|
6134 | from being placed in know genome gaps. |
---|
6135 | .INDENT 0.0 |
---|
6136 | .INDENT 3.5 |
---|
6137 | .sp |
---|
6138 | .nf |
---|
6139 | .ft C |
---|
6140 | shuffleBed \-i variants.bed \-g hg18.genome \-excl genome_gaps.bed |
---|
6141 | .ft P |
---|
6142 | .fi |
---|
6143 | .UNINDENT |
---|
6144 | .UNINDENT |
---|
6145 | .sp |
---|
6146 | 6.11.2 Randomly place all discovered variants in the genome. However, prevent them |
---|
6147 | from being placed in know genome gaps and require that the variants be randomly |
---|
6148 | placed on the same chromosome. |
---|
6149 | .INDENT 0.0 |
---|
6150 | .INDENT 3.5 |
---|
6151 | .sp |
---|
6152 | .nf |
---|
6153 | .ft C |
---|
6154 | shuffleBed \-i variants.bed \-g hg18.genome \-excl genome_gaps.bed \-chrom |
---|
6155 | .ft P |
---|
6156 | .fi |
---|
6157 | .UNINDENT |
---|
6158 | .UNINDENT |
---|
6159 | .SH ADVANCED USAGE |
---|
6160 | .SS 7.1 Mask all regions in a genome except for targeted capture regions. |
---|
6161 | .sp |
---|
6162 | # Add 500 bp up and downstream of each probe |
---|
6163 | .INDENT 0.0 |
---|
6164 | .INDENT 3.5 |
---|
6165 | .sp |
---|
6166 | .nf |
---|
6167 | .ft C |
---|
6168 | slopBed \-i probes.bed \-b 500 > probes.500bp.bed |
---|
6169 | .ft P |
---|
6170 | .fi |
---|
6171 | .UNINDENT |
---|
6172 | .UNINDENT |
---|
6173 | .sp |
---|
6174 | # Get a BED file of all regions not covered by the probes (+500 bp up/down) |
---|
6175 | .INDENT 0.0 |
---|
6176 | .INDENT 3.5 |
---|
6177 | .sp |
---|
6178 | .nf |
---|
6179 | .ft C |
---|
6180 | complementBed \-i probes.500bp.bed \-g hg18.genome > probes.500bp.complement.bed |
---|
6181 | .ft P |
---|
6182 | .fi |
---|
6183 | .UNINDENT |
---|
6184 | .UNINDENT |
---|
6185 | .sp |
---|
6186 | # Create a masked genome where all bases are masked except for the probes +500bp |
---|
6187 | .INDENT 0.0 |
---|
6188 | .INDENT 3.5 |
---|
6189 | .sp |
---|
6190 | .nf |
---|
6191 | .ft C |
---|
6192 | maskFastaFromBed \-in hg18.fa \-bed probes.500bp.complement.bed \-fo hg18.probecomplement. |
---|
6193 | masked.fa |
---|
6194 | .ft P |
---|
6195 | .fi |
---|
6196 | .UNINDENT |
---|
6197 | .UNINDENT |
---|
6198 | .SS 7.2 Screening for novel SNPs. |
---|
6199 | .sp |
---|
6200 | # Find all SNPs that are not in dbSnp and not in the latest 1000 genomes calls |
---|
6201 | .INDENT 0.0 |
---|
6202 | .INDENT 3.5 |
---|
6203 | .sp |
---|
6204 | .nf |
---|
6205 | .ft C |
---|
6206 | intersectBed \-a snp.calls.bed \-b dbSnp.bed \-v | intersectBed \-a stdin \-b 1KG.bed |
---|
6207 | \-v > snp.calls.novel.bed |
---|
6208 | .ft P |
---|
6209 | .fi |
---|
6210 | .UNINDENT |
---|
6211 | .UNINDENT |
---|
6212 | .sp |
---|
6213 | you can first use intersectBed with the "\-f 1.0" option. |
---|
6214 | .INDENT 0.0 |
---|
6215 | .INDENT 3.5 |
---|
6216 | .sp |
---|
6217 | .nf |
---|
6218 | .ft C |
---|
6219 | intersectBed \-a features.bed \-b windows.bed \-f 1.0 | coverageBed \-a stdin \-b |
---|
6220 | windows.bed > windows.bed.coverage |
---|
6221 | .ft P |
---|
6222 | .fi |
---|
6223 | .UNINDENT |
---|
6224 | .UNINDENT |
---|
6225 | .SS 7.4 Computing the coverage of BAM alignments on exons. |
---|
6226 | .sp |
---|
6227 | # One can combine SAMtools with BEDtools to compute coverage directly from the BAM |
---|
6228 | data by using bamToBed. |
---|
6229 | .INDENT 0.0 |
---|
6230 | .INDENT 3.5 |
---|
6231 | .sp |
---|
6232 | .nf |
---|
6233 | .ft C |
---|
6234 | bamToBed \-i reads.bam | coverageBed \-a stdin \-b exons.bed > exons.bed.coverage |
---|
6235 | .ft P |
---|
6236 | .fi |
---|
6237 | .UNINDENT |
---|
6238 | .UNINDENT |
---|
6239 | .sp |
---|
6240 | # Take it a step further and require that coverage be from properly\-paired reads. |
---|
6241 | .INDENT 0.0 |
---|
6242 | .INDENT 3.5 |
---|
6243 | .sp |
---|
6244 | .nf |
---|
6245 | .ft C |
---|
6246 | samtools view \-bf 0x2 reads.bam | bamToBed \-i stdin | coverageBed \-a stdin \-b |
---|
6247 | exons.bed > exons.bed.proper.coverage |
---|
6248 | .ft P |
---|
6249 | .fi |
---|
6250 | .UNINDENT |
---|
6251 | .UNINDENT |
---|
6252 | .SS 7.5 Computing coverage separately for each strand. |
---|
6253 | .sp |
---|
6254 | # Use grep to only look at forward strand features (i.e. those that end in "+"). |
---|
6255 | .INDENT 0.0 |
---|
6256 | .INDENT 3.5 |
---|
6257 | .sp |
---|
6258 | .nf |
---|
6259 | .ft C |
---|
6260 | bamToBed \-i reads.bam | grep \e+$ | coverageBed \-a stdin \-b genes.bed > |
---|
6261 | genes.bed.forward.coverage |
---|
6262 | .ft P |
---|
6263 | .fi |
---|
6264 | .UNINDENT |
---|
6265 | .UNINDENT |
---|
6266 | .sp |
---|
6267 | # Use grep to only look at reverse strand features (i.e. those that end in "\-"). |
---|
6268 | .INDENT 0.0 |
---|
6269 | .INDENT 3.5 |
---|
6270 | .sp |
---|
6271 | .nf |
---|
6272 | .ft C |
---|
6273 | bamToBed \-i reads.bam | grep \e\-$ | coverageBed \-a stdin \-b genes.bed > |
---|
6274 | genes.bed.forward.coverage |
---|
6275 | .ft P |
---|
6276 | .fi |
---|
6277 | .UNINDENT |
---|
6278 | .UNINDENT |
---|
6279 | .SS 7.6 Find structural variant calls that are private to one sample. |
---|
6280 | .sp |
---|
6281 | # : |
---|
6282 | .INDENT 0.0 |
---|
6283 | .INDENT 3.5 |
---|
6284 | .sp |
---|
6285 | .nf |
---|
6286 | .ft C |
---|
6287 | pairToPair \-a sample1.sv.bedpe \-b othersamples.sv.bedpe \-type neither > |
---|
6288 | sample1.sv.private.bedpe |
---|
6289 | .ft P |
---|
6290 | .fi |
---|
6291 | .UNINDENT |
---|
6292 | .UNINDENT |
---|
6293 | .SS 7.7 Exclude SV deletions that appear to be ALU insertions in the reference genome. |
---|
6294 | .sp |
---|
6295 | # We\(aqll require that 90% of the inner span of the deletion be overlapped by a |
---|
6296 | recent ALU. |
---|
6297 | .INDENT 0.0 |
---|
6298 | .INDENT 3.5 |
---|
6299 | .sp |
---|
6300 | .nf |
---|
6301 | .ft C |
---|
6302 | pairToBed \-a deletions.sv.bedpe \-b ALUs.recent.bed \-type notispan \-f 0.80 > |
---|
6303 | deletions.notALUsinRef.bedpe |
---|
6304 | .ft P |
---|
6305 | .fi |
---|
6306 | .UNINDENT |
---|
6307 | .UNINDENT |
---|
6308 | .sp |
---|
6309 | Refer to the mailing list. |
---|
6310 | .SH AUTHOR |
---|
6311 | UVa |
---|
6312 | .SH COPYRIGHT |
---|
6313 | 2012 |
---|
6314 | .\" Generated by docutils manpage writer. |
---|
6315 | . |
---|