Context Navigation

Back to Ticket #41437

Ticket #41437: bedtools.1

File bedtools.1, 161.8 KB (added by arjanvandervelde (Arjan van der Velde), 11 years ago)

Line
1	.\" Man page generated from reStructuredText.
2	.
3	.TH "BEDTOOLS" "1" "November 17, 2013" "2.16.2" "bedtools"
4	.SH NAME
5	bedtools \- Bedtools Documentation
6	.
7	.nr rst2man-indent-level 0
8	.
9	.de1 rstReportMargin
10	\\$1 \\n[an-margin]
11	level \\n[rst2man-indent-level]
12	level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
13	-
14	\\n[rst2man-indent0]
15	\\n[rst2man-indent1]
16	\\n[rst2man-indent2]
17	..
18	.de1 INDENT
19	.\" .rstReportMargin pre:
20	. RS \\$1
21	. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
22	. nr rst2man-indent-level +1
23	.\" .rstReportMargin post:
24	..
25	.de UNINDENT
26	. RE
27	.\" indent \\n[an-margin]
28	.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
29	.nr rst2man-indent-level -1
30	.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
31	.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
32	..
33	.
34	.nr rst2man-indent-level 0
35	.
36	.de1 rstReportMargin
37	\\$1 \\n[an-margin]
38	level \\n[rst2man-indent-level]
39	level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
40	-
41	\\n[rst2man-indent0]
42	\\n[rst2man-indent1]
43	\\n[rst2man-indent2]
44	..
45	.de1 INDENT
46	.\" .rstReportMargin pre:
47	. RS \\$1
48	. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
49	. nr rst2man-indent-level +1
50	.\" .rstReportMargin post:
51	..
52	.de UNINDENT
53	. RE
54	.\" indent \\n[an-margin]
55	.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
56	.nr rst2man-indent-level -1
57	.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
58	.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
59	..
60	.sp
61	Brief paragraph of the software.
62	.SH OVERVIEW
63	.SS 1.1 Background
64	.sp
65	The development of BEDTools was motivated by a need for fast, flexible tools with which to compare large sets of genomic
66	features. Answering fundamental research questions with existing tools was either too slow or required modifications to the
67	way they reported or computed their results. We were aware of the utilities on the UCSC Genome Browser and Galaxy websites, as
68	well as the elegant tools available as part of Jim Kent’s monolithic suite of tools (“Kent source”). However, we found that
69	the web\-based tools were too cumbersome when working with large datasets generated by current sequencing technologies.
70	Similarly, we found that the Kent source command line tools often required a local installation of the UCSC Genome Browser.
71	These limitations, combined with the fact that we often wanted an extra option here or there that wasn’t available with
72	existing tools, led us to develop our own from scratch. The initial version of BEDTools was publicly released in the spring of
73	2009. The current version has evolved from our research experiences and those of the scientists using the suite over the last
74	year. The BEDTools suite enables one to answer common questions of genomic data in a fast and reliable manner. The fact that
75	almost all the utilities accept input from “stdin” allows one to “stream / pipe” several commands together to facilitate more
76	complicated analyses. Also, the tools allow fine control over how output is reported. The initial version of BEDTools
77	supported solely 6\-column \fI\%BED\fP files. \fIHowever, we have subsequently added support for sequence alignments in\fP \fI\%BAM\fP
78	\fIformat, as well as for features in\fP \fI\%GFF\fP , \fI“blocked” BED format, and\fP
79	\fI\%VCF\fP \fIformat\fP\&.
80	The tools are quite fast and typically finish in a matter of a few seconds, even for large datasets. This manual seeks to describe the behavior and
81	available functionality for each BEDTool. Usage examples are scattered throughout the text, and formal examples are
82	provided in the last two sections, we hope that this document will give you a sense of the flexibility of
83	the toolkit and the types of analyses that are possible with BEDTools. If you have further questions, please join the BEDTools
84	discussion group, visit the Usage Examples on the Google Code site (usage, advanced usage), or take a look at the nascent
85	“Usage From the Wild” page.
86	.SS 1.2 Summary of available tools.
87	.sp
88	BEDTools support a wide range of operations for interrogating and manipulating genomic features. The table below summarizes
89	the tools available in the suite.
90	.TS
91	center;
92	\|l\|l\|.
93	_
94	T{
95	Utility
96	T} T{
97	Description
98	T}
99	_
100	T{
101	\fBintersectBed\fP
102	T} T{
103	Returns overlaps between two BED/GFF/VCF files.
104	T}
105	_
106	T{
107	\fBpairToBed\fP
108	T} T{
109	Returns overlaps between a paired\-end BED file and a regular BED/VCF/GFF file.
110	T}
111	_
112	T{
113	\fBbamToBed\fP
114	T} T{
115	Converts BAM alignments to BED6, BED12, or BEDPE format.
116	T}
117	_
118	T{
119	\fBbedToBam\fP
120	T} T{
121	Converts BED/GFF/VCF features to BAM format.
122	T}
123	_
124	T{
125	\fBbed12ToBed6\fP
126	T} T{
127	Converts "blocked" BED12 features to discrete BED6 features.
128	T}
129	_
130	T{
131	\fBbedToIgv\fP
132	T} T{
133	Creates IGV batch scripts for taking multiple snapshots from BED/GFF/VCF features.
134	T}
135	_
136	T{
137	\fBcoverageBed\fP
138	T} T{
139	Summarizes the depth and breadth of coverage of features in one BED versus features (e.g, windows, exons, etc.) defined in another BED/GFF/VCF file.
140	T}
141	_
142	T{
143	\fBmultiBamCov\fP
144	T} T{
145	Counts sequence coverage for multiple position\-sorted bams at specific loci defined in a BED/GFF/VCF file
146	T}
147	_
148	T{
149	\fBtagBam\fP
150	T} T{
151	Annotates a BAM file with custom tag fields based on overlaps with BED/GFF/VCF files
152	T}
153	_
154	T{
155	\fBnuclBed\fP
156	T} T{
157	Profiles the nucleotide content of intervals in a fasta file
158	T}
159	_
160	T{
161	\fBgenomeCoverageBed\fP
162	T} T{
163	Creates either a histogram, BEDGRAPH, or a "per base" report of genome coverage.
164	T}
165	_
166	T{
167	\fBunionBedGraphs\fP
168	T} T{
169	Combines multiple BedGraph? files into a single file, allowing coverage/other comparisons between them.
170	T}
171	_
172	T{
173	\fBannotateBed\fP
174	T} T{
175	Annotates one BED/VCF/GFF file with overlaps from many others.
176	T}
177	_
178	T{
179	\fBgroupBy\fP
180	T} T{
181	Deprecated. Now in the filo package.
182	T}
183	_
184	T{
185	\fBoverlap\fP
186	T} T{
187	Returns the number of bases pairs of overlap b/w two features on the same line.
188	T}
189	_
190	T{
191	\fBpairToPair\fP
192	T} T{
193	Returns overlaps between two paired\-end BED files.
194	T}
195	_
196	T{
197	\fBclosestBed\fP
198	T} T{
199	Returns the closest feature to each entry in a BED/GFF/VCF file.
200	T}
201	_
202	T{
203	\fBsubtractBed\fP
204	T} T{
205	Removes the portion of an interval that is overlapped by another feature.
206	T}
207	_
208	T{
209	\fBwindowBed\fP
210	T} T{
211	Returns overlaps between two BED/VCF/GFF files based on a user\-defined window.
212	T}
213	_
214	T{
215	\fBmergeBed\fP
216	T} T{
217	Merges overlapping features into a single feature.
218	T}
219	_
220	T{
221	\fBcomplementBed\fP
222	T} T{
223	Returns all intervals not spanned by the features in a BED/GFF/VCF file.
224	T}
225	_
226	T{
227	\fBfastaFromBed\fP
228	T} T{
229	Creates FASTA sequences based on intervals in a BED/GFF/VCF file.
230	T}
231	_
232	T{
233	\fBmaskFastaFromBed\fP
234	T} T{
235	Masks a FASTA file based on BED coordinates.
236	T}
237	_
238	T{
239	\fBshuffleBed\fP
240	T} T{
241	Randomly permutes the locations of a BED file among a genome.
242	T}
243	_
244	T{
245	\fBslopBed\fP
246	T} T{
247	Adjusts each BED entry by a requested number of base pairs.
248	T}
249	_
250	T{
251	\fBflankBed\fP
252	T} T{
253	Creates flanking intervals for each feature in a BED/GFF/VCF file.
254	T}
255	_
256	T{
257	\fBsortBed\fP
258	T} T{
259	Sorts a BED file by chrom, then start position. Other ways as well.
260	T}
261	_
262	T{
263	\fBlinksBed\fP
264	T} T{
265	Creates an HTML file of links to the UCSC or a custom browser.
266	T}
267	_
268	.TE
269	.SS 1.3 Fundamental concepts.
270	.SS 1.3.1 What are genome features and how are they represented?
271	.sp
272	Throughout this manual, we will discuss how to use BEDTools to manipulate, compare and ask questions of genome “features”. Genome features can be functional elements (e.g., genes), genetic polymorphisms (e.g.
273	SNPs, INDELs, or structural variants), or other annotations that have been discovered or curated by genome sequencing groups or genome browser groups. In addition, genome features can be custom annotations that
274	an individual lab or researcher defines (e.g., my novel gene or variant).
275	.sp
276	The basic characteristics of a genome feature are the chromosome or scaffold on which the feature “resides”, the base pair on which the
277	feature starts (i.e. the “start”), the base pair on which feature ends (i.e. the “end”), the strand on which the feature exists (i.e. “+” or “\-“), and the name of the feature if one is applicable.
278	.sp
279	The two most widely used formats for representing genome features are the BED (Browser Extensible Data) and GFF (General Feature Format) formats. BEDTools was originally written to work exclusively with genome features
280	described using the BED format, but it has been recently extended to seamlessly work with BED, GFF and VCF files.
281	.sp
282	Existing annotations for the genomes of many species can be easily downloaded in BED and GFF
283	format from the UCSC Genome Browser’s “Table Browser” (\fI\%http://genome.ucsc.edu/cgi-bin/hgTables?command=start\fP) or from the “Bulk Downloads” page (\fI\%http://hgdownload.cse.ucsc.edu/downloads.html\fP). In addition, the
284	Ensemble Genome Browser contains annotations in GFF/GTF format for many species (\fI\%http://www.ensembl.org/info/data/ftp/index.html\fP)
285	.SS 1.3.2 Overlapping / intersecting features.
286	.sp
287	Two genome features (henceforth referred to as “features”) are said to overlap or intersect if they share at least one base in common.
288	In the figure below, Feature A intersects/overlaps Feature B, but it does not intersect/overlap Feature C.
289	.sp
290	\fBTODO: place figure here\fP
291	.SS 1.3.3 Comparing features in file “A” and file “B”.
292	.sp
293	The previous section briefly introduced a fundamental naming convention used in BEDTools. Specifically, all BEDTools that compare features contained in two distinct files refer to one file as feature set “A” and the other file as feature set “B”. This is mainly in the interest of brevity, but it also has its roots in set theory.
294	As an example, if one wanted to look for SNPs (file A) that overlap with exons (file B), one would use intersectBed in the following manner:
295	.INDENT 0.0
296	.INDENT 3.5
297	.sp
298	.nf
299	.ft C
300	intersectBed –a snps.bed –b exons.bed
301	.ft P
302	.fi
303	.UNINDENT
304	.UNINDENT
305	.sp
306	There are two exceptions to this rule: 1) When the “A” file is in BAM format, the “\-abam” option must bed used. For example:
307	.INDENT 0.0
308	.INDENT 3.5
309	.sp
310	.nf
311	.ft C
312	intersectBed –abam alignedReads.bam –b exons.bed
313	.ft P
314	.fi
315	.UNINDENT
316	.UNINDENT
317	.sp
318	And 2) For tools where only one input feature file is needed, the “\-i” option is used. For example:
319	.INDENT 0.0
320	.INDENT 3.5
321	.sp
322	.nf
323	.ft C
324	mergeBed –i repeats.bed
325	.ft P
326	.fi
327	.UNINDENT
328	.UNINDENT
329	.SS 1.3.4 BED starts are zero\-based and BED ends are one\-based.
330	.sp
331	BEDTools users are sometimes confused by the way the start and end of BED features are represented. Specifically, BEDTools uses the UCSC Genome Browser’s internal database convention of making the start position 0\-based and the end position 1\-based: (\fI\%http://genome.ucsc.edu/FAQ/FAQtracks#tracks1\fP)
332	In other words, BEDTools interprets the “start” column as being 1 basepair higher than what is represented in the file. For example, the following BED feature represents a single base on chromosome 1; namely, the 1st base:
333	.INDENT 0.0
334	.INDENT 3.5
335	.sp
336	.nf
337	.ft C
338	chr1 0 1 first_base
339	.ft P
340	.fi
341	.UNINDENT
342	.UNINDENT
343	.sp
344	Why, you might ask? The advantage of storing features this way is that when computing the length of a feature, one must simply subtract the start from the end. Were the start position 1\-based,
345	the calculation would be (slightly) more complex (i.e. (end\-start)+1). Thus, storing BED features this way reduces the computational burden.
346	.SS 1.3.5 GFF starts and ends are one\-based.
347	.sp
348	In contrast, the GFF format uses 1\-based coordinates for both the start and the end positions. BEDTools is aware of this and adjusts the positions accordingly.
349	In other words, you don’t need to subtract 1 from the start positions of your GFF features for them to work correctly with BEDTools.
350	.SS 1.3.6 VCF coordinates are one\-based.
351	.sp
352	The VCF format uses 1\-based coordinates. As in GFF, BEDTools is aware of this and adjusts the positions accordingly.
353	In other words, you don’t need to subtract 1 from the start positions of your VCF features for them to work correctly with BEDTools.
354	.SS 1.3.7 File B is loaded into memory (most of the time).
355	.sp
356	Whenever a BEDTool compares two files of features, the “B” file is loaded into memory. By contrast, the “A” file is processed line by line and compared with the features from B.
357	Therefore to minimize memory usage, one should set the smaller of the two files as the B file. One salient example is the comparison of aligned sequence reads from a
358	current DNA sequencer to gene annotations. In this case, the aligned sequence file (in BED format) may have tens of millions of features (the sequence alignments),
359	while the gene annotation file will have tens of thousands of features. In this case, it is wise to sets the reads as file A and the genes as file B.
360	.SS 1.3.8 Feature files \fImust\fP be tab\-delimited.
361	.sp
362	This is rather self\-explanatory. While it is possible to allow BED files to be space\-delimited, we have decided to require tab delimiters for three reasons:
363	.INDENT 0.0
364	.IP 1. 3
365	By requiring one delimiter type, the processing time is minimized.
366	.IP 2. 3
367	Tab\-delimited files are more amenable to other UNIX utilities.
368	.IP 3. 3
369	GFF files can contain spaces within attribute columns. This complicates the use of space\-delimited files as spaces must therefore be treated specially depending on the context.
370	.UNINDENT
371	.SS 1.3.9 All BEDTools allow features to be “piped” via standard input.
372	.sp
373	In an effort to allow one to combine multiple BEDTools and other UNIX utilities into more complicated “pipelines”, all BEDTools allow features
374	to be passed to them via standard input. Only one feature file may be passed to a BEDTool via standard input.
375	The convention used by all BEDTools is to set either file A or file B to “stdin” or "\-". For example:
376	.INDENT 0.0
377	.INDENT 3.5
378	.sp
379	.nf
380	.ft C
381	cat snps.bed \| intersectBed –a stdin –b exons.bed
382	cat snps.bed \| intersectBed –a \- –b exons.bed
383	.ft P
384	.fi
385	.UNINDENT
386	.UNINDENT
387	.sp
388	In addition, all BEDTools that simply require one main input file (the \-i file) will assume that input is
389	coming from standard input if the \-i parameter is ignored. For example, the following are equivalent:
390	.INDENT 0.0
391	.INDENT 3.5
392	.sp
393	.nf
394	.ft C
395	cat snps.bed \| sortBed –i stdin
396	cat snps.bed \| sortBed
397	.ft P
398	.fi
399	.UNINDENT
400	.UNINDENT
401	.SS 1.3.10 Most BEDTools write their results to standard output.
402	.sp
403	To allow one to combine multiple BEDTools and other UNIX utilities into more complicated “pipelines”,
404	most BEDTools report their output to standard output, rather than to a named file. If one wants to write the output to a named file, one can use the UNIX “file redirection” symbol “>” to do so.
405	Writing to standard output (the default):
406	.INDENT 0.0
407	.INDENT 3.5
408	.sp
409	.nf
410	.ft C
411	intersectBed –a snps.bed –b exons.bed
412	chr1 100100 100101 rs233454
413	chr1 200100 200101 rs446788
414	chr1 300100 300101 rs645678
415	.ft P
416	.fi
417	.UNINDENT
418	.UNINDENT
419	.sp
420	Writing to a file:
421	.INDENT 0.0
422	.INDENT 3.5
423	.sp
424	.nf
425	.ft C
426	intersectBed –a snps.bed –b exons.bed > snps.in.exons.bed
427
428	cat snps.in.exons.bed
429	chr1 100100 100101 rs233454
430	chr1 200100 200101 rs446788
431	chr1 300100 300101 rs645678
432	.ft P
433	.fi
434	.UNINDENT
435	.UNINDENT
436	.SS 1.3.11 What is a “genome” file?
437	.sp
438	Some of the BEDTools (e.g., genomeCoverageBed, complementBed, slopBed) need to know the size of
439	the chromosomes for the organism for which your BED files are based. When using the UCSC Genome
440	Browser, Ensemble, or Galaxy, you typically indicate which species / genome build you are working.
441	The way you do this for BEDTools is to create a “genome” file, which simply lists the names of the
442	chromosomes (or scaffolds, etc.) and their size (in basepairs).
443	Genome files must be tab\-delimited and are structured as follows (this is an example for C. elegans):
444	.INDENT 0.0
445	.INDENT 3.5
446	.sp
447	.nf
448	.ft C
449	chrI 15072421
450	chrII 15279323
451	\&...
452	chrX 17718854
453	chrM 13794
454	.ft P
455	.fi
456	.UNINDENT
457	.UNINDENT
458	.sp
459	BEDTools includes predefined genome files for human and mouse in the /genomes directory included
460	in the BEDTools distribution. Additionally, the “chromInfo” files/tables available from the UCSC
461	Genome Browser website are acceptable. For example, one can download the hg19 chromInfo file here:
462	\fI\%http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz\fP
463	.SS 1.3.12 Paired\-end BED files (BEDPE files).
464	.sp
465	We have defined a new file format (BEDPE) to concisely describe disjoint genome features, such as
466	structural variations or paired\-end sequence alignments. We chose to define a new format because the
467	existing BED block format (i.e. BED12) does not allow inter\-chromosomal feature definitions. Moreover,
468	the BED12 format feels rather bloated when one want to describe events with only two blocks.
469	.SS 1.3.13 Use “\-h” for help with any BEDTool.
470	.sp
471	Rather straightforward. If you use the “\-h” option with any BEDTool, a full menu of example usage
472	and available options (when applicable) will be reported.
473	.SS 1.3.14 BED features must not contain negative positions.
474	.sp
475	BEDTools will typically reject BED features that contain negative positions. In special cases, however,
476	BEDPE positions may be set to \-1 to indicate that one or more ends of a BEDPE feature is unaligned.
477	.SS 1.3.15 The start position must be <= to the end position.
478	.sp
479	BEDTools will reject BED features where the start position is greater than the end position.
480	.SS 1.3.16 Headers are allowed in GFF and BED files
481	.sp
482	BEDTools will ignore headers at the beginning of BED and GFF files. Valid header lines begin with a
483	“#” symbol, the work “track”, or the word “browser”. For example, the following examples are valid
484	headers for BED or GFF files:
485	.INDENT 0.0
486	.INDENT 3.5
487	.sp
488	.nf
489	.ft C
490	track name=aligned_read description="Illumina aligned reads”
491	chr5 100000 500000 read1 50 +
492	chr5 2380000 2386000 read2 60 \-
493
494	#This is a fascinating dataset
495	chr5 100000 500000 read1 50 +
496	chr5 2380000 2386000 read2 60 \-
497
498	browser position chr22:1\-20000
499	chr5 100000 500000 read1 50 +
500	chr5 2380000 2386000 read2 60 \-
501	.ft P
502	.fi
503	.UNINDENT
504	.UNINDENT
505	.SS 1.3.17 GZIP support: BED, GFF, VCF, and BEDPE file can be “gzipped”
506	.sp
507	BEDTools will process gzipped BED, GFF, VCF and BEDPE files in the same manner as
508	uncompressed files. Gzipped files are auto\-detected thanks to a helpful contribution from Gordon
509	Assaf.
510	.SS 1.3.18 Support for “split” or “spliced” BAM alignments and “blocked” BED features
511	.sp
512	As of Version 2.8.0, five BEDTools (\fBintersectBed\fP, \fBcoverageBed\fP, \fBgenomeCoverageBed\fP,
513	\fBbamToBed\fP, and \fBbed12ToBed6\fP) can properly handle “split”/”spliced” BAM alignments (i.e., having an
514	“N” CIGAR operation) and/or “blocked” BED (aka BED12) features.
515	.sp
516	\fBintersectBed\fP, \fBcoverageBed\fP, and \fBgenomeCoverageBed\fP will optionally handle “split” BAM and/or
517	“blocked” BED by using the \fB\-split\fP option. This will cause intersects or coverage to be computed only
518	for the alignment or feature blocks. In contrast, without this option, the intersects/coverage would be
519	computed for the entire “span” of the alignment or feature, regardless of the size of the gaps between
520	each alignment or feature block. For example, imagine you have a RNA\-seq read that originates from
521	the junction of two exons that were spliced together in a mRNA. In the genome, these two exons
522	happen to be 30Kb apart. Thus, when the read is aligned to the reference genome, one portion of the
523	read will align to the first exon, while another portion of the read will align ca. 30Kb downstream to the
524	other exon. The corresponding CIGAR string would be something like (assuming a 76bp read):
525	30M3000N46M. In the genome, this alignment “spans” 3076 bp, yet the nucleotides in the sequencing
526	read only align “cover” 76bp. Without the \fB\-split\fP option, coverage or overlaps would be reported for the
527	entire 3076bp span of the alignment. However, with the \fB\-split\fP option, coverage or overlaps will only
528	be reported for the portions of the read that overlap the exons (i.e. 30bp on one exon, and
529	46bp on the other).
530	.sp
531	Using the \-split option with bamToBed causes “spliced/split” alignments to be reported in BED12
532	format. Using the \-split option with bed12ToBed6 causes “blocked” BED12 features to be reported in
533	BED6 format.
534	.SS 1.3.19 Writing uncompressed BAM output.
535	.sp
536	When working with a large BAM file using a complex set of tools in a pipe/stream, it is advantageous
537	to pass uncompressed BAM output to each downstream program. This minimizes the amount of time
538	spent compressing and decompressing output from one program to the next. All BEDTools that create
539	BAM output (e.g. \fBintersectBed\fP, \fBwindowBed\fP) will now optionally create uncompressed BAM output
540	using the \fB\-ubam\fP option.
541	.SS 1.4 Implementation and algorithmic notes.
542	.sp
543	BEDTools was implemented in C++ and makes extensive use of data structures and fundamental
544	algorithms from the Standard Template Library (STL). Many of the core algorithms are based upon the
545	genome binning algorithm described in the original UCSC Genome Browser paper (Kent et al, 2002).
546	The tools have been designed to inherit core data structures from central source files, thus allowing
547	rapid tool development and deployment of improvements and corrections. Support for BAM files is
548	made possible through Derek Barnett’s elegant C++ API called BamTools.
549	.SS 1.5 License and availability.
550	.sp
551	BEDTools is freely available under a GNU Public License (Version 2) at:
552	\fI\%http://bedtools.googlecode.com\fP
553	.SS 1.6 Mailing list.
554	.sp
555	A discussion group for reporting bugs, asking questions of the developer and of the user community, as
556	well as for requesting new features is available at:
557	\fI\%http://groups.google.com/group/bedtools-discuss\fP
558	.SS 1.7 Contributors.
559	.sp
560	As open\-source software, BEDTools greatly benefits from contributions made by other developers and
561	users of the tools. We encourage and welcome suggestions, contributions and complaints. This is how
562	software matures, improves and stays on top of the needs of its user community. The Google Code
563	(GC) site maintains a list of individuals who have contributed either source code or useful ideas for
564	improving the tools. In the near future, we hope to maintain a source repository on the GC site in
565	order to facilitate further contributions. We are currently unable to do so because we use Git for
566	version control, which is not yet supported by GC.
567	.SH INSTALLATION
568	.sp
569	BEDTools is intended to run in a "command line" environment on UNIX, LINUX and Apple OS X
570	operating systems. Installing BEDTools involves downloading the latest source code archive followed by
571	compiling the source code into binaries on your local system. The following commands will install
572	BEDTools in a local directory on a NIX or OS X machine. Note that the \fB"<version>"\fP refers to the
573	latest posted version number on \fI\%http://bedtools.googlecode.com/\fP\&.
574	.sp
575	Note: \fIThe BEDTools "makefiles" use the GCC compiler. One should edit the Makefiles accordingly if
576	one wants to use a different compiler.\fP:
577	.INDENT 0.0
578	.INDENT 3.5
579	.sp
580	.nf
581	.ft C
582	curl http://bedtools.googlecode.com/files/BEDTools.<version>.tar.gz > BEDTools.tar.gz
583	tar \-zxvf BEDTools.tar.gz
584	cd BEDTools\-<version>
585	make clean
586	make all
587	ls bin
588	.ft P
589	.fi
590	.UNINDENT
591	.UNINDENT
592	.sp
593	At this point, one should copy the binaries in BEDTools/bin/ to either usr/local/bin/ or some
594	other repository for commonly used UNIX tools in your environment. You will typically require
595	administrator (e.g. "root" or "sudo") privileges to copy to usr/local/bin/. If in doubt, contact you
596	system administrator for help.
597	.SH QUICK START
598	.SS Install BEDTools
599	.INDENT 0.0
600	.INDENT 3.5
601	.sp
602	.nf
603	.ft C
604	curl http://bedtools.googlecode.com/files/BEDTools.<version>.tar.gz > BEDTools.tar.gz
605	tar \-zxvf BEDTools.tar.gz
606	cd BEDTools
607	make clean
608	make all
609	sudo cp bin/* /usr/local/bin/
610	.ft P
611	.fi
612	.UNINDENT
613	.UNINDENT
614	.SS Use BEDTools
615	.sp
616	Below are examples of typical BEDTools usage. \fBAdditional usage examples are described in
617	section 6 of this manual.\fP Using the "\-h" option with any BEDTools will report a list of all command
618	line options.
619	.sp
620	A. Report the base\-pair overlap between the features in two BED files.
621	.INDENT 0.0
622	.INDENT 3.5
623	.sp
624	.nf
625	.ft C
626	intersectBed \-a reads.bed \-b genes.bed
627	.ft P
628	.fi
629	.UNINDENT
630	.UNINDENT
631	.sp
632	B. Report those entries in A that overlap NO entries in B. Like "grep \-v"
633	.INDENT 0.0
634	.INDENT 3.5
635	.sp
636	.nf
637	.ft C
638	intersectBed \-a reads.bed \-b genes.bed ?Cv
639	.ft P
640	.fi
641	.UNINDENT
642	.UNINDENT
643	.sp
644	C. Read BED A from stdin. Useful for stringing together commands. For example, find genes that overlap LINEs
645	but not SINEs.
646	.INDENT 0.0
647	.INDENT 3.5
648	.sp
649	.nf
650	.ft C
651	intersectBed \-a genes.bed \-b LINES.bed \| intersectBed \-a stdin \-b SINEs.bed ?Cv
652	.ft P
653	.fi
654	.UNINDENT
655	.UNINDENT
656	.sp
657	D. Find the closest ALU to each gene.
658	.INDENT 0.0
659	.INDENT 3.5
660	.sp
661	.nf
662	.ft C
663	closestBed \-a genes.bed \-b ALUs.bed
664	.ft P
665	.fi
666	.UNINDENT
667	.UNINDENT
668	.sp
669	E. Merge overlapping repetitive elements into a single entry, returning the number of entries merged.
670	.INDENT 0.0
671	.INDENT 3.5
672	.sp
673	.nf
674	.ft C
675	mergeBed \-i repeatMasker.bed \-n
676	.ft P
677	.fi
678	.UNINDENT
679	.UNINDENT
680	.sp
681	F. Merge nearby repetitive elements into a single entry, so long as they are within 1000 bp of one another.
682	.INDENT 0.0
683	.INDENT 3.5
684	.sp
685	.nf
686	.ft C
687	mergeBed \-i repeatMasker.bed \-d 1000
688	.ft P
689	.fi
690	.UNINDENT
691	.UNINDENT
692	.SH GENERAL USAGE
693	.SS 4.1 Supported file formats
694	.SS 4.1.1 BED format
695	.sp
696	As described on the UCSC Genome Browser website (see link below), the BED format is a concise and
697	flexible way to represent genomic features and annotations. The BED format description supports up to
698	12 columns, but only the first 3 are required for the UCSC browser, the Galaxy browser and for
699	BEDTools. BEDTools allows one to use the "BED12" format (that is, all 12 fields listed below).
700	However, only intersectBed, coverageBed, genomeCoverageBed, and bamToBed will obey the BED12
701	"blocks" when computing overlaps, etc., via the \fB"\-split"\fP option. For all other tools, the last six columns
702	are not used for any comparisons by the BEDTools. Instead, they will use the entire span (start to end)
703	of the BED12 entry to perform any relevant feature comparisons. The last six columns will be reported
704	in the output of all comparisons.
705	.sp
706	The file description below is modified from: \fI\%http://genome.ucsc.edu/FAQ/FAQformat#format1\fP\&.
707	.INDENT 0.0
708	.IP 1. 3
709	\fBchrom\fP \- The name of the chromosome on which the genome feature exists.
710	.UNINDENT
711	.INDENT 0.0
712	.INDENT 3.5
713	.INDENT 0.0
714	.IP \(bu 2
715	\fIAny string can be used\fP\&. For example, "chr1", "III", "myChrom", "contig1112.23".
716	.IP \(bu 2
717	\fIThis column is required\fP\&.
718	.UNINDENT
719	.UNINDENT
720	.UNINDENT
721	.INDENT 0.0
722	.IP 2. 3
723	\fBstart\fP \- The zero\-based starting position of the feature in the chromosome.
724	.UNINDENT
725	.INDENT 0.0
726	.INDENT 3.5
727	.INDENT 0.0
728	.IP \(bu 2
729	\fIThe first base in a chromosome is numbered 0\fP\&.
730	.IP \(bu 2
731	\fIThe start position in each BED feature is therefore interpreted to be 1 greater than the start position listed in the feature. For example, start=9, end=20 is interpreted to span bases 10 through 20,inclusive\fP\&.
732	.IP \(bu 2
733	\fIThis column is required\fP\&.
734	.UNINDENT
735	.UNINDENT
736	.UNINDENT
737	.INDENT 0.0
738	.IP 3. 3
739	\fBend\fP \- The one\-based ending position of the feature in the chromosome.
740	.UNINDENT
741	.INDENT 0.0
742	.INDENT 3.5
743	.INDENT 0.0
744	.IP \(bu 2
745	\fIThe end position in each BED feature is one\-based. See example above\fP\&.
746	.IP \(bu 2
747	\fIThis column is required\fP\&.
748	.UNINDENT
749	.UNINDENT
750	.UNINDENT
751	.INDENT 0.0
752	.IP 4. 3
753	\fBname\fP \- Defines the name of the BED feature.
754	.UNINDENT
755	.INDENT 0.0
756	.INDENT 3.5
757	.INDENT 0.0
758	.IP \(bu 2
759	\fIAny string can be used\fP\&. For example, "LINE", "Exon3", "HWIEAS_0001:3:1:0:266#0/1", or "my_Feature".
760	.IP \(bu 2
761	\fIThis column is optional\fP\&.
762	.UNINDENT
763	.UNINDENT
764	.UNINDENT
765	.INDENT 0.0
766	.IP 5. 3
767	\fBscore\fP \- The UCSC definition requires that a BED score range from 0 to 1000, inclusive. However, BEDTools allows any string to be stored in this field in order to allow greater flexibility in annotation features. For example, strings allow scientific notation for p\-values, mean enrichment values, etc. It should be noted that this flexibility could prevent such annotations from being correctly displayed on the UCSC browser.
768	.UNINDENT
769	.INDENT 0.0
770	.INDENT 3.5
771	.INDENT 0.0
772	.IP \(bu 2
773	\fIAny string can be used\fP\&. For example, 7.31E\-05 (p\-value), 0.33456 (mean enrichment value), "up", "down", etc.
774	.IP \(bu 2
775	\fIThis column is optional\fP\&.
776	.UNINDENT
777	.UNINDENT
778	.UNINDENT
779	.INDENT 0.0
780	.IP 6. 3
781	\fBstrand\fP \- Defines the strand \- either \(aq+\(aq or \(aq\-\(aq.
782	.UNINDENT
783	.INDENT 0.0
784	.INDENT 3.5
785	.INDENT 0.0
786	.IP \(bu 2
787	\fIThis column is optional\fP\&.
788	.UNINDENT
789	.UNINDENT
790	.UNINDENT
791	.INDENT 0.0
792	.IP 7. 3
793	\fBthickStart\fP \- The starting position at which the feature is drawn thickly.
794	.UNINDENT
795	.INDENT 0.0
796	.INDENT 3.5
797	.INDENT 0.0
798	.IP \(bu 2
799	\fIAllowed yet ignored by BEDTools\fP\&.
800	.UNINDENT
801	.UNINDENT
802	.UNINDENT
803	.INDENT 0.0
804	.IP 8. 3
805	\fBthickEnd\fP \- The ending position at which the feature is drawn thickly.
806	.UNINDENT
807	.INDENT 0.0
808	.INDENT 3.5
809	.INDENT 0.0
810	.IP \(bu 2
811	\fIAllowed yet ignored by BEDTools\fP\&.
812	.UNINDENT
813	.UNINDENT
814	.UNINDENT
815	.INDENT 0.0
816	.IP 9. 3
817	\fBitemRgb\fP \- An RGB value of the form R,G,B (e.g. 255,0,0).
818	.UNINDENT
819	.INDENT 0.0
820	.INDENT 3.5
821	.INDENT 0.0
822	.IP \(bu 2
823	\fIAllowed yet ignored by BEDTools\fP\&.
824	.UNINDENT
825	.UNINDENT
826	.UNINDENT
827	.INDENT 0.0
828	.IP 10. 3
829	\fBblockCount\fP \- The number of blocks (exons) in the BED line.
830	.UNINDENT
831	.INDENT 0.0
832	.INDENT 3.5
833	.INDENT 0.0
834	.IP \(bu 2
835	\fIAllowed yet ignored by BEDTools\fP\&.
836	.UNINDENT
837	.UNINDENT
838	.UNINDENT
839	.INDENT 0.0
840	.IP 11. 4
841	\fBblockSizes\fP \- A comma\-separated list of the block sizes.
842	.UNINDENT
843	.INDENT 0.0
844	.INDENT 3.5
845	.INDENT 0.0
846	.IP \(bu 2
847	\fIAllowed yet ignored by BEDTools\fP\&.
848	.UNINDENT
849	.UNINDENT
850	.UNINDENT
851	.INDENT 0.0
852	.IP 12. 4
853	\fBblockStarts\fP \- A comma\-separated list of block starts.
854	.UNINDENT
855	.INDENT 0.0
856	.INDENT 3.5
857	.INDENT 0.0
858	.IP \(bu 2
859	\fIAllowed yet ignored by BEDTools\fP\&.
860	.UNINDENT
861	.UNINDENT
862	.UNINDENT
863	.sp
864	BEDTools requires that all BED input files (and input received from stdin) are \fBtab\-delimited\fP\&. The following types of BED files are supported by BEDTools:
865	.INDENT 0.0
866	.IP 1. 3
867	.nf
868	\fBBED3\fP: A BED file where each feature is described by \fBchrom\fP, \fBstart\fP, and \fBend\fP\&.
869	For example: chr1 11873 14409
870	.fi
871	.sp
872	.IP 2. 3
873	.nf
874	\fBBED4\fP: A BED file where each feature is described by \fBchrom\fP, \fBstart\fP, \fBend\fP, and \fBname\fP\&.
875	For example: chr1 11873 14409 uc001aaa.3
876	.fi
877	.sp
878	.IP 3. 3
879	.nf
880	\fBBED5\fP: A BED file where each feature is described by \fBchrom\fP, \fBstart\fP, \fBend\fP, \fBname\fP, and \fBscore\fP\&.
881	For example: chr1 11873 14409 uc001aaa.3 0
882	.fi
883	.sp
884	.IP 4. 3
885	.nf
886	\fBBED6\fP: A BED file where each feature is described by \fBchrom\fP, \fBstart\fP, \fBend\fP, \fBname\fP, \fBscore\fP, and \fBstrand\fP\&.
887	For example: chr1 11873 14409 uc001aaa.3 0 +
888	.fi
889	.sp
890	.IP 5. 3
891	.nf
892	\fBBED12\fP: A BED file where each feature is described by all twelve columns listed above.
893	For example: chr1 11873 14409 uc001aaa.3 0 + 11873
894	11873 0 3 354,109,1189, 0,739,1347,
895	.fi
896	.sp
897	.UNINDENT
898	.SS 4.1.2 BEDPE format
899	.sp
900	We have defined a new file format (BEDPE) in order to concisely describe disjoint genome features,
901	such as structural variations or paired\-end sequence alignments. We chose to define a new format
902	because the existing "blocked" BED format (a.k.a. BED12) does not allow inter\-chromosomal feature
903	definitions. In addition, BED12 only has one strand field, which is insufficient for paired\-end sequence
904	alignments, especially when studying structural variation.
905	.sp
906	The BEDPE format is described below. The description is modified from: \fI\%http://genome.ucsc.edu/FAQ/FAQformat#format1\fP\&.
907	.INDENT 0.0
908	.IP 1. 3
909	\fBchrom1\fP \- The name of the chromosome on which the \fBfirst\fP end of the feature exists.
910	.UNINDENT
911	.INDENT 0.0
912	.INDENT 3.5
913	.INDENT 0.0
914	.IP \(bu 2
915	\fIAny string can be used\fP\&. For example, "chr1", "III", "myChrom", "contig1112.23".
916	.IP \(bu 2
917	\fIThis column is required\fP\&.
918	.IP \(bu 2
919	\fIUse "." for unknown\fP\&.
920	.UNINDENT
921	.UNINDENT
922	.UNINDENT
923	.INDENT 0.0
924	.IP 2. 3
925	\fBstart1\fP \- The zero\-based starting position of the \fBfirst\fP end of the feature on \fBchrom1\fP\&.
926	.UNINDENT
927	.INDENT 0.0
928	.INDENT 3.5
929	.INDENT 0.0
930	.IP \(bu 2
931	\fIThe first base in a chromosome is numbered 0\fP\&.
932	.IP \(bu 2
933	\fIAs with BED format, the start position in each BEDPE feature is therefore interpreted to be 1 greater than the start position listed in the feature. This column is required\fP\&.
934	.IP \(bu 2
935	\fIUse \-1 for unknown\fP\&.
936	.UNINDENT
937	.UNINDENT
938	.UNINDENT
939	.INDENT 0.0
940	.IP 3. 3
941	\fBend1\fP \- The one\-based ending position of the first end of the feature on \fBchrom1\fP\&.
942	.UNINDENT
943	.INDENT 0.0
944	.INDENT 3.5
945	.INDENT 0.0
946	.IP \(bu 2
947	\fIThe end position in each BEDPE feature is one\-based\fP\&.
948	.IP \(bu 2
949	\fIThis column is required\fP\&.
950	.IP \(bu 2
951	\fIUse \-1 for unknown\fP\&.
952	.UNINDENT
953	.UNINDENT
954	.UNINDENT
955	.INDENT 0.0
956	.IP 4. 3
957	\fBchrom2\fP \- The name of the chromosome on which the \fBsecond\fP end of the feature exists.
958	.UNINDENT
959	.INDENT 0.0
960	.INDENT 3.5
961	.INDENT 0.0
962	.IP \(bu 2
963	\fIAny string can be used\fP\&. For example, "chr1", "III", "myChrom", "contig1112.23".
964	.IP \(bu 2
965	\fIThis column is required\fP\&.
966	.IP \(bu 2
967	\fIUse "." for unknown\fP\&.
968	.UNINDENT
969	.UNINDENT
970	.UNINDENT
971	.INDENT 0.0
972	.IP 5. 3
973	\fBstart2\fP \- The zero\-based starting position of the \fBsecond\fP end of the feature on \fBchrom2\fP\&.
974	.UNINDENT
975	.INDENT 0.0
976	.INDENT 3.5
977	.INDENT 0.0
978	.IP \(bu 2
979	\fIThe first base in a chromosome is numbered 0\fP\&.
980	.IP \(bu 2
981	\fIAs with BED format, the start position in each BEDPE feature is therefore interpreted to be 1 greater than the start position listed in the feature. This column is required\fP\&.
982	.IP \(bu 2
983	\fIUse \-1 for unknown\fP\&.
984	.UNINDENT
985	.UNINDENT
986	.UNINDENT
987	.INDENT 0.0
988	.IP 6. 3
989	\fBend2\fP \- The one\-based ending position of the \fBsecond\fP end of the feature on \fBchrom2\fP\&.
990	.UNINDENT
991	.INDENT 0.0
992	.INDENT 3.5
993	.INDENT 0.0
994	.IP \(bu 2
995	\fIThe end position in each BEDPE feature is one\-based\fP\&.
996	.IP \(bu 2
997	\fIThis column is required\fP\&.
998	.IP \(bu 2
999	\fIUse \-1 for unknown\fP\&.
1000	.UNINDENT
1001	.UNINDENT
1002	.UNINDENT
1003	.INDENT 0.0
1004	.IP 7. 3
1005	\fBname\fP \- Defines the name of the BEDPE feature.
1006	.UNINDENT
1007	.INDENT 0.0
1008	.INDENT 3.5
1009	.INDENT 0.0
1010	.IP \(bu 2
1011	\fIAny string can be used\fP\&. For example, "LINE", "Exon3", "HWIEAS_0001:3:1:0:266#0/1", or "my_Feature".
1012	.IP \(bu 2
1013	\fIThis column is optional\fP\&.
1014	.UNINDENT
1015	.UNINDENT
1016	.UNINDENT
1017	.INDENT 0.0
1018	.IP 8. 3
1019	\fBscore\fP \- The UCSC definition requires that a BED score range from 0 to 1000, inclusive. \fIHowever, BEDTools allows any string to be stored in this field in order to allow greater flexibility in annotation features\fP\&. For example, strings allow scientific notation for p\-values, mean enrichment values, etc. It should be noted that this flexibility could prevent such annotations from being correctly displayed on the UCSC browser.
1020	.UNINDENT
1021	.INDENT 0.0
1022	.INDENT 3.5
1023	.INDENT 0.0
1024	.IP \(bu 2
1025	\fIAny string can be used\fP\&. For example, 7.31E\-05 (p\-value), 0.33456 (mean enrichment value), "up", "down", etc.
1026	.IP \(bu 2
1027	\fIThis column is optional\fP\&.
1028	.UNINDENT
1029	.UNINDENT
1030	.UNINDENT
1031	.INDENT 0.0
1032	.IP 9. 3
1033	\fBstrand1\fP \- Defines the strand for the first end of the feature. Either \(aq+\(aq or \(aq\-\(aq.
1034	.UNINDENT
1035	.INDENT 0.0
1036	.INDENT 3.5
1037	.INDENT 0.0
1038	.IP \(bu 2
1039	\fIThis column is optional\fP\&.
1040	.IP \(bu 2
1041	\fIUse "." for unknown\fP\&.
1042	.UNINDENT
1043	.UNINDENT
1044	.UNINDENT
1045	.INDENT 0.0
1046	.IP 10. 3
1047	\fBstrand2\fP \- Defines the strand for the second end of the feature. Either \(aq+\(aq or \(aq\-\(aq.
1048	.UNINDENT
1049	.INDENT 0.0
1050	.INDENT 3.5
1051	.INDENT 0.0
1052	.IP \(bu 2
1053	\fIThis column is optional\fP\&.
1054	.IP \(bu 2
1055	\fIUse "." for unknown\fP\&.
1056	.UNINDENT
1057	.UNINDENT
1058	.UNINDENT
1059	.INDENT 0.0
1060	.IP 11. 4
1061	\fBAny number of additional, user\-defined fields\fP \- BEDTools allows one to add as many additional fields to the normal, 10\-column BEDPE format as necessary. These columns are merely "passed through" \fBpairToBed\fP and \fBpairToPair\fP and are not part of any analysis. One would use these additional columns to add extra information (e.g., edit distance for each end of an alignment, or "deletion", "inversion", etc.) to each BEDPE feature.
1062	.UNINDENT
1063	.INDENT 0.0
1064	.INDENT 3.5
1065	.INDENT 0.0
1066	.IP \(bu 2
1067	\fIThese additional columns are optional\fP\&.
1068	.UNINDENT
1069	.UNINDENT
1070	.UNINDENT
1071	.sp
1072	Entries from an typical BEDPE file:
1073	.INDENT 0.0
1074	.INDENT 3.5
1075	.sp
1076	.nf
1077	.ft C
1078	chr1 100 200 chr5 5000 5100 bedpe_example1 30 + \-
1079	chr9 1000 5000 chr9 3000 3800 bedpe_example2 100 + \-
1080	.ft P
1081	.fi
1082	.UNINDENT
1083	.UNINDENT
1084	.sp
1085	Entries from a BEDPE file with two custom fields added to each record:
1086	.INDENT 0.0
1087	.INDENT 3.5
1088	.sp
1089	.nf
1090	.ft C
1091	chr1 10 20 chr5 50 60 a1 30 + \- 0 1
1092	chr9 30 40 chr9 80 90 a2 100 + \- 2 1
1093	.ft P
1094	.fi
1095	.UNINDENT
1096	.UNINDENT
1097	.SS 4.1.3 GFF format
1098	.sp
1099	The GFF format is described on the Sanger Institute\(aqs website (\fI\%http://www.sanger.ac.uk/resources/software/gff/spec.html\fP). The GFF description below is modified from the definition at this URL. All nine columns in the GFF format description are required by BEDTools.
1100	.INDENT 0.0
1101	.IP 1. 3
1102	\fBseqname\fP \- The name of the sequence (e.g. chromosome) on which the feature exists.
1103	.UNINDENT
1104	.INDENT 0.0
1105	.INDENT 3.5
1106	.INDENT 0.0
1107	.IP \(bu 2
1108	\fIAny string can be used\fP\&. For example, "chr1", "III", "myChrom", "contig1112.23".
1109	.IP \(bu 2
1110	\fIThis column is required\fP\&.
1111	.UNINDENT
1112	.UNINDENT
1113	.UNINDENT
1114	.INDENT 0.0
1115	.IP 2. 3
1116	\fBsource\fP \- The source of this feature. This field will normally be used to indicate the program making the prediction, or if it comes from public database annotation, or is experimentally verified, etc.
1117	.UNINDENT
1118	.INDENT 0.0
1119	.INDENT 3.5
1120	.INDENT 0.0
1121	.IP \(bu 2
1122	\fIThis column is required\fP\&.
1123	.UNINDENT
1124	.UNINDENT
1125	.UNINDENT
1126	.INDENT 0.0
1127	.IP 3. 3
1128	\fBfeature\fP \- The feature type name. Equivalent to BED\(aqs \fBname\fP field.
1129	.UNINDENT
1130	.INDENT 0.0
1131	.INDENT 3.5
1132	.INDENT 0.0
1133	.IP \(bu 2
1134	\fIAny string can be used\fP\&. For example, "exon", etc.
1135	.IP \(bu 2
1136	\fIThis column is required\fP\&.
1137	.UNINDENT
1138	.UNINDENT
1139	.UNINDENT
1140	.INDENT 0.0
1141	.IP 4. 3
1142	\fBstart\fP \- The one\-based starting position of feature on \fBseqname\fP\&.
1143	.UNINDENT
1144	.INDENT 0.0
1145	.INDENT 3.5
1146	.INDENT 0.0
1147	.IP \(bu 2
1148	\fIThis column is required\fP\&.
1149	.IP \(bu 2
1150	\fIBEDTools accounts for the fact the GFF uses a one\-based position and BED uses a zero\-based start position\fP\&.
1151	.UNINDENT
1152	.UNINDENT
1153	.UNINDENT
1154	.INDENT 0.0
1155	.IP 5. 3
1156	\fBend\fP \- The one\-based ending position of feature on \fBseqname\fP\&.
1157	.UNINDENT
1158	.INDENT 0.0
1159	.INDENT 3.5
1160	.INDENT 0.0
1161	.IP \(bu 2
1162	\fIThis column is required\fP\&.
1163	.UNINDENT
1164	.UNINDENT
1165	.UNINDENT
1166	.INDENT 0.0
1167	.IP 6. 3
1168	\fBscore\fP \- A score assigned to the GFF feature. Like BED format, BEDTools allows any string to be stored in this field in order to allow greater flexibility in annotation features. We note that this differs from the GFF definition in the interest of flexibility.
1169	.UNINDENT
1170	.INDENT 0.0
1171	.INDENT 3.5
1172	.INDENT 0.0
1173	.IP \(bu 2
1174	\fIThis column is required\fP\&.
1175	.UNINDENT
1176	.UNINDENT
1177	.UNINDENT
1178	.INDENT 0.0
1179	.IP 7. 3
1180	\fBstrand\fP \- Defines the strand. Use \(aq+\(aq, \(aq\-\(aq or \(aq.\(aq
1181	.UNINDENT
1182	.INDENT 0.0
1183	.INDENT 3.5
1184	.INDENT 0.0
1185	.IP \(bu 2
1186	\fIThis column is required\fP\&.
1187	.UNINDENT
1188	.UNINDENT
1189	.UNINDENT
1190	.INDENT 0.0
1191	.IP 8. 3
1192	\fBframe\fP \- The frame of the coding sequence. Use \(aq0\(aq, \(aq1\(aq, \(aq2\(aq, or \(aq.\(aq.
1193	.UNINDENT
1194	.INDENT 0.0
1195	.INDENT 3.5
1196	.INDENT 0.0
1197	.IP \(bu 2
1198	\fIThis column is required\fP\&.
1199	.UNINDENT
1200	.UNINDENT
1201	.UNINDENT
1202	.INDENT 0.0
1203	.IP 9. 3
1204	\fBattribute\fP \- Taken from \fI\%http://www.sanger.ac.uk/resources/software/gff/spec.html\fP: From version 2 onwards, the attribute field must have an tag value structure following the syntax used within objects in a .ace file, flattened onto one line by semicolon separators. Tags must be standard identifiers ([A\-Za\-z][
1205	.nf
1206	AZa\-z0\-9_
1207	.fi
1208	]*). Free text values must be quoted with double quotes. \fINote: all non\-printing characters in such free text value strings (e.g. newlines, tabs, control characters, etc) must be explicitly represented by their C (UNIX) style backslash\-escaped representation (e.g. newlines as \(aqn\(aq, tabs as \(aqt\(aq)\fP\&. As in ACEDB, multiple values can follow a specific tag. The aim is to establish consistent use of particular tags, corresponding to an underlying implied ACEDB model if you want to think that way (but acedb is not required).
1209	.UNINDENT
1210	.INDENT 0.0
1211	.INDENT 3.5
1212	.INDENT 0.0
1213	.IP \(bu 2
1214	\fIThis column is required\fP\&.
1215	.UNINDENT
1216	.UNINDENT
1217	.UNINDENT
1218	.sp
1219	An entry from an example GFF file :
1220	.INDENT 0.0
1221	.INDENT 3.5
1222	.sp
1223	.nf
1224	.ft C
1225	seq1 BLASTX similarity 101 235 87.1 + 0 Target "HBA_HUMAN" 11 55 ;
1226	E_value 0.0003 dJ102G20 GD_mRNA coding_exon 7105 7201 . \- 2 Sequence
1227	"dJ102G20.C1.1"
1228	.ft P
1229	.fi
1230	.UNINDENT
1231	.UNINDENT
1232	.SS 4.1.3 GFF format
1233	.sp
1234	Some of the BEDTools (e.g., genomeCoverageBed, complementBed, slopBed) need to know the size of
1235	the chromosomes for the organism for which your BED files are based. When using the UCSC Genome
1236	Browser, Ensemble, or Galaxy, you typically indicate which which species/genome build you are
1237	working. The way you do this for BEDTools is to create a "genome" file, which simply lists the names of
1238	the chromosomes (or scaffolds, etc.) and their size (in basepairs).
1239	.sp
1240	Genome files must be \fBtab\-delimited\fP and are structured as follows (this is an example for \fIC. elegans\fP):
1241	.INDENT 0.0
1242	.INDENT 3.5
1243	.sp
1244	.nf
1245	.ft C
1246	chrI 15072421
1247	chrII 15279323
1248	\&...
1249	chrX 17718854
1250	chrM 13794
1251	.ft P
1252	.fi
1253	.UNINDENT
1254	.UNINDENT
1255	.sp
1256	BEDTools includes pre\-defined genome files for human and mouse in the \fB/genomes\fP directory included
1257	in the BEDTools distribution.
1258	.SS 4.1.5 SAM/BAM format
1259	.sp
1260	The SAM / BAM format is a powerful and widely\-used format for storing sequence alignment data (see
1261	\fI\%http://samtools.sourceforge.net/\fP for more details). It has quickly become the standard format to which
1262	most DNA sequence alignment programs write their output. Currently, the following BEDTools
1263	support inout in BAM format: \fIintersectBed, windowBed, coverageBed, genomeCoverageBed,
1264	pairToBed, bamToBed\fP\&. Support for the BAM format in BEDTools allows one to (to name a few):
1265	compare sequence alignments to annotations, refine alignment datasets, screen for potential mutations
1266	and compute aligned sequence coverage.
1267	.sp
1268	The details of how these tools work with BAM files are addressed in \fBSection 5\fP of this manual.
1269	.SS 4.1.6 VCF format
1270	.sp
1271	The Variant Call Format (VCF) was conceived as part of the 1000 Genomes Project as a standardized
1272	means to report genetic variation calls from SNP, INDEL and structural variant detection programs
1273	(see \fI\%http://www.1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcf4.0\fP for details).
1274	BEDTools now supports the latest version of this format (i.e, Version 4.0). As a result, BEDTools can
1275	be used to compare genetic variation calls with other genomic features.
1276	.SH THE BEDTOOLS SUITE
1277	.sp
1278	This section covers the functionality and default / optional usage for each of the available BEDTools.
1279	Example "figures" are provided in some cases in an effort to convey the purpose of the tool. The
1280	behavior of each available parameter is discussed for each tool in abstract terms. More concrete usage
1281	examples are provided in \fBSection 6\fP\&.
1282	.SS Table of contents
1283	.SS 5.1 intersect
1284	.sp
1285	By far, the most common question asked of two sets of genomic features is whether or not any of the
1286	features in the two sets "overlap" with one another. This is known as feature intersection. \fBbedtools intersect\fP
1287	allows one to screen for overlaps between two sets of genomic features. Moreover, it allows one to have
1288	fine control as to how the intersections are reported. \fBbedtools intersect\fP works with both BED/GFF/VCF
1289	and BAM files as input.
1290	.SS 5.1.1 Usage and option summary
1291	.sp
1292	\fBUsage\fP:
1293	.INDENT 0.0
1294	.INDENT 3.5
1295	.sp
1296	.nf
1297	.ft C
1298	bedtools intersect [OPTIONS] [\-a <BED/GFF/VCF> \|\| \-abam <BAM>] \-b <BED/GFF/VCF>
1299
1300	intersectBed [OPTIONS] [\-a <BED/GFF/VCF> \|\| \-abam <BAM>] \-b <BED/GFF/VCF>
1301	.ft P
1302	.fi
1303	.UNINDENT
1304	.UNINDENT
1305	.TS
1306	center;
1307	\|l\|l\|.
1308	_
1309	T{
1310	Option
1311	T} T{
1312	Description
1313	T}
1314	_
1315	T{
1316	\fB\-a\fP
1317	T} T{
1318	BED/GFF/VCF file A. Each feature in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe.
1319	T}
1320	_
1321	T{
1322	\fB\-b\fP
1323	T} T{
1324	BED/GFF/VCF file B. Use "stdin" if passing B with a UNIX pipe.
1325	T}
1326	_
1327	T{
1328	\fB\-abam\fP
1329	T} T{
1330	BAM file A. Each BAM alignment in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe: For example: samtools view \-b <BAM> \| bedtools intersect \-abam stdin \-b genes.bed
1331	T}
1332	_
1333	T{
1334	\fB\-ubam\fP
1335	T} T{
1336	Write uncompressed BAM output. The default is write compressed BAM output.
1337	T}
1338	_
1339	T{
1340	\fB\-bed\fP
1341	T} T{
1342	When using BAM input (\-abam), write output as BED. The default is to write output in BAM when using \-abam. For example: bedtools intersect \-abam reads.bam \-b genes.bed \-bed
1343	T}
1344	_
1345	T{
1346	\fB\-wa\fP
1347	T} T{
1348	Write the original entry in A for each overlap.
1349	T}
1350	_
1351	T{
1352	\fB\-wb\fP
1353	T} T{
1354	Write the original entry in B for each overlap. Useful for knowing what A overlaps. Restricted by \-f and \-r.
1355	T}
1356	_
1357	T{
1358	\fB\-wo\fP
1359	T} T{
1360	Write the original A and B entries plus the number of base pairs of overlap between the two features. Only A features with overlap are reported. Restricted by \-f and \-r.
1361	T}
1362	_
1363	T{
1364	\fB\-wao\fP
1365	T} T{
1366	Write the original A and B entries plus the number of base pairs of overlap between the two features. However, A features w/o overlap are also reported with a NULL B feature and overlap = 0. Restricted by \-f and \-r.
1367	T}
1368	_
1369	T{
1370	\fB\-u\fP
1371	T} T{
1372	Write original A entry once if any overlaps found in B. In other words, just report the fact at least one overlap was found in B. Restricted by \-f and \-r.
1373	T}
1374	_
1375	T{
1376	\fB\-c\fP
1377	T} T{
1378	For each entry in A, report the number of hits in B while restricting to \-f. Reports 0 for A entries that have no overlap with B. Restricted by \-f and \-r.
1379	T}
1380	_
1381	T{
1382	\fB\-v\fP
1383	T} T{
1384	Only report those entries in A that have no overlap in B. Restricted by \-f and \-r.
1385	T}
1386	_
1387	T{
1388	\fB\-f\fP
1389	T} T{
1390	Minimum overlap required as a fraction of A. Default is 1E\-9 (i.e. 1bp).
1391	T}
1392	_
1393	T{
1394	\fB\-r\fP
1395	T} T{
1396	Require that the fraction of overlap be reciprocal for A and B. In other words, if \-f is 0.90 and \-r is used, this requires that B overlap at least 90% of A and that A also overlaps at least 90% of B.
1397	T}
1398	_
1399	T{
1400	\fB\-s\fP
1401	T} T{
1402	Force "strandedness". That is, only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand.
1403	T}
1404	_
1405	T{
1406	\fB\-split\fP
1407	T} T{
1408	Treat "split" BAM (i.e., having an "N" CIGAR operation) or BED12 entries as distinct BED intervals.
1409	T}
1410	_
1411	.TE
1412	.SS 5.1.2 Default behavior
1413	.sp
1414	By default, if an overlap is found, \fBbedtools intersect\fP reports the shared interval between the two
1415	overlapping features.
1416	.sp
1417	For example:
1418	.INDENT 0.0
1419	.INDENT 3.5
1420	.sp
1421	.nf
1422	.ft C
1423	cat A.bed
1424	chr1 10 20
1425	chr1 30 40
1426
1427	cat B.bed
1428	chr1 15 20
1429
1430	bedtools intersect \-a A.bed \-b B.bed
1431	chr1 15 20
1432	.ft P
1433	.fi
1434	.UNINDENT
1435	.UNINDENT
1436	.SS 5.1.3 (\-wa) Reporting the original A feature
1437	.sp
1438	Instead, one can force \fBbedtools intersect\fP to report the \fIoriginal\fP \fB"A"\fP feature when an overlap is found. As
1439	shown below, the entire "A" feature is reported, not just the portion that overlaps with the "B" feature.
1440	.sp
1441	For example:
1442	.INDENT 0.0
1443	.INDENT 3.5
1444	.sp
1445	.nf
1446	.ft C
1447	cat A.bed
1448	chr1 10 20
1449	chr1 30 40
1450
1451	cat B.bed
1452	chr1 15 20
1453
1454	bedtools intersect \-a A.bed \-b B.bed \-wa
1455	chr1 10 20
1456	.ft P
1457	.fi
1458	.UNINDENT
1459	.UNINDENT
1460	.SS 5.1.4 (\-wb) Reporting the original B feature
1461	.sp
1462	Similarly, one can force \fBbedtools intersect\fP to report the \fIoriginal\fP \fB"B"\fP feature when an overlap is found. If
1463	just \-wb is used, the overlapping portion of A will be reported followed by the \fIoriginal\fP \fB"B"\fP\&. If both \-wa
1464	and \-wb are used, the \fIoriginals\fP of both \fB"A"\fP and \fB"B"\fP will be reported.
1465	.sp
1466	For example (\-wb alone):
1467	::
1468	For example:
1469	.INDENT 0.0
1470	.INDENT 3.5
1471	.sp
1472	.nf
1473	.ft C
1474	cat A.bed
1475	chr1 10 20
1476	chr1 30 40
1477
1478	cat B.bed
1479	chr1 15 20
1480
1481	bedtools intersect \-a A.bed \-b B.bed \-wb
1482	chr1 15 20 chr 15 20
1483	.ft P
1484	.fi
1485	.UNINDENT
1486	.UNINDENT
1487	.sp
1488	Now \-wa and \-wb:
1489	.INDENT 0.0
1490	.INDENT 3.5
1491	.sp
1492	.nf
1493	.ft C
1494	cat A.bed
1495	chr1 10 20
1496	chr1 30 40
1497
1498	cat B.bed
1499	chr1 15 20
1500
1501	bedtools intersect \-a A.bed \-b B.bed \-wa \-wb
1502	chr1 10 20 chr 15 20
1503	.ft P
1504	.fi
1505	.UNINDENT
1506	.UNINDENT
1507	.SS 5.1.5 (\-u) Reporting the presence of \fIat least one\fP overlapping feature
1508	.sp
1509	Frequently a feature in "A" will overlap with multiple features in "B". By default, \fBbedtools intersect\fP will
1510	report each overlap as a separate output line. However, one may want to simply know that there is at
1511	least one overlap (or none). When one uses the \-u option, "A" features that overlap with one or more
1512	"B" features are reported once. Those that overlap with no "B" features are not reported at all.
1513	.sp
1514	For example (\fIwithout\fP \-u):
1515	.INDENT 0.0
1516	.INDENT 3.5
1517	.sp
1518	.nf
1519	.ft C
1520	cat A.bed
1521	chr1 10 20
1522	chr1 30 40
1523
1524	cat B.bed
1525	chr1 15 20
1526	chr1 18 25
1527
1528	bedtools intersect \-a A.bed \-b B.bed \-wb
1529	chr1 10 20 chr 15 20
1530	chr1 10 20 chr 18 25
1531	.ft P
1532	.fi
1533	.UNINDENT
1534	.UNINDENT
1535	.sp
1536	For example (\fIwith\fP \-u):
1537	.INDENT 0.0
1538	.INDENT 3.5
1539	.sp
1540	.nf
1541	.ft C
1542	cat A.bed
1543	chr1 10 20
1544	chr1 30 40
1545
1546	cat B.bed
1547	chr1 15 20
1548	chr1 18 25
1549
1550	bedtools intersect \-a A.bed \-b B.bed \-u
1551	chr1 10 20
1552	.ft P
1553	.fi
1554	.UNINDENT
1555	.UNINDENT
1556	.SS 5.1.6 (\-c) Reporting the number of overlapping features
1557	.sp
1558	The \-c option reports a column after each "A" feature indicating the \fInumber\fP (0 or more) of overlapping
1559	features found in "B". Therefore, \fIeach feature in A is reported once\fP\&.
1560	.sp
1561	For example:
1562	.INDENT 0.0
1563	.INDENT 3.5
1564	.sp
1565	.nf
1566	.ft C
1567	cat A.bed
1568	chr1 10 20
1569	chr1 30 40
1570
1571	cat B.bed
1572	chr1 15 20
1573	chr1 18 25
1574
1575	bedtools intersect \-a A.bed \-b B.bed \-u
1576	chr1 10 20 2
1577	chr1 30 40 0
1578	.ft P
1579	.fi
1580	.UNINDENT
1581	.UNINDENT
1582	.SS 5.1.7 (\-v) Reporting the absence of any overlapping features
1583	.sp
1584	There will likely be cases where you\(aqd like to know which "A" features do not overlap with any of the
1585	"B" features. Perhaps you\(aqd like to know which SNPs don\(aqt overlap with any gene annotations. The \-v
1586	(an homage to "grep \-v") option will only report those "A" features that have no overlaps in "B".
1587	.sp
1588	For example:
1589	.INDENT 0.0
1590	.INDENT 3.5
1591	.sp
1592	.nf
1593	.ft C
1594	cat A.bed
1595	chr1 10 20
1596	chr1 30 40
1597
1598	cat B.bed
1599	chr1 15 20
1600
1601	bedtools intersect \-a A.bed \-b B.bed \-v
1602	chr1 30 40
1603	.ft P
1604	.fi
1605	.UNINDENT
1606	.UNINDENT
1607	.SS 5.1.8 (\-f) Requiring a minimal overlap fraction
1608	.sp
1609	By default, \fBbedtools intersect\fP will report an overlap between A and B so long as there is at least one base
1610	pair is overlapping. Yet sometimes you may want to restrict reported overlaps between A and B to cases
1611	where the feature in B overlaps at least X% (e.g. 50%) of the A feature. The \-f option does exactly
1612	this.
1613	.sp
1614	For example (note that the second B entry is not reported):
1615	.INDENT 0.0
1616	.INDENT 3.5
1617	.sp
1618	.nf
1619	.ft C
1620	cat A.bed
1621	chr1 100 200
1622
1623	cat B.bed
1624	chr1 130 201
1625	chr1 180 220
1626
1627	bedtools intersect \-a A.bed \-b B.bed \-f 0.50 \-wa \-wb
1628	chr1 100 200 chr1 130 201
1629	.ft P
1630	.fi
1631	.UNINDENT
1632	.UNINDENT
1633	.SS 5.1.9 (\-r, combined with \-f)Requiring reciprocal minimal overlap fraction
1634	.sp
1635	Similarly, you may want to require that a minimal fraction of both the A and the B features is
1636	overlapped. For example, if feature A is 1kb and feature B is 1Mb, you might not want to report the
1637	overlap as feature A can overlap at most 1% of feature B. If one set \-f to say, 0.02, and one also
1638	enable the \-r (reciprocal overlap fraction required), this overlap would not be reported.
1639	.sp
1640	For example (note that the second B entry is not reported):
1641	.INDENT 0.0
1642	.INDENT 3.5
1643	.sp
1644	.nf
1645	.ft C
1646	cat A.bed
1647	chr1 100 200
1648
1649	cat B.bed
1650	chr1 130 201
1651	chr1 130 200000
1652
1653	bedtools intersect \-a A.bed \-b B.bed \-f 0.50 \-r \-wa \-wb
1654	chr1 100 200 chr1 130 201
1655	.ft P
1656	.fi
1657	.UNINDENT
1658	.UNINDENT
1659	.SS 5.1.10 (\-s)Enforcing "strandedness"
1660	.sp
1661	By default, \fBbedtools intersect\fP will report overlaps between features even if the features are on opposite
1662	strands. However, if strand information is present in both BED files and the "\-s" option is used, overlaps
1663	will only be reported when features are on the same strand.
1664	.sp
1665	For example (note that the second B entry is not reported):
1666	.INDENT 0.0
1667	.INDENT 3.5
1668	.sp
1669	.nf
1670	.ft C
1671	cat A.bed
1672	chr1 100 200 a1 100 +
1673
1674	cat B.bed
1675	chr1 130 201 b1 100 \-
1676	chr1 130 201 b2 100 +
1677
1678	bedtools intersect \-a A.bed \-b B.bed \-wa \-wb \-s
1679	chr1 100 200 a1 100 + chr1 130 201 b2 100 +
1680	.ft P
1681	.fi
1682	.UNINDENT
1683	.UNINDENT
1684	.SS 5.1.11 (\-abam)Default behavior when using BAM input
1685	.sp
1686	When comparing alignments in BAM format (\fB\-abam\fP) to features in BED format (\fB\-b\fP), \fBbedtools intersect\fP
1687	will, \fBby default\fP, write the output in BAM format. That is, each alignment in the BAM file that meets
1688	the user\(aqs criteria will be written (to standard output) in BAM format. This serves as a mechanism to
1689	create subsets of BAM alignments are of biological interest, etc. Note that only the mate in the BAM
1690	alignment is compared to the BED file. Thus, if only one end of a paired\-end sequence overlaps with a
1691	feature in B, then that end will be written to the BAM output. By contrast, the other mate for the
1692	pair will not be written. One should use \fBpairToBed(Section 5.2)\fP if one wants each BAM alignment
1693	for a pair to be written to BAM output.
1694	.sp
1695	For example:
1696	.INDENT 0.0
1697	.INDENT 3.5
1698	.sp
1699	.nf
1700	.ft C
1701	bedtools intersect \-abam reads.unsorted.bam \-b simreps.bed \| samtools view \- \| head \-3
1702
1703	BERTHA_0001:3:1:15:1362#0 99 chr4 9236904 0 50M = 9242033 5 1 7 9
1704	AGACGTTAACTTTACACACCTCTGCCAAGGTCCTCATCCTTGTATTGAAG W c T U ] b \e g c e g X g f c b f c c b d d g g V Y P W W _
1705	\ec\(gadcdabdfW^a^gggfgd XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:19 X1:i:2 XM:i:0 XO:i:0 XG:i:0 MD:Z:50
1706	BERTHA _0001:3:1:16:994#0 83 chr6 114221672 37 25S6M1I11M7S =
1707	114216196 \-5493 G A A A G G C C A G A G T A T A G A A T A A A C A C A A C A A T G T C C A A G G T A C A C T G T T A
1708	gffeaaddddggggggedgcgeggdegggggffcgggggggegdfggfgf XT:A:M NM:i:3 SM:i:37 AM:i:37 XM:i:2 X O : i :
1709	1 XG:i:1 MD:Z:6A6T3
1710	BERTHA _0001:3:1:16:594#0 147 chr8 43835330 0 50M =
1711	43830893 \-4487 CTTTGGGAGGGCTTTGTAGCCTATCTGGAAAAAGGAAATATCTTCCCATG U
1712	\ee^bgeTdg_Kgcg\(gaggeggg_gggggggggddgdggVg\egWdfgfgff XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:10 X1:i:7 X M : i :
1713	2 XO:i:0 XG:i:0 MD:Z:1A2T45
1714	.ft P
1715	.fi
1716	.UNINDENT
1717	.UNINDENT
1718	.SS 5.1.12 (\-bed)Output BED format when using BAM input
1719	.sp
1720	When comparing alignments in BAM format (\fB\-abam\fP) to features in BED format (\fB\-b\fP), \fBbedtools intersect\fP
1721	will \fBoptionally\fP write the output in BED format. That is, each alignment in the BAM file is converted
1722	to a 6 column BED feature and if overlaps are found (or not) based on the user\(aqs criteria, the BAM
1723	alignment will be reported in BED format. The BED "name" field is comprised of the RNAME field in
1724	the BAM alignment. If mate information is available, the mate (e.g., "/1" or "/2") field will be
1725	appended to the name. The "score" field is the mapping quality score from the BAM alignment.
1726	.sp
1727	For example:
1728	.INDENT 0.0
1729	.INDENT 3.5
1730	.sp
1731	.nf
1732	.ft C
1733	bedtools intersect \-abam reads.unsorted.bam \-b simreps.bed \-bed \| head \-20
1734
1735	chr4 9236903 9236953 BERTHA_0001:3:1:15:1362#0/1 0 +
1736	chr6 114221671 114221721 BERTHA_0001:3:1:16:994#0/1 37 \-
1737	chr8 43835329 43835379 BERTHA_0001:3:1:16:594#0/2 0 \-
1738	chr4 49110668 49110718 BERTHA_0001:3:1:31:487#0/1 23 +
1739	chr19 27732052 27732102 BERTHA_0001:3:1:32:890#0/2 46 +
1740	chr19 27732012 27732062 BERTHA_0001:3:1:45:1135#0/1 37 +
1741	chr10 117494252 117494302 BERTHA_0001:3:1:68:627#0/1 37 \-
1742	chr19 27731966 27732016 BERTHA_0001:3:1:83:931#0/2 9 +
1743	chr8 48660075 48660125 BERTHA_0001:3:1:86:608#0/2 37 \-
1744	chr9 34986400 34986450 BERTHA_0001:3:1:113:183#0/2 37 \-
1745	chr10 42372771 42372821 BERTHA_0001:3:1:128:1932#0/1 3 \-
1746	chr19 27731954 27732004 BERTHA_0001:3:1:130:1402#0/2 0 +
1747	chr10 42357337 42357387 BERTHA_0001:3:1:137:868#0/2 9 +
1748	chr1 159720631 159720681 BERTHA_0001:3:1:147:380#0/2 37 \-
1749	chrX 58230155 58230205 BERTHA_0001:3:1:151:656#0/2 37 \-
1750	chr5 142612746 142612796 BERTHA_0001:3:1:152:1893#0/1 37 \-
1751	chr9 71795659 71795709 BERTHA_0001:3:1:177:387#0/1 37 +
1752	chr1 106240854 106240904 BERTHA_0001:3:1:194:928#0/1 37 \-
1753	chr4 74128456 74128506 BERTHA_0001:3:1:221:724#0/1 37 \-
1754	chr8 42606164 42606214 BERTHA_0001:3:1:244:962#0/1 37 +
1755	.ft P
1756	.fi
1757	.UNINDENT
1758	.UNINDENT
1759	.SS 5.1.13 (\-split)Reporting overlaps with spliced alignments or blocked BED features
1760	.sp
1761	As described in section 1.3.19, bedtools intersect will, by default, screen for overlaps against the entire span
1762	of a spliced/split BAM alignment or blocked BED12 feature. When dealing with RNA\-seq reads, for
1763	example, one typically wants to only screen for overlaps for the portions of the reads that come from
1764	exons (and ignore the interstitial intron sequence). The \fB\-split\fP command allows for such overlaps to be
1765	performed.
1766	.sp
1767	For example, the diagram below illustrates the \fIdefault\fP behavior. The blue dots represent the "split/
1768	spliced" portion of the alignment (i.e., CIGAR "N" operation). In this case, the two exon annotations
1769	are reported as overlapping with the "split" BAM alignment, but in addition, a third feature that
1770	overlaps the "split" portion of the alignment is also reported.
1771	.INDENT 0.0
1772	.INDENT 3.5
1773	.sp
1774	.nf
1775	.ft C
1776	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1777
1778	Exons \-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-
1779
1780	BED/BAM A **********.......................................**
1781
1782	BED File B ^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^
1783
1784	Result =============== ======== ==========
1785	.ft P
1786	.fi
1787	.UNINDENT
1788	.UNINDENT
1789	.sp
1790	In contrast, when using the \fB\-split\fP option, only the exon overlaps are reported.
1791	.INDENT 0.0
1792	.INDENT 3.5
1793	.sp
1794	.nf
1795	.ft C
1796	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1797
1798	Exons \-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-
1799
1800	BED/BAM A **********.......................................**
1801
1802	BED File B ^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^
1803
1804	Result =============== ==========
1805	.ft P
1806	.fi
1807	.UNINDENT
1808	.UNINDENT
1809	.SS 5.2 pairToBed
1810	.sp
1811	\fBpairToBed\fP compares each end of a BEDPE feature or a paired\-end BAM alignment to a feature file in
1812	search of overlaps.
1813	.sp
1814	\fBNOTE: pairToBed requires that the BAM file is sorted/grouped by the read name. This
1815	allows pairToBed to extract correct alignment coordinates for each end based on their
1816	respective CIGAR strings. It also assumes that the alignments for a given pair come in
1817	groups of twos. There is not yet a standard method for reporting multiple alignments
1818	using BAM. pairToBed will fail if an aligner does not report alignments in pairs.\fP
1819	.SS 5.2.1 Usage and option summary
1820	.sp
1821	\fBUsage:\fP
1822	.INDENT 0.0
1823	.INDENT 3.5
1824	.sp
1825	.nf
1826	.ft C
1827	pairToBed [OPTIONS] [\-a <BEDPE> \|\| \-abam <BAM>] \-b <BED/GFF/VCF>
1828	.ft P
1829	.fi
1830	.UNINDENT
1831	.UNINDENT
1832	.TS
1833	center;
1834	\|l\|l\|.
1835	_
1836	T{
1837	Option
1838	T} T{
1839	Description
1840	T}
1841	_
1842	T{
1843	\fB\-a\fP
1844	T} T{
1845	BEDPE file A. Each feature in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe. Output will be in BEDPE format.
1846	T}
1847	_
1848	T{
1849	\fB\-b\fP
1850	T} T{
1851	BED file B. Use "stdin" if passing B with a UNIX pipe.
1852	T}
1853	_
1854	T{
1855	\fB\-abam\fP
1856	T} T{
1857	BAM file A. Each end of each BAM alignment in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe: For example: samtools view ?Cb <BAM> \| pairToBed ?Cabam stdin ?Cb genes.bed \| samtools view \-
1858	T}
1859	_
1860	T{
1861	\fB\-ubam\fP
1862	T} T{
1863	Write uncompressed BAM output. The default is write compressed BAM output.
1864	T}
1865	_
1866	T{
1867	\fB\-bedpe\fP
1868	T} T{
1869	When using BAM input (\-abam), write output as BEDPE. The default is to write output in BAM when using \-abam. For example: pairToBed ?Cabam reads.bam ?Cb genes.bed ?Cbedpe
1870	T}
1871	_
1872	T{
1873	\fB\-ed\fP
1874	T} T{
1875	Use BAM total edit distance (NM tag) for BEDPE score. Default for BEDPE is to use the \fIminimum\fP of the two mapping qualities for the pair. When \-ed is used the \fItotal\fP edit distance from the two mates is reported as the score.
1876	T}
1877	_
1878	T{
1879	\fB\-f\fP
1880	T} T{
1881	Minimum overlap required as a fraction of A. Default is 1E\-9 (i.e. 1bp).
1882	T}
1883	_
1884	T{
1885	\fB\-s\fP
1886	T} T{
1887	Force "strandedness". That is, only report hits in B that overlap A on the \fBsame\fP strand. By default, overlaps are reported without respect to strand.
1888	T}
1889	_
1890	T{
1891	\fB\-type\fP
1892	T} T{
1893	Approach to reporting overlaps between BEDPE and BED.
1894	.INDENT 0.0
1895	.INDENT 3.5
1896	.INDENT 0.0
1897	.INDENT 3.5
1898	\fBeither\-\fP Report overlaps if either end of A overlaps B.
1899	.INDENT 0.0
1900	.IP \(bu 2
1901	\fIDefault\fP
1902	.UNINDENT
1903	.sp
1904	\fBneither\-\fP Report A if neither end of A overlaps B.
1905	.sp
1906	\fBxor\-\fP Report overlaps if one and only one end of A overlaps B.
1907	.sp
1908	\fBboth\-\fP Report overlaps if both ends of A overlap B.
1909	.sp
1910	\fBnotboth\-\fP Report overlaps if neither end or one and only one end of A overlap B.
1911	.sp
1912	\fBispan\-\fP Report overlaps between [end1, start2] of A and B.
1913	.INDENT 0.0
1914	.IP \(bu 2
1915	Note: If chrom1 <> chrom2, entry is ignored.
1916	.UNINDENT
1917	.UNINDENT
1918	.UNINDENT
1919	.sp
1920	\fBospan\-\fP Report overlaps between [start1, end2] of A and B.
1921	.INDENT 0.0
1922	.INDENT 3.5
1923	.INDENT 0.0
1924	.IP \(bu 2
1925	Note: If chrom1 <> chrom2, entry is ignored.
1926	.UNINDENT
1927	.sp
1928	\fBnotispan\-\fP Report A if ispan of A doesn\(aqt overlap B.
1929	\- Note: If chrom1 <> chrom2, entry is ignored.
1930	.sp
1931	\fBnotospan\-\fP Report A if ospan of A doesn\(aqt overlap B.
1932	\- Note: If chrom1 <> chrom2, entry is ignored.
1933	.UNINDENT
1934	.UNINDENT
1935	.UNINDENT
1936	.UNINDENT
1937	T}
1938	_
1939	.TE
1940	.SS 5.2.2 Default behavior
1941	.sp
1942	By default, a BEDPE / BAM feature will be reported if \fIeither\fP end overlaps a feature in the BED file.
1943	In the example below, the left end of the pair overlaps B yet the right end does not. Thus, BEDPE/
1944	BAM A is reported since the default is to report A if either end overlaps B.
1945	.sp
1946	Default: Report A if \fIeither\fP end overlaps B.
1947	.INDENT 0.0
1948	.INDENT 3.5
1949	.sp
1950	.nf
1951	.ft C
1952	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1953
1954	BEDPE/BAM A ***.................................***
1955
1956	BED File B ^^^^^^^^ ^^^^^^
1957
1958	Result =====.................................=====
1959	.ft P
1960	.fi
1961	.UNINDENT
1962	.UNINDENT
1963	.SS 5.2.3 (\-type)Optional overlap requirements
1964	.sp
1965	Using then \fB\-type\fP option, \fBpairToBed\fP provides several other overlap requirements for controlling how
1966	overlaps between BEDPE/BAM A and BED B are reported. The examples below illustrate how each
1967	option behaves.
1968	.sp
1969	\fB\-type both\fP: Report A only if \fIboth\fP ends overlap B.
1970	.INDENT 0.0
1971	.INDENT 3.5
1972	.sp
1973	.nf
1974	.ft C
1975	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1976
1977	BEDPE/BAM A ***.................................***
1978
1979	BED File B ^^^^^^^^ ^^^^^^
1980
1981	Result
1982
1983
1984
1985	BEDPE/BAM A ***.................................***
1986
1987	BED File B ^^^^^^^^ ^^^^^^
1988
1989	Result =====.................................=====
1990	.ft P
1991	.fi
1992	.UNINDENT
1993	.UNINDENT
1994	.sp
1995	\fB\-type neither\fP: Report A only if \fIneither\fP end overlaps B.
1996	.INDENT 0.0
1997	.INDENT 3.5
1998	.sp
1999	.nf
2000	.ft C
2001	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2002
2003	BEDPE/BAM A ***.................................***
2004
2005	BED File B ^^^^^^^^ ^^^^^^
2006
2007	Result
2008
2009
2010
2011	BEDPE/BAM A ***.................................***
2012
2013	BED File B ^^^^ ^^^^^^
2014
2015	Result =====.................................=====
2016	.ft P
2017	.fi
2018	.UNINDENT
2019	.UNINDENT
2020	.sp
2021	\fB\-type xor\fP: Report A only if \fIone and only one\fP end overlaps B.
2022	.INDENT 0.0
2023	.INDENT 3.5
2024	.sp
2025	.nf
2026	.ft C
2027	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2028
2029	BEDPE/BAM A ***.................................***
2030
2031	BED File B ^^^^^^^^ ^^^^^^
2032
2033	Result =====.................................=====
2034
2035
2036
2037	BEDPE/BAM A ***.................................***
2038
2039	BED File B ^^^^ ^^^^^^
2040
2041	Result
2042	.ft P
2043	.fi
2044	.UNINDENT
2045	.UNINDENT
2046	.sp
2047	\fB\-type notboth\fP: Report A only if \fIneither end\fP \fBor\fP \fIone and only one\fP end overlaps B. Thus "notboth"
2048	includes what would be reported by "neither" and by "xor".
2049	.INDENT 0.0
2050	.INDENT 3.5
2051	.sp
2052	.nf
2053	.ft C
2054	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2055
2056	BEDPE/BAM A ***.................................***
2057
2058	BED File B ^^^^^^^^ ^^^^^^
2059
2060	Result =====.................................=====
2061
2062
2063
2064	BEDPE/BAM A ***.................................***
2065
2066	BED File B ^^^ ^^^^^^
2067
2068	Result =====.................................=====
2069
2070
2071
2072	BEDPE/BAM A ***.................................***
2073
2074	BED File B ^^^^ ^^^^^^
2075
2076	Result
2077	.ft P
2078	.fi
2079	.UNINDENT
2080	.UNINDENT
2081	.sp
2082	\fB\-type ispan\fP: Report A if it\(aqs "\fIinner span\fP" overlaps B. Applicable only to intra\-chromosomal features.
2083	.INDENT 0.0
2084	.INDENT 3.5
2085	.sp
2086	.nf
2087	.ft C
2088	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2089
2090	Inner span \|\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\|
2091
2092	BEDPE/BAM A ***.................................***
2093
2094	BED File B ^^^^^^^^
2095
2096	Result =====.................................=====
2097
2098
2099
2100	BEDPE/BAM A =====.................................=====
2101
2102	BED File B ====
2103
2104	Result
2105	.ft P
2106	.fi
2107	.UNINDENT
2108	.UNINDENT
2109	.sp
2110	\fB\-type ospan\fP: Report A if it\(aqs "\fIouter span\fP" overlaps B. Applicable only to intra\-chromosomal features.
2111	.INDENT 0.0
2112	.INDENT 3.5
2113	.sp
2114	.nf
2115	.ft C
2116	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2117
2118	Outer span \|\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\|
2119
2120	BEDPE/BAM A ***.................................***
2121
2122	BED File B ^^^^^^^^^^^^
2123
2124	Result =====.................................=====
2125
2126
2127
2128	BEDPE/BAM A ***.................................***
2129
2130	BED File B ^^^^
2131
2132	Result
2133	.ft P
2134	.fi
2135	.UNINDENT
2136	.UNINDENT
2137	.sp
2138	\fB\-type notispan\fP: Report A only if it\(aqs "\fIinner span\fP" does not overlap B. Applicable only to intrachromosomal
2139	features.
2140	.INDENT 0.0
2141	.INDENT 3.5
2142	.sp
2143	.nf
2144	.ft C
2145	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2146
2147	Inner span \|\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\|
2148
2149	BEDPE/BAM A ***.................................***
2150
2151	BED File B ^^^^^^^^
2152
2153	Result
2154
2155
2156
2157	BEDPE/BAM A ***.................................***
2158
2159	BED File B ^^^^
2160
2161	Result =====.................................=====
2162	.ft P
2163	.fi
2164	.UNINDENT
2165	.UNINDENT
2166	.sp
2167	\fB\-type notospan\fP: Report A if it\(aqs "\fIouter span\fP" overlaps B. Applicable only to intra\-chromosomal
2168	features.
2169	.INDENT 0.0
2170	.INDENT 3.5
2171	.sp
2172	.nf
2173	.ft C
2174	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2175
2176	Outer span \|\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\|
2177
2178	BEDPE/BAM A ***.................................***
2179
2180	BED File B ^^^^^^^^^^^^
2181
2182	Result
2183
2184
2185
2186	BEDPE/BAM A ***.................................***
2187
2188	BED File B ^^^^
2189
2190	Result =====.................................=====
2191	.ft P
2192	.fi
2193	.UNINDENT
2194	.UNINDENT
2195	.SS 5.2.4 (\-f)Requiring a minimum overlap fraction
2196	.sp
2197	By default, \fBpairToBed\fP will report an overlap between A and B so long as there is at least one base
2198	pair is overlapping on either end. Yet sometimes you may want to restrict reported overlaps between A
2199	and B to cases where the feature in B overlaps at least X% (e.g. 50%) of A. The \fB?Cf\fP option does exactly
2200	this. The \fB\-f\fP option may also be combined with the \-type option for additional control. For example,
2201	combining \fB\-f 0.50\fP with \fB\-type both\fP requires that both ends of A have at least 50% overlap with a
2202	feature in B.
2203	.sp
2204	For example, report A only at least 50% of one of the two ends is overlapped by B.
2205	.INDENT 0.0
2206	.INDENT 3.5
2207	.sp
2208	.nf
2209	.ft C
2210	pairToBed \-a A.bedpe \-b B.bed \-f 0.5
2211
2212
2213	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2214
2215	BEDPE/BAM A ***.................................***
2216
2217	BED File B ^^ ^^^^^^
2218
2219	Result
2220
2221
2222
2223	BEDPE/BAM A ***.................................***
2224
2225	BED File B ^^^^ ^^^^^^
2226
2227	Result =====.................................=====
2228	.ft P
2229	.fi
2230	.UNINDENT
2231	.UNINDENT
2232	.SS 5.2.5 (\-s)Enforcing "strandedness"
2233	.sp
2234	By default, \fBpairToBed\fP will report overlaps between features even if the features are on opposing
2235	strands. However, if strand information is present in both files and the \fB"\-s"\fP option is used, overlaps will
2236	only be reported when features are on the same strand.
2237	.sp
2238	For example, report A only at least 50% of one of the two ends is overlapped by B.
2239	.INDENT 0.0
2240	.INDENT 3.5
2241	.sp
2242	.nf
2243	.ft C
2244	pairToBed \-a A.bedpe \-b B.bed \-s
2245
2246
2247
2248	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2249
2250	BEDPE/BAM A >>>>>.................................<<<<<
2251
2252	BED File B << >>>>>
2253
2254	Result
2255
2256
2257
2258	BEDPE/BAM A >>>>>.................................<<<<<
2259
2260	BED File B >> >>>>>
2261
2262	Result >>>>>.................................<<<<<
2263	.ft P
2264	.fi
2265	.UNINDENT
2266	.UNINDENT
2267	.SS 5.2.6 (\-abam)Default is to write BAM output when using BAM input
2268	.sp
2269	When comparing \fIpaired\fP alignments in BAM format (\fB\-abam\fP) to features in BED format (\fB\-b\fP),
2270	\fBpairToBed\fP will , by default, write the output in BAM format. That is, each alignment in the BAM
2271	file that meets the user\(aqs criteria will be written (to standard output) in BAM format. This serves as a
2272	mechanism to create subsets of BAM alignments are of biological interest, etc. Note that both
2273	alignments for each aligned pair will be written to the BAM output.
2274	.sp
2275	For example:
2276	.INDENT 0.0
2277	.INDENT 3.5
2278	.sp
2279	.nf
2280	.ft C
2281	pairToBed ?Cabam pairedReads.bam ?Cb simreps.bed \| samtools view \- \| head \-4
2282
2283	JOBU_0001:3:1:4:1060#0 99 chr10 42387928 29 50M = 42393091 5 2 1 3
2284	AA A A A C G G A A T T A T C G A A T G G A A T C G A A G A G A A T C T T C G A A C G G A C C C G A
2285	dcgggggfbgfgdgggggggfdfgggcggggfcggcggggggagfgbggc XT:A:R NM:i:5 SM:i:0 AM:i:0 X0:i:3 X 1 : i :
2286	3 XM:i:5 XO:i:0 XG:i:0 MD:Z:0T0C33A5T4T3
2287	JOBU_0001:3:1:4:1060#0 147 chr10 42393091 0 50M = 42387928 \- 5 2 1 3
2288	AAATGGAATCGAATGGAATCAACATCAAATGGAATCAAATGGAATCATTG K g d c g g d e c d g
2289	\ed\(gaggfcgcggffcgggc^cgfgccgggfc^gcdgg\ebg XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:3 X1:i:13 XM:i:2 X O : i :
2290	0 XG:i:0 MD:Z:21T14G13
2291	JOBU_0001:3:1:8:446#0 99 chr10 42388091 9 50M = 42392738 4 6 9 7
2292	GAATCGACTGGAATCATCATCGGATGGAAATGAATGGAATAATCATCGAA f _ O f f \(ga ] I e Y f f \(ga f f e d d c f e f c P \(ga c _ W \e \e R _ ]
2293	_BBBBBBBBBBBBBBBB XT:A:U NM:i:4 SM:i:0 AM:i:0 X0:i:1 X1:i:3 XM:i:4 XO:i:0 XG:i:0 M D : Z :
2294	7A22C9C2T6
2295	JOBU_0001:3:1:8:446#0 147 chr10 42392738 9 50M = 42388091 \- 4 6 9 7
2296	TTATCGAATGCAATCGAATGGAATTATCGAATGCAATCGAATAGAATCAT df^ffec_JW[\(gaMWceRec\(ga\(gafee\(gadcecfeeZae\(gac]
2297	f^cNeecfccf^ XT:A:R NM:i:1 SM:i:0 AM:i:0 X0:i:2 X1:i:2 XM:i:1 XO:i:0 XG:i:0 MD:Z:38A11
2298	.ft P
2299	.fi
2300	.UNINDENT
2301	.UNINDENT
2302	.SS 5.2.7 (\-bedpe)Output BEDPE format when using BAM input
2303	.sp
2304	When comparing \fIpaired\fP alignments in BAM format (\fB\-abam\fP) to features in BED format (\fB\-b\fP),
2305	\fBpairToBed\fP will optionally write the output in BEDPE format. That is, each alignment in the BAM
2306	file is converted to a 10 column BEDPE feature and if overlaps are found (or not) based on the user\(aqs
2307	criteria, the BAM alignment will be reported in BEDPE format. The BEDPE "name" field is comprised
2308	of the RNAME field in the BAM alignment. The "score" field is the mapping quality score from the
2309	BAM alignment.
2310	.sp
2311	For example:
2312	.INDENT 0.0
2313	.INDENT 3.5
2314	.sp
2315	.nf
2316	.ft C
2317	pairToBed ?Cabam pairedReads.bam ?Cb simreps.bed \-bedpe \| head \-5
2318	chr10 42387927 42387977 chr10 42393090 42393140
2319	JOBU_0001:3:1:4:1060#0 29 + \-
2320	chr10 42388090 42388140 chr10 42392737 42392787
2321	JOBU_0001:3:1:8:446#0 9 + \-
2322	chr10 42390552 42390602 chr10 42396045 42396095
2323	JOBU_0001:3:1:10:1865#0 9 + \-
2324	chrX 139153741 139153791 chrX 139159018 139159068
2325	JOBU_0001:3:1:14:225#0 37 + \-
2326	chr4 9236903 9236953 chr4 9242032 9242082
2327	JOBU_0001:3:1:15:1362#0 0 + \-
2328	.ft P
2329	.fi
2330	.UNINDENT
2331	.UNINDENT
2332	.SS 5.3 pairToPair
2333	.sp
2334	\fBpairToPair\fP compares two BEDPE files in search of overlaps where each end of a BEDPE feature in A
2335	overlaps with the ends of a feature in B. For example, using pairToPair, one could screen for the exact
2336	same discordant paired\-end alignment in two files. This could suggest (among other things) that the
2337	discordant pair suggests the same structural variation in each file/sample.
2338	.SS 5.3.1 Usage and option summary
2339	.sp
2340	\fBUsage:\fP
2341	.INDENT 0.0
2342	.INDENT 3.5
2343	.sp
2344	.nf
2345	.ft C
2346	pairToPair [OPTIONS] \-a <BEDPE> \-b <BEDPE>
2347	.ft P
2348	.fi
2349	.UNINDENT
2350	.UNINDENT
2351	.TS
2352	center;
2353	\|l\|l\|.
2354	_
2355	T{
2356	Option
2357	T} T{
2358	Description
2359	T}
2360	_
2361	T{
2362	\fB\-a\fP
2363	T} T{
2364	BEDPE file A. Each feature in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe.
2365	T}
2366	_
2367	T{
2368	\fB\-b\fP
2369	T} T{
2370	BEDPE file B. Use "stdin" if passing B with a UNIX pipe.
2371	T}
2372	_
2373	T{
2374	\fB\-f\fP
2375	T} T{
2376	Minimum overlap required as a fraction of A. Default is 1E\-9 (i.e. 1bp).
2377	T}
2378	_
2379	T{
2380	\fB\-is\fP
2381	T} T{
2382	Force "strandedness". That is, only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand.
2383	T}
2384	_
2385	T{
2386	\fB\-type\fP
2387	T} T{
2388	.INDENT 0.0
2389	.INDENT 3.5
2390	Approach to reporting overlaps between BEDPE and BED.
2391	.UNINDENT
2392	.UNINDENT
2393	.nf
2394	\fBeither\fP Report overlaps if either ends of A overlap B.
2395	.fi
2396	.sp
2397	.INDENT 0.0
2398	.INDENT 3.5
2399	.nf
2400	\fBneither\fP Report A if neither end of A overlaps B.
2401	.fi
2402	.sp
2403	.nf
2404	\fBboth\fP Report overlaps if both ends of A overlap B. \-\fIDefault behavior.\fP
2405	.fi
2406	.sp
2407	.UNINDENT
2408	.UNINDENT
2409	T}
2410	_
2411	.TE
2412	.SS 5.3.2 Default behavior
2413	.sp
2414	By default, a BEDPE feature from A will be reported if \fIboth\fP ends overlap a feature in the BEDPE B
2415	file. If strand information is present for the two BEDPE files, it will be further required that the
2416	overlaps on each end be on the same strand. This way, an otherwise overlapping (in terms of genomic
2417	locations) F/R alignment will not be matched with a R/R alignment.
2418	.sp
2419	Default: Report A if \fIboth\fP ends overlaps B.
2420	.INDENT 0.0
2421	.INDENT 3.5
2422	.sp
2423	.nf
2424	.ft C
2425	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2426
2427	BEDPE/BAM A ***.................................***
2428
2429	BED File B ^^^^^^^^ ^^^^^^
2430
2431	Result =====.................................=====
2432	.ft P
2433	.fi
2434	.UNINDENT
2435	.UNINDENT
2436	.sp
2437	Default when strand information is present in both BEDPE files: Report A if \fIboth\fP ends overlaps B \fIon
2438	the same strands\fP\&.
2439	.INDENT 0.0
2440	.INDENT 3.5
2441	.sp
2442	.nf
2443	.ft C
2444	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2445
2446	BEDPE A >>>>>.................................>>>>>
2447
2448	BEDPE B <<<<<.............................>>>>>
2449
2450	Result
2451
2452
2453
2454	BEDPE A >>>>>.................................>>>>>
2455
2456	BEDPE B >>>>>.............................>>>>>
2457
2458	Result >>>>>.................................>>>>>
2459	.ft P
2460	.fi
2461	.UNINDENT
2462	.UNINDENT
2463	.SS 5.3.3 (\-type neither)Optional overlap requirements
2464	.sp
2465	Using then \fB\-type neither, pairToPair\fP will only report A if \fIneither\fP end overlaps with a BEDPE
2466	feature in B.
2467	.sp
2468	\fB\-type neither\fP: Report A only if \fIneither\fP end overlaps B.
2469	.INDENT 0.0
2470	.INDENT 3.5
2471	.sp
2472	.nf
2473	.ft C
2474	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2475
2476	BEDPE/BAM A ***.................................***
2477
2478	BED File B ^^^^^^^^......................................^^^^^^
2479
2480	Result
2481
2482
2483
2484	BEDPE/BAM A ***.................................***
2485
2486	BED File B ^^^^................................................^^^^^^
2487
2488	Result =====.................................=====
2489	.ft P
2490	.fi
2491	.UNINDENT
2492	.UNINDENT
2493	.SS 5.4 bamToBed
2494	.sp
2495	\fBbamToBed\fP is a general purpose tool that will convert sequence alignments in BAM format to either
2496	BED6, BED12 or BEDPE format. This enables one to convert BAM files for use with all of the other
2497	BEDTools. The CIGAR string is used to compute the alignment end coordinate in an "ungapped"
2498	fashion. That is, match ("M"), deletion ("D"), and splice ("N") operations are observed when computing
2499	alignment ends.
2500	.SS 5.4.1 Usage and option summary
2501	.sp
2502	\fBUsage:\fP
2503	.INDENT 0.0
2504	.INDENT 3.5
2505	.sp
2506	.nf
2507	.ft C
2508	bamToBed [OPTIONS] \-i <BAM>
2509	.ft P
2510	.fi
2511	.UNINDENT
2512	.UNINDENT
2513	.TS
2514	center;
2515	\|l\|l\|.
2516	_
2517	T{
2518	Option
2519	T} T{
2520	Description
2521	T}
2522	_
2523	T{
2524	\fB\-bedpe\fP
2525	T} T{
2526	.INDENT 0.0
2527	.INDENT 3.5
2528	.INDENT 0.0
2529	.TP
2530	.B Write BAM alignments in BEDPE format. Only one alignment from paired\-end reads will be reported. Specifically, it each mate is aligned to the same chromosome, the BAM alignment reported will be the one where the BAM insert size is greater than zero. When the mate alignments are interchromosomal, the lexicographically lower chromosome will be reported first. Lastly, when an end is unmapped, the chromosome and strand will be set to "." and the start and end coordinates will be set to \-1. \fIBy default, this is disabled and the output will be reported in BED format\fP\&.
2531	\fBNOTE: When using this option, it is required that the BAM file is sorted/grouped by the read name. This allows bamToBed to extract correct alignment coordinates for each end based on their respective CIGAR strings. It also assumes that the alignments for a given pair come in groups of twos. There is not yet a standard method for reporting multiple alignments using BAM. bamToBed will fail if an aligner does not report alignments in pairs\fP\&.
2532	.UNINDENT
2533	.UNINDENT
2534	.UNINDENT
2535	.sp
2536	BAM files may be piped to bamToBed by specifying "\-i stdin". See example below.
2537	T}
2538	_
2539	T{
2540	\fB\-bed12\fP
2541	T} T{
2542	Write "blocked" BED (a.k.a. BED12) format. This will convert "spliced" BAM alignments (denoted by the "N" CIGAR operation) to BED12.
2543	T}
2544	_
2545	T{
2546	\fB\-ed\fP
2547	T} T{
2548	Use the "edit distance" tag (NM) for the BED score field. Default for BED is to use mapping quality. Default for BEDPE is to use the \fIminimum\fP of the two mapping qualities for the pair. When \-ed is used with \-bedpe, the total edit distance from the two mates is reported.
2549	T}
2550	_
2551	T{
2552	\fB\-tag\fP
2553	T} T{
2554	Use other \fInumeric\fP BAM alignment tag for BED score. Default for BED is to use mapping quality. Disallowed with BEDPE output.
2555	T}
2556	_
2557	T{
2558	\fB\-color\fP
2559	T} T{
2560	An R,G,B string for the color used with BED12 format. Default is (255,0,0).
2561	T}
2562	_
2563	T{
2564	\fB\-split\fP
2565	T} T{
2566	Report each portion of a "split" BAM (i.e., having an "N" CIGAR operation) alignment as a distinct BED intervals.
2567	T}
2568	_
2569	.TE
2570	.sp
2571	By default, each alignment in the BAM file is converted to a 6 column BED. The BED "name" field is
2572	comprised of the RNAME field in the BAM alignment. If mate information is available, the mate (e.g.,
2573	"/1" or "/2") field will be appended to the name. The "score" field is the mapping quality score from the
2574	BAM alignment, unless the \fB\-ed\fP option is used.
2575	.sp
2576	Examples:
2577	.INDENT 0.0
2578	.INDENT 3.5
2579	.sp
2580	.nf
2581	.ft C
2582	bamToBed \-i reads.bam \| head \-5
2583	chr7 118970079 118970129 TUPAC_0001:3:1:0:1452#0/1 37 \-
2584	chr7 118965072 118965122 TUPAC_0001:3:1:0:1452#0/2 37 +
2585	chr11 46769934 46769984 TUPAC_0001:3:1:0:1472#0/1 37 \-
2586
2587	bamToBed \-i reads.bam \-tag NM \| head \-5
2588	chr7 118970079 118970129 TUPAC_0001:3:1:0:1452#0/1 1 \-
2589	chr7 118965072 118965122 TUPAC_0001:3:1:0:1452#0/2 3 +
2590	chr11 46769934 46769984 TUPAC_0001:3:1:0:1472#0/1 1 \-
2591
2592	bamToBed \-i reads.bam \-bedpe \| head \-3
2593	chr7 118965072 118965122 chr7 118970079 118970129
2594	TUPAC_0001:3:1:0:1452#0 37 + \-
2595	chr11 46765606 46765656 chr11 46769934 46769984
2596	TUPAC_0001:3:1:0:1472#0 37 + \-
2597	chr20 54704674 54704724 chr20 54708987 54709037
2598	TUPAC_0001:3:1:1:1833#0 37 +
2599	.ft P
2600	.fi
2601	.UNINDENT
2602	.UNINDENT
2603	.sp
2604	One can easily use samtools and bamToBed together as part of a UNIX pipe. In this example, we will
2605	only convert properly\-paired (BAM flag == 0x2) reads to BED format.
2606	.INDENT 0.0
2607	.INDENT 3.5
2608	.sp
2609	.nf
2610	.ft C
2611	samtools view \-bf 0x2 reads.bam \| bamToBed \-i stdin \| head
2612	chr7 118970079 118970129 TUPAC_0001:3:1:0:1452#0/1 37 \-
2613	chr7 118965072 118965122 TUPAC_0001:3:1:0:1452#0/2 37 +
2614	chr11 46769934 46769984 TUPAC_0001:3:1:0:1472#0/1 37 \-
2615	chr11 46765606 46765656 TUPAC_0001:3:1:0:1472#0/2 37 +
2616	chr20 54704674 54704724 TUPAC_0001:3:1:1:1833#0/1 37 +
2617	chr20 54708987 54709037 TUPAC_0001:3:1:1:1833#0/2 37 \-
2618	chrX 9380413 9380463 TUPAC_0001:3:1:1:285#0/1 0 \-
2619	chrX 9375861 9375911 TUPAC_0001:3:1:1:285#0/2 0 +
2620	chrX 131756978 131757028 TUPAC_0001:3:1:2:523#0/1 37 +
2621	chrX 131761790 131761840 TUPAC_0001:3:1:2:523#0/2 37 \-
2622	.ft P
2623	.fi
2624	.UNINDENT
2625	.UNINDENT
2626	.SS 5.4.2 (\-split)Creating BED12 features from "spliced" BAM entries.
2627	.sp
2628	bamToBed will, by default, create a BED6 feature that represents the entire span of a spliced/split
2629	BAM alignment. However, when using the \fB\-split\fP command, a BED12 feature is reported where BED
2630	blocks will be created for each aligned portion of the sequencing read.
2631	.INDENT 0.0
2632	.INDENT 3.5
2633	.sp
2634	.nf
2635	.ft C
2636	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2637
2638	Exons ************* ********
2639
2640	BED/BAM A ^^^^^^^^^^^^....................................^^^^
2641
2642	Result =============== ====
2643	.ft P
2644	.fi
2645	.UNINDENT
2646	.UNINDENT
2647	.SS 5.5 windowBed
2648	.sp
2649	Similar to \fBintersectBed\fP, \fBwindowBed\fP searches for overlapping features in A and B. However,
2650	\fBwindowBed\fP adds a specified number (1000, by default) of base pairs upstream and downstream of
2651	each feature in A. In effect, this allows features in B that are "near" features in A to be detected.
2652	.SS 5.5.1 Usage and option summary
2653	.sp
2654	\fBUsage:\fP
2655	.INDENT 0.0
2656	.INDENT 3.5
2657	.sp
2658	.nf
2659	.ft C
2660	windowBed [OPTIONS] \-a <BED/GFF/VCF> \-b <BED/GFF/VCF>
2661	.ft P
2662	.fi
2663	.UNINDENT
2664	.UNINDENT
2665	.TS
2666	center;
2667	\|l\|l\|.
2668	_
2669	T{
2670	Option
2671	T} T{
2672	Description
2673	T}
2674	_
2675	T{
2676	\fB\-abam\fP
2677	T} T{
2678	BAM file A. Each BAM alignment in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe: For example: samtools view \-b <BAM> \| windowBed \-abam stdin \-b genes.bed
2679	T}
2680	_
2681	T{
2682	\fB\-ubam\fP
2683	T} T{
2684	Write uncompressed BAM output. The default is write compressed BAM output.
2685	T}
2686	_
2687	T{
2688	\fB\-bed\fP
2689	T} T{
2690	When using BAM input (\-abam), write output as BED. The default is to write output in BAM when using \-abam. For example: windowBed \-abam reads.bam \-b genes.bed \-bed
2691	T}
2692	_
2693	T{
2694	\fB\-w\fP
2695	T} T{
2696	Base pairs added upstream and downstream of each entry in A when searching for overlaps in B. \fIDefault is 1000 bp\fP\&.
2697	T}
2698	_
2699	T{
2700	\fB\-l\fP
2701	T} T{
2702	Base pairs added upstream (left of) of each entry in A when searching for overlaps in B. \fIAllows one to create assymetrical "windows". Default is 1000bp\fP\&.
2703	T}
2704	_
2705	T{
2706	\fB\-r\fP
2707	T} T{
2708	Base pairs added downstream (right of) of each entry in A when searching for overlaps in B. \fIAllows one to create assymetrical "windows". Default is 1000bp\fP\&.
2709	T}
2710	_
2711	T{
2712	\fB\-sw\fP
2713	T} T{
2714	Define \-l and \-r based on strand. For example if used, \-l 500 for a negative\-stranded feature will add 500 bp downstream. \fIBy default, this is disabled\fP\&.
2715	T}
2716	_
2717	T{
2718	\fB\-sm\fP
2719	T} T{
2720	Only report hits in B that overlap A on the same strand. \fIBy default, overlaps are reported without respect to strand\fP\&.
2721	T}
2722	_
2723	T{
2724	\fB\-u\fP
2725	T} T{
2726	Write original A entry once if any overlaps found in B. In other words, just report the fact at least one overlap was found in B.
2727	T}
2728	_
2729	T{
2730	\fB\-c\fP
2731	T} T{
2732	For each entry in A, report the number of hits in B while restricting to \-f. Reports 0 for A entries that have no overlap with B.
2733	T}
2734	_
2735	.TE
2736	.SS 5.5.2 Default behavior
2737	.sp
2738	By default, \fBwindowBed\fP adds 1000 bp upstream and downstream of each A feature and searches for
2739	features in B that overlap this "window". If an overlap is found in B, both the \fIoriginal\fP A feature and the
2740	\fIoriginal\fP B feature are reported. For example, in the figure below, feature B1 would be found, but B2
2741	would not.
2742	.INDENT 0.0
2743	.INDENT 3.5
2744	.sp
2745	.nf
2746	.ft C
2747	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2748	"window" = 10
2749	BED File A <\-\-\-\-\-\-\-\-\-\-*************\-\-\-\-\-\-\-\-\-\->
2750
2751	BED File B ^^^^^^^^ ^^^^^^
2752
2753	Result ========
2754	.ft P
2755	.fi
2756	.UNINDENT
2757	.UNINDENT
2758	.sp
2759	For example:
2760	.INDENT 0.0
2761	.INDENT 3.5
2762	.sp
2763	.nf
2764	.ft C
2765	cat A.bed
2766	chr1 100 200
2767
2768	cat B.bed
2769	chr1 500 1000
2770	chr1 1300 2000
2771
2772	windowBed \-a A.bed \-b B.bed
2773	chr1 100 200 chr1 500 1000
2774	.ft P
2775	.fi
2776	.UNINDENT
2777	.UNINDENT
2778	.SS 5.5.3 (\-w)Defining a custom window size
2779	.sp
2780	Instead of using the default window size of 1000bp, one can define a custom, \fIsymmetric\fP window around
2781	each feature in A using the \fB\-w\fP option. One should specify the window size in base pairs. For example,
2782	a window of 5kb should be defined as \fB\-w 5000\fP\&.
2783	.sp
2784	For example (note that in contrast to the default behavior, the second B entry is reported):
2785	.INDENT 0.0
2786	.INDENT 3.5
2787	.sp
2788	.nf
2789	.ft C
2790	cat A.bed
2791	chr1 100 200
2792
2793	cat B.bed
2794	chr1 500 1000
2795	chr1 1300 2000
2796
2797	windowBed \-a A.bed \-b B.bed \-w 5000
2798	chr1 100 200 chr1 500 1000
2799	chr1 100 200 chr1 1300 2000
2800	.ft P
2801	.fi
2802	.UNINDENT
2803	.UNINDENT
2804	.SS 5.5.4 (\-l and \-r)Defining assymteric windows
2805	.sp
2806	One can also define asymmetric windows where a differing number of bases are added upstream and
2807	downstream of each feature using the \fB\-l (upstream)\fP and \fB\-r (downstream)\fP options.
2808	.sp
2809	For example (note the difference between \-l 200 and \-l 300):
2810	.INDENT 0.0
2811	.INDENT 3.5
2812	.sp
2813	.nf
2814	.ft C
2815	cat A.bed
2816	chr1 1000 2000
2817
2818	cat B.bed
2819	chr1 500 800
2820	chr1 10000 20000
2821
2822	windowBed \-a A.bed \-b B.bed \-l 200 \-r 20000
2823	chr1 100 200 chr1 10000 20000
2824
2825	windowBed \-a A.bed \-b B.bed \-l 300 \-r 20000
2826	chr1 100 200 chr1 500 800
2827	chr1 100 200 chr1 10000 20000
2828	.ft P
2829	.fi
2830	.UNINDENT
2831	.UNINDENT
2832	.SS 5.5.5 (\-sw)Defining assymteric windows based on strand
2833	.sp
2834	Especially when dealing with gene annotations or RNA\-seq experiments, you may want to define
2835	asymmetric windows based on "strand". For example, you may want to screen for overlaps that occur
2836	within 5000 bp upstream of a gene (e.g. a promoter region) while screening only 1000 bp downstream of
2837	the gene. By enabling the \fB\-sw\fP ("stranded" windows) option, the windows are added upstream or
2838	downstream according to strand. For example, imagine one specifies \fB\-l 5000 \-r 1000\fP as well as the \fB\-
2839	sw\fP option. In this case, forward stranded ("+") features will screen 5000 bp to the \fIleft\fP (that is, \fIlower\fP
2840	genomic coordinates) and 1000 bp to the \fIright\fP (that is, \fIhigher\fP genomic coordinates). By contrast,
2841	reverse stranded ("\-") features will screen 5000 bp to the \fIright\fP (that is, \fIhigher\fP genomic coordinates) and
2842	1000 bp to the \fIleft\fP (that is, \fIlower\fP genomic coordinates).
2843	.sp
2844	For example (note the difference between \-l 200 and \-l 300):
2845	.INDENT 0.0
2846	.INDENT 3.5
2847	.sp
2848	.nf
2849	.ft C
2850	cat A.bed
2851	chr1 10000 20000 A.forward 1 +
2852	chr1 10000 20000 A.reverse 1 \-
2853
2854	cat B.bed
2855	chr1 1000 8000 B1
2856	chr1 24000 32000 B2
2857
2858	windowBed \-a A.bed \-b B.bed \-l 5000 \-r 1000 \-sw
2859	chr1 10000 20000 A.forward 1 + chr1 1000 8000 B1
2860	chr1 10000 20000 A.reverse 1 \- chr1 24000 32000 B2
2861	.ft P
2862	.fi
2863	.UNINDENT
2864	.UNINDENT
2865	.SS 5.5.6 (\-sm)Enforcing "strandedness"
2866	.sp
2867	This option behaves the same as the \-s option for intersectBed while scanning for overlaps within the
2868	"window" surrounding A. See the discussion in the intersectBed section for details.
2869	.SS 5.5.7 (\-u)Reporting the presence of at least one overlapping feature
2870	.sp
2871	This option behaves the same as for intersectBed while scanning for overlaps within the "window"
2872	surrounding A. See the discussion in the intersectBed section for details.
2873	.SS 5.5.8 (\-c)Reporting the number of overlapping features
2874	.sp
2875	This option behaves the same as for intersectBed while scanning for overlaps within the "window"
2876	surrounding A. See the discussion in the intersectBed section for details.
2877	.SS 5.5.9 (\-v)Reporting the absence of any overlapping features
2878	.sp
2879	This option behaves the same as for intersectBed while scanning for overlaps within the "window"
2880	surrounding A. See the discussion in the intersectBed section for details.
2881	.SS 5.6 closestBed
2882	.sp
2883	Similar to \fBintersectBed, closestBed\fP searches for overlapping features in A and B. In the event that
2884	no feature in B overlaps the current feature in A, \fBclosestBed\fP will report the \fIclosest\fP (that is, least
2885	genomic distance from the start or end of A) feature in B. For example, one might want to find which
2886	is the closest gene to a significant GWAS polymorphism. Note that \fBclosestBed\fP will report an
2887	overlapping feature as the closest\-\-\-that is, it does not restrict to closest \fInon\-overlapping\fP feature.
2888	.SS 5.6.1 Usage and option summary
2889	.sp
2890	\fBUsage:\fP
2891	.INDENT 0.0
2892	.INDENT 3.5
2893	.sp
2894	.nf
2895	.ft C
2896	closestBed [OPTIONS] \-a <BED/GFF/VCF> \-b <BED/GFF/VCF>
2897	.ft P
2898	.fi
2899	.UNINDENT
2900	.UNINDENT
2901	.TS
2902	center;
2903	\|l\|l\|.
2904	_
2905	T{
2906	Option
2907	T} T{
2908	Description
2909	T}
2910	_
2911	T{
2912	\fB\-s\fP
2913	T} T{
2914	Force strandedness. That is, find the closest feature in B overlaps A on the same strand. \fIBy default, this is disabled\fP\&.
2915	T}
2916	_
2917	T{
2918	\fB\-d\fP
2919	T} T{
2920	In addition to the closest feature in B, report its distance to A as an extra column. The reported distance for overlapping features will be 0.
2921	T}
2922	_
2923	T{
2924	\fB\-t\fP
2925	T} T{
2926	How ties for closest feature should be handled. This occurs when two features in B have exactly the same overlap with a feature in A. \fIBy default, all such features in B are reported\fP\&.
2927	.INDENT 0.0
2928	.INDENT 3.5
2929	Here are the other choices controlling how ties are handled:
2930	.sp
2931	\fIall\-\fP Report all ties (default).
2932	.sp
2933	\fIfirst\-\fP Report the first tie that occurred in the B file.
2934	.sp
2935	\fIlast\-\fP Report the last tie that occurred in the B file.
2936	.UNINDENT
2937	.UNINDENT
2938	T}
2939	_
2940	.TE
2941	.SS 5.6.2 Default behavior
2942	.sp
2943	\fBclosestBed\fP first searches for features in B that overlap a feature in A. If overlaps are found, the feature
2944	in B that overlaps the highest fraction of A is reported. If no overlaps are found, \fBclosestBed\fP looks for
2945	the feature in B that is \fIclosest\fP (that is, least genomic distance to the start or end of A) to A. For
2946	example, in the figure below, feature B1 would be reported as the closest feature to A1.
2947	.INDENT 0.0
2948	.INDENT 3.5
2949	.sp
2950	.nf
2951	.ft C
2952	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2953
2954	BED FILE A *************
2955
2956	BED File B ^^^^^^^^ ^^^^^^
2957
2958	Result ======
2959	.ft P
2960	.fi
2961	.UNINDENT
2962	.UNINDENT
2963	.sp
2964	For example:
2965	.INDENT 0.0
2966	.INDENT 3.5
2967	.sp
2968	.nf
2969	.ft C
2970	cat A.bed
2971	chr1 100 200
2972
2973	cat B.bed
2974	chr1 500 1000
2975	chr1 1300 2000
2976
2977	closestBed \-a A.bed \-b B.bed
2978	chr1 100 200 chr1 500 1000
2979	.ft P
2980	.fi
2981	.UNINDENT
2982	.UNINDENT
2983	.SS 5.6.3 (\-s)Enforcing "strandedness"
2984	.sp
2985	This option behaves the same as the \-s option for intersectBed while scanning for the closest
2986	(overlapping or not) feature in B. See the discussion in the intersectBed section for details.
2987	.SS 5.6.4 (\-t)Controlling how ties for "closest" are broken
2988	.sp
2989	When there are two or more features in B that overlap the \fIsame fraction\fP of A, \fBclosestBed\fP will, by
2990	default, report both features in B. Imagine feature A is a SNP and file B contains genes. It can often
2991	occur that two gene annotations (e.g. opposite strands) in B will overlap the SNP. As mentioned, the
2992	default behavior is to report both such genes in B. However, the \-t option allows one to optionally
2993	choose the just first or last feature (in terms of where it occurred in the input file, not chromosome
2994	position) that occurred in B.
2995	.sp
2996	For example (note the difference between \-l 200 and \-l 300):
2997	.INDENT 0.0
2998	.INDENT 3.5
2999	.sp
3000	.nf
3001	.ft C
3002	cat A.bed
3003	chr1 100 101 rs1234
3004
3005	cat B.bed
3006	chr1 0 1000 geneA 100 +
3007	chr1 0 1000 geneB 100 \-
3008
3009	closestBed \-a A.bed \-b B.bed
3010	chr1 100 101 rs1234 chr1 0 1000 geneA 100 +
3011	chr1 100 101 rs1234 chr1 0 1000 geneB 100 \-
3012
3013	closestBed \-a A.bed \-b B.bed \-t all
3014	chr1 100 101 rs1234 chr1 0 1000 geneA 100 +
3015	chr1 100 101 rs1234 chr1 0 1000 geneB 100 \-
3016
3017	closestBed \-a A.bed \-b B.bed \-t first
3018	chr1 100 101 rs1234 chr1 0 1000 geneA 100 +
3019
3020	closestBed \-a A.bed \-b B.bed \-t last
3021	chr1 100 101 rs1234 chr1 0 1000 geneB 100 \-
3022	.ft P
3023	.fi
3024	.UNINDENT
3025	.UNINDENT
3026	.SS 5.6.5 (\-d)Reporting the distance to the closest feature in base pairs
3027	.sp
3028	ClosestBed will optionally report the distance to the closest feature in the B file using the \fB\-d\fP option.
3029	When a feature in B overlaps a feature in A, a distance of 0 is reported.
3030	.INDENT 0.0
3031	.INDENT 3.5
3032	.sp
3033	.nf
3034	.ft C
3035	cat A.bed
3036	chr1 100 200
3037	chr1 500 600
3038
3039	cat B.bed
3040	chr1 500 1000
3041	chr1 1300 2000
3042
3043	closestBed \-a A.bed \-b B.bed \-d
3044	chr1 100 200 chr1 500 1000 300
3045	chr1 500 600 chr1 500 1000 0
3046	.ft P
3047	.fi
3048	.UNINDENT
3049	.UNINDENT
3050	.SS 5.7 subtractBed
3051	.sp
3052	\fBsubtractBed\fP searches for features in B that overlap A. If an overlapping feature is found in B, the
3053	overlapping portion is removed from A and the remaining portion of A is reported. If a feature in B
3054	overlaps all of a feature in A, the A feature will not be reported.
3055	.SS 5.7.1 Usage and option summary
3056	.sp
3057	Usage:
3058	.INDENT 0.0
3059	.INDENT 3.5
3060	.sp
3061	.nf
3062	.ft C
3063	subtractBed [OPTIONS] \-a <BED/GFF/VCF> \-b <BED/GFF/VCF>
3064	.ft P
3065	.fi
3066	.UNINDENT
3067	.UNINDENT
3068	.TS
3069	center;
3070	\|l\|l\|.
3071	_
3072	T{
3073	Option
3074	T} T{
3075	Description
3076	T}
3077	_
3078	T{
3079	\fB\-f\fP
3080	T} T{
3081	Minimum overlap required as a fraction of A. Default is 1E\-9 (i.e. 1bp).
3082	T}
3083	_
3084	T{
3085	\fB\-s\fP
3086	T} T{
3087	Force strandedness. That is, find the closest feature in B overlaps A on the same strand. \fIBy default, this is disabled\fP\&.
3088	T}
3089	_
3090	.TE
3091	.SS 5.7.2 Default behavior
3092	.sp
3093	Figure:
3094	.INDENT 0.0
3095	.INDENT 3.5
3096	.sp
3097	.nf
3098	.ft C
3099	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3100
3101	BED FILE A *********** ****
3102
3103	BED File B ^^^^^^^^ ^^^^^^^^^^^
3104
3105	Result =========
3106	.ft P
3107	.fi
3108	.UNINDENT
3109	.UNINDENT
3110	.sp
3111	For example:
3112	.INDENT 0.0
3113	.INDENT 3.5
3114	.sp
3115	.nf
3116	.ft C
3117	cat A.bed
3118	chr1 100 200
3119	chr1 10 20
3120
3121	cat B.bed
3122	chr1 0 30
3123	chr1 180 300
3124
3125	subtractBed \-a A.bed \-b B.bed
3126	chr1 100 180
3127	.ft P
3128	.fi
3129	.UNINDENT
3130	.UNINDENT
3131	.SS 5.7.3 (\-f)Requiring a minimal overlap fraction before subtracting
3132	.sp
3133	This option behaves the same as the \-f option for intersectBed. In this case, subtractBed will only
3134	subtract an overlap with B if it covers at least the fraction of A defined by \-f. If an overlap is found,
3135	but it does not meet the overlap fraction, the original A feature is reported without subtraction.
3136	.sp
3137	For example:
3138	.INDENT 0.0
3139	.INDENT 3.5
3140	.sp
3141	.nf
3142	.ft C
3143	cat A.bed
3144	chr1 100 200
3145
3146	cat B.bed
3147	chr1 180 300
3148
3149	subtractBed \-a A.bed \-b B.bed \-f 0.10
3150	chr1 100 180
3151
3152	subtractBed \-a A.bed \-b B.bed \-f 0.80
3153	chr1 100 200
3154	.ft P
3155	.fi
3156	.UNINDENT
3157	.UNINDENT
3158	.SS 5.7.4 (\-s)Enforcing "strandedness"
3159	.sp
3160	This option behaves the same as the \-s option for intersectBed while scanning for features in B that
3161	should be subtracted from A. See the discussion in the intersectBed section for details.
3162	.SS 5.8 mergeBed
3163	.sp
3164	\fBmergeBed\fP combines overlapping or "book\-ended" (that is, one base pair away) features in a feature file
3165	into a single feature which spans all of the combined features.
3166	.SS 5.8.1 Usage and option summary
3167	.sp
3168	Usage:
3169	.INDENT 0.0
3170	.INDENT 3.5
3171	.sp
3172	.nf
3173	.ft C
3174	mergeBed [OPTIONS] \-i <BED/GFF/VCF>
3175	.ft P
3176	.fi
3177	.UNINDENT
3178	.UNINDENT
3179	.TS
3180	center;
3181	\|l\|l\|.
3182	_
3183	T{
3184	Option
3185	T} T{
3186	Description
3187	T}
3188	_
3189	T{
3190	\fB\-s\fP
3191	T} T{
3192	Force strandedness. That is, only merge features that are the same strand. \fIBy default, this is disabled\fP\&.
3193	T}
3194	_
3195	T{
3196	\fB\-n\fP
3197	T} T{
3198	Report the number of BED entries that were merged. \fI1 is reported if no merging occurred\fP\&.
3199	T}
3200	_
3201	T{
3202	\fB\-d\fP
3203	T} T{
3204	Maximum distance between features allowed for features to be merged. \fIDefault is 0. That is, overlapping and/or book\-ended features are merged\fP\&.
3205	T}
3206	_
3207	T{
3208	\fB\-nms\fP
3209	T} T{
3210	Report the names of the merged features separated by semicolons.
3211	T}
3212	_
3213	.TE
3214	.SS 5.8.2 Default behavior
3215	.sp
3216	Figure:
3217	.INDENT 0.0
3218	.INDENT 3.5
3219	.sp
3220	.nf
3221	.ft C
3222	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3223
3224	BED FILE *********** *********** ********************
3225	********
3226
3227	Result =============================== ======================
3228	.ft P
3229	.fi
3230	.UNINDENT
3231	.UNINDENT
3232	.sp
3233	For example:
3234	.INDENT 0.0
3235	.INDENT 3.5
3236	.sp
3237	.nf
3238	.ft C
3239	cat A.bed
3240	chr1 100 200
3241	chr1 180 250
3242	chr1 250 500
3243	chr1 501 1000
3244
3245	mergeBed \-i A.bed
3246	chr1 100 500
3247	chr1 501 1000
3248	.ft P
3249	.fi
3250	.UNINDENT
3251	.UNINDENT
3252	.SS 5.8.3 (\-s)Enforcing "strandedness"
3253	.sp
3254	This option behaves the same as the \-s option for intersectBed while scanning for features that should
3255	be merged. Only features on the same strand will be merged. See the discussion in the intersectBed
3256	section for details.
3257	.SS 5.8.4 (\-n)Reporting the number of features that were merged
3258	.sp
3259	The \-n option will report the number of features that were combined from the original file in order to
3260	make the newly merged feature. If a feature in the original file was not merged with any other features,
3261	a "1" is reported.
3262	.sp
3263	For example:
3264	.INDENT 0.0
3265	.INDENT 3.5
3266	.sp
3267	.nf
3268	.ft C
3269	cat A.bed
3270	chr1 100 200
3271	chr1 180 250
3272	chr1 250 500
3273	chr1 501 1000
3274
3275	mergeBed \-i A.bed \-n
3276	chr1 100 500 3
3277	chr1 501 1000 1
3278	.ft P
3279	.fi
3280	.UNINDENT
3281	.UNINDENT
3282	.SS 5.8.5 (\-d)Controlling how close two features must be in order to merge
3283	.sp
3284	By default, only overlapping or book\-ended features are combined into a new feature. However, one can
3285	force mergeBed to combine more distant features with the \-d option. For example, were one to set \-d to
3286	1000, any features that overlap or are within 1000 base pairs of one another will be combined.
3287	.sp
3288	For example:
3289	.INDENT 0.0
3290	.INDENT 3.5
3291	.sp
3292	.nf
3293	.ft C
3294	cat A.bed
3295	chr1 100 200
3296	chr1 501 1000
3297
3298	mergeBed \-i A.bed
3299	chr1 100 200
3300	chr1 501 1000
3301
3302	mergeBed \-i A.bed \-d 1000
3303	chr1 100 200 1000
3304	.ft P
3305	.fi
3306	.UNINDENT
3307	.UNINDENT
3308	.SS 5.8.6 (\-nms)Reporting the names of the features that were merged
3309	.sp
3310	Occasionally, one might like to know that names of the features that were merged into a new feature.
3311	The \-nms option will add an extra column to the mergeBed output which lists (separated by
3312	semicolons) the names of the merged features.
3313	.sp
3314	For example:
3315	.INDENT 0.0
3316	.INDENT 3.5
3317	.sp
3318	.nf
3319	.ft C
3320	cat A.bed
3321	chr1 100 200 A1
3322	chr1 150 300 A2
3323	chr1 250 500 A3
3324
3325	mergeBed \-i A.bed \-nms
3326	chr1 100 500 A1;A2;A3
3327	.ft P
3328	.fi
3329	.UNINDENT
3330	.UNINDENT
3331	.SS 5.9 coverageBed
3332	.sp
3333	\fBcoverageBed\fP computes both the \fIdepth\fP and \fIbreadth\fP of coverage of features in file A across the features
3334	in file B. For example, \fBcoverageBed\fP can compute the coverage of sequence alignments (file A) across 1
3335	kilobase (arbitrary) windows (file B) tiling a genome of interest. One advantage that \fBcoverageBed\fP
3336	offers is that it not only \fIcounts\fP the number of features that overlap an interval in file B, it also
3337	computes the fraction of bases in B interval that were overlapped by one or more features. Thus,
3338	\fBcoverageBed\fP also computes the \fIbreadth\fP of coverage for each interval in B.
3339	.SS 5.9.1 Usage and option summary
3340	.sp
3341	Usage:
3342	.INDENT 0.0
3343	.INDENT 3.5
3344	.sp
3345	.nf
3346	.ft C
3347	coverageBed [OPTIONS] \-a <BED/GFF/VCF> \-b <BED/GFF/VCF>
3348	.ft P
3349	.fi
3350	.UNINDENT
3351	.UNINDENT
3352	.TS
3353	center;
3354	\|l\|l\|.
3355	_
3356	T{
3357	Option
3358	T} T{
3359	Description
3360	T}
3361	_
3362	T{
3363	\fB\-abam\fP
3364	T} T{
3365	.INDENT 0.0
3366	.INDENT 3.5
3367	BAM file A. Each BAM alignment in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe: For example:
3368	.UNINDENT
3369	.UNINDENT
3370	.nf
3371	samtools view \-b <BAM> \| intersectBed \-abam stdin \-b genes.bed
3372	.fi
3373	T}
3374	_
3375	T{
3376	\fB\-s\fP
3377	T} T{
3378	Force strandedness. That is, only features in A are only counted towards coverage in B if they are the same strand. \fIBy default, this is disabled and coverage is counted without respect to strand\fP\&.
3379	T}
3380	_
3381	T{
3382	\fB\-hist\fP
3383	T} T{
3384	Report a histogram of coverage for each feature in B as well as a summary histogram for _all_ features in B.
3385	.nf
3386	Output (tab delimited) after each feature in B:
3387	.fi
3388	.sp
3389	.INDENT 0.0
3390	.INDENT 3.5
3391	.nf
3392	1) depth
3393	2) # bases at depth
3394	3) size of B
3395	4) % of B at depth
3396	.fi
3397	.sp
3398	.UNINDENT
3399	.UNINDENT
3400	T}
3401	_
3402	T{
3403	\fB\-d\fP
3404	T} T{
3405	Report the depth at each position in each B feature. Positions reported are one based. Each position and depth follow the complete B feature.
3406	T}
3407	_
3408	T{
3409	\fB\-split\fP
3410	T} T{
3411	Treat "split" BAM or BED12 entries as distinct BED intervals when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12).
3412	T}
3413	_
3414	.TE
3415	.SS 5.9.2 Default behavior
3416	.sp
3417	After each interval in B, \fBcoverageBed\fP will report:
3418	.INDENT 0.0
3419	.IP 1. 3
3420	The number of features in A that overlapped (by at least one base pair) the B interval.
3421	.IP 2. 3
3422	The number of bases in B that had non\-zero coverage from features in A.
3423	.IP 3. 3
3424	The length of the entry in B.
3425	.IP 4. 3
3426	The fraction of bases in B that had non\-zero coverage from features in A.
3427	.UNINDENT
3428	.sp
3429	Below are the number of features in A (N=...) overlapping B and fraction of bases in B with coverage.
3430	.INDENT 0.0
3431	.INDENT 3.5
3432	.sp
3433	.nf
3434	.ft C
3435	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3436
3437	BED FILE B ************* *********** ** ************
3438
3439	BED File A ^^^^ ^^^^ ^^ ^^^^^^^^^ ^^^ ^^ ^^^^
3440	^^^^^^^^ ^^^^^ ^^^^^ ^^
3441
3442	Result [ N=3, 10/15 ] [ N=1, 2/16 ] [N=1,6/6] [N=5, 11/12 ]
3443	.ft P
3444	.fi
3445	.UNINDENT
3446	.UNINDENT
3447	.sp
3448	For example:
3449	.INDENT 0.0
3450	.INDENT 3.5
3451	.sp
3452	.nf
3453	.ft C
3454	cat A.bed
3455	chr1 10 20
3456	chr1 20 30
3457	chr1 30 40
3458	chr1 100 200
3459
3460	cat B.bed
3461	chr1 0 100
3462	chr1 100 200
3463	chr2 0 100
3464
3465	coverageBed \-a A.bed \-b B.bed
3466	chr1 0 100 3 30 100 0.3000000
3467	chr1 100 200 1 100 100 1.0000000
3468	chr2 0 100 0 0 100 0.0000000
3469	.ft P
3470	.fi
3471	.UNINDENT
3472	.UNINDENT
3473	.SS 5.9.4 (\-s)Calculating coverage by strand
3474	.sp
3475	Use the "\fB\-s\fP" option if one wants to only count coverage if features in A are on the same strand as the
3476	feature / window in B. This is especially useful for RNA\-seq experiments.
3477	.sp
3478	For example (note the difference in coverage with and without \fB\-s\fP:
3479	.INDENT 0.0
3480	.INDENT 3.5
3481	.sp
3482	.nf
3483	.ft C
3484	cat A.bed
3485	chr1 10 20 a1 1 \-
3486	chr1 20 30 a2 1 \-
3487	chr1 30 40 a3 1 \-
3488	chr1 100 200 a4 1 +
3489
3490	cat B.bed
3491	chr1 0 100 b1 1 +
3492	chr1 100 200 b2 1 \-
3493	chr2 0 100 b3 1 +
3494
3495	coverageBed \-a A.bed \-b B.bed
3496	chr1 0 100 b1 1 + 3 30 100 0.3000000
3497	chr1 100 200 b2 1 \- 1 100 100 1.0000000
3498	chr2 0 100 b3 1 + 0 0 100 0.0000000
3499
3500	coverageBed \-a A.bed \-b B.bed \-s
3501	chr1 0 100 b1 1 + 0 0 100 0.0000000
3502	chr1 100 200 b2 1 \- 0 0 100 0.0000000
3503	chr2 0 100 b3 1 + 0 0 100 0.0000000
3504	.ft P
3505	.fi
3506	.UNINDENT
3507	.UNINDENT
3508	.SS 5.9.5 (\-hist)Creating a histogram of coverage for each feature in the B file
3509	.sp
3510	One should use the "\fB\-hist\fP" option to create, for each interval in B, a histogram of coverage of the
3511	features in A across B.
3512	.sp
3513	In this case, each entire feature in B will be reported, followed by the depth of coverage, the number of
3514	bases at that depth, the size of the feature, and the fraction covered. After all of the features in B have
3515	been reported, a histogram summarizing the coverage among all features in B will be reported.
3516	.INDENT 0.0
3517	.INDENT 3.5
3518	.sp
3519	.nf
3520	.ft C
3521	cat A.bed
3522	chr1 10 20 a1 1 \-
3523	chr1 20 30 a2 1 \-
3524	chr1 30 40 a3 1 \-
3525	chr1 100 200 a4 1 +
3526
3527	cat B.bed
3528	chr1 0 100 b1 1 +
3529	chr1 100 200 b2 1 \-
3530	chr2 0 100 b3 1 +
3531
3532	coverageBed \-a A.bed \-b B.bed \-hist
3533	chr1 0 100 b1 1 + 0 70 100 0.7000000
3534	chr1 0 100 b1 1 + 1 30 100 0.3000000
3535	chr1 100 200 b2 1 \- 1 100 100 1.0000000
3536	chr2 0 100 b3 1 + 0 100 100 1.0000000
3537	all 0 170 300 0.5666667
3538	all 1 130 300 0.4333333
3539	.ft P
3540	.fi
3541	.UNINDENT
3542	.UNINDENT
3543	.SS 5.9.6 (\-hist)Reporting the per\-base of coverage for each feature in the B file
3544	.sp
3545	One should use the "\fB\-d\fP" option to create, for each interval in B, a detailed list of coverage at each of the
3546	positions across each B interval.
3547	.sp
3548	The output will consist of a line for each one\-based position in each B feature, followed by the coverage
3549	detected at that position.
3550	.INDENT 0.0
3551	.INDENT 3.5
3552	.sp
3553	.nf
3554	.ft C
3555	cat A.bed
3556	chr1 0 5
3557	chr1 3 8
3558	chr1 4 8
3559	chr1 5 9
3560
3561	cat B.bed
3562	chr1 0 10
3563
3564	coverageBed \-a A.bed \-b B.bed \-d
3565	chr1 0 10 B 1 1
3566	chr1 0 10 B 2 1
3567	chr1 0 10 B 3 1
3568	chr1 0 10 B 4 2
3569	chr1 0 10 B 5 3
3570	chr1 0 10 B 6 3
3571	chr1 0 10 B 7 3
3572	chr1 0 10 B 8 3
3573	chr1 0 10 B 9 1
3574	chr1 0 10 B 10 0
3575	.ft P
3576	.fi
3577	.UNINDENT
3578	.UNINDENT
3579	.SS 5.9.7 (\-split)Reporting coverage with spliced alignments or blocked BED features
3580	.sp
3581	As described in section 1.3.19, coverageBed will, by default, screen for overlaps against the entire span
3582	of a spliced/split BAM alignment or blocked BED12 feature. When dealing with RNA\-seq reads, for
3583	example, one typically wants to only tabulate coverage for the portions of the reads that come from
3584	exons (and ignore the interstitial intron sequence). The \fB\-split\fP command allows for such coverage to be
3585	performed.
3586	.SS 5.10 genomeCoverageBed
3587	.sp
3588	\fBgenomeCoverageBed\fP computes a histogram of feature coverage (e.g., aligned sequences) for a given
3589	genome. Optionally, by using the \fB\-d\fP option, it will report the depth of coverage at \fIeach base\fP on each
3590	chromosome in the genome file (\fB\-g\fP).
3591	.SS 5.10.1 Usage and option summary
3592	.sp
3593	Usage:
3594	.INDENT 0.0
3595	.INDENT 3.5
3596	.sp
3597	.nf
3598	.ft C
3599	genomeCoverageBed [OPTIONS] \-i <BED> \-g <GENOME>
3600	.ft P
3601	.fi
3602	.UNINDENT
3603	.UNINDENT
3604	.sp
3605	NOTE: genomeCoverageBed requires that the input BED file be sorted by
3606	chromosome. A simple sort \-k1,1 will suffice.
3607	.TS
3608	center;
3609	\|l\|l\|.
3610	_
3611	T{
3612	Option
3613	T} T{
3614	Description
3615	T}
3616	_
3617	T{
3618	\fB\-ibam\fP
3619	T} T{
3620	.INDENT 0.0
3621	.INDENT 3.5
3622	BAM file as input for coverage. Each BAM alignment in A added to the total coverage for the genome. Use "stdin" if passing it with a UNIX pipe: For example:
3623	.UNINDENT
3624	.UNINDENT
3625	.nf
3626	samtools view \-b <BAM> \| genomeCoverageBed \-ibam stdin \-g hg18.genome
3627	.fi
3628	T}
3629	_
3630	T{
3631	\fB\-d\fP
3632	T} T{
3633	Report the depth at each genome position. \fIDefault behavior is to report a histogram\fP\&.
3634	T}
3635	_
3636	T{
3637	\fB\-max\fP
3638	T} T{
3639	Combine all positions with a depth >= max into a single bin in the histogram.
3640	T}
3641	_
3642	T{
3643	\fB\-bg\fP
3644	T} T{
3645	Report depth in BedGraph format. For details, see: \fI\%http://genome.ucsc.edu/goldenPath/help/bedgraph.html\fP
3646	T}
3647	_
3648	T{
3649	\fB\-bga\fP
3650	T} T{
3651	Report depth in BedGraph format, as above (i.e., \-bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep \-w 0$" to the output.
3652	T}
3653	_
3654	T{
3655	\fB\-split\fP
3656	T} T{
3657	Treat "split" BAM or BED12 entries as distinct BED intervals when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12).
3658	T}
3659	_
3660	T{
3661	\fB\-strand\fP
3662	T} T{
3663	Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6).
3664	T}
3665	_
3666	.TE
3667	.SS 5.10.2 Default behavior
3668	.sp
3669	By default, \fBgenomeCoverageBed\fP will compute a histogram of coverage for the genome file provided.
3670	The default output format is as follows:
3671	1. chromosome (or entire genome)
3672	2. depth of coverage from features in input file
3673	3. number of bases on chromosome (or genome) with depth equal to column 2.
3674	4. size of chromosome (or entire genome) in base pairs
3675	5. fraction of bases on chromosome (or entire genome) with depth equal to column 2.
3676	.sp
3677	For example:
3678	.INDENT 0.0
3679	.INDENT 3.5
3680	.sp
3681	.nf
3682	.ft C
3683	cat A.bed
3684	chr1 10 20
3685	chr1 20 30
3686	chr2 0 500
3687
3688	cat my.genome
3689	chr1 1000
3690	chr2 500
3691
3692	genomeCoverageBed \-i A.bed \-g my.genome
3693	chr1 0 980 1000 0.98
3694	chr1 1 20 1000 0.02
3695	chr2 1 500 500 1
3696	genome 0 980 1500 0.653333
3697	genome 1 520 1500 0.346667
3698	.ft P
3699	.fi
3700	.UNINDENT
3701	.UNINDENT
3702	.SS 5.10.3 (\-max)Controlling the histogram\(aqs maximum depth
3703	.sp
3704	Using the \fB\-max\fP option, \fBgenomeCoverageBed\fP will "lump" all positions in the genome having feature
3705	coverage greather than or equal to \fBmax\fP into the \fBmax\fP histogram bin. For example, if one sets \fB\-max\fP
3706	equal to 50, the max depth reported in the output will be 50 and all positions with a depth >= 50 will
3707	be represented in bin 50.
3708	.SS 5.10.4 (\-d)Reporting "per\-base" genome coverage
3709	.sp
3710	Using the \fB\-d\fP option, \fBgenomeCoverageBed\fP will compute the depth of feature coverage for each base
3711	on each chromosome in genome file provided.
3712	.sp
3713	The "per\-base" output format is as follows:
3714	1. chromosome
3715	2. chromosome position
3716	3. depth (number) of features overlapping this chromosome position.
3717	.sp
3718	For example:
3719	.INDENT 0.0
3720	.INDENT 3.5
3721	.sp
3722	.nf
3723	.ft C
3724	cat A.bed
3725	chr1 10 20
3726	chr1 20 30
3727	chr2 0 500
3728
3729	cat my.genome
3730	chr1 1000
3731	chr2 500
3732
3733	genomeCoverageBed \-i A.bed \-g my.genome \-d \| head \-15 \| tail \-n 10
3734	chr1 6 0
3735	chr1 7 0
3736	chr1 8 0
3737	chr1 9 0
3738	chr1 10 0
3739	chr1 11 1
3740	chr1 12 1
3741	chr1 13 1
3742	chr1 14 1
3743	chr1 15 1
3744	.ft P
3745	.fi
3746	.UNINDENT
3747	.UNINDENT
3748	.SS 5.1.13 (\-split)Reporting coverage with spliced alignments or blocked BED features
3749	.sp
3750	As described in section 1.3.19, genomeCoverageBed will, by default, screen for overlaps against the
3751	entire span of a spliced/split BAM alignment or blocked BED12 feature. When dealing with RNA\-seq
3752	reads, for example, one typically wants to only screen for overlaps for the portions of the reads that
3753	come from exons (and ignore the interstitial intron sequence). The \fB\-split\fP command allows for such
3754	overlaps to be performed.
3755	.sp
3756	For additional details, please visit the Usage From The Wild site and have a look at example 5,
3757	contributed by Assaf Gordon.
3758	.SS 5.11 fastaFromBed
3759	.sp
3760	\fBfastaFromBed\fP extracts sequences from a FASTA file for each of the intervals defined in a BED file.
3761	The headers in the input FASTA file must exactly match the chromosome column in the BED file.
3762	.SS 5.11.1 Usage and option summary
3763	.sp
3764	Usage:
3765	.INDENT 0.0
3766	.INDENT 3.5
3767	.sp
3768	.nf
3769	.ft C
3770	fastaFromBed [OPTIONS] \-fi <input FASTA> \-bed <BED/GFF/VCF> \-fo <output FASTA>
3771	.ft P
3772	.fi
3773	.UNINDENT
3774	.UNINDENT
3775	.TS
3776	center;
3777	\|l\|l\|.
3778	_
3779	T{
3780	Option
3781	T} T{
3782	Description
3783	T}
3784	_
3785	T{
3786	\fB\-name\fP
3787	T} T{
3788	Use the "name" column in the BED file for the FASTA headers in the output FASTA file.
3789	T}
3790	_
3791	T{
3792	\fB\-tab\fP
3793	T} T{
3794	Report extract sequences in a tab\-delimited format instead of in FASTA format.
3795	T}
3796	_
3797	T{
3798	\fB\-s\fP
3799	T} T{
3800	Force strandedness. If the feature occupies the antisense strand, the sequence will be reverse complemented. \fIDefault: strand information is ignored\fP\&.
3801	T}
3802	_
3803	.TE
3804	.SS 5.11.2 Default behavior
3805	.sp
3806	\fBfastaFromBed\fP will extract the sequence defined by the coordinates in a BED interval and create a
3807	new FASTA entry in the output file for each extracted sequence. By default, the FASTA header for each
3808	extracted sequence will be formatted as follows: "<chrom>:<start>\-<end>".
3809	.sp
3810	For example:
3811	.INDENT 0.0
3812	.INDENT 3.5
3813	.sp
3814	.nf
3815	.ft C
3816	$ cat test.fa
3817	>chr1
3818	AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
3819
3820	cat test.bed
3821	chr1 5 10
3822
3823	fastaFromBed \-fi test.fa \-bed test.bed \-fo test.fa.out
3824
3825	cat test.fa.out
3826	>chr1:5\-10
3827	AAACC
3828	.ft P
3829	.fi
3830	.UNINDENT
3831	.UNINDENT
3832	.SS 5.11.3 Using the BED "name" column as a FASTA header.
3833	.sp
3834	Using the \fB\-name\fP option, one can set the FASTA header for each extracted sequence to be the "name"
3835	columns from the BED feature.
3836	.sp
3837	For example:
3838	.INDENT 0.0
3839	.INDENT 3.5
3840	.sp
3841	.nf
3842	.ft C
3843	cat test.fa
3844	>chr1
3845	AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
3846
3847	cat test.bed
3848	chr1 5 10 myseq
3849
3850	fastaFromBed \-fi test.fa \-bed test.bed \-fo test.fa.out \-name
3851
3852	cat test.fa.out
3853	>myseq
3854	AAACC
3855	.ft P
3856	.fi
3857	.UNINDENT
3858	.UNINDENT
3859	.SS 5.11.4 Creating a tab\-delimited output file in lieu of FASTA output.
3860	.sp
3861	Using the \fB\-tab\fP option, the \fB\-fo\fP output file will be tab\-delimited instead of in FASTA format.
3862	.sp
3863	For example:
3864	.INDENT 0.0
3865	.INDENT 3.5
3866	.sp
3867	.nf
3868	.ft C
3869	cat test.fa
3870	>chr1
3871	AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
3872
3873	cat test.bed
3874	chr1 5 10 myseq
3875
3876	fastaFromBed \-fi test.fa \-bed test.bed \-fo test.fa.out.tab \-name \-tab
3877
3878	cat test.fa.out
3879	myseq AAACC
3880	.ft P
3881	.fi
3882	.UNINDENT
3883	.UNINDENT
3884	.SS 5.11.5 (\-s)Forcing the extracted sequence to reflect the requested strand
3885	.sp
3886	\fBfastaFromBed\fP will extract the sequence in the orientation defined in the strand column when the "\-s"
3887	option is used.
3888	.sp
3889	For example:
3890	.INDENT 0.0
3891	.INDENT 3.5
3892	.sp
3893	.nf
3894	.ft C
3895	cat test.fa
3896	>chr1
3897	AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
3898
3899	cat test.bed
3900	chr1 20 25 forward 1 +
3901	chr1 20 25 reverse 1 \-
3902
3903	fastaFromBed \-fi test.fa \-bed test.bed \-s \-name \-fo test.fa.out
3904
3905	cat test.fa.out
3906	>forward
3907	CGCTA
3908	>reverse
3909	TAGCG
3910	.ft P
3911	.fi
3912	.UNINDENT
3913	.UNINDENT
3914	.SS 5.12 maskFastaFromBed
3915	.sp
3916	\fBmaskFastaFromBed\fP masks sequences in a FASTA file based on intervals defined in a feature file. The
3917	headers in the input FASTA file must exactly match the chromosome column in the feature file. This
3918	may be useful fro creating your own masked genome file based on custom annotations or for masking all
3919	but your target regions when aligning sequence data from a targeted capture experiment.
3920	.SS 5.12.1 Usage and option summary
3921	.sp
3922	Usage:
3923	.INDENT 0.0
3924	.INDENT 3.5
3925	.sp
3926	.nf
3927	.ft C
3928	maskFastaFromBed [OPTIONS] \-fi <input FASTA> \-bed <BED/GFF/VCF> \-fo <output FASTA>
3929	.ft P
3930	.fi
3931	.UNINDENT
3932	.UNINDENT
3933	.sp
3934	NOTE: The input and output FASTA files must be different.
3935	.TS
3936	center;
3937	\|l\|l\|.
3938	_
3939	T{
3940	Option
3941	T} T{
3942	Description
3943	T}
3944	_
3945	T{
3946	\fB\-soft\fP
3947	T} T{
3948	Soft\-mask (that is, convert to lower\-case bases) the FASTA sequence. \fIBy default, hard\-masking (that is, conversion to Ns) is performed\fP\&.
3949	T}
3950	_
3951	.TE
3952	.SS 5.12.2 Default behavior
3953	.sp
3954	\fBmaskFastaFromBed\fP will mask a FASTA file based on the intervals in a BED file. The newly masked
3955	FASTA file is written to the output FASTA file.
3956	.sp
3957	For example:
3958	.INDENT 0.0
3959	.INDENT 3.5
3960	.sp
3961	.nf
3962	.ft C
3963	cat test.fa
3964	>chr1
3965	AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
3966
3967	cat test.bed
3968	chr1 5 10
3969
3970	maskFastaFromBed \-fi test.fa \-bed test.bed \-fo test.fa.out
3971
3972	cat test.fa.out
3973	>chr1
3974	AAAAANNNNNCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
3975	.ft P
3976	.fi
3977	.UNINDENT
3978	.UNINDENT
3979	.SS 5.12.3 Soft\-masking the FASTA file.
3980	.sp
3981	Using the \fB\-soft\fP option, one can optionally "soft\-mask" the FASTA file.
3982	.sp
3983	For example:
3984	.INDENT 0.0
3985	.INDENT 3.5
3986	.sp
3987	.nf
3988	.ft C
3989	cat test.fa
3990	>chr1
3991	AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
3992
3993	cat test.bed
3994	chr1 5 10
3995
3996	maskFastaFromBed \-fi test.fa \-bed test.bed \-fo test.fa.out \-soft
3997
3998	cat test.fa.out
3999	>chr1
4000	AAAAAaaaccCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
4001	.ft P
4002	.fi
4003	.UNINDENT
4004	.UNINDENT
4005	.SS 5.13 shuffleBed
4006	.sp
4007	\fBshuffleBed\fP will randomly permute the genomic locations of a fearure file among a genome defined in a
4008	genome file. One can also provide an "exclusions" BED/GFF/VCF file that lists regions where you do
4009	not want the permuted features to be placed. For example, one might want to prevent features from
4010	being placed in known genome gaps. \fBshuffleBed\fP is useful as a \fInull\fP basis against which to test the
4011	significance of associations of one feature with another.
4012	.SS 5.13.1 Usage and option summary
4013	.sp
4014	Usage:
4015	.INDENT 0.0
4016	.INDENT 3.5
4017	.sp
4018	.nf
4019	.ft C
4020	shuffleBed [OPTIONS] \-i <BED/GFF/VCF> \-g <GENOME>
4021	.ft P
4022	.fi
4023	.UNINDENT
4024	.UNINDENT
4025	.TS
4026	center;
4027	\|l\|l\|.
4028	_
4029	T{
4030	Option
4031	T} T{
4032	Description
4033	T}
4034	_
4035	T{
4036	\fB\-excl\fP
4037	T} T{
4038	A BED file of coordinates in which features from \-i should \fInot\fP be placed (e.g., genome gaps).
4039	T}
4040	_
4041	T{
4042	\fB\-chrom\fP
4043	T} T{
4044	Keep features in \-i on the same chromosome. Solely permute their location on the chromosome. \fIBy default, both the chromosome and position are randomly chosen\fP\&.
4045	T}
4046	_
4047	T{
4048	\fB\-seed\fP
4049	T} T{
4050	Supply an integer seed for the shuffling. This will allow feature shuffling experiments to be recreated exactly as the seed for the pseudo\-random number generation will be constant. \fIBy default, the seed is chosen automatically\fP\&.
4051	T}
4052	_
4053	.TE
4054	.SS 5.13.2 Default behavior
4055	.sp
4056	By default, \fBshuffleBed\fP will reposition each feature in the input BED file on a random chromosome at a
4057	random position. The size and strand of each feature are preserved.
4058	.sp
4059	For example:
4060	.INDENT 0.0
4061	.INDENT 3.5
4062	.sp
4063	.nf
4064	.ft C
4065	cat A.bed
4066	chr1 0 100 a1 1 +
4067	chr1 0 1000 a2 2 \-
4068
4069	cat my.genome
4070	chr1 10000
4071	chr2 8000
4072	chr3 5000
4073	chr4 2000
4074
4075	shuffleBed \-i A.bed \-g my.genome
4076	chr4 1498 1598 a1 1 +
4077	chr3 2156 3156 a2 2 \-
4078	.ft P
4079	.fi
4080	.UNINDENT
4081	.UNINDENT
4082	.SS 5.13.3 (\-chrom)Requiring that features be shuffled on the same chromosome
4083	.sp
4084	The "\fB\-chrom\fP" option behaves the same as the default behavior except that features are randomly
4085	placed on the same chromosome as defined in the BED file.
4086	.sp
4087	For example:
4088	.INDENT 0.0
4089	.INDENT 3.5
4090	.sp
4091	.nf
4092	.ft C
4093	cat A.bed
4094	chr1 0 100 a1 1 +
4095	chr1 0 1000 a2 2 \-
4096
4097	cat my.genome
4098	chr1 10000
4099	chr2 8000
4100	chr3 5000
4101	chr4 2000
4102
4103	shuffleBed \-i A.bed \-g my.genome \-chrom
4104	chr1 9560 9660 a1 1 +
4105	chr1 7258 8258 a2 2 \-
4106	.ft P
4107	.fi
4108	.UNINDENT
4109	.UNINDENT
4110	.SS 5.13.4 Excluding certain genome regions from shuffleBed
4111	.sp
4112	One may want to prevent BED features from being placed in certain regions of the genome. For
4113	example, one may want to exclude genome gaps from permutation experiment. The "\fB\-excl\fP" option
4114	defines a BED file of regions that should be excluded. \fBshuffleBed\fP will attempt to permute the
4115	locations of all features while adhering to the exclusion rules. However it will stop looking for an
4116	appropriate location if it cannot find a valid spot for a feature after 1,000,000 tries.
4117	.sp
4118	For example (\fInote that the exclude file excludes all but 100 base pairs of the chromosome\fP):
4119	.INDENT 0.0
4120	.INDENT 3.5
4121	.sp
4122	.nf
4123	.ft C
4124	cat A.bed
4125	chr1 0 100 a1 1 +
4126	chr1 0 1000 a2 2 \-
4127
4128	cat my.genome
4129	chr1 10000
4130
4131	cat exclude.bed
4132	chr1 100 10000
4133
4134	shuffleBed \-i A.bed \-g my.genome \-excl exclude.bed
4135	chr1 0 100 a1 1 +
4136	Error, line 2: tried 1000000 potential loci for entry, but could not avoid excluded
4137	regions. Ignoring entry and moving on.
4138	.ft P
4139	.fi
4140	.UNINDENT
4141	.UNINDENT
4142	.sp
4143	For example (\fInow the exclusion file only excludes the first 100 bases of the chromosome\fP):
4144	.INDENT 0.0
4145	.INDENT 3.5
4146	.sp
4147	.nf
4148	.ft C
4149	cat A.bed
4150	chr1 0 100 a1 1 +
4151	chr1 0 1000 a2 2 \-
4152
4153	cat my.genome
4154	chr1 10000
4155
4156	cat exclude.bed
4157	chr1 0 100
4158
4159	shuffleBed \-i A.bed \-g my.genome \-excl exclude.bed
4160	chr1 147 247 a1 1 +
4161	chr1 2441 3441 a2 2 \-
4162	.ft P
4163	.fi
4164	.UNINDENT
4165	.UNINDENT
4166	.SS 5.13.5 Defining a "seed" for the random replacement.
4167	.sp
4168	\fBshuffleBed\fP uses a pseudo\-random number generator to permute the locations of BED features.
4169	Therefore, each run should produce a different result. This can be problematic if one wants to exactly
4170	recreate an experiment. By using the "\fB\-seed\fP" option, one can supply a custom integer seed for
4171	\fBshuffleBed\fP\&. In turn, each execution of \fBshuffleBed\fP with the same seed and input files should produce
4172	identical results.
4173	.sp
4174	For example (\fInote that the exclude file below excludes all but 100 base pairs of the chromosome\fP):
4175	.INDENT 0.0
4176	.INDENT 3.5
4177	.sp
4178	.nf
4179	.ft C
4180	cat A.bed
4181	chr1 0 100 a1 1 +
4182	chr1 0 1000 a2 2 \-
4183
4184	cat my.genome
4185	chr1 10000
4186
4187	shuffleBed \-i A.bed \-g my.genome \-seed 927442958
4188	chr1 6177 6277 a1 1 +
4189	chr1 8119 9119 a2 2 \-
4190
4191	shuffleBed \-i A.bed \-g my.genome \-seed 927442958
4192	chr1 6177 6277 a1 1 +
4193	chr1 8119 9119 a2 2 \-
4194
4195	\&. . .
4196
4197	shuffleBed \-i A.bed \-g my.genome \-seed 927442958
4198	chr1 6177 6277 a1 1 +
4199	chr1 8119 9119 a2 2 \-
4200	.ft P
4201	.fi
4202	.UNINDENT
4203	.UNINDENT
4204	.SS 5.14 slopBed
4205	.sp
4206	\fBslopBed\fP will increase the size of each feature in a feature file be a user\-defined number of bases. While
4207	something like this could be done with an "\fBawk \(aq{OFS="t" print $1,$2\-<slop>,$3+<slop>}\(aq\fP",
4208	\fBslopBed\fP will restrict the resizing to the size of the chromosome (i.e. no start < 0 and no end >
4209	chromosome size).
4210	.SS 5.14.1 Usage and option summary
4211	.sp
4212	Usage:
4213	.INDENT 0.0
4214	.INDENT 3.5
4215	.sp
4216	.nf
4217	.ft C
4218	slopBed [OPTIONS] \-i <BED/GFF/VCF> \-g <GENOME> [\-b or (\-l and \-r)]
4219	.ft P
4220	.fi
4221	.UNINDENT
4222	.UNINDENT
4223	.TS
4224	center;
4225	\|l\|l\|.
4226	_
4227	T{
4228	Option
4229	T} T{
4230	Description
4231	T}
4232	_
4233	T{
4234	\fB\-b\fP
4235	T} T{
4236	Increase the BED/GFF/VCF entry by the same number base pairs in each direction. \fIInteger\fP\&.
4237	T}
4238	_
4239	T{
4240	\fB\-l\fP
4241	T} T{
4242	The number of base pairs to subtract from the start coordinate. \fIInteger\fP\&.
4243	T}
4244	_
4245	T{
4246	\fB\-r\fP
4247	T} T{
4248	The number of base pairs to add to the end coordinate. \fIInteger\fP\&.
4249	T}
4250	_
4251	T{
4252	\fB\-s\fP
4253	T} T{
4254	Define \-l and \-r based on strand. For example. if used, \-l 500 for a negative\-stranded feature, it will add 500 bp to the \fIend\fP coordinate.
4255	T}
4256	_
4257	.TE
4258	.SS 5.14.2 Default behavior
4259	.sp
4260	By default, \fBslopBed\fP will either add a fixed number of bases in each direction (\fB\-b\fP) or an asymmetric
4261	number of bases in each direction (\fB\-l\fP and \fB\-r\fP).
4262	.sp
4263	For example:
4264	.INDENT 0.0
4265	.INDENT 3.5
4266	.sp
4267	.nf
4268	.ft C
4269	cat A.bed
4270	chr1 5 100
4271	chr1 800 980
4272
4273	cat my.genome
4274	chr1 1000
4275
4276	slopBed \-i A.bed \-g my.genome \-b 5
4277	chr1 0 105
4278	chr1 795 985
4279
4280	slopBed \-i A.bed \-g my.genome \-l 2 \-r 3
4281	chr1 3 103
4282	chr1 798 983
4283	.ft P
4284	.fi
4285	.UNINDENT
4286	.UNINDENT
4287	.sp
4288	However, if the requested number of bases exceeds the boundaries of the chromosome, \fBslopBed\fP will
4289	"clip" the feature accordingly.
4290	.INDENT 0.0
4291	.INDENT 3.5
4292	.sp
4293	.nf
4294	.ft C
4295	cat A.bed
4296	chr1 5 100
4297	chr1 800 980
4298
4299	cat my.genome
4300	chr1 1000
4301
4302	slopBed \-i A.bed \-g my.genome \-b 5000
4303	chr1 0 1000
4304	chr1 0 1000
4305	.ft P
4306	.fi
4307	.UNINDENT
4308	.UNINDENT
4309	.SS 5.14.3 Resizing features according to strand
4310	.sp
4311	\fBslopBed\fP will optionally increase the size of a feature based on strand.
4312	.sp
4313	For example:
4314	.INDENT 0.0
4315	.INDENT 3.5
4316	.sp
4317	.nf
4318	.ft C
4319	cat A.bed
4320	chr1 100 200 a1 1 +
4321	chr1 100 200 a2 2 \-
4322
4323	cat my.genome
4324	chr1 1000
4325
4326	slopBed \-i A.bed \-g my.genome \-l 50 \-r 80 \-s
4327	chr1 50 280 a1 1 +
4328	chr1 20 250 a2 2 \-
4329	.ft P
4330	.fi
4331	.UNINDENT
4332	.UNINDENT
4333	.SS 5.15 sortBed
4334	.sp
4335	\fBsortBed\fP sorts a feature file by chromosome and other criteria.
4336	.SS 5.15.1 Usage and option summary
4337	.sp
4338	Usage:
4339	.INDENT 0.0
4340	.INDENT 3.5
4341	.sp
4342	.nf
4343	.ft C
4344	sortBed [OPTIONS] \-i <BED/GFF/VCF>
4345	.ft P
4346	.fi
4347	.UNINDENT
4348	.UNINDENT
4349	.TS
4350	center;
4351	\|l\|l\|.
4352	_
4353	T{
4354	Option
4355	T} T{
4356	Description
4357	T}
4358	_
4359	T{
4360	\fB\-sizeA\fP
4361	T} T{
4362	Sort by feature size in ascending order.
4363	T}
4364	_
4365	T{
4366	\fB\-sizeD\fP
4367	T} T{
4368	Sort by feature size in descending order.
4369	T}
4370	_
4371	T{
4372	\fB\-chrThenSizeA\fP
4373	T} T{
4374	Sort by chromosome, then by feature size (asc).
4375	T}
4376	_
4377	T{
4378	\fB\-chrThenSizeD\fP
4379	T} T{
4380	Sort by chromosome, then by feature size (desc).
4381	T}
4382	_
4383	T{
4384	\fB\-chrThenScoreA\fP
4385	T} T{
4386	Sort by chromosome, then by score (asc).
4387	T}
4388	_
4389	T{
4390	\fB\-chrThenScoreD\fP
4391	T} T{
4392	Sort by chromosome, then by score (desc).
4393	T}
4394	_
4395	.TE
4396	.SS 5.15.2 Default behavior
4397	.sp
4398	By default, \fBsortBed\fP sorts a BED file by chromosome and then by start position in ascending order.
4399	.sp
4400	For example:
4401	.INDENT 0.0
4402	.INDENT 3.5
4403	.sp
4404	.nf
4405	.ft C
4406	cat A.bed
4407	chr1 800 1000
4408	chr1 80 180
4409	chr1 1 10
4410	chr1 750 10000
4411
4412	sortBed \-i A.bed
4413	chr1 1 10
4414	chr1 80 180
4415	chr1 750 10000
4416	chr1 800 1000
4417	.ft P
4418	.fi
4419	.UNINDENT
4420	.UNINDENT
4421	.SS 5.15.3 Optional sorting behavior
4422	.sp
4423	\fBsortBed\fP will also sorts a BED file by chromosome and then by other criteria.
4424	.sp
4425	For example, to sort by chromosome and then by feature size (in descending order):
4426	.INDENT 0.0
4427	.INDENT 3.5
4428	.sp
4429	.nf
4430	.ft C
4431	cat A.bed
4432	chr1 800 1000
4433	chr1 80 180
4434	chr1 1 10
4435	chr1 750 10000
4436
4437	sortBed \-i A.bed \-sizeD
4438	chr1 750 10000
4439	chr1 800 1000
4440	chr1 80 180
4441	chr1 1 10
4442	.ft P
4443	.fi
4444	.UNINDENT
4445	.UNINDENT
4446	.sp
4447	\fBDisclaimer:\fP it should be noted that \fBsortBed\fP is merely a convenience utility, as the UNIX sort utility
4448	will sort BED files more quickly while using less memory. For example, UNIX sort will sort a BED file
4449	by chromosome then by start position in the following manner:
4450	.INDENT 0.0
4451	.INDENT 3.5
4452	.sp
4453	.nf
4454	.ft C
4455	sort \-k 1,1 \-k2,2 \-n a.bed
4456	chr1 1 10
4457	chr1 80 180
4458	chr1 750 10000
4459	chr1 800 1000
4460	.ft P
4461	.fi
4462	.UNINDENT
4463	.UNINDENT
4464	.SS 5.16 linksBed
4465	.sp
4466	Creates an HTML file with links to an instance of the UCSC Genome Browser for all features /
4467	intervals in a file. This is useful for cases when one wants to manually inspect through a large set of
4468	annotations or features.
4469	.SS 5.16.1 Usage and option summary
4470	.sp
4471	Usage:
4472	.INDENT 0.0
4473	.INDENT 3.5
4474	.sp
4475	.nf
4476	.ft C
4477	linksBed [OPTIONS] \-i <BED/GFF/VCF> > <HTML file>
4478	.ft P
4479	.fi
4480	.UNINDENT
4481	.UNINDENT
4482	.TS
4483	center;
4484	\|l\|l\|.
4485	_
4486	T{
4487	Option
4488	T} T{
4489	Description
4490	T}
4491	_
4492	T{
4493	\fB\-base\fP
4494	T} T{
4495	The "basename" for the UCSC browser. \fIDefault: http://genome.ucsc.edu\fP
4496	T}
4497	_
4498	T{
4499	\fB\-org\fP
4500	T} T{
4501	The organism (e.g. mouse, human). \fIDefault: human\fP
4502	T}
4503	_
4504	T{
4505	\fB\-db\fP
4506	T} T{
4507	The genome build. \fIDefault: hg18\fP
4508	T}
4509	_
4510	.TE
4511	.SS 5.16.2 Default behavior
4512	.sp
4513	By default, \fBlinksBed\fP creates links to the public UCSC Genome Browser.
4514	.sp
4515	For example:
4516	.INDENT 0.0
4517	.INDENT 3.5
4518	.sp
4519	.nf
4520	.ft C
4521	head genes.bed
4522	chr21 9928613 10012791 uc002yip.1 0 \-
4523	chr21 9928613 10012791 uc002yiq.1 0 \-
4524	chr21 9928613 10012791 uc002yir.1 0 \-
4525	chr21 9928613 10012791 uc010gkv.1 0 \-
4526	chr21 9928613 10061300 uc002yis.1 0 \-
4527	chr21 10042683 10120796 uc002yit.1 0 \-
4528	chr21 10042683 10120808 uc002yiu.1 0 \-
4529	chr21 10079666 10120808 uc002yiv.1 0 \-
4530	chr21 10080031 10081687 uc002yiw.1 0 \-
4531	chr21 10081660 10120796 uc002yix.2 0 \-
4532
4533	linksBed \-i genes.bed > genes.html
4534	.ft P
4535	.fi
4536	.UNINDENT
4537	.UNINDENT
4538	.sp
4539	When genes.html is opened in a web browser, one should see something like the following, where each
4540	link on the page is built from the features in genes.bed:
4541	.SS 5.16.3 Creating HTML links to a local UCSC Browser installation
4542	.sp
4543	Optionally, \fBlinksBed\fP will create links to a local copy of the UCSC Genome Browser.
4544	.sp
4545	For example:
4546	.INDENT 0.0
4547	.INDENT 3.5
4548	.sp
4549	.nf
4550	.ft C
4551	head \-3 genes.bed
4552	chr21 9928613 10012791 uc002yip.1 0 \-
4553	chr21 9928613 10012791 uc002yiq.1 0 \-
4554
4555	linksBed \-i genes.bed \-base http://mirror.uni.edu > genes.html
4556	.ft P
4557	.fi
4558	.UNINDENT
4559	.UNINDENT
4560	.sp
4561	One can point the links to the appropriate organism and genome build as well:
4562	.INDENT 0.0
4563	.INDENT 3.5
4564	.sp
4565	.nf
4566	.ft C
4567	head \-3 genes.bed
4568	chr21 9928613 10012791 uc002yip.1 0 \-
4569	chr21 9928613 10012791 uc002yiq.1 0 \-
4570
4571	linksBed \-i genes.bed \-base http://mirror.uni.edu \-org mouse \-db mm9 > genes.html
4572	.ft P
4573	.fi
4574	.UNINDENT
4575	.UNINDENT
4576	.SS 5.17 complementBed
4577	.sp
4578	\fBcomplementBed\fP returns the intervals in a genome that are not by the features in a feature file. An
4579	example usage of this tool would be to return the intervals of the genome that are not annotated as a
4580	repeat.
4581	.SS 5.17.1 Usage and option summary
4582	.sp
4583	Usage:
4584	.INDENT 0.0
4585	.INDENT 3.5
4586	.sp
4587	.nf
4588	.ft C
4589	complementBed [OPTIONS] \-i <BED/GFF/VCF> \-g <GENOME>
4590	.ft P
4591	.fi
4592	.UNINDENT
4593	.UNINDENT
4594	.sp
4595	\fBNo additional options.\fP
4596	.SS 5.17.2 Default behavior
4597	.sp
4598	Figure:
4599	.INDENT 0.0
4600	.INDENT 3.5
4601	.sp
4602	.nf
4603	.ft C
4604	Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4605
4606	BED FILE A *********** *********** ****************
4607
4608	Result === === ===== =======
4609	.ft P
4610	.fi
4611	.UNINDENT
4612	.UNINDENT
4613	.sp
4614	For example:
4615	.INDENT 0.0
4616	.INDENT 3.5
4617	.sp
4618	.nf
4619	.ft C
4620	cat A.bed
4621	chr1 100 200
4622	chr1 400 500
4623	chr1 500 800
4624
4625	cat my.genome
4626	chr1 1000
4627
4628	complementBed \-i A.bed \-g my.genome
4629	chr1 0 100
4630	chr1 200 400
4631	chr1 800 1000
4632	.ft P
4633	.fi
4634	.UNINDENT
4635	.UNINDENT
4636	.SS 5.18 bedToBam
4637	.sp
4638	\fBbedToBam\fP converts features in a feature file to BAM format. This is useful as an efficient means of
4639	storing large genome annotations in a compact, indexed format for visualization purposes.
4640	.SS 5.18.1 Usage and option summary
4641	.sp
4642	Usage:
4643	.INDENT 0.0
4644	.INDENT 3.5
4645	.sp
4646	.nf
4647	.ft C
4648	bedToBam [OPTIONS] \-i <BED/GFF/VCF> \-g <GENOME> > <BAM>
4649	.ft P
4650	.fi
4651	.UNINDENT
4652	.UNINDENT
4653	.TS
4654	center;
4655	\|l\|l\|.
4656	_
4657	T{
4658	Option
4659	T} T{
4660	Description
4661	T}
4662	_
4663	T{
4664	\fB\-mapq\fP
4665	T} T{
4666	Set a mapping quality (SAM MAPQ field) value for all BED entries. \fIDefault: 255\fP
4667	T}
4668	_
4669	T{
4670	\fB\-ubam\fP
4671	T} T{
4672	Write uncompressed BAM output. The default is write compressed BAM output.
4673	T}
4674	_
4675	T{
4676	\fB\-bed12\fP
4677	T} T{
4678	Indicate that the input BED file is in BED12 (a.k.a "blocked" BED) format. In this case, bedToBam will convert blocked BED features (e.g., gene annotaions) into "spliced" BAM alignments by creating an appropriate CIGAR string.
4679	T}
4680	_
4681	.TE
4682	.SS 5.18.2 Default behavior
4683	.sp
4684	The default behavior is to assume that the input file is in unblocked format. For example:
4685	.INDENT 0.0
4686	.INDENT 3.5
4687	.sp
4688	.nf
4689	.ft C
4690	head \-5 rmsk.hg18.chr21.bed
4691	chr21 9719768 9721892 ALR/Alpha 1004 +
4692	chr21 9721905 9725582 ALR/Alpha 1010 +
4693	chr21 9725582 9725977 L1PA3 3288 +
4694	chr21 9726021 9729309 ALR/Alpha 1051 +
4695	chr21 9729320 9729809 L1PA3 3897 \-
4696
4697	bedToBam \-i rmsk.hg18.chr21.bed \-g human.hg18.genome > rmsk.hg18.chr21.bam
4698
4699	samtools view rmsk.hg18.chr21.bam \| head \-5
4700	ALR/Alpha 0 chr21 9719769 255 2124M * 0 0 * *
4701	ALR/Alpha 0 chr21 9721906 255 3677M * 0 0 * *
4702	L1PA3 0 chr21 9725583 255 395M * 0 0 * *
4703	ALR/Alpha 0 chr21 9726022 255 3288M * 0 0 * *
4704	L1PA3 16 chr21 9729321 255 489M * 0 0 * *
4705	.ft P
4706	.fi
4707	.UNINDENT
4708	.UNINDENT
4709	.SS 5.18.3 Creating "spliced" BAM entries from "blocked" BED features
4710	.sp
4711	Optionally, \fBbedToBam\fP will create spliced BAM entries from "blocked" BED features by using the
4712	\-bed12 option. This will create CIGAR strings in the BAM output that will be displayed as "spliced"
4713	alignments. The image illustrates this behavior, as the top track is a BAM representation (using
4714	bedToBam) of a BED file of UCSC genes.
4715	.sp
4716	For example:
4717	.INDENT 0.0
4718	.INDENT 3.5
4719	.sp
4720	.nf
4721	.ft C
4722	bedToBam \-i knownGene.hg18.chr21.bed \-g human.hg18.genome \-bed12 > knownGene.bam
4723
4724	samtools view knownGene.bam \| head \-2
4725	uc002yip.1 16 chr21 9928614 2 5 5
4726
4727	298M1784N71M1411N93M3963N80M1927N106M3608N81M1769N62M11856N89M98N82M816N61M6910N65M
4728	738N64M146N100M1647N120M6478N162M1485N51M6777N60M9274N54M880N54M1229N54M2377N54M112
4729	68N58M2666N109M2885N158M * 0 0 * *
4730	uc002yiq.1 16 chr21 9928614 2 5 5
4731
4732	298M1784N71M1411N93M3963N80M1927N106M3608N81M1769N62M11856N89M98N82M816N61M6910N65M
4733	738N64M146N100M1647N120M6478N162M1485N51M6777N60M10208N54M1229N54M2377N54M11268N58M
4734	2666N109M2885N158M * 0 0 * *
4735	.ft P
4736	.fi
4737	.UNINDENT
4738	.UNINDENT
4739	.SS 5.19 overlap
4740	.sp
4741	\fBoverlap\fP computes the amount of overlap (in the case of positive values) or distance (in the case of
4742	negative values) between feature coordinates occurring on the same input line and reports the result at
4743	the end of the same line. In this way, it is a useful method for computing custom overlap scores from
4744	the output of other BEDTools.
4745	.SS 5.19.1 Usage and option summary
4746	.sp
4747	Usage:
4748	.INDENT 0.0
4749	.INDENT 3.5
4750	.sp
4751	.nf
4752	.ft C
4753	overlap [OPTIONS] \-i <input> \-cols s1,e1,s2,e2
4754	.ft P
4755	.fi
4756	.UNINDENT
4757	.UNINDENT
4758	.TS
4759	center;
4760	\|l\|l\|.
4761	_
4762	T{
4763	Option
4764	T} T{
4765	Description
4766	T}
4767	_
4768	T{
4769	\fB\-i\fP
4770	T} T{
4771	Input file. Use "stdin" for pipes.
4772	T}
4773	_
4774	T{
4775	\fB\-cols\fP
4776	T} T{
4777	Specify the columns (1\-based) for the starts and ends of the features for which you\(aqd like to compute the overlap/distance. The columns must be listed in the following order: \fIstart1,end1,start2,end2\fP
4778	T}
4779	_
4780	.TE
4781	.SS 5.19.2 Default behavior
4782	.sp
4783	The default behavior is to compute the amount of overlap between the features you specify based on the
4784	start and end coordinates. For example:
4785	.INDENT 0.0
4786	.INDENT 3.5
4787	.sp
4788	.nf
4789	.ft C
4790	windowBed \-a A.bed \-b B.bed \-w 10
4791	chr1 10 20 A chr1 15 25 B
4792	chr1 10 20 C chr1 25 35 D
4793	.ft P
4794	.fi
4795	.UNINDENT
4796	.UNINDENT
4797	.sp
4798	# Now let\(aqs say we want to compute the number of base pairs of overlap
4799	# between the overlapping features from the output of windowBed.
4800	.INDENT 0.0
4801	.INDENT 3.5
4802	.sp
4803	.nf
4804	.ft C
4805	windowBed \-a A.bed \-b B.bed \-w 10 \| overlap \-i stdin \-cols 2,3,6,7
4806	chr1 10 20 A chr1 15 25 B 5
4807	chr1 10 20 C chr1 25 35 D \-5
4808	.ft P
4809	.fi
4810	.UNINDENT
4811	.UNINDENT
4812	.SS 5.20 bedToIgv
4813	.sp
4814	\fBbedToIgv\fP creates an IGV (\fI\%http://www.broadinstitute.org/igv/\fP) batch script (see: \fI\%http://\fP
4815	www.broadinstitute.org/igv/batch for details) such that a ??snapshot?? will be taken at each features in a
4816	feature file. This is useful as an efficient means for quickly collecting images of primary data at several
4817	loci for subsequent screening, etc.
4818	.sp
4819	\fBNOTE: One must use IGV version 1.5 or higher.\fP
4820	.SS 5.20.1 Usage and option summary
4821	.sp
4822	Usage:
4823	.INDENT 0.0
4824	.INDENT 3.5
4825	.sp
4826	.nf
4827	.ft C
4828	bedToIgv [OPTIONS] \-i <BED/GFF/VCF> > <igv.batch>
4829	.ft P
4830	.fi
4831	.UNINDENT
4832	.UNINDENT
4833	.TS
4834	center;
4835	\|l\|l\|.
4836	_
4837	T{
4838	Option
4839	T} T{
4840	Description
4841	T}
4842	_
4843	T{
4844	\fB\-path\fP
4845	T} T{
4846	The full path to which the IGV snapshots should be written. \fIDefault: ./\fP
4847	T}
4848	_
4849	T{
4850	\fB\-sess\fP
4851	T} T{
4852	The full path to an existing IGV session file to be loaded prior to taking snapshots. \fIDefault is for no session to be loaded and the assumption is that you already have IGV open and loaded with your relevant data prior to running the batch script\fP\&.
4853	T}
4854	_
4855	T{
4856	\fB\-sort\fP
4857	T} T{
4858	The type of BAM sorting you would like to apply to each image. \fBValid sorting options\fP: \fIbase, position, strand, quality, sample, and readGroup Default is to apply no sorting at all\fP\&.
4859	T}
4860	_
4861	T{
4862	\fB\-clps\fP
4863	T} T{
4864	Collapse the aligned reads prior to taking a snapshot. \fIDefault is to not collapse\fP\&.
4865	T}
4866	_
4867	T{
4868	\fB\-name\fP
4869	T} T{
4870	Use the "name" field (column 4) for each image\(aqs filename. \fIDefault is to use the "chr:start\-pos.ext"\fP\&.
4871	T}
4872	_
4873	T{
4874	\fB\-slop\fP
4875	T} T{
4876	Number of flanking base pairs on the left & right of the image.
4877	T}
4878	_
4879	T{
4880	\fB\-img\fP
4881	T} T{
4882	The type of image to be created. \fBValid options\fP: \fIpng, eps, svg Default is png\fP\&.
4883	T}
4884	_
4885	.TE
4886	.SS 5.20.2 Default behavior
4887	.sp
4888	Figure:
4889	.INDENT 0.0
4890	.INDENT 3.5
4891	.sp
4892	.nf
4893	.ft C
4894	bedToIgv \-i data/rmsk.hg18.chr21.bed \| head \-9
4895	snapshotDirectory ./
4896	goto chr21:9719768\-9721892
4897	snapshot chr21:9719768\-9721892.png
4898	goto chr21:9721905\-9725582
4899	snapshot chr21:9721905\-9725582.png
4900	goto chr21:9725582\-9725977
4901	snapshot chr21:9725582\-9725977.png
4902	goto chr21:9726021\-9729309
4903	snapshot chr21:9726021\-9729309.png
4904	.ft P
4905	.fi
4906	.UNINDENT
4907	.UNINDENT
4908	.SS 5.20.3 Using a bedToIgv batch script within IGV.
4909	.sp
4910	Once an IGV batch script has been created with \fBbedToIgv\fP, it is simply a matter of running it from
4911	within IGV.
4912	.sp
4913	For example, first create the batch script:
4914	.INDENT 0.0
4915	.INDENT 3.5
4916	.sp
4917	.nf
4918	.ft C
4919	bedToIgv \-i data/rmsk.hg18.chr21.bed > rmsk.igv.batch
4920	.ft P
4921	.fi
4922	.UNINDENT
4923	.UNINDENT
4924	.sp
4925	Then, open and launch the batch script from within IGV. This will immediately cause IGV to begin
4926	taking snapshots of your requested regions.
4927	.SS 5.21 bed12ToBed6
4928	.sp
4929	\fBbed12ToBed6\fP is a convenience tool that converts BED features in BED12 (a.k.a. "blocked" BED
4930	features such as genes) to discrete BED6 features. For example, in the case of a gene with six exons,
4931	bed12ToBed6 would create six separate BED6 features (i.e., one for each exon).
4932	.SS 5.21.1 Usage and option summary
4933	.sp
4934	Usage:
4935	.INDENT 0.0
4936	.INDENT 3.5
4937	.sp
4938	.nf
4939	.ft C
4940	bed12ToBed6 [OPTIONS] \-i <BED12>
4941	.ft P
4942	.fi
4943	.UNINDENT
4944	.UNINDENT
4945	.TS
4946	center;
4947	\|l\|l\|.
4948	_
4949	T{
4950	Option
4951	T} T{
4952	Description
4953	T}
4954	_
4955	T{
4956	\fB\-i\fP
4957	T} T{
4958	The BED12 file that should be split into discrete BED6 features. \fIUse "stdin" when using piped input\fP\&.
4959	T}
4960	_
4961	.TE
4962	.SS 5.21.2 Default behavior
4963	.sp
4964	Figure:
4965	.INDENT 0.0
4966	.INDENT 3.5
4967	.sp
4968	.nf
4969	.ft C
4970	head data/knownGene.hg18.chr21.bed \| tail \-n 3
4971	chr21 10079666 10120808 uc002yiv.1 0 \- 10081686 1 0 1 2 0 6 0 8
4972	0 4 528,91,101,215, 0,1930,39750,40927,
4973	chr21 10080031 10081687 uc002yiw.1 0 \- 10080031 1 0 0 8 0 0 3 1
4974	0 2 200,91, 0,1565,
4975	chr21 10081660 10120796 uc002yix.2 0 \- 10081660 1 0 0 8 1 6 6 0
4976	0 3 27,101,223,0,37756,38913,
4977
4978	head data/knownGene.hg18.chr21.bed \| tail \-n 3 \| bed12ToBed6 \-i stdin
4979	chr21 10079666 10080194 uc002yiv.1 0 \-
4980	chr21 10081596 10081687 uc002yiv.1 0 \-
4981	chr21 10119416 10119517 uc002yiv.1 0 \-
4982	chr21 10120593 10120808 uc002yiv.1 0 \-
4983	chr21 10080031 10080231 uc002yiw.1 0 \-
4984	chr21 10081596 10081687 uc002yiw.1 0 \-
4985	chr21 10081660 10081687 uc002yix.2 0 \-
4986	chr21 10119416 10119517 uc002yix.2 0 \-
4987	chr21 10120573 10120796 uc002yix.2 0 \-
4988	.ft P
4989	.fi
4990	.UNINDENT
4991	.UNINDENT
4992	.SS 5.22 groupBy
4993	.sp
4994	\fBgroupBy\fP is a useful tool that mimics the "groupBy" clause in database systems. Given a file or stream
4995	that is sorted by the appropriate "grouping columns", groupBy will compute summary statistics on
4996	another column in the file or stream. This will work with output from all BEDTools as well as any other
4997	tab\-delimited file or stream.
4998	.sp
4999	\fBNOTE: When using groupBy, the input data must be ordered by the same
5000	columns as specified with the \-grp argument. For example, if \-grp is 1,2,3, the the
5001	data should be pre\-grouped accordingly. When groupBy detects changes in the
5002	group columns it then summarizes all lines with that group\fP\&.
5003	.SS 5.22.1 Usage and option summary
5004	.sp
5005	Usage:
5006	.INDENT 0.0
5007	.INDENT 3.5
5008	.sp
5009	.nf
5010	.ft C
5011	groupBy [OPTIONS] \-i <input> \-opCol <input column>
5012	.ft P
5013	.fi
5014	.UNINDENT
5015	.UNINDENT
5016	.TS
5017	center;
5018	\|l\|l\|.
5019	_
5020	T{
5021	Option
5022	T} T{
5023	Description
5024	T}
5025	_
5026	T{
5027	\fB\-i\fP
5028	T} T{
5029	.INDENT 0.0
5030	.INDENT 3.5
5031	The input file that should be grouped and summarized. \fIUse "stdin" when using piped input\fP\&.
5032	.UNINDENT
5033	.UNINDENT
5034	.sp
5035	\fBNote: if \-i is omitted, input is assumed to come from standard input (stdin)\fP
5036	T}
5037	_
5038	T{
5039	\fB\-g OR \-grp\fP
5040	T} T{
5041	Specifies which column(s) (1\-based) should be used to group the input. The columns must be comma\-separated and each column must be explicitly listed. No ranges (e.g. 1\-4) yet allowed. \fIDefault: 1,2,3\fP
5042	T}
5043	_
5044	T{
5045	\fB\-c OR \-opCol\fP
5046	T} T{
5047	Specify the column (1\-based) that should be summarized. \fIRequired\fP\&.
5048	T}
5049	_
5050	T{
5051	\fB\-o OR \-op\fP
5052	T} T{
5053	Specify the operation that should be applied to \fBopCol\fP\&.
5054	.nf
5055	Valid operations:
5056	.fi
5057	.sp
5058	.INDENT 0.0
5059	.INDENT 3.5
5060	.nf
5061	\fBsum\fP \- \fInumeric only\fP
5062	\fBcount\fP \- \fInumeric or text\fP
5063	\fBmin\fP \- \fInumeric only\fP
5064	\fBmax\fP \- \fInumeric only\fP
5065	\fBmean\fP \- \fInumeric only\fP
5066	\fBstdev\fP \- \fInumeric only\fP
5067	\fBmedian\fP \- \fInumeric only\fP
5068	\fBmode\fP \- \fInumeric or text\fP
5069	\fBantimode\fP \- \fInumeric or text\fP
5070	\fBcollapse\fP (i.e., print a comma separated list) \- \fInumeric or text\fP
5071	\fBfreqasc\fP \- \fIprint a comma separated list of values observed and the number of times they were observed. Reported in ascending order of frequency\fP
5072	.fi
5073	.sp
5074	.UNINDENT
5075	.UNINDENT
5076	.nf
5077	\fBfreqdesc\fP \- \fIprint a comma separated list of values observed and the number of times they were observed. Reported in descending order of frequency\fP
5078	.fi
5079	.sp
5080	.INDENT 0.0
5081	.INDENT 3.5
5082	.nf
5083	\fIDefault: sum\fP
5084	.fi
5085	.sp
5086	.UNINDENT
5087	.UNINDENT
5088	T}
5089	_
5090	.TE
5091	.SS 5.22.2 Default behavior.
5092	.sp
5093	Let\(aqs imagine we have three incredibly interesting genetic variants that we are studying and we are
5094	interested in what annotated repeats these variants overlap.
5095	.INDENT 0.0
5096	.INDENT 3.5
5097	.sp
5098	.nf
5099	.ft C
5100	cat variants.bed
5101	chr21 9719758 9729320 variant1
5102	chr21 9729310 9757478 variant2
5103	chr21 9795588 9796685 variant3
5104
5105	intersectBed \-a variants.bed \-b repeats.bed \-wa \-wb > variantsToRepeats.bed
5106	cat variantsToRepeats.bed
5107	chr21 9719758 9729320 variant1 chr21 9719768 9721892 ALR/Alpha 1004 +
5108	chr21 9719758 9729320 variant1 chr21 9721905 9725582 ALR/Alpha 1010 +
5109	chr21 9719758 9729320 variant1 chr21 9725582 9725977 L1PA3 3288 +
5110	chr21 9719758 9729320 variant1 chr21 9726021 9729309 ALR/Alpha 1051 +
5111	chr21 9729310 9757478 variant2 chr21 9729320 9729809 L1PA3 3897 \-
5112	chr21 9729310 9757478 variant2 chr21 9729809 9730866 L1P1 8367 +
5113	chr21 9729310 9757478 variant2 chr21 9730866 9734026 ALR/Alpha 1036 \-
5114	chr21 9729310 9757478 variant2 chr21 9734037 9757471 ALR/Alpha 1182 \-
5115	chr21 9795588 9796685 variant3 chr21 9795589 9795713 (GAATG)n 308 +
5116	chr21 9795588 9796685 variant3 chr21 9795736 9795894 (GAATG)n 683 +
5117	chr21 9795588 9796685 variant3 chr21 9795911 9796007 (GAATG)n 345 +
5118	chr21 9795588 9796685 variant3 chr21 9796028 9796187 (GAATG)n 756 +
5119	chr21 9795588 9796685 variant3 chr21 9796202 9796615 (GAATG)n 891 +
5120	chr21 9795588 9796685 variant3 chr21 9796637 9796824 (GAATG)n 621 +
5121	.ft P
5122	.fi
5123	.UNINDENT
5124	.UNINDENT
5125	.sp
5126	We can see that variant1 overlaps with 3 repeats, variant2 with 4 and variant3 with 6. We can use
5127	groupBy to summarize the hits for each variant in several useful ways. The default behavior is to
5128	compute the \fIsum\fP of the opCol.
5129	.INDENT 0.0
5130	.INDENT 3.5
5131	.sp
5132	.nf
5133	.ft C
5134	groupBy \-i variantsToRepeats.bed \-grp 1,2,3 \-opCol 9
5135	chr21 9719758 9729320 6353
5136	chr21 9729310 9757478 14482
5137	chr21 9795588 9796685 3604
5138	.ft P
5139	.fi
5140	.UNINDENT
5141	.UNINDENT
5142	.SS 5.22.3 Computing the min and max.
5143	.sp
5144	Now let\(aqs find the \fImin\fP and \fImax\fP repeat score for each variant. We do this by "grouping" on the variant
5145	coordinate columns (i.e. cols. 1,2 and 3) and ask for the min and max of the repeat score column (i.e.
5146	col. 9).
5147	.INDENT 0.0
5148	.INDENT 3.5
5149	.sp
5150	.nf
5151	.ft C
5152	groupBy \-i variantsToRepeats.bed \-g 1,2,3 \-c 9 \-o min
5153	chr21 9719758 9729320 1004
5154	chr21 9729310 9757478 1036
5155	chr21 9795588 9796685 308
5156	.ft P
5157	.fi
5158	.UNINDENT
5159	.UNINDENT
5160	.sp
5161	We can also group on just the \fIname\fP column with similar effect.
5162	.INDENT 0.0
5163	.INDENT 3.5
5164	.sp
5165	.nf
5166	.ft C
5167	groupBy \-i variantsToRepeats.bed \-grp 4 \-opCol 9 \-op min
5168	variant1 1004
5169	variant2 1036
5170	variant3 308
5171	.ft P
5172	.fi
5173	.UNINDENT
5174	.UNINDENT
5175	.sp
5176	What about the \fImax\fP score? Let\(aqs keep the coordinates and the name of the variants so that we
5177	stay in BED format.
5178	.INDENT 0.0
5179	.INDENT 3.5
5180	.sp
5181	.nf
5182	.ft C
5183	groupBy \-i variantsToRepeats.bed \-grp 1,2,3,4 \-opCol 9 \-op max
5184	chr21 9719758 9729320 variant1 3288
5185	chr21 9729310 9757478 variant2 8367
5186	chr21 9795588 9796685 variant3 891
5187	.ft P
5188	.fi
5189	.UNINDENT
5190	.UNINDENT
5191	.SS 5.22.4 Computing the mean and median.
5192	.sp
5193	Now let\(aqs find the \fImean\fP and \fImedian\fP repeat score for each variant.
5194	.INDENT 0.0
5195	.INDENT 3.5
5196	.sp
5197	.nf
5198	.ft C
5199	cat variantsToRepeats.bed \| groupBy \-g 1,2,3,4 \-c 9 \-o mean
5200	chr21 9719758 9729320 variant1 1588.25
5201	chr21 9729310 9757478 variant2 3620.5
5202	chr21 9795588 9796685 variant3 600.6667
5203
5204	groupBy \-i variantsToRepeats.bed \-grp 1,2,3,4 \-opCol 9 \-op median
5205	chr21 9719758 9729320 variant1 1030.5
5206	chr21 9729310 9757478 variant2 2539.5
5207	chr21 9795588 9796685 variant3 652
5208	.ft P
5209	.fi
5210	.UNINDENT
5211	.UNINDENT
5212	.SS 5.22.5 Computing the mode and "antimode".
5213	.sp
5214	Now let\(aqs find the \fImode\fP and \fIantimode\fP (i.e., the least frequent) repeat score for each variant (in this case
5215	they are identical).
5216	.INDENT 0.0
5217	.INDENT 3.5
5218	.sp
5219	.nf
5220	.ft C
5221	groupBy \-i variantsToRepeats.bed \-grp 1,2,3,4 \-opCol 9 \-op mode
5222	chr21 9719758 9729320 variant1 1004
5223	chr21 9729310 9757478 variant2 1036
5224	chr21 9795588 9796685 variant3 308
5225
5226	groupBy \-i variantsToRepeats.bed \-grp 1,2,3,4 \-opCol 9 \-op antimode
5227	chr21 9719758 9729320 variant1 1004
5228	chr21 9729310 9757478 variant2 1036
5229	chr21 9795588 9796685 variant3 308
5230	.ft P
5231	.fi
5232	.UNINDENT
5233	.UNINDENT
5234	.SS 5.22.6 Computing the count of lines for a given group.
5235	.sp
5236	Figure:
5237	.INDENT 0.0
5238	.INDENT 3.5
5239	.sp
5240	.nf
5241	.ft C
5242	groupBy \-i variantsToRepeats.bed \-g 1,2,3,4 \-c 9 \-c count
5243	chr21 9719758 9729320 variant1 4
5244	chr21 9729310 9757478 variant2 4
5245	chr21 9795588 9796685 variant3 6
5246	.ft P
5247	.fi
5248	.UNINDENT
5249	.UNINDENT
5250	.SS 5.22.7 Collapsing: listing all of the values in the opCol for a given group.
5251	.sp
5252	Now for something different. What if we wanted all of the names of the repeats listed on the same line
5253	as the variants? Use the collapse option. This "denormalizes" things. Now you have a list of all the
5254	repeats on a single line.
5255	.INDENT 0.0
5256	.INDENT 3.5
5257	.sp
5258	.nf
5259	.ft C
5260	groupBy \-i variantsToRepeats.bed \-grp 1,2,3,4 \-opCol 9 \-op collapse
5261	chr21 9719758 9729320 variant1 ALR/Alpha,ALR/Alpha,L1PA3,ALR/Alpha,
5262	chr21 9729310 9757478 variant2 L1PA3,L1P1,ALR/Alpha,ALR/Alpha,
5263	chr21 9795588 9796685 variant3 (GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,
5264	.ft P
5265	.fi
5266	.UNINDENT
5267	.UNINDENT
5268	.SS 5.22.8 Computing frequencies: freqasc and freqdesc.
5269	.sp
5270	Now for something different. What if we wanted all of the names of the repeats listed on the same line
5271	as the variants? Use the collapse option. This "denormalizes" things. Now you have a list of all the
5272	repeats on a single line.
5273	.INDENT 0.0
5274	.INDENT 3.5
5275	.sp
5276	.nf
5277	.ft C
5278	cat variantsToRepeats.bed \| groupBy \-g 1 \-c 8 \-o freqdesc
5279	chr21 (GAATG)n:6,ALR/Alpha:5,L1PA3:2,L1P1:1,
5280
5281	cat variantsToRepeats.bed \| groupBy \-g 1 \-c 8 \-o freqasc
5282	chr21 L1P1:1,L1PA3:2,ALR/Alpha:5,(GAATG)n:6,
5283	.ft P
5284	.fi
5285	.UNINDENT
5286	.UNINDENT
5287	.SS 5.23 unionBedGraphs
5288	.sp
5289	\fBunionBedGraphs\fP combines multiple BEDGRAPH files into a single file such that one can directly
5290	compare coverage (and other text\-values such as genotypes) across multiple sample
5291	.SS 5.23.1 Usage and option summary
5292	.sp
5293	Usage:
5294	.INDENT 0.0
5295	.INDENT 3.5
5296	.sp
5297	.nf
5298	.ft C
5299	unionBedGraphs [OPTIONS] \-i FILE1 FILE2 FILE3 ... FILEn
5300	.ft P
5301	.fi
5302	.UNINDENT
5303	.UNINDENT
5304	.TS
5305	center;
5306	\|l\|l\|.
5307	_
5308	T{
5309	Option
5310	T} T{
5311	Description
5312	T}
5313	_
5314	T{
5315	\fB\-header\fP
5316	T} T{
5317	Print a header line, consisting of chrom, start, end followed by the names of each input BEDGRAPH file.
5318	T}
5319	_
5320	T{
5321	\fB\-names\fP
5322	T} T{
5323	A list of names (one per file) to describe each file in \-i. These names will be printed in the header line.
5324	T}
5325	_
5326	T{
5327	\fB\-empty\fP
5328	T} T{
5329	Report empty regions (i.e., start/end intervals w/o values in all files). \fIRequires the \(aq\-g FILE\(aq parameter (see below)\fP\&.
5330	T}
5331	_
5332	T{
5333	\fB\-g\fP
5334	T} T{
5335	The genome file to be used to calculate empty regions.
5336	T}
5337	_
5338	T{
5339	\fB\-filler TEXT\fP
5340	T} T{
5341	Use TEXT when representing intervals having no value. Default is \(aq0\(aq, but you can use \(aqN/A\(aq or any other text.
5342	T}
5343	_
5344	T{
5345	\fB\-examples\fP
5346	T} T{
5347	Show detailed usage examples.
5348	T}
5349	_
5350	.TE
5351	.SS 5.23.2 Default behavior
5352	.sp
5353	Figure:
5354	.INDENT 0.0
5355	.INDENT 3.5
5356	.sp
5357	.nf
5358	.ft C
5359	cat 1.bg
5360	chr1 1000 1500 10
5361	chr1 2000 2100 20
5362
5363	cat 2.bg
5364	chr1 900 1600 60
5365	chr1 1700 2050 50
5366
5367	cat 3.bg
5368	chr1 1980 2070 80
5369	chr1 2090 2100 20
5370
5371	cat sizes.txt
5372	chr1 5000
5373
5374	unionBedGraphs \-i 1.bg 2.bg 3.bg
5375	chr1 900 1000 0 60 0
5376	chr1 1000 1500 10 60 0
5377	chr1 1500 1600 0 60 0
5378	chr1 1700 1980 0 50 0
5379	chr1 1980 2000 0 50 80
5380	chr1 2000 2050 20 50 80
5381	chr1 2050 2070 20 0 80
5382	chr1 2070 2090 20 0 0
5383	chr1 2090 2100 20 0 20
5384	.ft P
5385	.fi
5386	.UNINDENT
5387	.UNINDENT
5388	.SS 5.23.3 Add a header line to the output
5389	.sp
5390	Figure:
5391	.INDENT 0.0
5392	.INDENT 3.5
5393	.sp
5394	.nf
5395	.ft C
5396	unionBedGraphs \-i 1.bg 2.bg 3.bg \-header
5397	chrom start end 1 2 3
5398	chr1 900 1000 0 60 0
5399	chr1 1000 1500 10 60 0
5400	chr1 1500 1600 0 60 0
5401	chr1 1700 1980 0 50 0
5402	chr1 1980 2000 0 50 80
5403	chr1 2000 2050 20 50 80
5404	chr1 2050 2070 20 0 80
5405	chr1 2070 2090 20 0 0
5406	chr1 2090 2100 20 0 20
5407	.ft P
5408	.fi
5409	.UNINDENT
5410	.UNINDENT
5411	.SS 5.23.4 Add a header line with custom file names to the output
5412	.sp
5413	Figure:
5414	.INDENT 0.0
5415	.INDENT 3.5
5416	.sp
5417	.nf
5418	.ft C
5419	unionBedGraphs \-i 1.bg 2.bg 3.bg \-header \-names WT\-1 WT\-2 KO\-1
5420	chrom start end WT\-1 WT\-2 KO\-1
5421	chr1 900 1000 0 60 0
5422	chr1 1000 1500 10 60 0
5423	chr1 1500 1600 0 60 0
5424	chr1 1700 1980 0 50 0
5425	chr1 1980 2000 0 50 80
5426	chr1 2000 2050 20 50 80
5427	chr1 2050 2070 20 0 80
5428	chr1 2070 2090 20 0 0
5429	chr1 2090 2100 20 0 20
5430	.ft P
5431	.fi
5432	.UNINDENT
5433	.UNINDENT
5434	.SS 5.23.5 Include regions that have zero coverage in all BEDGRAPH files.
5435	.sp
5436	Figure:
5437	.INDENT 0.0
5438	.INDENT 3.5
5439	.sp
5440	.nf
5441	.ft C
5442	unionBedGraphs \-i 1.bg 2.bg 3.bg \-empty \-g sizes.txt \-header
5443	chrom start end WT\-1 WT\-2 KO\-1
5444	chrom start end 1 2 3
5445	chr1 0 900 0 0 0
5446	chr1 900 1000 0 60 0
5447	chr1 1000 1500 10 60 0
5448	chr1 1500 1600 0 60 0
5449	chr1 1600 1700 0 0 0
5450	chr1 1700 1980 0 50 0
5451	chr1 1980 2000 0 50 80
5452	chr1 2000 2050 20 50 80
5453	chr1 2050 2070 20 0 80
5454	chr1 2070 2090 20 0 0
5455	chr1 2090 2100 20 0 20
5456	chr1 2100 5000 0 0 0
5457	.ft P
5458	.fi
5459	.UNINDENT
5460	.UNINDENT
5461	.SS 5.23.6 Use a custom value for missing values.
5462	.sp
5463	Figure:
5464	.INDENT 0.0
5465	.INDENT 3.5
5466	.sp
5467	.nf
5468	.ft C
5469	unionBedGraphs \-i 1.bg 2.bg 3.bg \-empty \-g sizes.txt \-header \-filler N/A
5470	chrom start end WT\-1 WT\-2 KO\-1
5471	chrom start end 1 2 3
5472	chr1 0 900 N/A N/A N/A
5473	chr1 900 1000 N/A 60 N/A
5474	chr1 1000 1500 10 60 N/A
5475	chr1 1500 1600 N/A 60 N/A
5476	chr1 1600 1700 N/A N/A N/A
5477	chr1 1700 1980 N/A 50 N/A
5478	chr1 1980 2000 N/A 50 80
5479	chr1 2000 2050 20 50 80
5480	chr1 2050 2070 20 N/A 80
5481	chr1 2070 2090 20 N/A N/A
5482	chr1 2090 2100 20 N/A 20
5483	chr1 2100 5000 N/A N/A N/A
5484	.ft P
5485	.fi
5486	.UNINDENT
5487	.UNINDENT
5488	.SS 5.23.7 Use BEDGRAPH files with non\-numeric values.
5489	.sp
5490	Figure:
5491	.INDENT 0.0
5492	.INDENT 3.5
5493	.sp
5494	.nf
5495	.ft C
5496	cat 1.snp.bg
5497	chr1 0 1 A/G
5498	chr1 5 6 C/T
5499
5500	cat 2.snp.bg
5501	chr1 0 1 C/C
5502	chr1 7 8 T/T
5503
5504	cat 3.snp.bg
5505	chr1 0 1 A/G
5506	chr1 5 6 C/T
5507
5508	unionBedGraphs \-i 1.snp.bg 2.snp.bg 3.snp.bg \-filler \-/\-
5509	chr1 0 1 A/G C/C A/G
5510	chr1 5 6 C/T \-/\- C/T
5511	chr1 7 8 \-/\- T/T \-/\-
5512	.ft P
5513	.fi
5514	.UNINDENT
5515	.UNINDENT
5516	.SS 5.24 annotateBed
5517	.sp
5518	\fBannotateBed\fP annotates one BED/VCF/GFF file with the coverage and number of overlaps observed
5519	from multiple other BED/VCF/GFF files. In this way, it allows one to ask to what degree one feature
5520	coincides with multiple other feature types with a single command.
5521	.SS 5.24.1 Usage and option summary
5522	.sp
5523	Usage:
5524	.INDENT 0.0
5525	.INDENT 3.5
5526	.sp
5527	.nf
5528	.ft C
5529	annotateBed [OPTIONS] \-i <BED/GFF/VCF> \-files FILE1 FILE2 FILE3 ... FILEn
5530	.ft P
5531	.fi
5532	.UNINDENT
5533	.UNINDENT
5534	.TS
5535	center;
5536	\|l\|l\|.
5537	_
5538	T{
5539	Option
5540	T} T{
5541	Description
5542	T}
5543	_
5544	T{
5545	\fB\-namesr\fP
5546	T} T{
5547	A list of names (one per file) to describe each file in \-i. These names will be printed as a header line.
5548	T}
5549	_
5550	T{
5551	\fB\-counts\fP
5552	T} T{
5553	Report the count of features in each file that overlap \-i. Default behavior is to report the fraction of \-i covered by each file.
5554	T}
5555	_
5556	T{
5557	\fB\-both\fP
5558	T} T{
5559	Report the count of features followed by the % coverage for each annotation file. Default is to report solely the fraction of \-i covered by each file.
5560	T}
5561	_
5562	T{
5563	\fB\-s\fP
5564	T} T{
5565	Force strandedness. That is, only include hits in A that overlap B on the same strand. By default, hits are included without respect to strand.
5566	T}
5567	_
5568	.TE
5569	.SS 5.24.2 Default behavior \- annotate one file with coverage from others.
5570	.sp
5571	By default, the fraction of each feature covered by each annotation file is reported after the complete
5572	feature in the file to be annotated.
5573	.INDENT 0.0
5574	.INDENT 3.5
5575	.sp
5576	.nf
5577	.ft C
5578	cat variants.bed
5579	chr1 100 200 nasty 1 \-
5580	chr2 500 1000 ugly 2 +
5581	chr3 1000 5000 big 3 \-
5582
5583	cat genes.bed
5584	chr1 150 200 geneA 1 +
5585	chr1 175 250 geneB 2 +
5586	chr3 0 10000 geneC 3 \-
5587
5588	cat conserve.bed
5589	chr1 0 10000 cons1 1 +
5590	chr2 700 10000 cons2 2 \-
5591	chr3 4000 10000 cons3 3 +
5592
5593	cat known_var.bed
5594	chr1 0 120 known1 \-
5595	chr1 150 160 known2 \-
5596	chr2 0 10000 known3 +
5597
5598	annotateBed \-i variants.bed \-files genes.bed conserv.bed known_var.bed
5599	chr1 100 200 nasty 1 \- 0.500000 1.000000 0.300000
5600	chr2 500 1000 ugly 2 + 0.000000 0.600000 1.000000
5601	chr3 1000 5000 big 3 \- 1.000000 0.250000 0.000000
5602	.ft P
5603	.fi
5604	.UNINDENT
5605	.UNINDENT
5606	.SS 5.24.3 Report the count of hits from the annotation files
5607	.sp
5608	Figure:
5609	.INDENT 0.0
5610	.INDENT 3.5
5611	.sp
5612	.nf
5613	.ft C
5614	annotateBed \-counts \-i variants.bed \-files genes.bed conserv.bed known_var.bed
5615	chr1 100 200 nasty 1 \- 2 1 2
5616	chr2 500 1000 ugly 2 + 0 1 1
5617	chr3 1000 5000 big 3 \- 1 1 0
5618	.ft P
5619	.fi
5620	.UNINDENT
5621	.UNINDENT
5622	.SS 5.24.4 Report both the count of hits and the fraction covered from the annotation files
5623	.sp
5624	Figure:
5625	.INDENT 0.0
5626	.INDENT 3.5
5627	.sp
5628	.nf
5629	.ft C
5630	annotateBed \-both \-i variants.bed \-files genes.bed conserv.bed known_var.bed
5631	#chr start end name score +/\- cnt1 pct1 cnt2 pct2 cnt3 pct3
5632	chr1 100 200 nasty 1 \- 2 0.500000 1 1.000000 2 0.300000
5633	chr2 500 1000 ugly 2 + 0 0.000000 1 0.600000 1 1.000000
5634	chr3 1000 5000 big 3 \- 1 1.000000 1 0.250000 0 0.000000
5635	.ft P
5636	.fi
5637	.UNINDENT
5638	.UNINDENT
5639	.SS 5.24.5 Restrict the reporting to overlaps on the same strand.
5640	.sp
5641	Note: Compare with the result from 5.24.3
5642	.INDENT 0.0
5643	.INDENT 3.5
5644	.sp
5645	.nf
5646	.ft C
5647	annotateBed \-s \-i variants.bed \-files genes.bed conserv.bed known_var.bed
5648	chr1 100 200 nasty var1 \- 0.000000 0.000000 0.000000
5649	chr2 500 1000 ugly var2 + 0.000000 0.000000 0.000000
5650	chr3 1000 5000 big var3 \- 1.000000 0.000000 0.000000
5651	.ft P
5652	.fi
5653	.UNINDENT
5654	.UNINDENT
5655	.SH EXAMPLE USAGE
5656	.sp
5657	Below are several examples of basic BEDTools usage. Example BED files are provided in the
5658	/data directory of the BEDTools distribution.
5659	.SS 6.1 intersectBed
5660	.sp
5661	6.1.1 Report the base\-pair overlap between sequence alignments and genes.
5662	.INDENT 0.0
5663	.INDENT 3.5
5664	.sp
5665	.nf
5666	.ft C
5667	intersectBed \-a reads.bed \-b genes.bed
5668	.ft P
5669	.fi
5670	.UNINDENT
5671	.UNINDENT
5672	.sp
5673	6.1.2 Report whether each alignment overlaps one or more genes. If not, the alignment is not reported.
5674	.INDENT 0.0
5675	.INDENT 3.5
5676	.sp
5677	.nf
5678	.ft C
5679	intersectBed \-a reads.bed \-b genes.bed \-u
5680	.ft P
5681	.fi
5682	.UNINDENT
5683	.UNINDENT
5684	.sp
5685	6.1.3 Report those alignments that overlap NO genes. Like "grep \-v"
5686	.INDENT 0.0
5687	.INDENT 3.5
5688	.sp
5689	.nf
5690	.ft C
5691	intersectBed \-a reads.bed \-b genes.bed \-v
5692	.ft P
5693	.fi
5694	.UNINDENT
5695	.UNINDENT
5696	.sp
5697	6.1.4 Report the number of genes that each alignment overlaps.
5698	.INDENT 0.0
5699	.INDENT 3.5
5700	.sp
5701	.nf
5702	.ft C
5703	intersectBed \-a reads.bed \-b genes.bed \-c
5704	.ft P
5705	.fi
5706	.UNINDENT
5707	.UNINDENT
5708	.sp
5709	6.1.5 Report the entire, original alignment entry for each overlap with a gene.
5710	.INDENT 0.0
5711	.INDENT 3.5
5712	.sp
5713	.nf
5714	.ft C
5715	intersectBed \-a reads.bed \-b genes.bed \-wa
5716	.ft P
5717	.fi
5718	.UNINDENT
5719	.UNINDENT
5720	.sp
5721	6.1.6 Report the entire, original gene entry for each overlap with a gene.
5722	.INDENT 0.0
5723	.INDENT 3.5
5724	.sp
5725	.nf
5726	.ft C
5727	intersectBed \-a reads.bed \-b genes.bed \-wb
5728	.ft P
5729	.fi
5730	.UNINDENT
5731	.UNINDENT
5732	.sp
5733	6.1.7 Report the entire, original alignment and gene entries for each overlap.
5734	.INDENT 0.0
5735	.INDENT 3.5
5736	.sp
5737	.nf
5738	.ft C
5739	intersectBed \-a reads.bed \-b genes.bed \-wa \-wb
5740	.ft P
5741	.fi
5742	.UNINDENT
5743	.UNINDENT
5744	.sp
5745	6.1.8 Only report an overlap with a repeat if it spans at least 50% of the exon.
5746	.INDENT 0.0
5747	.INDENT 3.5
5748	.sp
5749	.nf
5750	.ft C
5751	intersectBed \-a exons.bed \-b repeatMasker.bed \-f 0.50
5752	.ft P
5753	.fi
5754	.UNINDENT
5755	.UNINDENT
5756	.sp
5757	6.1.9 Only report an overlap if comprises 50% of the structural variant and 50% of the segmental duplication. Thus, it is reciprocally at least a 50% overlap.
5758	.INDENT 0.0
5759	.INDENT 3.5
5760	.sp
5761	.nf
5762	.ft C
5763	intersectBed \-a SV.bed \-b segmentalDups.bed \-f 0.50 \-r
5764	.ft P
5765	.fi
5766	.UNINDENT
5767	.UNINDENT
5768	.sp
5769	6.1.10 Read BED A from stdin. For example, find genes that overlap LINEs but not SINEs.
5770	.INDENT 0.0
5771	.INDENT 3.5
5772	.sp
5773	.nf
5774	.ft C
5775	intersectBed \-a genes.bed \-b LINES.bed \| intersectBed \-a stdin \-b SINEs.bed \-v
5776	.ft P
5777	.fi
5778	.UNINDENT
5779	.UNINDENT
5780	.sp
5781	6.1.11 Retain only single\-end BAM alignments that overlap exons.
5782	.INDENT 0.0
5783	.INDENT 3.5
5784	.sp
5785	.nf
5786	.ft C
5787	intersectBed \-abam reads.bam \-b exons.bed > reads.touchingExons.bam
5788	.ft P
5789	.fi
5790	.UNINDENT
5791	.UNINDENT
5792	.sp
5793	6.1.12 Retain only single\-end BAM alignments that do not overlap simple sequence
5794	repeats.
5795	.INDENT 0.0
5796	.INDENT 3.5
5797	.sp
5798	.nf
5799	.ft C
5800	intersectBed \-abam reads.bam \-b SSRs.bed \-v > reads.noSSRs.bam
5801	.ft P
5802	.fi
5803	.UNINDENT
5804	.UNINDENT
5805	.SS 6.2 pairToBed
5806	.sp
5807	6.2.1 Return all structural variants (in BEDPE format) that overlap with genes on either
5808	end.
5809	.INDENT 0.0
5810	.INDENT 3.5
5811	.sp
5812	.nf
5813	.ft C
5814	pairToBed \-a sv.bedpe \-b genes > sv.genes
5815	.ft P
5816	.fi
5817	.UNINDENT
5818	.UNINDENT
5819	.sp
5820	6.2.2 Return all structural variants (in BEDPE format) that overlap with genes on both
5821	end.
5822	.INDENT 0.0
5823	.INDENT 3.5
5824	.sp
5825	.nf
5826	.ft C
5827	pairToBed \-a sv.bedpe \-b genes \-type both > sv.genes
5828	.ft P
5829	.fi
5830	.UNINDENT
5831	.UNINDENT
5832	.sp
5833	6.2.3 Retain only paired\-end BAM alignments where neither end overlaps simple
5834	sequence repeats.
5835	.INDENT 0.0
5836	.INDENT 3.5
5837	.sp
5838	.nf
5839	.ft C
5840	pairToBed \-abam reads.bam \-b SSRs.bed \-type neither > reads.noSSRs.bam
5841	.ft P
5842	.fi
5843	.UNINDENT
5844	.UNINDENT
5845	.sp
5846	6.2.4 Retain only paired\-end BAM alignments where both ends overlap segmental
5847	duplications.
5848	.INDENT 0.0
5849	.INDENT 3.5
5850	.sp
5851	.nf
5852	.ft C
5853	pairToBed \-abam reads.bam \-b segdups.bed \-type both > reads.SSRs.bam
5854	.ft P
5855	.fi
5856	.UNINDENT
5857	.UNINDENT
5858	.sp
5859	6.2.5 Retain only paired\-end BAM alignments where neither or one and only one end
5860	overlaps segmental duplications.
5861	.INDENT 0.0
5862	.INDENT 3.5
5863	.sp
5864	.nf
5865	.ft C
5866	pairToBed \-abam reads.bam \-b segdups.bed \-type notboth > reads.notbothSSRs.bam
5867	.ft P
5868	.fi
5869	.UNINDENT
5870	.UNINDENT
5871	.SS 6.3 pairToPair
5872	.sp
5873	6.3.1 Find all SVs (in BEDPE format) in sample 1 that are also in sample 2.
5874	.INDENT 0.0
5875	.INDENT 3.5
5876	.sp
5877	.nf
5878	.ft C
5879	pairToPair \-a 1.sv.bedpe \-b 2.sv.bedpe \| cut \-f 1\-10 > 1.sv.in2.bedpe
5880	.ft P
5881	.fi
5882	.UNINDENT
5883	.UNINDENT
5884	.sp
5885	6.3.2 Find all SVs (in BEDPE format) in sample 1 that are not in sample 2.
5886	.INDENT 0.0
5887	.INDENT 3.5
5888	.sp
5889	.nf
5890	.ft C
5891	pairToPair \-a 1.sv.bedpe \-b 2.sv.bedpe \-type neither \| cut \-f 1\-10 >
5892	.ft P
5893	.fi
5894	.UNINDENT
5895	.UNINDENT
5896	.sp
5897	1.sv.notin2.bedpe
5898	.SS 6.4 bamToBed
5899	.sp
5900	6.4.1 Convert BAM alignments to BED format.
5901	.INDENT 0.0
5902	.INDENT 3.5
5903	.sp
5904	.nf
5905	.ft C
5906	bamToBed \-i reads.bam > reads.bed
5907	.ft P
5908	.fi
5909	.UNINDENT
5910	.UNINDENT
5911	.sp
5912	6.4.2 Convert BAM alignments to BED format using the BAM edit distance (NM) as the
5913	BED "score".
5914	.INDENT 0.0
5915	.INDENT 3.5
5916	.sp
5917	.nf
5918	.ft C
5919	bamToBed \-i reads.bam \-ed > reads.bed
5920	.ft P
5921	.fi
5922	.UNINDENT
5923	.UNINDENT
5924	.sp
5925	6.4.3 Convert BAM alignments to BEDPE format.
5926	.INDENT 0.0
5927	.INDENT 3.5
5928	.sp
5929	.nf
5930	.ft C
5931	bamToBed \-i reads.bam \-bedpe > reads.bedpe
5932	.ft P
5933	.fi
5934	.UNINDENT
5935	.UNINDENT
5936	.SS 6.5 windowBed
5937	.sp
5938	6.5.1 Report all genes that are within 10000 bp upstream or downstream of CNVs.
5939	.INDENT 0.0
5940	.INDENT 3.5
5941	.sp
5942	.nf
5943	.ft C
5944	windowBed \-a CNVs.bed \-b genes.bed \-w 10000
5945	.ft P
5946	.fi
5947	.UNINDENT
5948	.UNINDENT
5949	.sp
5950	6.5.2 Report all genes that are within 10000 bp upstream or 5000 bp downstream of
5951	CNVs.
5952	.INDENT 0.0
5953	.INDENT 3.5
5954	.sp
5955	.nf
5956	.ft C
5957	windowBed \-a CNVs.bed \-b genes.bed \-l 10000 \-r 5000
5958	.ft P
5959	.fi
5960	.UNINDENT
5961	.UNINDENT
5962	.sp
5963	6.5.3 Report all SNPs that are within 5000 bp upstream or 1000 bp downstream of genes.
5964	Define upstream and downstream based on strand.
5965	.INDENT 0.0
5966	.INDENT 3.5
5967	.sp
5968	.nf
5969	.ft C
5970	windowBed \-a genes.bed \-b snps.bed \-l 5000 \-r 1000 \-sw
5971	.ft P
5972	.fi
5973	.UNINDENT
5974	.UNINDENT
5975	.SS 6.6 closestBed
5976	.sp
5977	Note: By default, if there is a tie for closest, all ties will be reported. \fBclosestBed\fP allows overlapping
5978	features to be the closest.
5979	.sp
5980	6.6.1 Find the closest ALU to each gene.
5981	.INDENT 0.0
5982	.INDENT 3.5
5983	.sp
5984	.nf
5985	.ft C
5986	closestBed \-a genes.bed \-b ALUs.bed
5987	.ft P
5988	.fi
5989	.UNINDENT
5990	.UNINDENT
5991	.sp
5992	6.6.2 Find the closest ALU to each gene, choosing the first ALU in the file if there is a
5993	tie.
5994	.INDENT 0.0
5995	.INDENT 3.5
5996	.sp
5997	.nf
5998	.ft C
5999	closestBed \-a genes.bed \-b ALUs.bed \-t first
6000	.ft P
6001	.fi
6002	.UNINDENT
6003	.UNINDENT
6004	.sp
6005	6.6.3 Find the closest ALU to each gene, choosing the last ALU in the file if there is a
6006	tie.
6007	.INDENT 0.0
6008	.INDENT 3.5
6009	.sp
6010	.nf
6011	.ft C
6012	closestBed \-a genes.bed \-b ALUs.bed \-t last
6013	.ft P
6014	.fi
6015	.UNINDENT
6016	.UNINDENT
6017	.SS 6.7 subtractBed
6018	.sp
6019	Note: If a feature in A is entirely "spanned" by any feature in B, it will not be reported.
6020	.sp
6021	6.7.1 Remove introns from gene features. Exons will (should) be reported.
6022	.INDENT 0.0
6023	.INDENT 3.5
6024	.sp
6025	.nf
6026	.ft C
6027	subtractBed \-a genes.bed \-b introns.bed
6028	.ft P
6029	.fi
6030	.UNINDENT
6031	.UNINDENT
6032	.SS 6.8 mergeBed
6033	.sp
6034	6.8.1 Merge overlapping repetitive elements into a single entry.
6035	.INDENT 0.0
6036	.INDENT 3.5
6037	.sp
6038	.nf
6039	.ft C
6040	mergeBed \-i repeatMasker.bed
6041	.ft P
6042	.fi
6043	.UNINDENT
6044	.UNINDENT
6045	.sp
6046	6.8.2 Merge overlapping repetitive elements into a single entry, returning the number of
6047	entries merged.
6048	.INDENT 0.0
6049	.INDENT 3.5
6050	.sp
6051	.nf
6052	.ft C
6053	mergeBed \-i repeatMasker.bed \-n
6054	.ft P
6055	.fi
6056	.UNINDENT
6057	.UNINDENT
6058	.sp
6059	6.8.3 Merge nearby (within 1000 bp) repetitive elements into a single entry.
6060	.INDENT 0.0
6061	.INDENT 3.5
6062	.sp
6063	.nf
6064	.ft C
6065	mergeBed \-i repeatMasker.bed \-d 1000
6066	.ft P
6067	.fi
6068	.UNINDENT
6069	.UNINDENT
6070	.SS 6.9 coverageBed
6071	.sp
6072	6.9.1 Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the
6073	genome.
6074	.INDENT 0.0
6075	.INDENT 3.5
6076	.sp
6077	.nf
6078	.ft C
6079	coverageBed \-a reads.bed \-b windows10kb.bed \| head
6080	chr1 0 10000 0 10000 0.00
6081	chr1 10001 20000 33 10000 0.21
6082	chr1 20001 30000 42 10000 0.29
6083	chr1 30001 40000 71 10000 0.36
6084	.ft P
6085	.fi
6086	.UNINDENT
6087	.UNINDENT
6088	.sp
6089	6.9.2 Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the
6090	genome and created a BEDGRAPH of the number of aligned reads in each window for
6091	display on the UCSC browser.
6092	.INDENT 0.0
6093	.INDENT 3.5
6094	.sp
6095	.nf
6096	.ft C
6097	coverageBed \-a reads.bed \-b windows10kb.bed \| cut \-f 1\-4 > windows10kb.cov.bedg
6098	.ft P
6099	.fi
6100	.UNINDENT
6101	.UNINDENT
6102	.sp
6103	6.9.3 Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the
6104	genome and created a BEDGRAPH of the fraction of each window covered by at least
6105	one aligned read for display on the UCSC browser.
6106	.INDENT 0.0
6107	.INDENT 3.5
6108	.sp
6109	.nf
6110	.ft C
6111	coverageBed \-a reads.bed \-b windows10kb.bed \| awk ??{OFS="\et"; print $1,$2,$3,$6}??
6112	> windows10kb.pctcov.bedg
6113	.ft P
6114	.fi
6115	.UNINDENT
6116	.UNINDENT
6117	.SS 6.10 complementBed
6118	.sp
6119	6.10.1 Report all intervals in the human genome that are not covered by repetitive
6120	elements.
6121	.INDENT 0.0
6122	.INDENT 3.5
6123	.sp
6124	.nf
6125	.ft C
6126	complementBed \-i repeatMasker.bed \-g hg18.genome
6127	.ft P
6128	.fi
6129	.UNINDENT
6130	.UNINDENT
6131	.SS 6.11 shuffleBed
6132	.sp
6133	6.11.1 Randomly place all discovered variants in the genome. However, prevent them
6134	from being placed in know genome gaps.
6135	.INDENT 0.0
6136	.INDENT 3.5
6137	.sp
6138	.nf
6139	.ft C
6140	shuffleBed \-i variants.bed \-g hg18.genome \-excl genome_gaps.bed
6141	.ft P
6142	.fi
6143	.UNINDENT
6144	.UNINDENT
6145	.sp
6146	6.11.2 Randomly place all discovered variants in the genome. However, prevent them
6147	from being placed in know genome gaps and require that the variants be randomly
6148	placed on the same chromosome.
6149	.INDENT 0.0
6150	.INDENT 3.5
6151	.sp
6152	.nf
6153	.ft C
6154	shuffleBed \-i variants.bed \-g hg18.genome \-excl genome_gaps.bed \-chrom
6155	.ft P
6156	.fi
6157	.UNINDENT
6158	.UNINDENT
6159	.SH ADVANCED USAGE
6160	.SS 7.1 Mask all regions in a genome except for targeted capture regions.
6161	.sp
6162	# Add 500 bp up and downstream of each probe
6163	.INDENT 0.0
6164	.INDENT 3.5
6165	.sp
6166	.nf
6167	.ft C
6168	slopBed \-i probes.bed \-b 500 > probes.500bp.bed
6169	.ft P
6170	.fi
6171	.UNINDENT
6172	.UNINDENT
6173	.sp
6174	# Get a BED file of all regions not covered by the probes (+500 bp up/down)
6175	.INDENT 0.0
6176	.INDENT 3.5
6177	.sp
6178	.nf
6179	.ft C
6180	complementBed \-i probes.500bp.bed \-g hg18.genome > probes.500bp.complement.bed
6181	.ft P
6182	.fi
6183	.UNINDENT
6184	.UNINDENT
6185	.sp
6186	# Create a masked genome where all bases are masked except for the probes +500bp
6187	.INDENT 0.0
6188	.INDENT 3.5
6189	.sp
6190	.nf
6191	.ft C
6192	maskFastaFromBed \-in hg18.fa \-bed probes.500bp.complement.bed \-fo hg18.probecomplement.
6193	masked.fa
6194	.ft P
6195	.fi
6196	.UNINDENT
6197	.UNINDENT
6198	.SS 7.2 Screening for novel SNPs.
6199	.sp
6200	# Find all SNPs that are not in dbSnp and not in the latest 1000 genomes calls
6201	.INDENT 0.0
6202	.INDENT 3.5
6203	.sp
6204	.nf
6205	.ft C
6206	intersectBed \-a snp.calls.bed \-b dbSnp.bed \-v \| intersectBed \-a stdin \-b 1KG.bed
6207	\-v > snp.calls.novel.bed
6208	.ft P
6209	.fi
6210	.UNINDENT
6211	.UNINDENT
6212	.sp
6213	you can first use intersectBed with the "\-f 1.0" option.
6214	.INDENT 0.0
6215	.INDENT 3.5
6216	.sp
6217	.nf
6218	.ft C
6219	intersectBed \-a features.bed \-b windows.bed \-f 1.0 \| coverageBed \-a stdin \-b
6220	windows.bed > windows.bed.coverage
6221	.ft P
6222	.fi
6223	.UNINDENT
6224	.UNINDENT
6225	.SS 7.4 Computing the coverage of BAM alignments on exons.
6226	.sp
6227	# One can combine SAMtools with BEDtools to compute coverage directly from the BAM
6228	data by using bamToBed.
6229	.INDENT 0.0
6230	.INDENT 3.5
6231	.sp
6232	.nf
6233	.ft C
6234	bamToBed \-i reads.bam \| coverageBed \-a stdin \-b exons.bed > exons.bed.coverage
6235	.ft P
6236	.fi
6237	.UNINDENT
6238	.UNINDENT
6239	.sp
6240	# Take it a step further and require that coverage be from properly\-paired reads.
6241	.INDENT 0.0
6242	.INDENT 3.5
6243	.sp
6244	.nf
6245	.ft C
6246	samtools view \-bf 0x2 reads.bam \| bamToBed \-i stdin \| coverageBed \-a stdin \-b
6247	exons.bed > exons.bed.proper.coverage
6248	.ft P
6249	.fi
6250	.UNINDENT
6251	.UNINDENT
6252	.SS 7.5 Computing coverage separately for each strand.
6253	.sp
6254	# Use grep to only look at forward strand features (i.e. those that end in "+").
6255	.INDENT 0.0
6256	.INDENT 3.5
6257	.sp
6258	.nf
6259	.ft C
6260	bamToBed \-i reads.bam \| grep \e+$ \| coverageBed \-a stdin \-b genes.bed >
6261	genes.bed.forward.coverage
6262	.ft P
6263	.fi
6264	.UNINDENT
6265	.UNINDENT
6266	.sp
6267	# Use grep to only look at reverse strand features (i.e. those that end in "\-").
6268	.INDENT 0.0
6269	.INDENT 3.5
6270	.sp
6271	.nf
6272	.ft C
6273	bamToBed \-i reads.bam \| grep \e\-$ \| coverageBed \-a stdin \-b genes.bed >
6274	genes.bed.forward.coverage
6275	.ft P
6276	.fi
6277	.UNINDENT
6278	.UNINDENT
6279	.SS 7.6 Find structural variant calls that are private to one sample.
6280	.sp
6281	# :
6282	.INDENT 0.0
6283	.INDENT 3.5
6284	.sp
6285	.nf
6286	.ft C
6287	pairToPair \-a sample1.sv.bedpe \-b othersamples.sv.bedpe \-type neither >
6288	sample1.sv.private.bedpe
6289	.ft P
6290	.fi
6291	.UNINDENT
6292	.UNINDENT
6293	.SS 7.7 Exclude SV deletions that appear to be ALU insertions in the reference genome.
6294	.sp
6295	# We\(aqll require that 90% of the inner span of the deletion be overlapped by a
6296	recent ALU.
6297	.INDENT 0.0
6298	.INDENT 3.5
6299	.sp
6300	.nf
6301	.ft C
6302	pairToBed \-a deletions.sv.bedpe \-b ALUs.recent.bed \-type notispan \-f 0.80 >
6303	deletions.notALUsinRef.bedpe
6304	.ft P
6305	.fi
6306	.UNINDENT
6307	.UNINDENT
6308	.sp
6309	Refer to the mailing list.
6310	.SH AUTHOR
6311	UVa
6312	.SH COPYRIGHT
6313	2012
6314	.\" Generated by docutils manpage writer.
6315	.

Download in other formats:

Original Format