forked from MikkelSchubert/adapterremoval
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathAdapterRemoval.1
362 lines (362 loc) · 17.9 KB
/
AdapterRemoval.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
.\" Man page generated from reStructuredText.
.
.TH "ADAPTERREMOVAL" "1" "Jun 23, 2019" "2.3.0" "AdapterRemoval"
.SH NAME
AdapterRemoval \- Fast short-read adapter trimming and processing
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH SYNOPSIS
.sp
\fBAdapterRemoval\fP [\fIoptions\fP…] –file1 <\fIfilenames\fP> [–file2 <\fIfilenames\fP>]
.SH DESCRIPTION
.sp
\fBAdapterRemoval\fP removes residual adapter sequences from single\-end (SE) or paired\-end (PE) FASTQ reads, optionally trimming Ns and low qualities bases and/or collapsing overlapping paired\-end mates into one read. Low quality reads are filtered based on the resulting length and the number of ambigious nucleotides (‘N’) present following trimming. These operations may be combined with simultaneous demultiplexing using 5’ barcode sequences. Alternatively, \fBAdapterRemoval\fP may attempt to reconstruct a consensus adapter sequences from paired\-end data, in order to allow the identification of the adapter sequences originally used.
.sp
If you use this program, please cite the paper:
.INDENT 0.0
.INDENT 3.5
Schubert, Lindgreen, and Orlando (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 12;9(1):88
.sp
\fI\%http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104\-016\-1900\-2\fP
.UNINDENT
.UNINDENT
.sp
For detailed documentation, please see
.INDENT 0.0
.INDENT 3.5
\fI\%http://adapterremoval.readthedocs.io/en/v2.2.3/\fP
.UNINDENT
.UNINDENT
.SH OPTIONS
.INDENT 0.0
.TP
.B \-\-help
Display summary of command\-line options.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-version
Print the version string.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-file1 filename [filenames...]
Read FASTQ reads from one or more files, either uncompressed, bzip2 compressed, or gzip compressed. This contains either the single\-end (SE) reads or, if paired\-end, the mate 1 reads. If running in paired\-end mode, both \fB\-\-file1\fP and \fB\-\-file2\fP must be set. See the primary documentation for a list of supported formats.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-file2 filename [filenames...]
Read one or more FASTQ files containing mate 2 reads for a paired\-end run. If specified, \fB\-\-file1\fP must also be set.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-identify\-adapters
Attempt to build a consensus adapter sequence from fully overlapping pairs of paired\-end reads. The minimum overlap is controlled by \fB\-\-minalignmentlength\fP\&. The result will be compared with the values set using \fB\-\-adapter1\fP and \fB\-\-adapter2\fP\&. No trimming is performed in this mode. Default is off.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-threads n
Maximum number of threads. Defaults to 1.
.UNINDENT
.SS FASTQ options
.INDENT 0.0
.TP
.B \-\-qualitybase base
The Phred quality scores encoding used in input reads \- either ‘64’ for Phred+64 (Illumina 1.3+ and 1.5+) or ‘33’ for Phred+33 (Illumina 1.8+). In addition, the value ‘solexa’ may be used to specify reads with Solexa encoded scores. Default is 33.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-qualitybase\-output base
The base of the quality score for reads written by AdapterRemoval \- either ‘64’ for Phred+64 (i.e., Illumina 1.3+ and 1.5+) or ‘33’ for Phred+33 (Illumina 1.8+). In addition, the value ‘solexa’ may be used to specify reads with Solexa encoded scores. However, note that quality scores are represented using Phred scores internally, and conversion to and from Solexa scores therefore result in a loss of information. The default corresponds to the value given for \fB\-\-qualitybase\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-qualitymax base
Specifies the maximum Phred score expected in input files, and used when writing output files. Possible values are 0 to 93 for Phred+33 encoded files, and 0 to 62 for Phred+64 encoded files. Defaults to 41.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-mate\-separator separator
Character separating the mate number (1 or 2) from the read name in FASTQ records. Defaults to ‘/’.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-interleaved
Enables \fB\-\-interleaved\-input\fP and \fB\-\-interleaved\-output\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-interleaved\-input
If set, input is expected to be a interleaved FASTQ files specified using \fB\-\-file1\fP, in which pairs of reads are written one after the other (e.g. read1/1, read1/2, read2/1, read2/2, etc.).
.UNINDENT
.INDENT 0.0
.TP
.B \-\-interleaved\-ouput
Write paired\-end reads to a single file, interleaving mate 1 and mate 2 reads. By default, this file is named \fBbasename.paired.truncated\fP, but this may be changed using the \fB\-\-output1\fP option.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-combined\-output
Write all reads into the files specified by \fB\-\-output1\fP and \fB\-\-output2\fP\&. The sequences of reads discarded due to quality filters or read merging are replaced with a single ‘N’ with Phred score 0. This option can be combined with \fB\-\-interleaved\-output\fP to write PE reads to a single output file specified with \fB\-\-output1\fP\&.
.UNINDENT
.SS Output file options
.INDENT 0.0
.TP
.B \-\-basename filename
Prefix used for the naming output files, unless these names have been overridden using the corresponding command\-line option (see below).
.UNINDENT
.INDENT 0.0
.TP
.B \-\-settings file
Output file containing information on the parameters used in the run as well as overall statistics on the reads after trimming. Default filename is ‘basename.settings’.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-output1 file
Output file containing trimmed mate1 reads. Default filename is ‘basename.pair1.truncated’ for paired\-end reads, ‘basename.truncated’ for single\-end reads, and ‘basename.paired.truncated’ for interleaved paired\-end reads.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-output2 file
Output file containing trimmed mate 2 reads when \fB\-\-interleaved\-output\fP is not enabled. Default filename is ‘basename.pair2.truncated’ in paired\-end mode.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-singleton file
Output file to which containing paired reads for which the mate has been discarded. Default filename is ‘basename.singleton.truncated’.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-outputcollapsed file
If –collapsed is set, contains overlapping mate\-pairs which have been merged into a single read (PE mode) or reads for which the adapter was identified by a minimum overlap, indicating that the entire template molecule is present. This does not include which have subsequently been trimmed due to low\-quality or ambiguous nucleotides. Default filename is ‘basename.collapsed’
.UNINDENT
.INDENT 0.0
.TP
.B \-\-outputcollapsedtruncated file
Collapsed reads (see –outputcollapsed) which were trimmed due the presence of low\-quality or ambiguous nucleotides. Default filename is ‘basename.collapsed.truncated’.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-discarded file
Contains reads discarded due to the –minlength, –maxlength or –maxns options. Default filename is ‘basename.discarded’.
.UNINDENT
.SS Output compression options
.INDENT 0.0
.TP
.B \-\-gzip
If set, all FASTQ files written by AdapterRemoval will be gzip compressed using the compression level specified using \fB\-\-gzip\-level\fP\&. The extension “.gz” is added to files for which no filename was given on the command\-line. Defaults to off.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-gzip\-level level
Determines the compression level used when gzip’ing FASTQ files. Must be a value in the range 0 to 9, with 0 disabling compression and 9 being the best compression. Defaults to 6.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-bzip2
If set, all FASTQ files written by AdapterRemoval will be bzip2 compressed using the compression level specified using \fB\-\-bzip2\-level\fP\&. The extension “.bz2” is added to files for which no filename was given on the command\-line. Defaults to off.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-bzip2\-level level
Determines the compression level used when bzip2’ing FASTQ files. Must be a value in the range 1 to 9, with 9 being the best compression. Defaults to 9.
.UNINDENT
.SS FASTQ trimming options
.INDENT 0.0
.TP
.B \-\-adapter1 adapter
Adapter sequence expected to be found in mate 1 reads, specified in read direction. For a detailed description of how to provide the appropriate adapter sequences, see the “Adapters” section of the online documentation. Default is AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-adapter2 adapter
Adapter sequence expected to be found in mate 2 reads, specified in read direction. For a detailed description of how to provide the appropriate adapter sequences, see the “Adapters” section of the online documentation. Default is AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-adapter\-list filename
Read one or more adapter sequences from a table. The first two columns (separated by whitespace) of each line in the file are expected to correspond to values passed to –adapter1 and –adapter2. In single\-end mode, only column one is required. Lines starting with ‘#’ are ignored. When multiple rows are found in the table, AdapterRemoval will try each adapter (pair), and select the best aligning adapters for each FASTQ read processed.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-minadapteroverlap length
In single\-end mode, reads are only trimmed if the overlap between read and the adapter is at least X bases long, not counting ambiguous nucleotides (N); this is independent of the \fB\-\-minalignmentlength\fP when using \fB\-\-collapse\fP, allowing a conservative selection of putative complete inserts in single\-end mode, while ensuring that all possible adapter contamination is trimmed. The default is 0.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-mm mismatchrate
The allowed fraction of mismatches allowed in the aligned region. If the value is less than 1, then the value is used directly. If \fB\(ga\-\-mismatchrate\fP is greater than 1, the rate is set to 1 / \fB\-\-mismatchrate\fP\&. The default setting is 3 when trimming adapters, corresponding to a maximum mismatch rate of 1/3, and 10 when using \fB\-\-identify\-adapters\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-shift n
To allow for missing bases in the 5’ end of the read, the program can let the alignment slip \fB\-\-shift\fP bases in the 5’ end. This corresponds to starting the alignment maximum \fB\-\-shift\fP nucleotides into read2 (for paired\-end) or the adapter (for single\-end). The default is 2.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-trim5p n [n]
Trim the 5’ of reads by a fixed amount after removing adapters, but before carrying out quality based trimming. Specify one value to trim mate 1 and mate 2 reads the same amount, or two values separated by a space to trim each mate different amounts. Off by default.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-trim3p n [n]
Trim the 3’ of reads by a fixed amount. See \fB\-\-trim5p\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-trimns
Trim consecutive Ns from the 5’ and 3’ termini. If quality trimming is also enabled (\fB\-\-trimqualities\fP), then stretches of mixed low\-quality bases and/or Ns are trimmed.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-maxns n
Discard reads containing more than \fB\-\-max\fP ambiguous bases (‘N’) after trimming. Default is 1000.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-trimqualities
Trim consecutive stretches of low quality bases (threshold set by \fB\-\-minquality\fP) from the 5’ and 3’ termini. If trimming of Ns is also enabled (\fB\-\-trimns\fP), then stretches of mixed low\-quality bases and Ns are trimmed.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-trimwindows window_size
Trim low quality bases using a sliding window based approach inspired by \fBsickle\fP with the given window size. See the “Window based quality trimming” section of the manual page for a description of this algorithm.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-minquality minimum
Set the threshold for trimming low quality bases using \fB\-\-trimqualities\fP and \fB\-\-trimwindows\fP\&. Default is 2.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-preserve5p
If set, bases at the 5p will not be trimmed by \fB\-\-trimns\fP, \fB\-\-trimqualities\fP, and \fB\-\-trimwindows\fP\&. Collapsed reads will not be quality trimmed when this option is enabled.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-minlength length
Reads shorter than this length are discarded following trimming. Defaults to 15.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-maxlength length
Reads longer than this length are discarded following trimming. Defaults to 4294967295.
.UNINDENT
.SS FASTQ merging options
.INDENT 0.0
.TP
.B \-\-collapse
In paired\-end mode, merge overlapping mates into a single and recalculate the quality scores. In single\-end mode, attempt to identify templates for which the entire sequence is available. In both cases, complete “collapsed” reads are written with a ‘M_’ name prefix, and “collapsed” reads which are trimmed due to quality settings are written with a ‘MT_’ name prefix. The overlap needs to be at least \fB\-\-minalignmentlength\fP nucleotides, with a maximum number of mismatches determined by \fB\-\-mm\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-minalignmentlength length
The minimum overlap between mate 1 and mate 2 before the reads are collapsed into one, when collapsing paired\-end reads, or when attempting to identify complete template sequences in single\-end mode. Default is 11.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-seed seed
When collaping reads at positions where the two reads differ, and the quality of the bases are identical, AdapterRemoval will select a random base. This option specifies the seed used for the random number generator used by AdapterRemoval. This value is also written to the settings file. Note that setting the seed is not reliable in multithreaded mode, since the order of operations is non\-deterministic.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-deterministic
Enable deterministic mode; currently only affects –collapse, different overlapping bases with equal quality are set to N quality 0, instead of being randomly sampled.
.UNINDENT
.SS FASTQ demultiplexing options
.INDENT 0.0
.TP
.B \-\-barcode\-list filename
Perform demultiplxing using table of one or two fixed\-length barcodes for SE or PE reads. The table is expected to contain 2 or 3 columns, the first of which represent the name of a given sample, and the second and third of which represent the mate 1 and (optionally) the mate 2 barcode sequence. For a detailed description, see the “Demultiplexing” section of the online documentation.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-barcode\-mm n
.TP
.B Maximum number of mismatches allowed when counting mismatches in both the mate 1 and the mate 2 barcode for paired reads.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-barcode\-mm\-r1 n
Maximum number of mismatches allowed for the mate 1 barcode; if not set, this value is equal to the \fB\-\-barcode\-mm\fP value; cannot be higher than the \fB\-\-barcode\-mm\fP value.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-barcode\-mm\-r2 n
Maximum number of mismatches allowed for the mate 2 barcode; if not set, this value is equal to the \fB\-\-barcode\-mm\fP value; cannot be higher than the \fB\-\-barcode\-mm\fP value.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-demultiplex\-only
Only carry out demultiplexing using the list of barcodes supplied with –barcode\-list. No other processing is done.
.UNINDENT
.SH WINDOW BASED QUALITY TRIMMING
.sp
As of v2.2.2, AdapterRemoval implements sliding window based approach to quality based base\-trimming inspired by \fBsickle\fP\&. If \fBwindow_size\fP is greater than or equal to 1, that number is used as the window size for all reads. If \fBwindow_size\fP is a number greater than or equal to 0 and less than 1, then that number is multiplied by the length of individual reads to determine the window size. If the window length is zero or is greater than the current read length, then the read length is used instead.
.sp
Reads are trimmed as follows for a given window size:
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP 1. 3
The new 5’ is determined by locating the first window where both the average quality and the quality of the first base in the window is greater than \fB\-\-minquality\fP\&.
.IP 2. 3
The new 3’ is located by sliding the first window right, until the average quality becomes less than or equal to \fB\-\-minquality\fP\&. The new 3’ is placed at the last base in that window where the quality is greater than or equal to \fB\-\-minquality\fP\&.
.IP 3. 3
If no 5’ position could be determined, the read is discarded.
.UNINDENT
.UNINDENT
.UNINDENT
.SH EXIT STATUS
.sp
AdapterRemoval exists with status 0 if the program ran succesfully, and with a non\-zero exit code if any errors were encountered. Do not use the output from AdapterRemoval if the program returned a non\-zero exit code!
.SH REPORTING BUGS
.sp
Please report any bugs using the AdapterRemoval issue\-tracker:
.sp
\fI\%https://github.com/MikkelSchubert/adapterremoval/issues\fP
.SH LICENSE
.sp
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
at your option any later version.
.sp
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.sp
You should have received a copy of the GNU General Public License
along with this program. If not, see <\fI\%http://www.gnu.org/licenses/\fP>.
.SH AUTHOR
Mikkel Schubert; Stinus Lindgreen
.SH COPYRIGHT
2017, Mikkel Schubert; Stinus Lindgreen
.\" Generated by docutils manpage writer.
.