pybedtools.contrib.long_range_interaction.tag_bedpe¶
-
pybedtools.contrib.long_range_interaction.tag_bedpe(bedpe, queries, verbose=False)[source]¶ Tag each end of a BEDPE with a set of (possibly many) query BED files.
For example, given a BEDPE of interacting fragments from a Hi-C experiment, identify the contacts between promoters and ChIP-seq peaks. In this case, promoters and ChIP-seq peaks of interest would be provided as BED files.
The strategy is to split the BEDPE into two separate files. Each file is intersected independently with the set of queries. The results are then iterated through in parallel to tie the ends back together. It is this iterator that is returned (see example below).
Parameters: bedpe : str
BEDPE-format file. Must be name-sorted.
queries : dict
Dictionary of BED/GFF/GTF/VCF files to use. After splitting the BEDPE, these query files (values in the dictionary) will be passed as the
-barg tobedtools intersect. The keys are passed as thenamesargument forbedtools intersectFeatures in each file must have unique names. Use
pybedtools.featurefuncs.UniqueID()to help fix this.Each file must be BED3 to BED6.
Returns: Tuple of (iterator, n, extra).
iteratoris described below.nis the total number of lines in theBEDPE file, which is useful for calculating percentage complete for
downstream work.
extrais the number of extra fields found in the BEDPE(also useful for downstream processing).
iteratoryields tuples of (label, end1_hits, end2_hits) wherelabelisthe name field of one line of the original BEDPE file.
end1_hitsandend2_hitsare each iterators of BED-like lines representing allidentified intersections across all query BED files for end1 and end2 for
this pair.
Recall that BEDPE format defines a single name and a single score for each
pair. For each item in
end1_hits, the fields are:chrom1 start1 end1 name score strand1 [extra fields] query_label fields_from_query_intersecting_end1
where
[extra fields]are any additional fields from the original BEDPE,query_labelis one of the keys in thebedsinput dictionary, and theremaining fields in the line are the intersecting line from the
corresponding BED file in the
bedsinput dictionary.Similarly, each item in
end2_hitsconsists of:chrom2 start2 end2 name score strand2 [extra fields] query_label fields_from_query_intersecting_end2
At least one line is reported for every line in the BEDPE file. If there
was no intersection, the standard BEDTools null fields will be shown. In
end1_hitsandend2_hits, a line will be reported for each hit in eachquery.