rnalib.testdata module
Provides access to test files and can be used to initially build the testdata folder.
- testdata creation:
Rnalib tests use various test data files that can be created by running this python script. The contained test_resources dict describes the various test resources and their origin. Briefly, this script does the following for each configured resource:
Download source file from a public URL or copy from the static_test_files directory
Ensure that the files are sorted by genomic coordinates and are compressed and indexed with bgzip and tabix.
Slice a genomic subregions from the files if configured
Copy the resultfiles and corresponding indices to the testdata directory
- testdata access:
Once you have created the testdata folder, you can get the filenames of test resources via the get_resource(<resource_id>) method. If <resource_id> starts with ‘pybedtools::<id>’ then this method will return the filename of the respective pybedtools test file. To list the ids of all available test resources, use the list_resources() method.
Examples
>>> get_resource("gencode_gff")
>>> get_resource("pybedtools::hg19.gff")
>>> list_resources()
>>> ['gencode_gff','gencode_gtf', ... ,'pybedtools::y.bam']
Note that for testdata creation, the following external tools need to be installed:
samtools
bedtools
htslib (bgzip, tabix)
- rnalib.testdata.get_resource(k, data_dir: Path = None, conf=None, create_resource=False)[source]
Return a file link to the test resource with the passed key. If the passed key starts with ‘pybedtools::<filename>’, then the respective pybedtools test file will be returned.
Examples
>>> get_resource("gencode_gff") >>> get_resource("pybedtools::snps.bed.gz")
- rnalib.testdata.download_bgzip_slice(config, resource_name, outdir, view_tempdir=False, show_progress=True)[source]
Download the resource with the passed name from the passed config dict. The resource will be downloaded to the testdata directory and will be sorted, compressed and indexed. If the resource is a genomic file (gff, gtf, bed, vcf, fasta, bam), then a subregion can be sliced from the file by passing a list of regions in the ‘regions’ key of the resource dict.
This method requires the following external tools to be installed: * samtools * bedtools * htslib (bgzip, tabix)
Examples
>>> download_bgzip_slice(test_resources, "gencode_gff", "/path/to/testdata")
- rnalib.testdata.create_testdata(out_dir, resources=None, show_data_dir=False, mkdir=True)[source]
Downloads test resources configured in the passed dicts if not existing yet.
- Parameters:
out_dir (
str) – The directory where the test data will be created.resources (
dict, optional) – A dict or a list of dicts configureing the test resources. If None, the test_resources dict will be used.show_data_dir (
bool, optional) – If True, the resulting test data directory will be printed.mkdir (
bool, optional) – If True, the output directory will be created if it does not exist.