updated bedbase and geniml

databio · Jan 29, 2025 · 588613e · 588613e
1 parent 33b09bd
commit 588613e
Show file tree

Hide file tree

Showing 19 changed files with 862 additions and 226 deletions.
diff --git a/docs/bedboss/how-to-configure.md → docs/bedbase/how-to-configure.md b/docs/bedboss/how-to-configure.md → docs/bedbase/how-to-configure.md
@@ -1,15 +1,23 @@
 # How to create bedbase config file
 
-### Bedbase config file is yaml file with 4 parts:
-- paths and vector models
-- relational database credentials
-- qdrant credentials
-- server information
-- remote info
-- pephub info
-- s3 credentials
+BEDbase config file serves as a configuration file for the BEDbase server to provide credentials and paths to the required resources.
 
-### Example:
+### How to create a bedbase config file
+
+There are two ways to create a bedbase config file: </br>
+1. Create a new file and copy the content from the example below. </br>
+2. Use [BEDboss](../bedboss/README.md) command:
+```bash
+bedboss init-config --outfolder path/to/outfolder
+```
+
+### How to check if the config file is correct:
+Use [BEDboss](../bedboss/README.md) command:
+```bash
+bedboss verify-config --config path/to/config.yaml
+```
+
+### Example of the config file:
 ```yaml
 path:
   remote_url_base: http://data.bedbase.org/

diff --git a/docs/bedboss/README.md b/docs/bedboss/README.md
@@ -13,7 +13,34 @@
 A command-line tool and Python package for managing and processing genomic interval region files and bedsets in BEDbase.
 BEDboss is highly related to BEDbase, nevertheless, it can be used as a standalone tool for calculating statistics, converting files, and verifying the quality of BED files.
 
-### Main components:
+---
+
+## 💿 Installation
+To install `bedboss` use this command: 
+```
+pip install bedboss
+```
+or install the latest version from the GitHub repository:
+```
+pip install git+https://github.com/databio/bedboss.git
+```
+
+---
+
+## 💻 CLI usage:
+Command line documentation is available here: [📑 CLI usage ](./usage.md)
+
+---
+
+## 📑 BEDbase configuration file
+
+To run most of the pipelines, you need to create a BEDbase configuration file.
+
+How to create a BEDbase configuration file is described in the [configuration section](../bedbase/how-to-configure.md).
+
+---
+
+## 🗃️ Main components:
 
 1) **bedmaker** - pipeline to convert various genomic interval file types into BED format and bigBed format. </br>
 2) **bedqc** - quality assessment pipeline of bed files </br>
@@ -28,19 +55,7 @@ they are also available as a python functions, so that user can use them indepen
 
 ---
 
-## Installation
-To install `bedboss` use this command: 
-```
-pip install bedboss
-```
-or install the latest version from the GitHub repository:
-```
-pip install git+https://github.com/databio/bedboss.git
-```
-
----
-
-## BEDboss dependencies
+## 📦 BEDboss dependencies
 Before running any of the pipelines, you need to install the required dependencies.
 
 To check if all dependencies are installed, you can run the following command:
@@ -49,40 +64,39 @@ To check if all dependencies are installed, you can run the following command:
 bedboss check-requirements
 ```
 
-All dependencies can be using this how to documentation: [How to install dependencies](./how-to-install-requirements.md)
 
+To install all R dependencies, you can run the following command:
 
----
-
-## BEDbase configuration file
+```bash
+bedboss install-requirements
+```
 
-To run most of the pipelines, you need to create a BEDbase configuration file.
+Additionally, sometimes you would need to have UCSC tools installed on your system.
+To install UCSC tools, follow initial instructions from the [UCSC website](https://genome.ucsc.edu/goldenpath/help/bigBed.html).
 
-How to create a BEDbase configuration file is described in the [configuration section](./how-to-configure.md).
+---
 
 
----
 
-## Pipelines information
+## ℹ️ Sort information about the pipelines:
 
-### bedmaker
-bedmaker - pipeline to convert supported file types* into BED format and bigBed format. Currently supported formats:
+### - bedmaker
+Bedmaker can convert different interval region set files to BED and bigBed format, cache it using [Geniml bbclient](../geniml/bbclient/bbclient).
 
+Supported formats are:
 - bedGraph
 - bigBed
 - bigWig
 - wig
 
-### bedqc
-flag bed files for further evaluation to determine whether they should be included in the downstream analysis. 
+### - bedqc
+Evaluates bed files if statistically they are correct, and if they should be included in the downstream analysis. 
 Currently, it flags bed files that are larger than 2G, has over 5 milliom regions, and/or has mean region width less than 10 bp.
 This threshold can be changed in bedqc function arguments.
 
-### bedstat
-
-pipeline for obtaining statistics about bed files
+### - bedstat
 
-It produces BED file Statistics:
+Pipeline for obtaining statistics about bed files. Statistics include:
 
 - **GC content**.The average GC content of the region set. 
 - **Number of regions**. The total number of regions in the BED file. 
@@ -96,16 +110,32 @@ It produces BED file Statistics:
 - **5' UTR percentage**. The percentage of the regions in the BED file that are annotated as 5'-UTR.
 - **3' UTR percentage**. The percentage of the regions in the BED file that are annotated as 3'-UTR.
 
-### bedbuncher
+### - bedbuncher
 
-Pipeline designed to create **bedsets** (sets of BED files) that will be retrieved from bedbase.
+Pipeline designed to create **bedsets** (collections of BED files) that will be retrieved from bedbase.
 
 Example bedsets:
 
 - Bed files from the AML database.
 - Bed files from the [Excluderanges](https://github.com/dozmorovlab/excluderanges#bedbase-data-download) database.
 - Bed files from the LOLA database [http://lolaweb.databio.org/](http://lolaweb.databio.org/)
 
-Bedbuncher calculates statistics:
-- Bedset statistics (currently means and standard deviations).
+\*This pipeline is available only in for bedbase processing, and can't be use as a standalone tool.
+
+### - bedclassifier
+
+Pipeline for classifying bed files based on their columns.
+The example output of the bedclassifier is bed_format: `nerrowopeak`/`broadpeak`/`bed` and bed_type: `bed3+5`.
+
+### - refgenome_validator
+
+Pipeline for validating the reference genome of the bed files. 
+It is standalone tool, and can be used independently. It tries to validate and predict the reference genome of the bed files.
+by comparing the regions in the bed file with the reference genome. It produces the ranking of the reference genomes
+where 1 is the best match and 4 is the worst match.
+
+
+### - bbuploader  (correct name GEO uploader)
 
+Module for uploading bed files from GEO database to the BEDbase database and processing them. Data for uploading files 
+are taken from the PEPhub database, where all GEO metadata is stored.
diff --git a/docs/bedboss/changelog.md b/docs/bedboss/changelog.md
@@ -2,6 +2,22 @@
 
 This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.
 
+# [0.5.0] - 2025-01-16
+
+## Added
+
+- Added open_chromatin plot back into processing.
+- Added gtrs dependency, that calculates gc content.
+- Added skipper that automatically skips samples in pep that were already processed.
+- Added lite functionality to main functions that allows to run uploading without using any heavy processing.
+- Added function that will reprocess files, if they were unprocessed in the bedbase.
+- Added function that predicts genome if genome wasn't provided.
+
+## Fixes
+- Important speed improvements.
+- Improved requirements checker.
+- Minor bug fixes.
+
 # [0.4.1] - 2024-09-20
 ## Added
 - Standardization of peps using bedbase bedms schema
@@ -45,4 +61,4 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 
 ## [0.1.0] - 2024-01-26
 ### Added
-- Initial alpha release
+- Initial alpha release
diff --git a/docs/bedboss/how-to-install-requirements.md b/docs/bedboss/how-to-install-requirements.md
@@ -1,13 +1,13 @@
 # How to install R dependencies
 
-0. Install bedboss
-1. Install R: https://cran.r-project.org/bin/linux/ubuntu/fullREADME.html
-2. Download this script: [installRdeps.R](https://github.com/databio/bedboss/blob/dev/scripts/installRdeps.R)
-3. Install dependencies by running this command in your terminal: ```Rscript installRdeps.R```
-4. Run `bedboss check-requirements` to check if everything was installed correctly.
+Before running any of the pipelines, you need to install the required R dependencies.
 
+To do so, you can run the following command:
+```bash
+bedboss install-requirements
+```
 
-# How to install regionset conversion tools:
+# How to install genomic interval region conversion tools:
 
 - **bedToBigBed**: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed
 - **bigBedToBed**: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigBedToBed

diff --git a/docs/bedboss/tutorials/cli/README.md b/docs/bedboss/tutorials/cli/README.md
@@ -0,0 +1,3 @@
+# BEDboss cli
+
+To get infromation about the BEDboss command line interface, please refer to the [📑 CLI usage ](../../usage.md) documentation.
diff --git a/.../bedboss/tutorials/bedbuncher_tutorial.md → ...s/tutorials/python/bedbuncher_tutorial.md b/.../bedboss/tutorials/bedbuncher_tutorial.md → ...s/tutorials/python/bedbuncher_tutorial.md
@@ -4,7 +4,7 @@ Bedbuncher is used to create bedset of bed files in the bedbase database.
 
 ### 1) Create bedbase config file
 
-How to create config file: [configuration section](../how-to-configure.md).
+How to create config file: [configuration section](../../../bedbase/how-to-configure.md).
 
 
 ### 2) Create pep with bed file record identifiers.

diff --git a/...dboss/tutorials/bedclassifier_tutorial.md → ...utorials/python/bedclassifier_tutorial.md b/...dboss/tutorials/bedclassifier_tutorial.md → ...utorials/python/bedclassifier_tutorial.md
@@ -1 +1,4 @@
+# BED classifier tutorial
+
+
 ### 🚧 Tutorial in progress! Stay tuned for updates. We're working hard to bring you valuable content soon!
diff --git a/docs/bedboss/tutorials/bedindex_tutorial.md → ...oss/tutorials/python/bedindex_tutorial.md b/docs/bedboss/tutorials/bedindex_tutorial.md → ...oss/tutorials/python/bedindex_tutorial.md
@@ -2,7 +2,7 @@
 
 ### 1. Create bedbase config file
 
-How to create a BEDbase configuration file is described in the [configuration section](../how-to-configure.md).
+How to create a BEDbase configuration file is described in the [configuration section](../../../bedbase/how-to-configure.md).
 
 
 ### 2. Run bedboss index

diff --git a/docs/bedboss/tutorials/bedmaker_tutorial.md → ...oss/tutorials/python/bedmaker_tutorial.md b/docs/bedboss/tutorials/bedmaker_tutorial.md → ...oss/tutorials/python/bedmaker_tutorial.md
diff --git a/docs/bedboss/tutorials/bedqc_tutorial.md → ...edboss/tutorials/python/bedqc_tutorial.md b/docs/bedboss/tutorials/bedqc_tutorial.md → ...edboss/tutorials/python/bedqc_tutorial.md
diff --git a/docs/bedboss/tutorials/bedstat_tutorial.md → ...boss/tutorials/python/bedstat_tutorial.md b/docs/bedboss/tutorials/bedstat_tutorial.md → ...boss/tutorials/python/bedstat_tutorial.md
diff --git a/docs/bedboss/tutorials/python/ref_genome_tutorial.md b/docs/bedboss/tutorials/python/ref_genome_tutorial.md
@@ -0,0 +1,3 @@
+# Reference genome validator
+
+### 🚧 Tutorial in progress! Stay tuned for updates. We're working hard to bring you valuable content soon!
diff --git a/docs/bedboss/tutorials/tutorial_all.md → .../bedboss/tutorials/python/tutorial_all.md b/docs/bedboss/tutorials/tutorial_all.md → .../bedboss/tutorials/python/tutorial_all.md
@@ -14,7 +14,7 @@ If requirements are not satisfied, you will see the list of missing packages.
 
 ### Step 2: Create bedconf.yaml file 
 To run bedboss, you need to create a bedconf.yaml file with configuration. 
-Detail instructions are in the [configuration section](../how-to-configure.md).
+Detail instructions are in the [configuration section](../../../bedbase/how-to-configure.md).
 
 ### Step 3: Run bedboss
 To run bedboss, you need to run the next command:

diff --git a/docs/bedboss/tutorials/tutorial_run_pep.md → ...boss/tutorials/python/tutorial_run_pep.md b/docs/bedboss/tutorials/tutorial_run_pep.md → ...boss/tutorials/python/tutorial_run_pep.md
@@ -15,7 +15,7 @@ If requirements are not satisfied, you will see the list of missing packages.
 
 ### Step 2: Create bedconf.yaml file 
 To run bedboss run-pep, you need to create a bedconf.yaml file with configuration. 
-Detailed instructions are in the [configuration section](../how-to-configure.md).
+Detailed instructions are in the [configuration section](../../../bedbase/how-to-configure.md).
 
 ### Step 3: Create PEP with bed files.
 BEDboss PEP should contain next fields: sample_name, input_file, input_type, genome.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# BEDboss cli

		To get infromation about the BEDboss command line interface, please refer to the [📑 CLI usage ](../../usage.md) documentation.
Original file line number	Diff line number	Diff line change
		@@ -1 +1,4 @@
		# BED classifier tutorial


		### 🚧 Tutorial in progress! Stay tuned for updates. We're working hard to bring you valuable content soon!
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# Reference genome validator

		### 🚧 Tutorial in progress! Stay tuned for updates. We're working hard to bring you valuable content soon!