Skip to content

Commit

Permalink
updated bedbase and geniml
Browse files Browse the repository at this point in the history
  • Loading branch information
khoroshevskyi committed Jan 29, 2025
1 parent 33b09bd commit 588613e
Show file tree
Hide file tree
Showing 19 changed files with 862 additions and 226 deletions.
Original file line number Diff line number Diff line change
@@ -1,15 +1,23 @@
# How to create bedbase config file

### Bedbase config file is yaml file with 4 parts:
- paths and vector models
- relational database credentials
- qdrant credentials
- server information
- remote info
- pephub info
- s3 credentials
BEDbase config file serves as a configuration file for the BEDbase server to provide credentials and paths to the required resources.

### Example:
### How to create a bedbase config file

There are two ways to create a bedbase config file: </br>
1. Create a new file and copy the content from the example below. </br>
2. Use [BEDboss](../bedboss/README.md) command:
```bash
bedboss init-config --outfolder path/to/outfolder
```

### How to check if the config file is correct:
Use [BEDboss](../bedboss/README.md) command:
```bash
bedboss verify-config --config path/to/config.yaml
```

### Example of the config file:
```yaml
path:
remote_url_base: http://data.bedbase.org/
Expand Down
98 changes: 64 additions & 34 deletions docs/bedboss/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,34 @@
A command-line tool and Python package for managing and processing genomic interval region files and bedsets in BEDbase.
BEDboss is highly related to BEDbase, nevertheless, it can be used as a standalone tool for calculating statistics, converting files, and verifying the quality of BED files.

### Main components:
---

## 💿 Installation
To install `bedboss` use this command:
```
pip install bedboss
```
or install the latest version from the GitHub repository:
```
pip install git+https://github.com/databio/bedboss.git
```

---

## 💻 CLI usage:
Command line documentation is available here: [📑 CLI usage ](./usage.md)

---

## 📑 BEDbase configuration file

To run most of the pipelines, you need to create a BEDbase configuration file.

How to create a BEDbase configuration file is described in the [configuration section](../bedbase/how-to-configure.md).

---

## 🗃️ Main components:

1) **bedmaker** - pipeline to convert various genomic interval file types into BED format and bigBed format. </br>
2) **bedqc** - quality assessment pipeline of bed files </br>
Expand All @@ -28,19 +55,7 @@ they are also available as a python functions, so that user can use them indepen

---

## Installation
To install `bedboss` use this command:
```
pip install bedboss
```
or install the latest version from the GitHub repository:
```
pip install git+https://github.com/databio/bedboss.git
```

---

## BEDboss dependencies
## 📦 BEDboss dependencies
Before running any of the pipelines, you need to install the required dependencies.

To check if all dependencies are installed, you can run the following command:
Expand All @@ -49,40 +64,39 @@ To check if all dependencies are installed, you can run the following command:
bedboss check-requirements
```

All dependencies can be using this how to documentation: [How to install dependencies](./how-to-install-requirements.md)

To install all R dependencies, you can run the following command:

---

## BEDbase configuration file
```bash
bedboss install-requirements
```

To run most of the pipelines, you need to create a BEDbase configuration file.
Additionally, sometimes you would need to have UCSC tools installed on your system.
To install UCSC tools, follow initial instructions from the [UCSC website](https://genome.ucsc.edu/goldenpath/help/bigBed.html).

How to create a BEDbase configuration file is described in the [configuration section](./how-to-configure.md).
---


---

## Pipelines information
## ℹ️ Sort information about the pipelines:

### bedmaker
bedmaker - pipeline to convert supported file types* into BED format and bigBed format. Currently supported formats:
### - bedmaker
Bedmaker can convert different interval region set files to BED and bigBed format, cache it using [Geniml bbclient](../geniml/bbclient/bbclient).

Supported formats are:
- bedGraph
- bigBed
- bigWig
- wig

### bedqc
flag bed files for further evaluation to determine whether they should be included in the downstream analysis.
### - bedqc
Evaluates bed files if statistically they are correct, and if they should be included in the downstream analysis.
Currently, it flags bed files that are larger than 2G, has over 5 milliom regions, and/or has mean region width less than 10 bp.
This threshold can be changed in bedqc function arguments.

### bedstat

pipeline for obtaining statistics about bed files
### - bedstat

It produces BED file Statistics:
Pipeline for obtaining statistics about bed files. Statistics include:

- **GC content**.The average GC content of the region set.
- **Number of regions**. The total number of regions in the BED file.
Expand All @@ -96,16 +110,32 @@ It produces BED file Statistics:
- **5' UTR percentage**. The percentage of the regions in the BED file that are annotated as 5'-UTR.
- **3' UTR percentage**. The percentage of the regions in the BED file that are annotated as 3'-UTR.

### bedbuncher
### - bedbuncher

Pipeline designed to create **bedsets** (sets of BED files) that will be retrieved from bedbase.
Pipeline designed to create **bedsets** (collections of BED files) that will be retrieved from bedbase.

Example bedsets:

- Bed files from the AML database.
- Bed files from the [Excluderanges](https://github.com/dozmorovlab/excluderanges#bedbase-data-download) database.
- Bed files from the LOLA database [http://lolaweb.databio.org/](http://lolaweb.databio.org/)

Bedbuncher calculates statistics:
- Bedset statistics (currently means and standard deviations).
\*This pipeline is available only in for bedbase processing, and can't be use as a standalone tool.

### - bedclassifier

Pipeline for classifying bed files based on their columns.
The example output of the bedclassifier is bed_format: `nerrowopeak`/`broadpeak`/`bed` and bed_type: `bed3+5`.

### - refgenome_validator

Pipeline for validating the reference genome of the bed files.
It is standalone tool, and can be used independently. It tries to validate and predict the reference genome of the bed files.
by comparing the regions in the bed file with the reference genome. It produces the ranking of the reference genomes
where 1 is the best match and 4 is the worst match.


### - bbuploader (correct name GEO uploader)

Module for uploading bed files from GEO database to the BEDbase database and processing them. Data for uploading files
are taken from the PEPhub database, where all GEO metadata is stored.
18 changes: 17 additions & 1 deletion docs/bedboss/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,22 @@

This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.

# [0.5.0] - 2025-01-16

## Added

- Added open_chromatin plot back into processing.
- Added gtrs dependency, that calculates gc content.
- Added skipper that automatically skips samples in pep that were already processed.
- Added lite functionality to main functions that allows to run uploading without using any heavy processing.
- Added function that will reprocess files, if they were unprocessed in the bedbase.
- Added function that predicts genome if genome wasn't provided.

## Fixes
- Important speed improvements.
- Improved requirements checker.
- Minor bug fixes.

# [0.4.1] - 2024-09-20
## Added
- Standardization of peps using bedbase bedms schema
Expand Down Expand Up @@ -45,4 +61,4 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm

## [0.1.0] - 2024-01-26
### Added
- Initial alpha release
- Initial alpha release
12 changes: 6 additions & 6 deletions docs/bedboss/how-to-install-requirements.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# How to install R dependencies

0. Install bedboss
1. Install R: https://cran.r-project.org/bin/linux/ubuntu/fullREADME.html
2. Download this script: [installRdeps.R](https://github.com/databio/bedboss/blob/dev/scripts/installRdeps.R)
3. Install dependencies by running this command in your terminal: ```Rscript installRdeps.R```
4. Run `bedboss check-requirements` to check if everything was installed correctly.
Before running any of the pipelines, you need to install the required R dependencies.

To do so, you can run the following command:
```bash
bedboss install-requirements
```

# How to install regionset conversion tools:
# How to install genomic interval region conversion tools:

- **bedToBigBed**: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed
- **bigBedToBed**: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigBedToBed
Expand Down
3 changes: 3 additions & 0 deletions docs/bedboss/tutorials/cli/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# BEDboss cli

To get infromation about the BEDboss command line interface, please refer to the [📑 CLI usage ](../../usage.md) documentation.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Bedbuncher is used to create bedset of bed files in the bedbase database.

### 1) Create bedbase config file

How to create config file: [configuration section](../how-to-configure.md).
How to create config file: [configuration section](../../../bedbase/how-to-configure.md).


### 2) Create pep with bed file record identifiers.
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
# BED classifier tutorial


### 🚧 Tutorial in progress! Stay tuned for updates. We're working hard to bring you valuable content soon!
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

### 1. Create bedbase config file

How to create a BEDbase configuration file is described in the [configuration section](../how-to-configure.md).
How to create a BEDbase configuration file is described in the [configuration section](../../../bedbase/how-to-configure.md).


### 2. Run bedboss index
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
3 changes: 3 additions & 0 deletions docs/bedboss/tutorials/python/ref_genome_tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Reference genome validator

### 🚧 Tutorial in progress! Stay tuned for updates. We're working hard to bring you valuable content soon!
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ If requirements are not satisfied, you will see the list of missing packages.

### Step 2: Create bedconf.yaml file
To run bedboss, you need to create a bedconf.yaml file with configuration.
Detail instructions are in the [configuration section](../how-to-configure.md).
Detail instructions are in the [configuration section](../../../bedbase/how-to-configure.md).

### Step 3: Run bedboss
To run bedboss, you need to run the next command:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ If requirements are not satisfied, you will see the list of missing packages.

### Step 2: Create bedconf.yaml file
To run bedboss run-pep, you need to create a bedconf.yaml file with configuration.
Detailed instructions are in the [configuration section](../how-to-configure.md).
Detailed instructions are in the [configuration section](../../../bedbase/how-to-configure.md).

### Step 3: Create PEP with bed files.
BEDboss PEP should contain next fields: sample_name, input_file, input_type, genome.
Expand Down
Loading

0 comments on commit 588613e

Please sign in to comment.