Help:Coombs

From CECS wiki
Jump to navigation Jump to search
Ganglia page
http://coombs.cs.ucf.edu/ganglia
Sponsors and contact information
Research focus
bioinformatics
Operating System
Rocks linux / CentOS

See also:

Hardware[edit]

Totals:

  • 6 computation nodes
    • 3T ram
    • 212 cpu cores
    • 2 gpus
  • 2 storage nodes
    • 210T disk storage


Hardware specs break down:

  • 5 cpu nodes
    • 4x (512G, 32 cores, 8T scratch)
      • cpu model: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
    • 1x (1000T, 64 cores, 8T scratch)
      • cpu model: Intel(R) Xeon(R) Gold 6338N CPU @ 2.20GHz
  • 1 gpu node
    • 256G ram
    • 2x nVidia Turing
    • 20 cpu cores
    • cpu model: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
  • 2 storage nodes:
    • (16 cores, 256G) Disk space: 144T using raid 6
    • (56 cores, 500G) Disk space: 66T using raid 6
  • 10G network backbone


large memory jobs[edit]

Several users have complained about large memory jobs not being scheduled in a timely way. If you anticipate needing to run a large memory job (>100G but <500G) please talk to Steve to experiment with large memory reservations.. Two weeks notice would be helpful to make sure the reservation can be placed before other long running jobs absorb the resources.

slurm options[edit]

DO NOT use --nodes or --tasks-per-node slurm options!!!

Instead, use --cpus-per-task and if you need more than 8G per core, --mem-per-cpu or --mem

Some codes may be restartable. This would allow your code to automatically restart in the event of a system crash. Please contact us if you want help making this work.

Temporary options: Options specific to the coombs cluster are:

--qos short
Designate a job as needing less than 5 hours to complete, and exceed other normal job limits
--qos shortp
Like short, job time limited to 3 hours, but after the limit, the job can continue as preemptable
-C zswap
Select a node with experimental larger virtual memory capability for extra large datasets

For a list of other common slurm options, see Help:Slurm#Slurm_options.

For a list of all slurm options, see the sbatch and srun man pages.

resource restrictions[edit]

To try to provide fair distribution of resources between users, slurm fairshare prioritization is enabled. Also, for jobs of unlimited length, users are restricted to using no more than 3 compute nodes at a time.

If you have many short jobs, you can exceed the 3 node limit with the option --qos short which enforces a maximum 2 hour per job time limit.

Shared data[edit]

NCBI blast databases
/share/projects/DB/
Downloader: update_blastdb (dbname...)
partial download, please request updates and additional data sets as necessary
snake
/home/snake

Software[edit]

Where indicated, these packages can be added to the environment with the module load command. Note that these lists may not be complete. Email us if you can't find your software package.

Most dependent modules will be automatically loaded. Modules listed in () are optional and must be separately loaded if features they provide are needed.

Software listed as (OS) in the modules column does not have a module as it is installed as part of the operating system.

Please note that computationally intensive and memory intensive software should not be run on the head node. Attempting to do so may get errors. Please use slurm (srun or sbatch) to run nontrivial software that requires resources.

Software from past operating systems (Ubuntu 16) may no longer work. If you need software marked (as N or 16 or maybe even 18), let us know to check for updates! The listings for obsolete software are retained for reference.

installed software[edit]

name version module dependent modules
20 anaconda multiple 'anaconda3 or miniconda
20 Abyss genome assembler 2.2.4-1 (OS:abyss)
20 BBmap 38.79+dfsg-1 (OS:bbmap)
N beagle git 6/23/2017 beagle/gpu beagle/cpu (cuda) (java)
20 beagle 5.1-191125+dfsg-1 (OS:beagle)
20 bcftools 1.10-3 (OS:samtools)
20 BEAST2, beast2-mcmc 2.6.0+dfsg-1 (OS:beast2-mcmc,doc)
20 Bedtools 2.27.1+dfsg-4ubuntu1 (OS)
16 biobloom biobloom
16 bismark v0.20.0 bismark bowtie2 (samtools)
BLAST+ 2.7.1 blast+
20 Blast+ 2.9.0-2 (OS:ncbi-blast+)
20 Bowtie 1.2.3+dfsg-4build1 (OS:bowtie)
20 Bowtie2 2.3.5.1-6build1 (OS:bowtie2)
16 BPP 3.3a bpp
20 busco 5.4.4 minicoda: busco
20 bwa 0.7.17 (OS:bwa)
20 Canu 1.9+dfsg-1build1 (OS:canu)
20 CD-HIT 4.8.1-2build1 (OS:cd-hit)
16 CheckM checkm
20 circlator 1.5.6-1 (OS:circlator)
20 cufflinks 2.2.1+dfsg.1 (OS:cufflinks)
20 Cutadapt 2.8-2build1 (OS:cutadapt)
20 dadi 2.2.0 (see below)
16 DEMIC 1.0.2 demic
16 DSK git 2018-mar-19 dsk
20 NIH edirect ? edirect
16 FastStringGraph 0.10.13 faststringgraph
16 FragGeneScan 1.30 fraggenescan
18 fastsimcoal2 fsc27 (4/2022) fsc
16 GATK 4.0.2.1 no module: /share/apps/gatk-4.0.2.1
20 tigr-glimmer 3.02b (OS:tigr-glimmer)
20 gnuplot 5.2.8 (OS:gnuplot)
16 GRiD 1.3 anaconda3
20 HiCExplorer 3.7.2 anaconda3/2022.05
20 [ https://github.com/chhylp123/hifiasm hifiasm] 0.18.4, hifiasm
20 HISAT2 2.1.0 (OS:hisat2) (replaces tophat)
20 HMMER 3.3+dfsg2-1 (OS:hmmer)
20 HMMER2 2.3.2+dfsg-6 (OS:hmmer2)
20 python-htseq 0.11.2-2build1 (OS:python3-htseq)
20 IDBA 1.1.3 (OS:idba)
16 iRep (OS/python3)
18 Jellyfish 2.2.9 single
20 Jellyfish 2.3.0-4build1 (OS:jellyfish)
* julia 1.0, 1.5 julia
20 Kallisto 0.46.1+dfsg-2build1 (os:kallisto)
16 MaSuRCA 3.3.0 MaSuRCA (restartable!)
18 MEGAN 6.21.12 megan6
18 MetaMaps Dec 2020 metamaps
18 metaQuast (UNTESTED) 5.0.2 quast
18 migrate 5.0.3 migrate
20 minimap2 2.24 minimap2/2.24
20 MIPhy 1.1.2 anaconda3/2022.05 (command: miphy.py )
20 Mothur 1.42.1-1build1 (OS:mothur)
20 Mr. Bayes 3.2.6+dfsg (OS:mrbayes) (cuda?)
20 freebayes 1.3.2-2 (OS:freebayes)
20 MUMmer 3.23~dfsg (OS:mummer)
20 nanopolish 0.11.3-1build1 (OS:nanopolish)
16 ncbi-genome-downloader (python) head, dtn
18 Nextflow 21.04.3 nextflow
20 PAML 4.9j+dfsg-1 (OS:paml)
18 pbsim2 2020-dec-4 git pbsim2 (apt pbsim?)
PEAR git 4/20/2017 missing ?
20 picard 3.0.0 picard3 (java 17)
20 plink 1.07-6 (OS:plink)
20 Porechop 0.2.4 (os)
16 pplacer v1.1.alpha17 pplacer
18, 20 prodigal 2.6.3 (OS:prodigal)
20 python3 3.8.2 (OS:plython3)
20? qiime 1.8.0+dfsg-4ubuntu1 (OS? ampliconnoise q2templates sortmerna )
18, 20 qiime (multiple) (see below)
18 R 3.6.3 (OS/CRAN)
18 R 4.0.3 R
18 RAxML / exelixis-lab 8.2.11 (OS:raxml)
20 RAxML / exelixis-lab 8.2.12 (OS:raxml)
20 Salmon 8.2.12+dfsg-4 (OS:salmon)
20 racon 1.4.10-1build1 (OS:racon)
20 SequelTools git May 18, 2021 sequeltools
16 rMats 4.0.2 rmats (samtools)
20 RSEM 1.3.3+dfsg-1 (OS:rsem)
18 shapeit4 git 7-oct-2020 single
20 shapeit4 4.1+dfsg-1build1 (OS:shapeit4)
20 rna-star 2.7.3a+dfsg-1build2 (OS:rna-star)
20 Samtools 1.10-3 (OS:samtools,bcftools)
18 Samtools, bcftools, htslib 1.11, 1.15 samtools
20 Singularity renamed to apptainer 1.0.3 (OS)
18 SLiM 3.5 slim
14 SOAPdenovo-trans 1.02
20 Soapdenovo2 genome assembler 241+dfsg (OS:soapdenovo2)
16 SolexaQA v3.1.7.1 single
20 SPAdes 3.13.1+dfsg-2build2 (OS:spades)
20 stacks 2.41+dfsg-1build3 (OS:stacks)
20 stacks 2.63 stacks
16 STAR 2.7.3a single
20 stringtie 2.1.1+ds-2 (OS:stringtie)
N PrichardLab structure software 2.3.4 structure
16 SVDQuartets and PAUP 4a159 single:paup
20 subread 2.0.0+dfsg-1 (OS:subread)
16 taxonkit 0.5.0 single
20 transrate-tools 1.0.0 (OS:transrate-tools)
20 TRF 4.09.1 (2020) trf
20 Trinity 2.6.6+dfsg-6build2 (OS:trinityrnaseq)
20 Trimmomatic 0.39+dfsg-1 (OS:trimmomatic)
N Tophat replaced by hisat2 / OBSOLETE
20 tourmaline 2023.5 tourmaline (note: uses snakemake miniforge3 qiime2-2023.5a)
20 velvet 1.2.10+dfsg1 (OS:velvet)
20 vsearch 2.14.1-3build1 (OS:vsearch)
16 wgsim 24-jan-2019 single
20 dwgsim 0.1.12-3 (OS:dwgsim)
? ELPH 1.0.1 (was in glimmer)
20 ELPH 1.0.1-4build1 (OS:elph)

Python packages installed[edit]

Note: this lists requested packages only, additional packages are also installed.

Package Python notes
TensorFlow 3 GPU node only
Keras 3 GPU node only
biopython 3
pandas 3
Ipyrad 3 (openmpi)
ete3 3
bowtie 3
ncbi-genome-download 2 head, dtn
scikit-learn 3
virtualenv 3

Python virtual environments[edit]

Some python packages have been installed using conda in python virtual environments. To use these, load the corresponding python module (module load ...) and then activate the environment (source activate ...).

Package module venv
busco 5.4.4 miniconda busco
dadi 2.2.0 miniconda dadi
feems (untested) miniconda feems
qiime2-2020.11 anaconda3/97 qiime2-2020.11
qiime2-2021.4 anaconda3/97 qiime2-2021.4
qiime2 2021.4.0 miniconda qiime2-2021.4
qiime2-2023.2 miniconda qiime2-2023.2
qiime2-2023.5 and RESCRIPt miniforge3 qiime2-2023.5 (may be broken)
qiime2-2023.5a and tourmaline qiime2-2023.5a or tourmaline

The environments in a particular conda install can be listed with conda env list

NOTE: If you get a segfault while running source activate ... stop running this on the head node! The source activate needs to be run under slurm as part of your job, or it will run out of resources.

Software not installed[edit]

These may be installed in the future or may problems (as noted)

Shared python environment[edit]

Based on anaconda3/2024.02-1 module, a venv was created and can be loaded with module load mgenomics and was built with

 srun -c 6 --pty conda create -n mgenomics -c conda-forge -c bioconda -c defaults prokka roary snp-sites iqtreequast  fastqc unicycler snippy trimmomatic multiqc

failed or delayed installs[edit]

Name status notes
Racon build errors: malformed cmake older version successfully installed
Pilon java
ballgown R bioconductor package
fastqc java
MetaGene no source, 32bit binary only use web version
GeneMark licensing issues
Usearch licensing issues replaced with vsearch
ASTRAL strange run env -- ask for help
RevBayes
Migrate (2015)
BinPacker (doc) (download) buggy build scripts vcflib  
SignalP license issues use web version
PyMOL license issues

Selected python packages installed[edit]

Python 3 (anaconda3):

  • pyqt

Special notes[edit]

beagle
both cpu and gpu versions are available; the gpu version may work on machines without gpus as well
beast2
This can be used with or without beagle -- beagle should make it faster; it has not been tested with beagle.
mrbayes
alternate versions are available (mb-gpu mb-mpi mb-serial) ; this supports checkpointing but sge configuration is required; ask if you have an extra long running job and want to try either checkpointing, mpi, or cuda/gpu.
rsem
Additional functionality is available if the bowtie or opt-python modules are loaded.

Name[edit]

The coombs cluster was named after both