Genome Center Cluster

Requesting an account

Request an account by registering at the genome center computing page

Logging on

Use ssh or mosh to logon to the cluster head node at barbera.genomecenter.ucdavis.edu

first modify .ssh/config on your computer to contain

GSSAPIAuthentication=yes

then at the Linux/Unix command line on your computer, enter

kinit -l 14d  jmaloof@GENOMECENTER.UCDAVIS.EDU #14 day max; change the username to yours...

Once you logon to Barbera with mosh or .ssha you will find that you do not have permission for your home directory. To authorize yourself:

kinit -l14d #only needs to be done once every 14 days
aklog # enables your home directory.  If you get a message about not being authorized, then do kinit first.

Working directory

Your home directory has little storage space. For analyses, please use the Maloof Lab share

cd /share/malooflab

Please create your own directory within malooflab

Shared files

Please keep genome fasta files, etc at

cd /share/malooflab/ref_genomes

Using the cluster

Most analyses are done by submitting a batch script to the queue Do no analyses from the head node!!

modules

You will need to load modules that contain the programs you want to use. You can see what is available with:

module avail

Slurm Script

You need to create edit a slurm script to submit commands for processing. These scripts can either be simple (a single job) or an array job.

single job

The script below is to run STAR on one fastq file at a time, using 16 cpus

  #!/bin/bash
  #SBATCH --partition=production # partition to submit to
  #SBATCH --job-name=Sol150_Star_Run # Job name
  #SBATCH --nodes=1 # single node, anything more than 1 will not run
  #SBATCH --ntasks=16 # equivalent to cpus
  #SBATCH --mem=100000 # in MB, memory pool all cores, default is 2GB per cpu
  #SBATCH --time=1-00:00:00  # expected time of completion in days, hours, minutes, seconds, default 1-day
  #SBATCH --output=Sol150_Star_Run_single.out # STDOUT
  #SBATCH --error=Sol150_Star_Run_single.err # STDERR
  #SBATCH --mail-user=jnmaloof@ucdavis.edu #
  #SBATCH --mail-type=ALL #
  # This will be run once for a single process
  /bin/hostname

  start=`date +%s`

  # Load STAR Module 2.5.2b

  module load star/2.5.2b

  # Change directory

  cd /share/malooflab/Julin/Solanum/Sol150

  #files=`ls fastq`
  files=`ls -1 fastq_cat | head -n 1`
  for f in $files
  do
        fbase=`basename $f .fastq.gz`
        mkdir -p STAR/${fbase}.STARout
        STAR \
        --genomeDir /share/malooflab/ref_genomes/S_lycopersicum/SL3.00_STAR_REF \
        --readFilesIn /share/malooflab/Julin/Solanum/Sol150/fastq_cat/${f} \
        --quantMode TranscriptomeSAM GeneCounts \
        --twopassMode Basic \
        --alignIntronMax 10000 \
        --runThreadN 16 \
        --outSAMtype BAM SortedByCoordinate \
        --outFileNamePrefix ./STAR/${fbase}.STARout/${fbase}_ \
        --outReadsUnmapped Fastx \
        --outSAMattrRGline ID:${fbase} \
        --readFilesCommand zcat

  done

  end=`date +%s`
  runtime=$((end-start))
  echo $runtime seconds to completion

array job

this script creates a separate job for each fastq file.

#!/bin/bash
#SBATCH --partition=production # partition to submit to
#SBATCH --job-name=Brapa_Kallisto # Job name
#SBATCH --array=0-63 #for this script adjust to match number of fastq files
#SBATCH --nodes=1 # single node, anything more than 1 will not run
#SBATCH --ntasks=01 # equivalent to cpus, stick to around 20 max on gc64, or gc128 nodes
#SBATCH --mem=4000 # in MB, memory pool all cores, default is 2GB per cpu
#SBATCH --time=0-01:00:00  # expected time of completion in hours, minutes, seconds, default 1-day
#SBATCH --output=Kallisto_%A_%a.out # STDOUT
#SBATCH --error=Kallisto_%A_%a.err # STDERR
#SBATCH --mail-user=jnmaloof@ucdavis.edu #
#SBATCH --mail-type=ALL #

# This will be run once for a single process

/bin/hostname

start=`date +%s`

# Load Kallisto

module load kallisto

# Change directory

cd /share/malooflab/Julin/Brapa_microbes/20180202-samples/

# Identify each array run

echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID

# create an array of file names:
filelist=($(ls 20180202-data/raw-fastq/*/*gz))

# now pick the file that corresponds to the current array
# note that for this script the number of arrays should equal the number of files
f=${filelist[${SLURM_ARRAY_TASK_ID}]}

# trim off directory info and file extensions:
outdir=$(basename $f .fastq.gz)
echo "file stem: " $outdir

kallisto quant \
    --index /share/malooflab/ref_genomes/B_rapa/V3.0/B_rapa_CDS_V3.0_k31_kallisto_index   \
    --output-dir 20180202-data/kallisto_outV3.0/$outdir \
    --plaintext \
    --single \
    -l 250 \
    -s 40 \
    $f

end=`date +%s`
runtime=$((end-start))
echo $runtime seconds to completion

submitting your script

sbatch script.slurm

checking on your job status

squeue -u jmaloof #change to your username

interactive session

If you need to install packages (i.e. for R) or compile programs, or move large files (i.e. sftp) you should start an interactive session. Logon to Barbera first, and then from Barbera:

screen
srun -p production -N 1 -n 1 --time=0-04 --mem=4000 --pty /bin/bash

Genome Center Cluster genome_center_cluster