(This needs to be expanded, just getting a few thoughts down)
[[Advice on setting up your computer for common bioinformatics tasks]]

## Sequencing Database
[[sequencing_database_and_sra_submission]]

## Organizing your project
__READ [How to organize a bioinformatics project](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424#pcbi-1000424-g001)__
__READ [Best Practices for Scientific Computing](http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745)__

## Atom (Julin)
link to suggested packages

## Github (Julin)
link to info and to lab repos

## Genome Center Clusters

* Maloof lab info on [[genome center cluster]]
* [Genome center tutorials on UCD cluster usage](https://github.com/ucdavis-bioinformatics-training/A-Primer-on-Using-the-Bioinformatics-Core-Administrated-Servers-and-Cluster-s-)

## R (Ruijuan)

### Tips and Tricks

* You can directly access google sheets from R using the googlesheets library.  See [CRAN](https://cran.r-project.org/web/packages/googlesheets/index.html) or [github](https://github.com/jennybc/googlesheets) repositories for more info.  Follow the basic usage vignette to get started.

## Pipelines
RNAseq (illumina platform)

* [BIS180L](http://jnmaloof.github.io/BIS180L_web/labs/) (the lab Julin created for his undergraduate teaching) has most of the information/guidelines on each step. 

* There is also a paper describing the best practice for RNAseq data analysis, check [me](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8) please.  

Below is a step-by-step instruction: 

1) Quality check: check the quality of raw [fastq](https://en.wikipedia.org/wiki/FASTQ_format) data using [FastQC](https://biof-edu.colorado.edu/videos/dowell-short-read-class/day-4/fastqc-manual) to see whether there is bad sequencing quality and/or adapter contamination. 

2) Trimming: AlWAYS trim raw data! [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic) is a good tool for quality and adapter trimming. Make sure you are using the adapter you used for your library making for trimming.   

3) Quality check: Check the trimmed data using FastQC.
 
4) mapping to reference: 

* 4.a) If you are mapping to reference genome sequence, use gapped aligner [STAR](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf). 

Here is how we usually use STAR: 


*  Firstly index the reference genome with gff3 annotation file: 


          STAR --runMode genomeGenerate --genomeDir ref_genome_dir/ --genomeFastaFiles ref.fa --sjdbGTFfile ref.gff3 --runThreadN 6 --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfeatureExon CDS


Depends on the format of your gff3 file, you may need to modify it a bit. 


* Then map (a paired end data example):

          STAR --genomeDir ref_genome_dir/ --readFilesIn 1.fq 2.fq --outSAMtype BAM SortedByCoordinate --sjdbGTFfile ref.gff3 --quantMode TranscriptomeSAM GeneCounts --twopassMode Basic –alignIntronMax 15000 --outFilterIntronMotifs RemoveNoncanonical --runThreadN 6 --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfeatureExon CDS --outReadsUnmapped fastx 
 
 
* 4.b) If you are mapping to reference CDS, use [kallisto](https://pachterlab.github.io/kallisto/manual). 

* index genome: 

          kallisto index -k 19 -i ref.cds.19.kai ref.cds.fa  

* map: 

          kallisto quant --single --plaintext -l 250 -s 50 -t 12 -i ref.cds.19.kai -o out.dir file.fq

5) Alignment result checking, visualization: check the mapping rate after mapping, very low mapping rate (<75%) to reference genome requires trouble shooting. Use [IGV](http://software.broadinstitute.org/software/igv/) to visualize alignment results. You can also use [Picard](https://broadinstitute.github.io/picard/) CollectInsertSizeMetrics to decide library type and check insert size. 

6) read count file generation: make a master raw read count tsv file from mapping result of each library, kallisto and STAR give you the read count file directly, but sometimes you need to generate read count file from your bam alignment using [Rsubread](https://www.bioconductor.org/packages/release/bioc/manuals/Rsubread/man/Rsubread.pdf) or [HTseq](http://www-huber.embl.de/HTSeq/doc/overview.html). 

7) expression analysis: import tsv file into R for differential expression analysis. [edgeR](https://bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) is the tool we prefer for expression analysis. Refer to [BIS180L](http://jnmaloof.github.io/BIS180L_web/2016/05/19/RNAseq-edgeR/) and [edgeR](https://bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) user manual for differential expression analysis guideline.  

* tips on expression analysis: 

* a) libraries with very low read count compared to other libraries should be excluded. We used 1,000,000 read count as the threshold to keep a library. 

* b) Use plotMDS() to check your sample clustering pattern, biological replicates are expected to cluster together, if not, trouble shooting. 

* c) Check batch effect, if there is, included in the model, read edgeR user manual for batch effect. 

8) GO and promoter motif enrichment analysis: use Goseq package for GO enrichment analysis, Julin also wrote a function for promoter motif enrichment analysis. Check [BIS180L](http://jnmaloof.github.io/BIS180L_web/2016/05/24/RNAseq-Annotation/).  

9) Network/coexpression analysis: use SOM (R package [kohonen](https://cran.r-project.org/web/packages/kohonen/kohonen.pdf)) or [WGCNA](https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/) for coexpression analysis on variance stabilizing transformed data with [DEseq](https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq/inst/doc/DESeq.pdf).    

SNP calling

transcriptome assembly

QTL Analysis

mixed model tutorials

Bayesian advice

## programs and settings
ImageJ https://imagej.nih.gov/ij/ 

LeafJ https://bitbucket.org/jnmaloof/leafj
## Tips on screen command
screen command in general https://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/

Start screen

          screen
resumes a detached screen session

          screen -r
HOW TO ACCESS ATTACHED SCREEN AFTER CONNECTION DROPPED

          screen -D -r ****
Please see "how to kill screen" http://stackoverflow.com/questions/1509677/kill-detached-screen-session
    screen -X -S [session # you want to kill] kill
You can kill a detached session which is not responding within the screen session by doing the following.

type "screen -list" to identify the (detached) screen session. eg: screen -list There are screens on: 20751.Melvin_Peter_V42 (Detached) Note: "20751.Melvin_Peter_V42" is your session id.

get attached to the detached screen session eg: screen -r 20751.Melvin_Peter_V42

Once connected to the session which might or might not respond, do the following. press "Ctrl + a" (there wont be any changes in your window now) type ":quit" ( its a colon[:] followed quit)

Thats its your remote screen session will be terminated now.