Lecture préparative pour BIN6002

  1. Denovo-assembly of NGS data
  2. LINUX Basics
    1. Main linux commands
    2. Aide memoire des commandes linux
    3. Linux tutorials and videos

Literature used for oral presentations 12/13 June 2014

RNA-sequence
------------
Souhila:
1. Transcriptome: Connecting the Genome to Gene Function Jill U. Adam
Nature Education 2008

2. The amazing complexity of the human transcriptome
European Journal of Human Genetics (2005)

3. Pre-mRNA splicing and human disease Genes Dev. 2003 Nuno André
Faustino and Thomas A. Cooper

Articles utilisé pour le pourquoi du transcriptome

4. Brazma et al. One-stop shop for microarray data.
Nature (2000)

5. The beginning of the end for microarrays-
Nature Methods (2008)
Explication de la technique Microarray

6. RNA-Seq: a revolutionary tool for transcriptomics Zhong Wang, Mark
Gerstein, and Michael Snyder Nat Rev Genet. 2009

7. RNA Sequencing: Platform Selection, Experimental Design, and Data
Interpretation Nucleic Acid Ther 2012

8. Principles of transcriptome analysis and gene expression
quantification: an RNA-seq tutorial JOCHEN B. W. WOLF
Explication de la technique ARNSeq

9. RNA-seq and microarray complement each other in transcriptome
profiling : Sunitha Kogenaru, Qing Yan, Yinping Guo and Nian Wang

10. Illumina ARN-Seq Data comparison with gene expresssion Microarray

11. RNA-seq: an assessment of technical reproducibility and comparison
with gene expression arrays Marioni, J.C., Mason, C.E., Mane, S.M.,
Stephens, M. & Gilad, Y. Genome Res. (2008)
Comparaison entre les deux (2) techniques

12. RNA-Seq and expression microarray highlight different aspects of the
fetal amniotic fluid transcriptome. Zwemer LM1, Hui L, Wick HC,
Bianchi DW. Prenat Diagn (May 2014)

13. Comparison of RNA-Seq and Microarray in Transcriptome Profiling of
Activated T Cells. Shanrong Zhao mail, Wai-Ping Fung-Leung, Anton
Bittner, Karen Ngo, Xuejun Liu, journal pone, January 2014
Exemples pratiques de la comparaison entre les deux (2) techniques

14. An efficient rRNA removal method for RNA sequencing in GC-rich
bacteria
Peano et al.; licensee BioMed Central Ltd. 2013

15. Ribo-Zero Gold Kit: improved RNA-seq results after removal of
cytoplasmic and mitochondrial ribosomal RNA 2011 Nature
Articles montrant comment enlever l-rRNA

16. Wolf JB. Principles of transcriptome analysis and gene expression
quantification: an RNA-seq tutorial. Mol Ecol Resour. 2013 Jul;13(4):559-72.

David:
Wolf JB. Principles of transcriptome analysis and gene expression
quantification: an RNA-seq tutorial. Mol Ecol Resour. 2013 Jul;13(4):559-72.
A good introduction of all steps of an RNA-seq experiment.

Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format
for sequences with quality scores, and the Solexa/Illumina FASTQ variants.
Nucleic Acids Res. 2010 Apr;38(6):1767-71.
Explanation of the FASTQ format.

Andrews, S. FastQC A Quality Control tool for High Throughput Sequence Data. 
Babraham Bioinformatics, 2014-06-06. Web. 2014-06-09. 
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
No paper available, but this tool allows for basic quality metrics on readsets.

Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina
sequence data. Bioinformatics. 2014 Apr 28.
A method to clip adaptors and trim reads according to their quality level.

Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with 
RNA-Seq. Bioinformatics. 2009 May 1;25(9):1105-11.
First article explaining the TopHat assembler for RNA-seq.

Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2:
accurate alignment of transcriptomes in the presence of insertions, deletions and
gene fusions. Genome Biol. 2013 Apr 25;14(4):R36.
Describes the improvements to the original TopHat.

Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P,
Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner.
Bioinformatics. 2013 Jan 1;29(1):15-21.
Introduces STAR Aligner, another aligning software with a different strategy.

Engström PG, Steijger T, Sipos B, Grant GR, Kahles A; RGASP Consortium, Rätsch
G, Goldman N, Hubbard TJ, Harrow J, Guigó R, Bertone P. Systematic evaluation of 
spliced alignment programs for RNA-seq data. Nat Methods. 2013
Dec;10(12):1185-91.
A recent comparison of many RNA-seq alignment software.

DeLuca DS, Levin JZ, Sivachenko A, Fennell T, Nazaire MD, Williams C, Reich M,
Winckler W, Getz G. RNA-SeQC: RNA-seq metrics for quality control and process
optimization. Bioinformatics. 2012 Jun 1;28(11):1530-2.
A software providing various metrics to assess the quality of an RNA-seq dataset.

Parts of the following articles also have some interesting elements on RNA-seq assembly:
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat
Methods. 2012 Mar 4;9(4):357-9.
Basic aligner used by TopHat. 

Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for
transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011
Jun;8(6):469-77.
Talks about different approaches to reads alignment in RNA-seq.

Vijay N, Poelstra JW, Künstner A, Wolf JB. Challenges and strategies in
transcriptome assembly and differential gene expression quantification. A
comprehensive in silico assessment of RNA-seq experiments. Mol Ecol. 2013
Feb;22(3):620-34.
Talks about alignment in a simulated dataset, exploring the denovo alignment approach with Trinity. 
This will not be a main focus of the presentation, but is worth mentioning.

Elise:
Wolf JB. Principles of transcriptome analysis and gene expression quantification: an RNA-seq tutorial. 
Mol Ecol Resour. 2013 Jul;13(4):559-72.
This article provides a practical guidance for the many steps involved in a typical RNA-seq work flow


Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. 
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform 
switching during cell differentiation. Nat Biotechnol. 2010 May;28(5):511-5.
This protocol describes in detail how to use Cufflinks to compare gene and transcript expression 
under two or more conditions.

Sigurgeirsson B, Emanuelsson O, Lundeberg J. Sequencing degraded RNA addressed by 3' tag counting. 
PLoS One. 2014 Mar 14;9(3):e91851.
This article describes the effects of the RNA integrity number on gene coverage, false positives in 
differential expression and the quantification of duplicate reads.

Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, 
Griffith M, Raymond A, Thiessen N, Cezard T, Butte field YS, Newsome R, Chan SK, She R, Varhol R, 
Kamoh B, Prabhu AL, Tam A, Zhao Y, Moore RA, Hirst M, Marra MA, Jones SJ, Hoodless PA, Birol I. 
De novo assembly and analysis of RNA-seq data. Nat Methods. 2010 Nov;7(11):909-12.

Vijay N, Poelstra JW, Künstner A, Wolf JB. Challenges and strategies in
transcriptome assembly and differential gene expression quantification. A
comprehensive in silico assessment of RNA-seq experiments. Mol Ecol. 2013
Feb;22(3):620-34.
This article describes how features of the transcriptome, technological processing and 
bioinformatic workflow impact transcriptome quality and inference of differential gene expression (DE).

Anders S, Huber W. Differential expression analysis for sequence count data.
Genome Biol. 2010;11(10):R106.
This article explains the statistical method used in the DESeq package.

Soneson C, Delorenzi M. A comparison of methods for differential expression
analysis of RNA-seq data. BMC Bioinformatics. 2013 Mar 9;14:91.
These articles evaluate different read count-based RNAseq analysis methods.

Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Sal berg SL, Wold BJ, Pachter L. 
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform 
switching during cell diff rentiation. Nat Biotechnol. 2010 May;28(5):511-5.
This paper describes a method for estimating the expression of each reconstructed isoform.



Variants
--------
Nielsen, R., Paul, J. S., Albrechtsen, A. & Song, Y. S. 
Genotype and SNP calling from next-generation sequencing data. Nature reviews. Genetics 12, 443?51 (2011).

Altmann, A. et al. A beginners guide to SNP calling from high-throughput DNA-sequencing data. 
Human genetics(2012). doi:10.1007/s00439-012-1213-z

Ramaswami G, Zhang R, Piskol R, Keegan LP, Deng P, O'Connell MA, Li JB.
Identifying RNA editing sites using RNA sequencing data alone. Nat Methods. 2013 Feb;10(2):128-32.

Yu, X. & Sun, S. Comparing a few SNP calling algorithms using low-coverage sequencing data. 
BMC bioinformatics 14,274 (2013).

Cheng, A. Y., Teo, Y.-Y. Y. & Ong, R. T. 
Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals.
Bioinformatics (Oxford, England) (2014). doi:10.1093/bioinformatics/btu067



Proteomique
-----------
Melissa:
Teixeira, M.C. et al, Teaching Expression Proteomics: 
From the Wet-Lab to the Laptop, Biochemistry and Molecular Biology, Vol. 37, No 5, pp. 279-286 (2009)
Ce papier nous informe sur la méthodologie d-étude de l-expression de protéines et des implications 
cliniques possibles grâce à la protéomique.

Reschiglian, P.  et al, Flow field-flow fractionation: 
A pre-analytical method for proteomics, Journal of Proteomics, pp. 265-276 (2008)
Cet article présente une nouvelle méthode de séparation de haute performance pour les échantillons 
qui difficilement isolables ou purifiables avec les autres approches de séparation.

Qian, P-Y. et al, Proteomics: 
Challenges, Techniques and Possibilities to Overcome Biological Sample Complexity, 
Human Genomics and Proteomics, pp. 1-22 (2009)
Ce papier présente en détail l-axe protéomique et ses applications biologiques. 
Ils passent en revue les différentes techniques utilisées avec leurs caractéristiques et leurs limites.

Wilhelm, M. et al, Mass-spectrometry-based draft of the human proteome, 
Nature, Vol. 509, pp. 582-587 (2014)
L-équipe de Bantscheff présente un nouveau programme de base de données en temps réel. 
Il est beaucoup plus complet que ses prédécesseurs.  

Heck, J.R. et al, Development and application of proteomics technologies in Saccharomyces cerevisiae, 
TRENDS in Biotechnology, Vol. 23, No 12, pp. 598-604 (2005)
Ce papier montre un exemple d-application d-analyse des différents taux d-expression protéiques de 
S. cerevisiae en fonction des différentes conditions de nutriment. Ils utilisent comme techniques, 
l-électrophorèse 2D et la chromatographie en phase liquide couplée à la spectrométrie de masse.

Wilkins, M.R. et al, Progress with Proteome Projects: Why all Proteins Expressed by a Genome Should be 
Identified and How To Do It, 
Biotechnology and Genetic Engineering Reviews, Vol. 13, Issue 1, pp. 19-50 (1996)
Ce journal explique la technique d-électrophorèse 2D du protéome, les méthodes de détections 
et ce que représente une banque de données du protéome ainsi que le nom de ces différentes 
banques de données existantes.

Wilkins, M.R. et al, Bioinformatics meets proteomics- Bridging the gap between mass spectrometry 
data analysis and cell biology, 
The Journal of Bioinformatics and Computational Biology, Vol. 1, No 1, pp. 183-200 (2003)
Ce journal élabore le sujet de la distribution des gènes, donne une définition de la 
spectrométrie de masse, de l-analyse des protéines qui ont subi des modifications post-traductionnelles 
et de la capacité de les détecter en protéomique.

Matt:
1. Altaf-Ul-Amin, M., Shinbo, Y., Mihara, K., Kurokawa, K. & Kanaya, S. Development and implementation 
of an algorithm for detection of protein complexes in large interaction networks. 
BMC Bioinformatics 7, 207 (2006). 

2. Arifuzzaman, M. et al. Large-scale identification of protein-protein interaction of Escherichia coli K-12. 
Genome Res. 16, 686-91 (2006).

3. Butland, G. et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. 
Nature 433, 531-537 (2005).

4. Gasteiger, E. ExPASy: the proteomics server for in-depth protein knowledge and analysis. 
Nucleic Acids Res. 31, 3784-3788 (2003).

5. Hu, J. C., Karp, P. D., Keseler, I. M., Krummenacker, M. & Siegele, D. a. 
What we can learn about Escherichia coli through application of Gene Ontology. 
Trends Microbiol. 17, 269-78 (2009). 

6. Jansen, R., Greenbaum, D. & Gerstein, M. Relating whole-genome expression data with protein-protein interactions. 
Genome Res. 12, 37-46 (2002).

7. Karp, P. D. et al. Multidimensional annotation of the Escherichia coli K-12 genome. 
Nucleic Acids Res. 35, 7577-90 (2007).

8. Keseler, I. M. et al. EcoCyc: a comprehensive view of Escherichia coli biology. 
Nucleic Acids Res. 37, D464-70 (2009).

9. Keseler, I. M. et al. EcoCyc: a comprehensive database of Escherichia coli biology. 
Nucleic Acids Res. 39, D583-90 (2011).

10. Lee, D., Redfern, O. & Orengo, C. Predicting protein function from sequence and structure. 
Nat. Rev. Mol. Cell Biol. 8, 995-1005 (2007).

11. Marcotte, E. M. Detecting Protein Function and Protein-Protein Interactions from Genome Sequences. 
Science (80-. ). 285, 751-753 (1999).
12. Sali, A., Glaeser, R., Earnest, T. & Baumeister, W. From words to literature in structural proteomics. 
Nature 422, 216-25 (2003).

13. Shin, D. H. et al. Structure-based inference of molecular functions of proteins of unknown function 
from Berkeley Structural Genomics Center. 
J. Struct. Funct. Genomics 8, 99-105 (2007). 

14. Sousa, P. M. F. et al. Supramolecular organizations in the aerobic respiratory chain of Escherichia coli. 
Biochimie 93, 418-25 (2011).

15. Daoud, R., Levy, E. & Lang, B. F. Mitochondrial metabolic and housekeeping proteins are organized in 
soluble, multifunctional supercomplexes. 44 (2014). (Unpublished work in progress)

Back to GB's homepage - Teaching