It’s been a while exploring lots of resources to understand how eQTL works and this is to summarize what I have found (or understood) and not to repeat the whole thing to save my time/memory.
Data prep: TCGA RNA-seq offers gene and isoform data which are processed using the rsem program. Here my interest is in utilizing isoform data. A couple of files are provided: 1)*.isoforms.results 2)*.isoforms.normalized_results
To perform eQTL, I began with the *.isoforms.results files that have two columns –raw_count and scaled_estimate: raw_count is equivalent to raw count except it includes uncertain mapping count (which would not change much the reads count); scaled_estimate is similar to FPKM or TPM values which are already processed.
step 1: using a nearZeroVar function offered by caret, screen transcripts
nzv <- nearZeroVar(PRAD_T) ## remove near zeros; row: samples, column: genes
filteredPRAD_T <- PRAD_T[, -nzv]
step 2: prepare normalized counts by:
cds = newCountDataSet(filteredPRAD_T, conditions=surv1$race); # input: row(gene), column(sample)
cds = estimateSizeFactors(cds)
cts = counts(cds, normalized=TRUE) ## expression data prepared
Prepare SNP array by combining them and get gene location by running biomart in r and SNP location from a local SNP reference file.
Mapping using matrix eQTL
This requires a predefined format for each file (gene expression, SNP, Covariates (e.g., race, age, gender), gene location, SNP location).
To accelerate a large matrix computation, openBLAS is a must.
 Dillies MA et al., A comprehensive evaluation of normalization methods for Illumina high-throughput RNAsequencing data analysis, Brief Bioinform. 2013; 14(6):671-83 PMID:22988256