Valerie Welch

CMPS244 Project Write Up


Introduction:

In the past few years, DNA microarrays have moved biology into the post-genomic era. Already, microarrays have been used to evaluate the expression of thousands of open reading frames in the human genome across a wide variety of conditions(1,2). These conditions can be disease states such as cancer, or different tissue types. Understanding how and when the information stored in the genome is used, will allow us to understand the fundamental differences between tissues and the differences between healthy and disease states.

Before microarrays, biologists were lucky to study more than one gene at a time. Although looking at one gene, there was high confidence if data could be repeated, there was absolutely no information about the cellular context in which that gene's expression pattern was changing. With micorarrays, changes in one gene can be correlated with changes in thousands of other genes, leading to a much clearer picture of the pathways that are regulating the cell. Examining all this data, however, proves to be a logistical nightmare.

In general, the task of analyzing microarray data falls on both biologists and computer scientists. The data must be normalized and analyzed to reduce feature to feature variation in order to produce convincing data. Ideally, the data should be analyzed in multiple ways, by a wide variety of programs. Therefore, the world wide web has become a forum for sharing microarray data between scientists.

My focus in the Ares Lab is to build a microarray that can analyze patterns of human pre-mRNA splicing. Splicing is an essential step in the transmission of information from the genome into a form that can be read and translated into protein. Further, humans and other higher eukaryotes use alternative splicing to generate protein diversity. Alternative splicing often leads to proteins of antagonistic function being encoded by the same gene. In this paper, I will focus on one example, the fas gene. This gene undergoes a complex pattern of alternative splicing that generates two classes of proteins(3). One class are transmembrane proteins that signal the cell to die when they bind with their ligand fasL. The other class of proteins are soluble and are exported out of the cell. The soluble isoforms of fas bind all of the fasL and prevent the signalling events that should kill the cell. Many cancer cells produce the soluble fas and are therefore resistant to signals that tell these cancer cells to die.

Many microarrays that are designed for gene expression studies completely ignore the role of splicing. Since alternative splicing can involve only tiny changes in the mRNA, cDNA microarrays may not be able to distinguish between mRNA isoforms that code for proteins with different functions. Therefore it is difficult to correlate gene expression with protein function. Oligonucleotide microarrays can be used to detect splicing. Short oligonucleotides are sensitive to even the slightest changes in mRNA sequence. The microarray that I have designed harnesses the power of oligonucleotide arrays to look at the splicing and alternative splicing of 63 human genes, in parallel.


Methods:

Building the array:

Features for the splicing sensitive oligonucleotide microarray were extracted from cDNA-genomic alignments. These alignments were obtained through the UCSC Human Genome Browser(4). Each exon in the alignment was identified and numbered. Alternative splicing was identified through the comparison of multiple cDNA alignments to the genome and/or through literature searches that described the alternatively spliced isoforms. Each splice site was examined by hand to determine its validity, and then 40mer oligonucleotides were designed that spanned the splice junction with 20 nucleotides contributed from each exon. For each gene, a constitutive exon and a constitutive splice junction oligo were made to determine the expression of all the isoforms of the gene.

The oligos were spotted on SurModics glass slides with four-fold oversampling.

Creating a searchable database:

The database currently consists of oligo information and will soon include some preliminary microarray data. To make my database searchable, I wrote a short perl script that could search for a particular gene and return a list of the oligos designed to describe the splicing of that gene. Once I had a script that could search and extract information from my database I changed the script so that I could call it as a cgi-bin script from a web page, and the script would return the requested information in HTML. Currently, the only oligo information the program is designed to return, is the melting temperature however, the database itself contains the oligo sequence information. The search results page also contains a figure that describes the numbering system I have used for each exon. The images were screen captured from the UCSC Human Genome Browser and then labeled in Adobe Photoshop. Some images contain comments describing descrepancies between the alignments and the actual splicing of the gene. At the bottom of the results page, there is a link to the Human Genome Browser that will take the user directly to the gene searched for in my database. From the Browser, users will have access to massive amounts of information, including protein predictions, ESTs, and links to Gene Card information.

To try out my database, click here, and type in fas in the query box.

As you can see, the splicing of the fas gene is quite complex. When you click on the link to the Human Genome browser, you will run into a common problem in biology. I have named all my features fas, however, the protein label is TNFRSF6, meaning Tumor Necrosis Factor Receptor Superfamily Member 6. Most of the cDNAs that align to this locus carry different names ie CD-95, Apo-1. Therefore, the link from my results page to the browser requests the locus that aligns a fas cDNA.


Discussion:

Although the this microarray is still in the process of optimization, there should be usable data obtained within the next few weeks. The advantage of building a web-based searchable database for this preliminary microarray data is to make the data accessable for analysis and to introduce this microarray to potential collaborators. The ultimate goal of my project is to understand how changes in mRNA splicing can lead to cancer.

Oligo design has been the most time consuming aspect of this project. Since oligonucleotides are so sensitive to changes in mRNA, they must be identical to the correct splice junction to function properly. The Human Genome Browser alignments are often precise, and define the edges of exons very well. However, small exons ( < 50 nucleotides) are often missed by the alignment algorithm. The Browser output for the gene brca1 is a classic example for this, and a tricky one for anyone interested in alternative splicing. There are three small exons that do not align very well. All three are present in the cDNAs presented by the Browser, however, a cursory look at the browser output seems as if two of the exons are alternatively spliced. Another frequent problem is the lack of sequences in Genbank. The gene p73 has been published as having 5 different isoforms(5), however only one sequence is published in Genbank. Often this means that genes that are alternatively spliced do not appear so in the browser, even though the information is already known.

This project has been my attempt at clearly displaying my nomenclature as to gene name, exon number and splice junction sequence, so that anyone interested in the data that is generated by my microarray, knows exactly what the data represents at the sequence level. Also, one goal that I have not yet met with this project is to include references to the literature for the splicing of each gene. Eventually once I have added these references to my database, they will be linked to their PubMed abstracts.


References:

1. Shoemaker DD. et al. Nature 2001 Feb 15; 409(6822):922-7.

2. Hedenfalk I. et al. N Engl J Med 2001 Feb 22;344(8):601-2.

3. Cascino I. et al. J Immunol 1996 Jan 1; 156(1):13-7.

4. Jim Kent's Human Genome Browser

5. Zaika A. et al. Cancer Research 1999 Jul 1; 59(13):3257-63.