a) Name of collaborating institution
Bioinformatics Research Centre (BIRC)
Nanyang Technological University, Singapore
b) Name of collaborator
Associate Professor Lin Feng
Division of Information Systems, School of Computer Engineering
Nanyang Technological University, Singapore
c) Nanyang Technological University, Singapore has a formal university-wide exchange agreement with The University of Melbourne, Australia.
d) Period of study
6 months from 7th of October 2005 to 23rd of April 2006.
e) Nature of collaboration
A long-term collaboration is being initiated through this laboratory placement.
f) Outline of research work intended
Large volumes of research articles have been published on the molecular pathways of hormones that regulates mouse lactation. However, there has been no attempt to consolidate this knowledge into a useable framework. This thesis aims at initiating this framework by building a model of endocrine stimulation of growth and development of mouse mammary epithelial cells, leading to lactation. This project aims to use information extracted from the literature and to establish a set of tools necessary for building the model.
The two main tools needed are, a system for handling and analysis of abstracts from PubMed; and a system for the modeling and simulation of the mouse lactation model. The former is used to manage the process of storing and analysis of published abstracts to yield information, such as endocrine signal transduction and their genetic responses, that forms the basis of the lactation model. The latter will be used as a platform for managing the model built from the extracted information and provides facilities for simulation of the model.
The project focuses on two aspects. Firstly, using a set of 5000 genes [1], it seeks to understand how they are related to each other, in terms of gene-protein and protein-protein interactions. Secondly, it seeks to explore the link from the endocrine stimulus of insulin, prolactin, and glucocorticoid to the set of 5000 genes.
Current Progress
The focus of this project so far has been the establishment of tools needed to build the model. Two separate systems have been created over the course of the last 9 months. The first system is an object-oriented modeling and simulation tool, Mosirium. The architecture of Mosirium and an overview of its workings have been published as a poster in APBC 2005 [2]. The second system is a pipeline for handling and analysis of published abstracts from PubMed for information on gene-protein and protein-protein interactions. Both systems are written in Python programming language using open-sourced libraries.
With both systems in place, the current focus of the project is on the analysis of abstracts for intracellular protein-protein interactions after stimulation of mammary epithelial cells by insulin (i.e. mapping insulin signal transduction pathways), and will be extended to include prolactin signal transduction and glucocorticoid signal transduction pathways.
Outline of Research Project Intended
The proposed laboratory placement in BIRC will focus on two outcomes. Firstly, analysis of literature to examine the relationship between the genes. A process for handling and analysis of abstracts is currently being established. However, this will need to be developed further and synergistic use of text mining tools can be investigated with the assistance of computing expertise in BIRC (BioWare project). Some of the text mining tools that will be investigated in this part of the project are NLTK [3], GATE [4] and ConceptNet [5].
Secondly, the use of Mosirium on clustered computers will be examined. Currently, Mosirium is a standalone system, which is unable to use any clustering facilities that may be present to speed up the process of simulation. Hence, the possibility of cluster-enabling Mosirium will present a major improvement in the modeling and simulation system. This may be possible in two different ways, either high-level clustering or low-level clustering. High-level clustering, also known as application-level clustering, is usually implemented as networked-enabled programs, using methods like Java Remote Method Invocation or CORBA. Low-level clustering, on the other hand, works much closer or directly interfacing with the operation system's kernel through systems like OpenMosix [6] or MPI/PVM. Methods for implementation will be studied using cluster-enabled simulators, like E-Cell 3 [7]. At the same time, advantages and disadvantages of high- and low-level clustering, with respect to Mosirium, will be evaluated. This will be followed by a proof-of-concept implementation on Mosirium to enable it to take advantage of a cluster system.
Estimated Timeline of Work
An estimated monthly timeline of progress is as follows:
October: An initial analysis of abstracts for any interactions between each of the 5000 genes and their gene products. Understanding the clustering mechanism in E-Cell 3 with the assistance of Kouichi Takahashi (architect of E-Cell 3).
November: Examining possible synergistic use of BioWare and the pipeline. Examining the pros and cons of high- and low-level clustering on Mosirium.
December: Examining and testing synergistic use of NLTK with the pipeline. Studying the work of Mr. Qi YuTao (final year PhD candidate in BIRC) on clustering and with assistance from 'HPC' group, examine how it can be applied on Mosirium.
January: Examining and testing synergistic use of GATE and ConceptNet with the pipeline. Prototyping the clustering mechanism of Mosirium.
February: Testing the clustering prototype. Attending APBC 2006 and talking to the authors of E-Cell 3 in Keio.
March: Trial modeling the interactions in the gene set with assistance from researchers from 'in silico modeling' group in BIRC.
April: Clearing up work and seeking possibilities of further collaborations.