After my confirmation and had Andrew as my 2nd co-supervisor, he got me a presentation slot in the Language Technology Seminar Series, organized by Steven Bird’s group in the department of computer science and software engineering (CSSE). My presentation was “Functional modeling of mouse lactation using published abstracts and text mining.” Steven Bird is one of the big names in computational linguistics and he is also the editor of several computational linguistics journals, as well as one of the authors of NLTK. Hence, having his group’s validation on my work is really crucial at that point in time. My work will be deeply involved in extracting protein-protein interactios from the literature and potentially mine for novel interactions. Existing protein-protein interactions can be evaluated using a known corpus – a set of text where all the protein-protein inteactions are known. I could then evaluate my system based on how many interactions it misses and how many interactions it got wrong. The main concern then was how to validate novel protein-protein interactions. The general consensus from Steven Bird’s group was that my work seems interesting and the only way for evaluating novel hypotheses is by biological experimentation or theoretical evaluation – is there biological reason for each potentially novel hypothesis?
By then, I was working on a line of research that is in contradiction to Andrew’s school of thoughts and that puts me in direct confrontation with him – something that dominates my entire second year of my PhD.
As I was reading papers on biomedical text mining, it was clear to me that there were 2 schools of thought and I subsequently wrote the literature review to examine these 2 schools of thought. On one hand, there was a school that considers biomedical text to be a specialized form of literature and cannot be adequately processed with generic natural language processing tools. Hence, a number of specialized tools were developed to process biomedical text. I called this school as the “specialists”. On the other hand, some considered biomedical text to be insufficiently domain-specific that requires a completely new set of tools in order to process it. Perhaps generic tools can be used to a large extend and only develop specialized tools when generic tools have been shown to fail, such as recognizing gene and protein names. I called this school as the “generalists”.
Andrew was a specialist while I am a generalist.
Instinctively, I feel that even though I accept that biomedical text has a lot of domain-specific language constructs and certainly not the type of generic writing as newspapers or emails which generic text processing tools were initially developed for, I do not think that biomedical text is as specialized as legal text or poetry. Moreover, I had read papers that employed generic text processing tools on legal text. However, Andrew did not buy my argument. Instead, he expected me to follow his line of thought. He wanted me to replicate the work of Textpresso{1}, which is a specialized tool and work from there. There is nothing wrong with Textpresso. In fact, I find Textpresso to be a highly-motivating piece of work and I had learnt a lot from it. Looking back, there may be instances where I held my beliefs too strongly and failed to appreciate Andrew’s arguments. Nevertheless, back then, I find this train of thoughts and actions to be contradictory to my first held principle of a PhD candidature – to develop my own philosophy.
As a result, there was a massive disagreement between Andrew and myself to the point whereby he told me that he does not think my work or the path that I am pursuing deserves a PhD and I will fail my PhD regardless. As noted by Kevin, I was quite shakened. To me, I took it that Andrew felt that nothing I did can deserve a PhD and I certainly did not take it well. Andrew had even taken one of my early chapters to be reviewed by a staff member of Steven Bird’s group. That chapter was essentially a rejected piece of work from the Fourth Asia-Pacific Bioinformatics Conference. By then, I recognized that it was an absolutely terrible piece of work. Of course, the review comments from Steven Bird’s group was not kind and that became part of Andrew’s armory. The general comment was (as extracted from my logbook kept during that period) “Overall this portion of the work has distinct lack of scientific rigour in a number of dimensions. There is no extant abstract experimental design, which could be reviewed and verified, hence improving the overall direction of research. Many other works are cited either completely out of context; interpreted considerably different to the original published findings; lack evidence of peer review; or all three of the above. There is little evidence of coherent understanding [of] previous work particularly in language technology, and much assumption based on scant, if any, empirical evidence. Little appreciation is shown for the need to systematically evaluate each and every step in the workflow either against gold standard or internally for consistency via cross validation.”
Looking at it today, it is still pretty damning. Nevertheless, there was a way out – since the main concern appears to be scientific rigour, Andrew suggested in front of both Kevin and Christophe that I should write up my literature review and rewrite the terrible chapter and submit to 4 of his suggested external experts for review on the scientific rigours of my work. I see that there is no other way out and I gladly accepted this challenge. After working on it for almost 5 weeks, I had my literature review and first experimental chapter written up – about 40 thousand words in all – and sent it off to the 4 experts.
About 2 weeks later, on the second last day of the CRC conference of 2006 which was held in Seaworld in Sydney, we had some relaxation time before the conference dinner and I took a walk along the beach. The sea was calm and air was tranquil. I went into an internet café to check my email which cost me 4 dollars for an hour. That was the most important 4 dollars I spent that year – I received an email from Thomas Rindflesch, one of the 4 external experts designated by Andrew Lonie, which says,
Maurice,
In general I think your dissertation demonstrates scientific rigor regarding natural language processing for biology. Although it is a matter of style, I think it would be good to discuss the contribution of your work at the beginning of the introduction, rather than at the end of the review of the literature. I would also recommend that at this point you expand the discussion just a bit regarding the use of a generic system. You need to emphasize the significance of your contribution. As for the system itself, you need more detail about finding SVO. This is crucial in supporting the accuracy of protein-protein interactions. For example, it's not clear whether the S, V, and O have to be contiguous. Whether they are or not has a significant effect on accuracy of results. You may want to look at two of my papers related to you work. I've attached a copy of the first one; the other is readily available through PubMed. Good luck on your dissertation.
-Tom Rindflesch
In fact, that was the only reply that I received out of the 4. With this, I won the battle against Andrew Lonie. I told Kevin that I really do not need a co-supervisor that does not think I should pass my PhD. It was clear that I wanted him out of my advisory committee at that point in time. I even went to the extent of telling Kevin and Christophe that I had drew a line with regards to Andrew’s supervisory contributions towards my thesis and I do not want anything else to do with him. Andrew recognized that his designated external expert is on my side and acknowledged it when Christophe and I approached him to sign the second year confirmation papers. He then went to talk to David MacMillian, then Head of Zoology, and got himself out of my advisory committee.
At the end of it, I acknowledged Andrew’s contribution as a supervisor in my PhD thesis as “I extend many thanks to Andrew and Feng for your constructive criticisms and valuable suggestions to point me in a correct direction.” Despite all the unhappiness, I must say that Andrew used his much needed tough love to steer me from a train-wreck even though it brought some collateral damage to him at that point in time{2}. I deeply apologize for my terrible attitude towards him back then. I had also given him a copy of my final bounded thesis to show him that I had done it and hopes that he will be proud.
I further send my 2 chapters to 8 other experts, only 1 replied - Professor Jonathan Wren, an associate editor of Bioinformatics.
Maurice:
Interesting work, and it seems promising. I think you need to benchmark it on some more datasets. Try to get ahold of some of the BioCreative and KDD Cup datasets - they're good for benchmarking protein-protein interactions. You also need a discussion on context. Some interactions are context-specific. For example, insulin increases glucose concentration in cells. Insulin decreases glucose concentration in the bloodstream. So it depends upon your perspective.
Also, it will be very valuable and informative to benchmark it on a large dataset - millions of abstracts. Small datasets are nice & neat and all the rage, but researchers get excited when the possibility arises that some system could possibly be applied to massive datasets with reasonable accuracy. Achieving 90% precision & 80% recall sounds impressive, but if it's only from evaluating 50 abstracts, it's not. So I think you need to perform a few scale tests to see how scale affects F-score.
Good luck!
That was dated 12/10/06. Jonathan's comments helped me to get that chapter accepted and presentated a year later at the Second IAPR Workshop on Pattern Recognition in Bioinformatics (PRIB 2007) and was published in Lecture Notes in Computer Science, volume 4774 – my first manuscript published from my PhD work. Needless to say, I felt very good about it.
Other than the long drawn-out incident with Andrew, the rest of my year went through rather smoothly. In fact, I was not in Melbourne for half of my 2nd year – I did an “overseas” attachment in Bioinformatics Research Centre (BIRC) in Nanyang Technological University, Singapore. I will describe this part of my year in the next chapter.
It was also during this year that Joly finished her honours degree and continued her PhD in Kevin’s lab – working on milk and stomach development – co-supervised by Kevin and Mary. But by that time, I was no longer an experimental biologist but a bioinformaticist. Even then, I had the tendency to pop into the lab once a day or so, just to make conversations with Sonia and the rest. It had only occurred to me much later, some five years after I had left Melbourne, that the underlying psychological need to walk into the lab for conversations is the biological lab itself. I can meet the same people in caferia or in pantry but it is the settings of a biological lab that was endearing – looking at rows of labeled blue-cap bottles, jars of autoclaved microfuge tubes and boxes of micropipette tips on the bench – that gave me mental comfort.
Something I learnt about this time in my life is that when someone asked for a big help, it did not always mean that the person should be helped. Asking does not mean deserving. I recalled an incident whereby a person learnt of my lab attachment stint at Robb de Iongh’s lab back in my undergraduate days and sought Joly’s and my help to recommend him for similar stint at Kevin’s lab. I did have a short conversation with Joly before recommending him to Kevin, whom very kindly accepted him in the lab. However, it turned out that his timing was inconsistent – as what I would say – appear as and when he likes. As a result, it was difficult to rely on him being around in the lab and when he appeared, he had to do something. To put it simply, he was giving more disruption than assistance. Granted that he need not be there for the purpose of his degree, he had also failed to recognize that by our recommendation, we were using putting our credibility on the line. In fact, I was even kindly asked to try not to recommend people unless I knew what I was doing. The words were kind but the message was clear. I was disappointed with his attitude though I did not say anything to him back then. After this instance, I was very hesitant to recommend anyone else. My bar had just increased tremendously even up to today and I do have to thank this ex-friend for such an important lesson in helping others. If asked for a reference, I will consider supplying a reference but I will be very cautious about putting my own credibility to get something done for another person. Asking for help is always a first step but it has to be underpinned by load of deserving; many who ask do not deserve. It was probably about 2 years later that I told this ex-friend that he has to follow through when being helped and explained this incident, which dented some of Joly’s and my credibility, and he said that I had judged him. Yes, I did and till now, I do not feel guilty about it. That was the last time I ever conversed with him.
Another person that came into my life and destined to play an important role in my life is Phil Au. As mentioned before, I had known Phil during my honours year but it was during this time that I knew him much better. Phil did his honours project under Mary – same batch as Joly. Phil went on to do his PhD under Mary and Lynne Selwood, a professorial fellow in Zoology, working on marsupial oocyte development, mainly on the oocytes of dunnarts. Phil has an interesting history. Even though he has a Chinese name, he was Vietnamese by heritage. He spent his childhood in France, which is why he can speak some Francais, before migrating to Australia. One thing about Phil’s project was that he was scheduled to take care of the dunnarts during some weekends and I tend to drop into office to do my work on weekends as well before meeting Edwin and gang for dinner. We also shared the same postgraduate office. Hence, more often than not, we had coffee somewhere along Lygon Street before or after he is done with his animal husbandry task. Gradually, we became good friends and he became my morning tea/coffee mate at Blue Zone. Half the time, we had serious scientific discussions. Half the time, we were making fun of the people we saw as we were having our coffee or tea – like the “headless chicken”, a guy that walked rather briskly looking lost and turning his head rapidly like a chicken; or “Sow, the mother pig”, a guy who was always dressed in blazer for class and in gym, he will take a book and walk on the treadmill for 20 minutes. We planned to convocate together, which we did.
As part of CRC’s education programme, both Joly and I got funded to attend a communication training workshop organized by Naill Byrne. The purpose was to train scientists to present their ideas in public. I recalled it to be expensive but worth it. In two days, we were taught how to talk in radio in television interviews. For the workshop, Naill had engaged notable radio and TV presenters to give us tips. There was even time allocated for each of us to be interviewed, recorded, and commented on by the presenters. It was 2 days of fun and learning. The main thing that I got out of it was the importance of soundbites – to be quoted in any media; you have to be able to present your key ideas clearly and succulently in 8 to 10 seconds with out blabbering. This was exactly what I used when doing a phone interview with Geoff Maslen, who was writing a piece for The Australian Financial Review on postgraduate research in Australia.
My featured article in The Australian Financial Review (Monday, 8 May 2006) by Geoff Maslen.