Archive for March, 2008

Book Review: Web Database Applications with PHP and MySQL

Web Database Applications with PHP and MySQL, 2nd Edition, was written by Hugh E. Williams and David Lane and published by O’Reilly in May, 2004.

I purchased this book in 2005 when I was doing some consulting for a microarray company in Massachusetts, where I was adapting the BioArray Software Environment (BASE 1.2), which was written in PHP, to their process. I had set the book aside until recently, when I was motivated to review PHP so I could modify the WordPress theme that I use for this blog.

PHP is the programming language usually referred to by the P in the acronym LAMP, which stands for Linux Apache MySQL PHP. PHP was designed from the beginning to work closely with the Apache httpd web server and with the MySQL database management system. PHP code is easily embedded into HTML, and this makes it easy for relative novices to use a three-tier architecture for their web sites.

I found Web Database Applications with PHP and MySQL an excellent introduction to PHP and MySQL for someone who is skilled at programming using another language. The introductory material on PHP was just enough to get me started, and I quickly learned to refer to the PHP documentation (which is also excellent) when I needed enlightenment. The book also provides several chapters that provide a solid introduction to MySQL 4.1.

The last five chapters of the book are devoted to a complete working example of an online wine store. All source code is available at the authors’ web site, http://www.webdatabasebook.com/. This is an invaluable resource that can serve as the basis for many other PHP projects.

In my opinion, this book is not appropriate for someone who is learning to program or use databases for the first time. Good alternatives might be Learning PHP & MySQL, 2nd Edition, by Michele E. Davis and Jon A. Phillips (published August, 2007, by O’Reilly), and Learning SQL, by Alan Beaulieu (published August, 2005, by O’Reilly). (I read Learning SQL recently, and I recommend it highly. Watch for a review at a later date.)

Since the book was published nearly four years ago, some of the material is dated. In 2004, PHP 5.0 was still in beta; PHP has reached 5.2 by now, and support for PHP 4 is about to end. Similarly, the book recommends using MySQL 4.0 or 4.1, whereas today MySQL 5.0 is very stable. The book provides an excellent set of appendices that explain how to install and configure Apache httpd, MySQL, and PHP for Linux, Windows, and Mac OS X. Again, these instructions are now dated, but they should still provide a useful guide.

I found the index somewhat frustrating to use; some items I was interested in are not found there. These include:

  • @, the error control operator
  • & and =&, operators used for working with references to variables
  • reference, the term referring to a variable that is not passed by value
  • define, the function used to create constants
  • magic constants, such as __FILE__, __LINE__, __FUNCTION__, __CLASS__, and __METHOD__

I don’t use PHP in my daily work; rather, I use Perl. (Python is the next language I plan to learn, but that’s another story.) I like how PHP has combined arrays and hashes into a single array type. I like PHP’s Boolean values, true and false. And I like how easy it is to embed PHP into HTML; it makes creating a web page dynamically more intuitive than Perl CGI does. But I dislike the inconsistent and ugly naming of many standard functions (for example, strtoupper, the PHP equivalent of Perl’s uc), and this seems to be a common complaint about PHP. I also don’t like the large number of global variables and constants that PHP uses.

There is a BioPHP project (PHP for Bioinformatics), at http://biophp.org/, but it is not as well developed as the BioPerl or BioPython projects. The only large bioinformatics project that I’m aware of that uses PHP is the BioArray Software Environment (BASE) 1.2. Version 2.0 of BASE has been completely rewritten in Java.

March 23 2008 | Bioinformatics and Books and Computing | Comments Off

Genome of the Ectomycorrhizal Fungus Laccaria bicolor

Martin et al. published a paper, “The genome of Laccaria bicolor provides insights into mycorrhizal symbiosis”, in the 06 March 2008 issue of Nature. The paper (PubMed record) is accompanied by a News and Views article, “Fungal symbiosis unearthed”, by Dan Cullen of the USDA Forest Products Laboratory in Madison, Wisconsin. (A subscription to Nature is required to obtain electronic copies of the paper and the article.)

Fungi and plants enter a symbiotic relationship in which the microrrhizae of fungi join with the roots of plants, allowing an exchange of nutrients between the organisms. Plants transfer simple carbohydrates, the products of photosynthesis, to the fungi, and they receive in return water and minerals, including ammonia and phosphate ions.

The researchers identified approximately 21,000 genes in the 65 million base pair genome; about 70% of the predicted genes have significant similarity to genes in the sequence databases. As might be expected, the number of predicted membrane-bound transporter proteins is large.

The authors of the paper and Dr. Cullen in his commentary noted that it was expected that the L. bicolor genome would contain genes encoding enzymes for breaking down cellulose and lignin. It was surprising, therefore, that only one gene encoding an endoglucanase with a cellulose-binding domain was identified, and no genes encoding enzymes for breaking down cellulose were found. In contrast, a large number of genes predicted to encode lipases and proteinases were identified, suggesting that L. bicolor derives nutrients from the degradation of bacteria and invertebrates. Martin et al. write:

These observations suggest that the inventory of L. bicolor plant cell wall-degrading enzymes underwent massive gene loss as a result of its adaptation to a symbiotic lifestyle, and that this species is now unable to use many plant cell wall polysaccharides as a carbon source, including those found in soil and leaf litter.

Having now in hand this fungal genome and the genome of the black cottonwood, Populus trichocarpa, which enters into ectomycorrhizal symbiosis, the authors predict it will be much easier to identify the genes that mediate the symbiotic relationship.

March 18 2008 | Biology | Comments Off

The Personal Genome (2)

On 04 March 2008 I wrote a post about the personal genome. In the days since that post, I have expanded my reading about this topic, and I have found two blogs that cover personal genomes/personalized medicine very well.

The first is Gene Sherpas, a blog by Steve Murphy, M.D., about “personalized medicine and you.”

To usher in the new paradigm of personalized medicine we will need to travel a perilous path. Much like the route through the Himalayas it has punished the naive and self-reliant. That is why I have dedicated my life to being a Gene Sherpa. What is a gene sherpa? The Sherpa speaks the language of the trail, he/she knows short cuts and dangerous paths to avoid. This blog is for those wishing to take the journey and those wishing to become Gene Sherpas. Interested? Email me…

Dr. Murphy also writes:

I am the founder of a Personalized Medicine practice (likely the first private practice of its kind). In addition I am the Clinical Genetics Fellow at Yale University until 2010. Now not under contract and that’s why I am posting and running my practice. I also am developing a modern medical genetics curriculum for residents and other physicians. On this blog I am educating the public and hopefully some physicians about the field of genetics and personalized medicine.

The Gene Sherpas blog contains many informative and provocative posts, including a recent post about twin studies and how identical twins aren’t actually so identical when DNA methylation and copy number variation are taken into account.

From Gene Sherpas, I learned about Misha Angrist’s GenomeBoy blog. Dr. Angrist earned a Ph.D. in Genetics at Case Western Reserve University, and he now writes about personal genomics. Dr. Angrist writes:

I work as the Science Editor for the Duke University Institute for Genome Sciences & Policy (although this site and its content are my own). In 2007 I became the fourth subject in Harvard geneticist George Church’s Personal Genome Project. As the PGP moves forward, I am chronicling the dawn of personal genomics, that is, people obtaining their genomic information for whatever reason(s) and figuring out what to do with it. I am interested in the relevant technologies and especially the attendant privacy and other ethical/legal/social issues.

Dr. Angrist posts frequently about his participation in the Personal Genome Project and about the rapid technological advances that will make personal genomes easy to obtain in the near future.

March 17 2008 | Biology | 1 Comment »

Understanding Evolutionary Trees

Today I read “Understanding Evolutionary Trees”, by T. Ryan Gregory, the author of the Genomicron blog. Dr. Gregory mentioned his paper in his blog post, “Evolutionary Trees for Darwin Day”, and the paper appears in the new journal Evolution: Education and Outreach, which is free online through 2008.

Nimravid posted a summary, which was graciously acknowledged today by Dr. Gregory, and that was how I learned about the paper. Nimravid’s summary is good (as are all of Nimravid’s posts), but I recommend reading the entire paper, which is very readable and well-presented. I found that I have a pretty clear idea of how to interpret evolutionary trees, but I learned the differences in the terms dendrogram, cladogram, and phylogram.

Dr. Gregory makes an important point in his paper:

[I]t is impossible to know with certainty that any given phylogeny is historically accurate. As a result, any reconstructed phylogenetic tree is a hypothesis about relationships and patterns of branching and thus is subject to further testing and revision with the analysis of additional data.

In my mind, this is why evolution is a science. An evolutionary tree is a model of the evolutionary relationships among organisms. A good model is consistent with existing observations, and it provides a hypothesis than can be tested as additional data are gathered. If the model is inconsistent with the new data, the model is revised and subjected to testing with even more data.

March 16 2008 | Biology | Comments Off

More Free Rice

Six days ago, I discovered freerice.com.

My wife, who has a bigger vocabulary than I have, achieved a perfect vocabulary level of 55 and sent me a screen shot to prove it:

Perfect vocabulary score at freerice.com

One secret to success is to set up the options to save your score so that you don’t have to start over each time. Another secret to success is to be persistent, because you’ll learn the words you need to achieve a high score.

I haven’t had the time recently to attempt a perfect score, but eventually I’ll get around to trying.

[5-Jul-2008] Five more levels were added in May, 2008, so now the top possible score is 60.

March 15 2008 | Internet | Comments Off

Free Rice With Cyanobacteria

I learned about the Free Rice web site today. The object is to “learn free vocabulary and give free rice.”

The premise is simple. The site presents a word and four possible meanings, and you click on the meaning you think is correct. With each correct guess, you earn 20 grains of rice that the site owner donates to the United Nations World Food Program. Advertisers pay to display ads on the site, so they are the ones who ultimately pay for the donated rice. The game is fun and addictive, and it has the benefit that you can learn some new words.

As a scientist, I have a pretty large vocabulary, so I can reach a vocabulary score of 48. My wife, who edits medical textbooks and consequently has an enormous vocabulary, routinely reaches 53. A score of 55 is the highest possible.

I have learned some new words, some useful and some not. When someone mentions they wore a rebato trimmed with vair, now I know what they’re talking about.

But one of the words hit my hot button. The word was nostoc, and the required answer was blue-green alga. This annoys me because it is like calling a dolphin a fish, a mushroom a plant, or water an element. The answer is incorrect.

What were once called blue-green algae are properly called cyanobacteria. The distinction is important, because cyanobacteria are prokaryotic organisms and algae are eukaryotic organisms.

While I’m ranting: It’s one alga, many algae; one bacterium, many bacteria.

March 09 2008 | Biology and Internet | Comments Off

Re: Notes to a Young Computational Biologist

This is a story of how clueless I can be, but how sometimes, given a sufficient number of opportunities, I can become clueful again.

On 13 March 2007, Bosco Ho wrote a post entitled “Notes to a Young Computational Biologist” on his Trapped in the USA blog. I don’t remember how I happened to come across this post the first time, because I wasn’t reading blogs systematically then, but something about this topic clicked in me, and on 25 March I posted a long comment about some things I thought Dr. Ho had omitted.

Time passed, and in October or November, I received an email from a recruiter who wanted to interest me in a bioinformatics or programming job on the West Coast. I couldn’t figure out where she had heard of me, except that she mentioned in the email that she had seen my name on the boscoh.com web site. This mystified me, because I had forgotten all about the events in March. I did a little poking around on the web site, but I couldn’t find my own name. So I concluded that she was completely mistaken—that she actually wanted to recruit Bosco Ho but had sent me the email in error—and I decided not to respond.

The benefit of this apparent error was that I learned (for what I thought was the first time) about Dr. Ho’s site, which is full of great writing and useful information. I now work with quite a few scientists who are experts at protein structure, but this is a new field for me because I was trained as a DNA jockey (you know—molecular biology, cloning, sequencing, and all that). Trapped in the USA provided me with additional background reading on protein structure.

At the beginning of this year, I decided it was time to start another web site, so I registered sphaerula.com. One of the things my hosting provider offers is an installation of WordPress, and that’s how I got started blogging. Since mid-February, I have been immersed in reading biology and bioinformatics blogs and writing posts of my own. I’ve discovered many wonderful blogs, and I discovered Trapped in the USA for the third time.

On weekends, I have been methodically reading blogs from beginning to end. (This is going to take me a long time as I follow the branches from the blogrolls.) When I began reading Trapped in the USA, I rediscovered Dr. Ho’s “Notes to a Young Computational Biologist” post, and I rediscovered my comments.

So in the course of a little more than 11 months, I’ve made a discovery, a rediscovery, and a rerediscovery. The difference is that now I am fully engaged in blogging and in reading blogs, and this time the discovery will stick with me. I know now that Dr. Ho is a widely respected blogger in the bioinformatics blogging community, and I won’t forget it this time.

By the way, if you happen to be that recruiter, I apologize for not responding. I love the West Coast, having grown up in Oregon and earned my Ph.D. at Berkeley. But I have a good job, I love Boston, and I’m not inclined to move. But I’m happy to recommend a talented computational biologist who is named Bosco Ho.

March 06 2008 | Bioinformatics | 1 Comment »

Next »