Tuesday, May 08, 2012

Design Philosophies of Developer Tools

Clojure hacker Stuart Sierra wrote an insightful piece on the design philosophies of developer tools. His conclusions are paraphrased here:

Git is an elegantly designed system of many small programs, linked by the shared repository data structure.

Maven plugins, by contrast, are not designed to be composed and the APIs are underdocumented. With Maven, plugins fit into the common framework of the engine, which strikes me as maybe a more difficult proposition than many small programs working on a common data structure.

Of the anarchic situation with Ruby package management (Gems, Bundler, and RVM) Sierra says, “layers of indirection make debugging harder,” and “The speed of development comes with its own cost.”

Principles

  • Plan for integration
  • Rigorously specify the boundaries and extension points of your system

...and I really like this idea:

  • The filesystem is the universal integration point
  • Fork/exec is the universal plugin architecture

Wednesday, May 02, 2012

Phenotypic constraints drive the architecture of biological networks

Biological networks are often compared to random networks in terms of properties like degree distribution, clustering, robustness and over-representation of network motifs. In a talk this morning, Areejit Samal, a Postdoctoral Fellow in the Price lab at ISB, proposed a new null model based on Markov Chain Monte Carlo (MCMC) sampling to generate realistic benchmark ensembles for metabolic networks and gene regulatory networks. Based on this improved background model, the conclusion is that “phenotypic constraints drive the architecture of biological networks”, which is also the title of the talk.

Applied to the metabolic model of E. coli, the sampling technique involves two steps. In the swap step, a reaction is removed from the network and a new reaction is drawn randomly from the KEGG database to replace it. The proposed new network is then tested by flux-balance analysis. If the new network is viable, it is accepted. If not, it is rejected. So, only networks that are biochemically plausible are sampled.

The technique has implications for systems and synthetic biology as well as evolutionary theory. By approximating the space of all possible networks that might lead to a viable functioning organism, we can better understand what properties these networks have, and maybe better design new networks. By changing the acceptance criteria to enforce viability in a second environment, the algorithm nicely models the evolutionary emergence of modularity.

Samal also showed the same general algorithm applied to the gene regulatory network controlling flowering in Arabidopsis.

I especially appreciated seeing a really cool application of Markov Chain Monte Carlo sampling, a topic covered only a couple weeks back in the Probabilistic Graphical Models class.

Links

Maximum expected utility decision rules

Week 6 of the Daphne Koller's Probabilistic Graphical Models class looks at Decision Theory, which integrates the concept of utility functions from economics into our models. I'm getting flashbacks of Dr. Welsh's Econ 101 in Schwab Auditorium.

In the Influence network below, we're making a decision about whether to found a company. Our success is determined by market conditions, which we can't observe. But, we can observe the results of a survey, which will give us some information on which to make our decision. So, how would you find the optimal decision rule?

The factor fm represents the probability that market conditions (var 1) are good (3), fair (2) or poor (1).

fm.var =  1
fm.card =  3
fm.val = [0.5 0.3 0.2]
PrintFactor(fm)
1 
1 0.500000
2 0.300000
3 0.200000

The factor fsm represents the results of a survey (var 2) given the underlying market conditions (var 1).

fsm.var = [2 1]
fsm.card = [3 3]
fsm.val = [0.6 0.3 0.1 0.3 0.4 0.3 0.1 0.4 0.5]
PrintFactor(fsm)
2 1 
1 1 0.600000
2 1 0.300000
3 1 0.100000
1 2 0.300000
2 2 0.400000
3 2 0.300000
1 3 0.100000
2 3 0.400000
3 3 0.500000

The factor fufm represents the utility function, given the market conditions (var 1) and the decision to found (var 3) a company, yes (2) or no (1).

fufm.var = [3 1]
fufm.card = [2 3]
fufm.val = [0 -7 0 5 0 20]
PrintFactor(fufm)
3 1 
1 1 0.000000
2 1 -7.000000
1 2 0.000000
2 2 5.000000
1 3 0.000000
2 3 20.000000

Compute mu of F, S by taking the factor product of factors representing the market, survey and utility function, then summing out over all possible market conditions (var 1).

PrintFactor(FactorMarginalization(FactorProduct(FactorProduct(fm, fsm), fufm), [1]))
2 3 
1 1 0.000000
2 1 0.000000
3 1 0.000000
1 2 -1.250000
2 2 1.150000
3 2 2.100000

We can build our decision rule by walking down all possible survey results (var 2) and selecting the decision to found or not which maximizes expected utility. See the red circles at the bottom of the diagram.

We can make a decision factor, which depends on the survey (var 2). We'll fill in dummy values, for now.

fd.var = [3 2]
fd.card = [2 3]
fd.val = ones(1,prod(fd.card))

Now, we can use the code from the programming assignment to compute the optimal decision rule and the expected utility it yields.

marketI.RandomFactors = [fm fsm]
marketI.UtilityFactors = [fufm]
marketI.DecisionFactors = [fd]
[meu optdr] = OptimizeMEU(marketI)
meu =  3.2500
optdr =
  scalar structure containing the fields:
    var  =       3   2
    card =       2   3
    val  =       1   0   0   1   0   1