Research ArticleNeuroscience

Odor Coding by a Mammalian Receptor Repertoire

See allHide authors and affiliations

Science Signaling  03 Mar 2009:
Vol. 2, Issue 60, pp. ra9
DOI: 10.1126/scisignal.2000016

Abstract

Deciphering olfactory encoding requires a thorough description of the ligands that activate each odorant receptor (OR). In mammalian systems, however, ligands are known for fewer than 50 of more than 1400 human and mouse ORs, greatly limiting our understanding of olfactory coding. We performed high-throughput screening of 93 odorants against 464 ORs expressed in heterologous cells and identified agonists for 52 mouse and 10 human ORs. We used the resulting interaction profiles to develop a predictive model relating physicochemical odorant properties, OR sequences, and their interactions. Our results provide a basis for translating odorants into receptor neuron responses and for unraveling mammalian odor coding.

Introduction

Odorant receptors (ORs) in the olfactory sensory neurons of the nasal epithelium translate odorants into neural signals. Each OR is thought to be specialized to recognize physicochemical features, such as functional groups or molecular size, of odorant molecules; these features are then translated into a neural signal that in turn leads to an olfactory perception. The physicochemical features of an odorant, therefore, are a key determinant of the olfactory percept. The rules governing the translation of molecule into percept, however, remain largely unknown. To develop a theory that predicts olfactory percept from molecular structure, we must first be able to predict OR activation from molecular structure.

Mammalian ORs are heterotrimeric guanine nucleotide–binding protein (G protein)–coupled receptors localized to the cell-surface membrane at the tips of olfactory sensory neuron dendrites (1). They constitute a multigene family; there are ∼1035 and 387 putatively functional ORs in mice and humans, respectively (14), making it one of the largest gene families in the mammalian genome (57). The overall sequences of mammalian ORs are diverse, with amino acid sequence similarity between different ORs ranging from less than 40% to more than 90%. On the basis of phylogenetic analysis, OR family members can be divided into two subclasses, class I and class II (4, 8). Most currently identified fish ORs belong to class I [but see (9)], whereas amphibians and mammals express ORs of both classes (8). The dolphin Stenella coeruleoalba has both class I and class II ORs, but the sequenced class II ORs are all pseudogenes. The preferential expression of class I ORs in marine mammals and fish suggests that water-soluble odorants may be the preferred agonists of class I but not class II ORs (10).

Elucidation of the fundamental properties that enable olfactory encoding, such as determination of the general similarity between two odorants or two ORs, requires investigation of a wide panel of diverse ORs with a large number of chemically diverse odorants in a consistent assay. One of the roadblocks to studying receptor responses directly has been the inability to express ORs in heterologous systems suitable for high-throughput screening or mutational analysis. Agonists have been identified for fewer than 50 mammalian receptors in heterologous systems; moreover, these ORs were tested with various assay systems and odorant sets (1124), greatly complicating any attempts to unify their analysis. Previously, using a set of molecules that induce the expression of mouse and human ORs in heterologous cells, we developed a system to determine odorant activation profiles for ORs (17, 21, 25). Here, we leveraged this high-throughput system to identify agonists for a large number of mammalian ORs. We tested the responses of 219 mouse and 245 human receptors to a panel of 93 odorants, identifying 340 receptor-ligand interactions, with 62 ORs responding to at least one agonist. This study is the first to examine the interaction of a large number of mammalian ORs with a diverse set of odorants. We used these findings to address several distinct issues for which the lack of functional data has impeded progress, such as quantification of odorant similarity, quantification of receptor similarity, examination of the differences between class I and class II ORs, examination of the differences between human and mouse ORs, and development of a model to predict odorant-receptor activation.

Results

High-throughput screening of mouse and human ORs

We generated libraries of mouse and human ORs that represent a large fraction of the total mouse and human OR families (table S1). Our mouse OR library comprises 219 mouse ORs that represent more than 21% of the total 1035 mouse OR genes and includes at least one member of 217 of the 228 mouse OR subfamilies defined by Zhang and Firestein (4). Our human OR library comprises 245 ORs that represent 63% of the 387 human OR genes. Although many human OR genes are polymorphic (11, 21, 26), we used a single variant of each receptor in the screen (table S1).

We stimulated the entire OR library with one of 8 odorant mixtures drawn from 93 odorants chosen to represent diverse functional groups, sizes, and structures (table S2). We applied each mixture at five different concentrations: 121 human ORs (49.4%) and 169 mouse ORs (77.2%) showed a response to at least one mixture at one concentration. We then applied the 93 odorants individually at 100 µM to the 290 mixture-responsive ORs. Of these, 27 human ORs (11.0%) and 102 mouse ORs (46.6%) showed a significant response (P < 0.05, uncorrected for multiple comparisons) to at least 1 of 67 odorants relative to a no-odor control. We then constructed dose-response curves for every combination of 129 receptors and 67 odorants that showed a response significantly above baseline in the previous step. In this more stringent test, we identified 52 mouse and 10 human ORs that responded to one or more of 63 odorants (Fig. 1, figs. S1 to S3, and table S3), representing 23.7% of the original mouse library and 4% of the original human library. The additional 4 odorants and 67 receptors did not elicit or show a significant response in this follow-up. Although our positive responses resulted in discovery of a large number of OR agonists, the failure of a specific OR to respond to any of the tested odorants may reflect a failure of the OR to function in our assay rather than a lack of sensitivity to the tested odorant. In further analyses, we only looked at results for receptors that responded to at least one of the tested odorants and were therefore unequivocally functional in our assay.

Fig. 1

EC50 values for 62 ORs and 63 odorants. Class I receptors are shown in green, and class II receptors in purple. Human ORs have a gray background. Odorant and receptor order were determined independently by cluster analysis with the receptor response data such that the most similar odorants (and receptors) are next to each other on their respective axes. A second version of this figure, with odorant order determined by functional group and receptor order determined by number of agonists is included for comparison (fig. S10).

Bias in odorant and receptor representation

One problem in the field of olfaction research is the lack of an agreed-upon metric to organize odorants; for example, there is currently no agreed-upon metric to quantify odorant similarity (27). Without such a metric, it is difficult to choose a random sample of odorants that fairly represents all odorants, because all odorants have discrete molecular structures that differ in such physicochemical properties as molecular weight, carbon number, functional groups, and hydrophobicity. A bias in odorant sampling may give a misleading estimate of a receptor’s preferred physicochemical features, as well as lead to faulty extrapolation of odorant similarity. Because of the large number of physicochemical properties thought to determine odorant similarity, we use a broad descriptor set to show the bias in our chosen odorant set and then use principal component analysis (PCA) to simplify the organization of these odorants. PCA is a method for transforming a number of possibly correlated variables into a smaller number of uncorrelated variables.

To examine the bias in our odorant set, we constructed a 1664-dimensional (1664D) space in which each dimension represents a physicochemical property. We plotted 2683 commercially available odorants (table S4) in the resulting odorant space. A 2D projection of this 1664D odorant space is shown in Fig. 2A. This is an incomplete representation of our bias in odor selection, because our choice of 2683 odorants is likely to itself be a biased representation of odorant space, but represents an intermediate between the limitations of our testing capability and the ideal case of sampling all odorous molecules.

To examine the bias in our resulting OR cohort, we mapped the distance between 1425 human and mouse ORs (2) (table S1). Using the Jukes-Cantor method (28), we constructed a 1425 × 1425 distance matrix and visualized the matrix in 2D space with PCA (Fig. 2B and fig. S5).

Fig. 2

Bias in odorant and receptor sampling. (A) Odorant space. A total of 1664 chemical descriptors were calculated for 2683 odorants. The odorants were projected onto a 2D space made of the first and second principal components. Odorants used in the mixture-screening phase are colored in magenta; odorants found to bind at least one receptor are shown in green. Black crosses represent untested odorants. (B) Receptor space. The Jukes-Cantor algorithm was used to calculate a distance matrix for 1425 intact ORs. The matrix was visualized in two dimensions with PCA. Receptors used in the mixture-screening phase are colored in magenta; receptors deorphaned in this study are shown in green. Black crosses represent untested receptors.

The first principal component (PC 1) of odorant space correlates with molecular size (29). Fig. 2A indicates that, although we have fairly dense coverage of the middle of PC 1, some caution is warranted with regard to extrapolating to very large and very small odorants. Our criterion of including at least one member of 217 of the 228 mouse receptor subfamilies ensured a broad coverage in receptor space based on full-sequence similarity.

Physicochemical odorant properties predict functional data

Early studies focused on small sets of odorant features, such as carbon chain length and functional group, to explain response variability in the olfactory system. More recent studies have used more quantitative approaches with broader sets of physicochemical descriptors. In one such study, physicochemical descriptors of odorants predicted ∼35% of the variation in perceived odorant pleasantness (29). This finding suggests that physicochemical descriptors might be useful for predicting earlier stages of olfactory perception. Indeed, two recent studies showed that physicochemical descriptors explain 43 to 72% of the variance in receptor neuron responses (30) and, in a meta-analysis, that a set of 32 physicochemical descriptors explained an average of 48% of the variance in neural responses for various olfactory data sets (31). Here we examined how well several proposed metrics predict our functional data (Fig. 3A).

Fig. 3

Distance in odorant space predicts similarity in receptor response. (A) Testing various odorant-similarity metrics against the functional data. (B) The difference in the receptor response profile is correlated with the distance between the two odorants calculated with 20 optimized descriptors (r = 0.79, P < 0.001). Enantiomeric pairs are circled. The absence of any completely noncorrelated odorant pairs is due to the fact that all pairs of odorants have at least one receptor in common that was not activated by either odorant. (C) Eighteen physicochemical descriptors that explain more than 62% of the variance in our data set. Definitions of how the descriptors are calculated can be found in the Handbook of Molecular Descriptors (65). (D) The top 10 most similar odorant pairs according to our assay. The second column is the Pearson correlation coefficient between the two odorant-response vectors represented in Fig. 1.

In our data set, carbon number alone described a small portion of the variance in receptor response (r = 0.07, P < 0.04). Functional group descriptors (r = 0.42, P < 0.0001) explained nearly as much variance as all 1664 tested descriptors (r = 0.43, P < 0.0001). Haddad et al. (31) proposed a set of 32 descriptors, a subset of the 1664 used here, that described a large portion of the variance in a meta-analysis of eight olfactory data sets. These descriptors outperformed carbon number, the set of functional group descriptors, and the full set of descriptors (r = 0.59, P < 0.0001). Using a meta-analysis including data from different model organisms, different measurement techniques, and different levels of the olfactory system, Haddad et al. created an optimized set of 32 descriptors. Using a greedy optimization algorithm with a leave-10-out cross-validation scheme, we similarly optimized the descriptor set, but only on data from our assay system. On average, this technique explained 60% of the variance in the left-out data set (r = 0.77). To verify that this result was not due to chance, we randomly shuffled each descriptor vector to create a new set of randomized descriptors from the same distribution. As expected, real descriptors significantly outperformed shuffled descriptors [shuffled r = 0.55, t(23) = 9.09, P < 0.0001]. Having validated this technique, we applied the algorithm to the entire data set. To reduce overfitting, we chose descriptors that explained the most variance averaged over 10 divisions of the data set, resulting in a set of 18 descriptors that explain more than 62% of the variance in our data set (r = 0.79, P < 0.0001) (Fig. 3, B and C). That is, much of the variation in OR responses can be explained by a fairly small set of physicochemical descriptors.

Receptor sequence predicts functional data

OR genes are classified into families and subfamilies based on sequence alignment of the full-length proteins. Because of the paucity of functional data regarding these receptors, it is unclear if these divisions correspond to functional variability [but see (19)]. That is, the assumption that full-length protein sequence variability predicts functional variability has not been tested on a comprehensive set of receptors. Here we test this assumption by examining how well the properties of amino acid residues predict our functional data (Fig. 4A).

Fig. 4

Distance in receptor space predicts similarity in responses to odorants. (A) Testing various receptor-similarity metrics against the functional data shows that our optimized descriptors predict functional responses better than full-sequence similarity or similarity at previously suggested residues (34). (B) Differences in the odorant response profiles of two receptors are correlated with distances between the same receptors, calculated with 16 optimized descriptors (r = 0.73, P < 0.001). Each point represents one pair of receptors. The absence of any completely noncorrelated receptor pairs is due to the fact that all pairs of receptors have at least one odorant in common which fails to activate both receptors. (C) Snake plot of a typical OR in which amino acid residues with ligand-specificity-determining properties are highlighted. Residue properties selected by the greedy optimization algorithm are indicated by color. Amino acid positions conserved in at least 90% of the 1425 receptors are labeled with their single-letter amino acid code. Abbreviations for the amino acid residues are as follows: A, Ala; C, Cys; D, Asp; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; and Y, Tyr.

As with odorants, we constructed a set of descriptors for ORs. Using a multiple alignment of 1425 intact mouse and human ORs (table S1), we calculated amino acid properties (polarity, composition, and volume) as defined by Grantham (32) for 327 amino acid residues common to at least 10% of the ORs. The entire set of descriptors explained only 7% of the variance (r = 0.28, P < 0.0001). Recently, Man et al. (33) proposed that comparison of predicted ligand-binding residues might predict functional variation more accurately than full-length comparisons. Restricting the set of descriptors to the 66 properties of 22 predicted binding site residues (34), however, did not improve prediction of functional variation (r = 0.17, P < 0.0001). We optimized the descriptor set, using a greedy optimization algorithm with a leave-10-out cross-validation scheme. On average, this technique explained 40% of the variance in the left-out data set (r = 0.63). To verify that this result was not due to chance, we randomly shuffled each descriptor vector to create a new set of randomized descriptors from the same distribution. As expected, real descriptors significantly outperformed shuffled descriptors [r = 0.18, t(12) = 8.53, P < 0.0001]. Having validated this technique, we applied the algorithm to the entire data set. To reduce overfitting, we chose descriptors that explained the most variance averaged over 10 divisions of the data set, resulting in a set of 16 descriptors that explain more than 53% of the variance in our data set (r = 0.73, P < 0.0001) (Fig. 4, B and C, and table S4).

Breadth of tuning

In color vision, three broadly tuned receptors sense the entire visible range of wavelengths (35). In audition, ∼3500 narrowly tuned cochlear hair cells sense the audible spectrum of frequencies (36). In olfaction, it is not clear if receptors are broadly or narrowly tuned. Given the aforementioned lack of an agreed-upon metric for measuring odorant similarity, this question has traditionally been answered in terms of a receptor’s number of agonists. These values (number of agonists) are listed for five receptors in Fig. 5A. On the second line of Fig. 5A, we incorporated the sensitivity of the receptor to each odorant to create a tuning curve (for sensitivity-ordered tuning curves for all receptors, see fig. S6). Note that the x axis is ordered to place the most sensitive odorant in the center and is different for each receptor. A metric used to define an OR as “broadly” or “narrowly” tuned should, however, take into account not only the number of agonists to which it responds, but also the similarity of those agonists to each other. On the third line of Fig. 5A, we ordered the x axis to reflect odorant similarity; that is, the x axis represents the odorant’s position along the first principal component of Haddad et al.’s 32D odorant space (31) (for 1D tuning curves based on the first principal component for all receptors, see fig. S7). This first principal component only describes 19.4% of the variance in odorant space. Using more principal components, as in the fourth line of Fig. 5A, describes more of the variance in odorant space, but beyond three principal components is difficult to display in a figure (for 2D tuning curves based on the first two principal components for all receptors, see fig. S8). The radius of a circle enclosing all of the agonists, as in the fourth line of Fig. 5A, gives us a measure of tuning breadth that, unlike the figure, can be scaled to high-dimensional space. The radius of a hypersphere enclosing all five receptors’ agonists in Haddad et al.’s 32D odorant space is listed on the fifth line of Fig. 5A, and for all receptors, in Fig. 5B. The results reveal that the mammalian ORs vary along a continuum of tuning breadths. That is, some receptors are broadly tuned, responding to a large number of odorants that occupy a large area of odorant space (are structurally dissimilar), others are more narrowly tuned, and some respond to only a small number of closely related odorants.

Fig. 5

Breadth of tuning in odorant space. (A) Table of assorted breadth of tuning representations. The sensitivity-ordered tuning curve displays the 63 tested odorants on the x axis ordered according to their EC50 for the given receptor. Odorants that activated the receptor at the lowest concentrations are placed near the center of the distribution, whereas those that did not elicit a response are placed at the edges of the distribution. The order of odorants is thus different for different receptors. The 1D tuning curves are stem-plot versions of the sensitivity-ordered tuning curves with the x axis representing the value of the odorant along the first principal component of Haddad et al.’s odorant space (31). The 2D odorant space figure retains the x axis of the 1D odorant space figure, but plots the value of the odorant along the second principal component of Haddad et al.’s odorant space on the y axis, thus forming a 2D projection of odorant space. Plotted in gray are 2683 odorants. The odorants activating the receptor are plotted in red and a circle circumscribing the odorants is plotted in blue. The final row is the radius of a hypersphere that encloses all of the receptor’s agonists in the 32D odorant space. (B) A histogram of the hypersphere radius measure for all 62 receptors. For comparison, a hypersphere enclosing 2683 odorants (table S5) has a radius of 26; a hypersphere enclosing the 93 odorants in our test set has a radius of 14; a hypersphere enclosing the 63 odorants that activated at least one receptor has a radius of 12.

Receptor response to enantiomers

Enantiomers—stereoisomers that are nonsuperimposable mirror images of each other—impose an interesting constraint on olfactory theories because, even though they have many identical properties, mammals can discriminate between the members of some pairs (3740). We examined four pairs of enantiomers: phenylbutyric acid, carvone, fenchone, and camphor. The (+) and (−) enantiomers of carvone activate overlapping but distinct sets of olfactory neurons in mice (41). Consistent with these previous studies, we found that the response of an OR to one enantiomer is highly correlated with its response to the other enantiomer (r = 0.85, P < 0.0001). We did, however, find a receptor basis for the ability to perceptually distinguish between three of the four enantiomers: For example, MOR107-1 is activated only by the (−) enantiomer of fenchone and MOR271-1 is activated only by the (+) enantiomer of fenchone. Although at some doses, MOR2-1 responded more strongly to the (+) enantiomer of 2-phenylbutyric acid, no receptors responded to only one enantiomer of 2-phenylbutyric acid (Fig. 6). Enantiomers were the most similar odorant pairs in terms of physicochemical descriptors and among the hardest to discriminate in our data set (Fig. 3, B and D).

Fig. 6

Responses of olfactory receptors to enantiomeric pairs. All receptors that responded to at least one member of the four tested enantiomeric pairs are plotted. The black line represents the unit slope line. Points that fall above the line represent receptors more sensitive to the (+) enantiomer, points below the line represent receptors more sensitive to the (−) enantiomer. Error bars represent standard error.

Functional comparison of class I and class II ORs

Class I and class II ORs did not differ in number of agonists (P = 0.11, Mann-Whitney U test) (Fig. 7A), breadth of odor tuning in odorant space (P = 0.21, Mann-Whitney U test) (Fig. 7B), or sensitivity to odorant concentration (P = 0.99, Mann-Whitney U test) (Fig. 7C). However, we were able to differentiate class I OR agonists from class II OR agonists with a machine-learning algorithm trained on the physicochemical descriptors of the agonists (sensitivity index d ′ = 1.98, P < 0.0001).

Fig. 7

Receptor comparisons by classification. Breadth of tuning did not differ between class I and class II receptors by either (A) number of agonists or (B) coverage of odorant space. (C) Sensitivity did not differ between class I and class II receptors. Breadth of tuning did not differ between human and mouse receptors by either (D) number of agonists or (E) distance in odorant space. (F) Human receptors were significantly more sensitive to odorants than were mouse receptors (P < 0.008). N.S., not significant.

We then asked what molecular features best differentiated class I from class II agonists. The top 10 descriptors are shown in Table 1 (top). As predicted (10), agonists for class I ORs are significantly more hydrophilic than agonists for class II ORs (median class I hydrophilic factor = −0.2440, mean class II hydrophilic factor = −0.8020, P < 0.001, Mann-Whitney U test after Bonferroni correction for 1664 descriptors) and have a higher topological polar surface area (TPSA) (median class I TPSA = 37.3, mean class II TPSA = 17.1, P < 0.001, Mann-Whitney U test after Bonferroni correction for 1664 descriptors).

Table 1 The top 10 physicochemical descriptors for distinguishing between agonists for (A) class I and class II receptors or (B) human and mouse receptors. Definitions of how the descriptors are calculated can be found in the Handbook of Molecular Descriptors (65). WHIM, weighted holistic invariant molecular.
View this table:

Functional comparison of human and mouse receptors

Humans have fewer than 400 ORs with an intact open reading frame, whereas mice have more than 1000 (4, 8, 4244). One hypothesis to account for this difference in OR number is that humans may have preferentially retained broadly tuned receptors to retain the ability to detect most odorants at the expense of sensitivity to low-concentration odorants (45). We did not find a significant difference between mouse and human receptors in the number of agonists (P = 0.25, Mann-Whitney U test) (Fig. 7D) or breadth of tuning in odorant space (P = 0.14, Mann-Whitney U test) (Fig. 7E). On average, however, human ORs were significantly more sensitive than mouse receptors (P < 0.008, Mann-Whitney U test) (Fig. 7F). Although this finding may not generalize to the overall OR population due to the large discrepancy in the relative percentages of human and mouse ORs uncovered by our screening process, it provides preliminary evidence contradicting the hypothesis that mouse ORs are more sensitive than human ORs to low-concentration odorants.

Using a machine-learning algorithm trained on the physicochemical descriptors of the odorants as above, we were able to differentiate odorants that activated human ORs from odorants that activated mouse ORs (d ′ = 0.62, P < 0.013). The 10 molecular features that best differentiate odorants activating human receptors from odorants activating mouse receptors are shown in Table 1 (bottom).

Predicting odorant-receptor interactions

Except for a few examples (12, 13, 19, 4648), our knowledge of what makes an OR-ligand interaction effective is limited. Here, we have greatly increased the number of ORs with known agonists. Moreover, all 62 receptors were tested with the same 63 odorants in a consistent assay, allowing us to look for general rules that govern the interaction between odorant and OR.

We used logistic regression to differentiate odorant-OR combinations that result in a response from odorant-OR combinations that fail to elicit a response. Our initial work suggested that a subset of physicochemical odorant descriptors and amino acid properties predict variation in odorant-receptor activation. Here we combined both physicochemical descriptors and amino acid properties in an attempt to predict the interaction of previously untested odorants and previously untested ORs.

We used a leave-10-out cross validation procedure to validate our model, in each round selecting a set of descriptors that best predicted the training data and applying those descriptors to the test data. We selected the values to leave out by using two different methods. First, we reserved 10% of the receptors, simulating a situation in which an investigator is searching for the response of previously unidentified ORs to odorants that activate a known OR. Our model predicted the response of the “novel” ORs to the 63 odorants [area under the receiver operating characteristic (ROC) curve (AUC) = 0.59, P < 0.0001, Mann-Whitney U test] (Fig. 8A). Second, we reserved 10% of the odorants, simulating a situation in which an investigator is searching for the response of ORs with known agonists to untested odorants. Our model predicted the response of all 62 ORs to the “novel” odorants (AUC = 0.64, P < 0.0001, Mann-Whitney U test) (Fig. 8B).

Fig. 8

ROC curves for our ligand-receptor interaction classifier. (A) Validation of our model when predicting the response of previously unidentified receptors to tested odorants. The area under the curve indicates that our model successfully predicts if a tested odorant is an agonist of a previously unidentified receptor 59% of the time. (B) Validation of our model when predicting the response of tested receptors to previously unidentified odorants. The area under the curve indicates that our model successfully predicts if a novel odorant is an agonist of a tested receptor 64% of the time.

Discussion

Starting with more than 450 mouse and human ORs, we identified agonists for 52 mouse and 10 human ORs. These data were used to confirm the utility of a multidimensional metric for odorant similarity (31), develop a multidimensional metric for receptor similarity, quantify the breadth of receptor tuning in odorant space, identify ORs capable of discriminating three enantiomeric odorant pairs, distinguish activation profiles of class I and class II ORs, distinguish activation profiles of human and mouse ORs, and develop a model to predict odorant-receptor activation.

Our in vitro assay lacks many components of an in vivo olfactory system, including odorant binding proteins, a mucosal layer, intracellular molecules, and sniffing behaviors. In addition, although some odorants inhibit ORs (14, 22, 47, 4951), our assay is not designed to detect inhibitory responses. In spite of these shortcomings, previous results from in vitro systems predict human olfactory perception (11, 21), suggesting that in vitro assays are relevant to the intact olfactory system.

Despite starting with similar numbers of human and mouse ORs, we identified agonists for more than five times as many mouse ORs as human ORs. This may indicate, as previously suggested (52, 53), that mice have even more functional ORs relative to humans than the genome sequences predict. This could reflect a large fraction of nonfunctional genes among the intact human ORs (54) or a large fraction of nonfunctional variants among the human OR clones we used for screening. Alternatively, this difference in mouse and human OR response to agonist may be due to bias in odorant choice or technical problems in the heterologous system specific to the human ORs.

We identified 18 physicochemical odorant descriptors that predict our functional data. Odorant similarity calculated using these descriptors is highly correlated with similarity calculated using Haddad et al.’s previously identified 32 descriptors (r = 0.77, P < 0.0001) across a set of more than 2500 odorants. Our descriptors outperform Haddad et al.’s descriptors for our data, but it is unlikely that they are universally better. Although there is some universality in odorant similarity measures across various studies (31), there are also large differences between studies, which would suggest that optimization to a specific organism and technique would outperform a general set of descriptors.

We identified properties of 16 OR amino acid residues that predict our functional data. Of these 16 residues, 12 occur in predicted transmembrane domains, three occur in predicted extracellular domains and one occurs in a predicted intracellular domain. This is consistent with functional evidence and computational predictions suggesting that the binding pocket of olfactory receptors is formed by the transmembrane domains (34, 48, 55, 56); however, these residues do not necessarily correspond to the binding site of the ORs, because amino acid residues far from the binding pocket may affect ligand specificity by changing the global conformation of the folded protein. The properties of the residues predicted by Man et al. to play a role in ligand binding (34) did not predict our data (see the comparison in fig. S10), perhaps because several of Man et al.’s putative orthologs do not share similar odorant specificities or because the amino acid sites predicted to play a role in ligand binding by Man et al. (34) may be more important for a set of odorants that was not well represented in our data set. With more functional data, we expect the set of ligand-specificity–determining residues to change somewhat and improve.

Enantiomers have similar physicochemical properties, but mammals can discriminate between members of some enantiomeric pairs (3740). The (+) and (−) enantiomers of carvone activate overlapping but distinct sets of olfactory neurons in mice (41). Consistent with these previous studies, we find that some ORs can distinguish between enantiomers, although many cannot. We have identified candidate ORs capable of supplying sufficient information for the discrimination of three enantiomeric pairs. These ORs are ideal for future structure-function analysis to understand how ORs discriminate between physicochemically similar compounds.

With two exceptions, class I ORs are expressed only in the anterior-dorsal-most zone of the olfactory epithelium in mice (57). In agreement with the theory that the olfactory mucosa serves as a chromatographic separator of odorants, with ORs responding to fast-sorbing hydrophilic compounds expressed early in the airstream and ORs responding to slow-sorbing hydrophobic compounds expressed at locations farther along in the airstream (58, 59), we found that class I ORs prefer polar (and therefore hydrophilic) compounds. This suggests that the location of receptor expression along the mucosa may be related to the ligands that bind the receptor.

Humans have ∼387 potentially functional ORs (that is, ORs with an intact open reading frame); mice have ∼1035 (2). Despite this huge difference in OR number, behavioral studies have failed to show a clear distinction between rodents and primates in odorant detection threshold (60). This may be due to a lack of data, or it may be due to compensatory mechanisms such as a shortened nose or more computation at later neural stages (61). In our data set, we found, unexpectedly, that human ORs are more sensitive than mouse ORs. Although our results may fail to generalize to the entire OR repertoire due to the small absolute number of human ORs in our data set and the bias in relative number of human and mouse ORs that were activated in our screen, our predictions provide a tentative framework for identifying odorants that are more likely to activate either human or mouse ORs. This narrows the field of compounds to test in behavioral studies searching for differences between mouse and human perception. Understanding perceptual differences between species, in turn, will give insight into the evolutionary pressures driving changes in olfaction and aid the translation of research from one species to another.

In addition to identifying active agonists for 62 ORs, we developed a model to predict the interaction of ORs and their ligands. Why is our model better at predicting a tested OR’s response to a previously untested odorant than a previously untested OR’s response to a tested odorant? This is likely because of the relative quality of information we have about odorant similarity compared to receptor similarity. More detailed modeling of the OR that includes 3D information about the protein structure and binding pocket will undoubtedly improve the ability to predict the activity of previously tested ORs and improve our initial model relating molecular structure to OR response.

Materials and Methods

Cloning mouse and human ORs

Mouse (219) and human (245) ORs were cloned with sequence information from The Olfactory Receptor Database (http://senselab.med.yale.edu/senselab/ORDB/default.asp) (3). We adopted the nomenclature proposed by the D. Lancet group for the human ORs (8) and by the S. Firestein group for the mouse ORs (4). OR open reading frames were amplified from genomic DNA with the use of proofreading KOD DNA polymerase (Toyobo/Novagen) and subcloned into pCI expression vectors (Promega) containing the first 20 residues of human rhodopsin (Rho tag). The sequences of the cloned receptors were verified by sequencing (3100 Genetic Analyzer, ABI Biosystems).

Immunocytochemistry and fluorescence-activated cell sorting analysis

For live cell-surface staining, the mouse monoclonal antibody to rhodopsin, 4D2 (gift from R. Molday) (62), and Cy3-conjugated donkey antibody against mouse immunoglobulin G (IgG) (Jackson Immunologicals) were used. For fluorescence-activated cell sorting (FACS) analysis we used phycoerythrin (PE)-conjugated anti-mouse IgG (Jackson Immunologicals). To monitor the transfection efficiency, we cotransfected green fluorescent protein (GFP) with the OR. We quantified the intensity of receptor cell surface expression as the ratio of PE and GFP expression. We added 7-amino-actinomycin D (Calbiochem) before flow cytometry to mark dead cells that were excluded from the analysis. We normalized the gate and mean of PE-positive cells by GFP values.

Luciferase assay

We used the Dual-Glo Luciferase Assay System (Promega) for the luciferase assay as previously described (17). OR activation leads to an increase in intracellular adenosine 3′,5′-monophosphate (cAMP); we used cAMP response element (CRE)-luciferase (Stratagene) to measure this change. Renilla luciferase driven by a constitutively active simian virus 40 (SV40) promoter (pRL-SV40; Promega) served as an internal control for cell viability and transfection efficiency. We plated Hana3A cells on poly-d-lysine–coated 96-well plates (BioCoat; Becton Dickinson). We transfected ORs with Rho tag in a Hana3A cell line along with RTP1S(25), CRE-luciferase, and pRL-SV40 with Lipofectamine2000 (Invitrogen). For each 96-well plate, we transfected 1 μg of CRE-luciferase, 1 μg of pRL-SV40, 5 μg of OR plasmid, and 1 μg of RTP1S. Approximately 24 hours after transfection, we replaced the medium with CD293 chemically defined medium (Gibco) and then incubated the plate for 30 min at 37°C. We then replaced the medium with 25 μl of odorant solution diluted in CD293 and incubated the plate for 4 hours at 37°C and 5% CO2. We followed the manufacturer’s protocols for measuring luciferase and Renilla luciferase activities. We measured luminescence with a Wallac Victor 1420 plate reader (Perkin-Elmer). First, we divided all luminescence values by the Renilla luciferase activity to control for transfection efficiency in a given well. We calculated normalized luciferase activity with the formula (LNLmin)/(LmaxLmin), where LN is the luminescence of firefly luciferase in response to the odorant, Lmin is the minimum luciferase value on a plate or set of plates, and Lmax is the maximum luciferase value on a plate or set of plates. We analyzed the data with Microsoft Excel and GraphPad Prism 4.

Screening procedure

We stimulated the entire OR library with eight separate odorant mixtures formed from 93 odorants (fig. S4). We applied the mixtures at 100 µM and eliminated all ORs that did not show activity (the ratio of CRE-luciferase to Renilla luciferase was less than 0.1 above baseline). We then applied the mixtures at five different doses (1 µM, 10 µM, 100 µM, 300 µM, and 1 mM) and eliminated all ORs that did not show dose-dependent activity to any of the eight mixtures (statistical significance was not assessed for this screening stage), leaving 121 human receptors (49.4%) and 169 mouse receptors (77.2%). We then took the 290 receptors and performed a comparison between the 93 individual odorants at a 100 µM dose and a no-odor control. Each comparison was performed in triplicate; statistical significance was assessed by t test (uncorrected for multiple comparisons). In addition, we confirmed the consistency of our experimental conditions with two positive controls: MOR203-1 with nonanoic acid and MOR32-1 with nonanoic acid. Twenty-seven human ORs (11.0%) and 102 mouse ORs (46.6%) showed a significant response to at least one of 67 odorants relative to a no-odor control. We then constructed dose-response curves ranging from 10 nM to 3 mM for each combination of 129 receptors and 67 odorants. On each plate we used a single odorant to avoid cross-contamination, and each OR-odorant dose was tested at least three times. We fit the data to a sigmoidal curve. We counted an odorant-receptor pair as a significant activation if both the normalized activity at 100 µM was significantly different from the baseline activity (with a t test), and the standard deviation of the fitted log median effective concentration (EC50) was less than 0.5 log units. We confirmed that the raw CRE-luciferase curve and the normalized (Luc/RL) curve EC50 values did not differ by more than 1 log step. As a result, we identified 52 mouse (23.7%) and 10 human (0.04%) ORs that showed a significant dose-dependent response to one or more of 63 odorants (Fig. 1). Four odorants failed to activate any OR in this final stage.

Physicochemical descriptors

We obtained molecular structure files for each odorant from PubChem (http://pubchem.ncbi.nlm.nih.gov/search/) and input these structures into the Virtual Computational Chemistry Laboratory (http://www.vcclab.org) (63). There, we used CORINA (64) to obtain 3D coordinates and Dragon (Talete) to compute 1664 physicochemical descriptors (65).

Receptor descriptors

Using the MUSCLE algorithm (66) in Seaview (67) with manual adjustment for conserved domains (table S1), we aligned 1425 OR amino acid sequences from Niimura and Nei (2) and 464 OR amino acid sequences from our OR library. We eliminated all sites that were gaps in more than 90% of the 1425 ORs, leaving 327 amino acids. Our set of 981 descriptors consisted of the polarity, composition, and volume of these 327 residues, as defined by Grantham (32). Differences between disease alleles and wild-type alleles computed with these properties are on average greater than those observed between putatively neutral polymorphic alleles, suggesting that these properties are functionally relevant (68).

Correlations between descriptors and responses

Each descriptor was z-scored across 2683 odorants (table S4) or all 1425 receptors (table S1). We eliminated all odorants for which we had fewer than three responsive receptors. For all remaining odorant pairs, we calculated the Pearson correlation between EC50 vectors and the Euclidean distance between descriptor vectors. We then measured the Pearson correlation between these two sets of distances.

Optimizing descriptors

Testing all possible combinations of the descriptors is an intractable task, so we used a greedy optimization algorithm, as in (31), to determine the best set. In this method, we begin with an empty set of descriptors. We then combine each descriptor with the previous set of descriptors and compute correlation values for all of these candidate sets of descriptors. To reduce overfitting, we divide the data randomly into 10 subsets and compute correlation values for each leave-one-out subset. The final correlation coefficient for each set of descriptors is the average of the correlation coefficients for each subset. The set with the best correlation coefficient then becomes the new set of descriptors and the process is repeated until the correlation coefficient increases by less than 0.004 in three consecutive iterations.

Using a leave-10-out cross-validation scheme, we validated this method on each data set. That is, we optimized the descriptors, using 90% of the data, and then tested the descriptors on the remaining 10% of the data. We repeated this division 10 times such that all subdivisions were test sets and reported the average performance over all test sets. We then verified that this optimization does not work as well for randomly shuffled vectors. We used the same descriptor values as in the actual vectors, but shuffled each descriptor independently so that any given object had a random set of descriptor values. We created 30 sets of these objects with shuffled descriptor values. For each of these 30 sets, we optimized descriptors for 90% of the data and tested the descriptors on the remaining 10% of the data. We reported the average of all 30 values. We compared the performance of the real optimized descriptors to the shuffled optimized descriptors with a two-sample t test with unequal variance.

Breadth of tuning

We define a receptor’s breadth of tuning as the radius of a hypersphere, centered on the center of mass of all of the receptor’s agonists and enclosing all of the receptor’s agonists in Haddad et al.’s (31) odorant space. For reference, a hypersphere enclosing 2683 odorants (table S5) has a radius of 26; a hypersphere enclosing the 93 odorants in our test set has a radius of 14; a hypersphere enclosing the 63 odorants that activated at least one receptor has a radius of 12.

Machine-learning algorithm

We used the support vector machine (SVM) functions in the Bioinformatics Toolbox of Matlab (Mathworks) to classify the odorants in our data set. To estimate the discrimination ability of our classifier for class I/class II discriminations and human/mouse discriminations, we jackknifed the data set. That is, we trained the classifier on all but one instance and then tested the classifier on that instance. We repeated this n times, where n was the total number of instances. We used signal detection theory (69) to compute the sensitivity index (d ′), which is the separation of the means of the two distributions (class I and class II agonists, or mouse and human agonists) in units of standard deviation. We confirmed that the d ′ was significantly different from zero according to Marascuilo’s test (70). We used the rankfeatures function in the MATLAB Bioinformatics Toolbox to determine the physicochemical properties that best predict membership in a class. We applied a cross-correlation weighting value of 0.7 to reduce the number of highly correlated properties.

Predicting odorant-OR interactions

Each of the 3906 tested odorant-OR interactions were represented by a vector of 1664 physicochemical descriptors and 981 receptor descriptors. We then used a cross-validation procedure to determine if a subset of these vectors could predict if the odorant-OR combination resulted in activation (that is, all colored squares in Fig. 1). We used a 10-fold validation procedure in two different ways. In the first method, we divided all 62 rows of odorant-OR interactions into 10 sets (leave receptor out). In the second method, we divided all 63 columns of odorant interactions into 10 sets (leave odorant out). After choosing the test set for a round, we used the greedy optimization method described above to calculate optimized descriptor sets for the other 9 sets (training data). After this selection of attributes, we then performed logistic regression in JMP v6 (SAS Institute) on the training set and used the resulting coefficients to predict the probability of activation for the test set. We repeated this method 10 times for each of the three selection processes, rotating the test set with each iteration, such that we evaluated the entire data set, using a model that was not trained on its respective test set. We generated ROC curves according to (71). We determined statistical significance with a Mann-Whitney U test comparing the distribution of predicted odds for interactions resulting in activation to the distribution of predicted odds for interactions resulting in no activation.

Acknowledgments

This work was supported by grants from the NIH and Human Frontier Science Program and by a National Research Service Award postdoctoral fellowship to J.D.M. We thank I. Davison, R. Haddad, N. Sobel, H. Lapid, and M. Caron, as well as N. Pillai from the Duke Statistical Education and Consulting Center for helpful discussions; D. Marchuk and R. Valdivia for sharing equipment; A. Toyama and M. Kubota for expert technical assistance; J.-T. Chi for initial analysis; and M. Cook and O. Awonuga for FACS analysis. H.S. performed functional assays and preliminary analysis. Q.C. created the OR libraries. H.Z. optimized the assay conditions. H.M. supervised the project, interpreted data, and wrote the paper. J.D.M. analyzed and interpreted data and wrote the paper. All authors commented on the manuscript.

Supplementary Materials

www.sciencesignaling.org/cgi/content/full/2/60/ra9/DC1

Fig. S1. Dose-response curves of all 340 odorant/receptor interactions showing significant receptor activation.

Fig. S2. Odorant clustering based on receptor response.

Fig. S3. Receptor clustering based on response to odorants.

Fig. S4. Outline of the screening procedure.

Fig. S5. A phylogenetic tree of all 464 receptors in the screening library, as well as 1425 intact mouse and human ORs.

Fig. S6. Sensitivity-ordered tuning curves.

Fig. S7. One-dimensional tuning curves.

Fig. S8. Two-dimensional tuning plots.

Fig. S9. Snake plot comparison of ligand-specificity-determining residues.

Fig. S10. EC50 values for 62 odorant receptors and 63 odorants.

Table S1. Multiple alignment of all 464 receptors in the screening library, as well as 1425 intact mouse and human ORs.

Table S2. Odorants used to screen the receptor libraries.

Table S3. The numerical EC50 values (log M) displayed in Fig. 1 in Microsoft Excel format.

Table S4. CAS registry numbers for 2683 odorants used to estimate the size of odorant space.

Table S5. Sixteen amino acid property descriptors that explain more than 53% of the variance in our data set.

References and Notes

View Abstract

Navigate This Article