Automated Scoring of Crystallization Trials
نویسندگان
چکیده
Recently, the use of robotics and parallel techniques for protein production and crystallization is becoming commonplace among structural genomics initiatives, whose contributions amount to 73% of newly solved structures each year. Despite the strides made in increasing physical experimental throughput, the act of finding just a few crystals among potentially thousands of crystallization experiments still remains a task for humans. A number of processes have been proposed for this task [6, 2, 5, 4, 7] and achieve varying degrees of success. Whilst automating the analysis part of a pipeline may seem like a straightforward task of recognizing lines and textures indicative of crystals, devising an automated analyzer in practice proves challenging for two reasons. First, computer vision is still a relatively young field. While many consider detection of ubiquitous, structured objects like human faces a well-studied problem, detection of non-uniform objects like crystals remains open and problem specific. Second, the needle-in-a-haystack property of finding just a few harvestable crystals among potentially thousands of trials necessitates a low false negative rate traded off against a tolerable false positive rate. Our work attempts to address the above two challenges of the automation problem through a scoring based system – machine learning algorithms assign a score, or real-valued likelihood, of containing crystalline material to each trial. Specialists then look through the trials in rank-order to determine candidates for diffraction analysis. The proposed scheme bears a passing resemblance to previous works [2]; however, the authors there do not explicitly describe a ranking centric system. Consequently, we focus exclusively on a scoring framework and distinguish data sets as separate experimental attempts to crystallize distinct proteins. The trained algorithm scores square image subregions of 127 × 127 pixels; the score for an entire image is the maximum over all square scores, as in Figure 1. This is not unlike previous work [5, 4] that also eschews global heuristics in favor of accurate local classifiers. Feature extraction relies on Gabor wavelet responses to detect edges and textures [5]. Orientation histogram statistics are also calculated and substitute for gray level co-occurrence matrices [6, 4]. To learn from extracted features, we use the alternating decision tree variant of boosting [3, 1]. Taken as a black box learning algorithm, boosting has the same input-output interfaces as support vector machine [5] (SVM), linear discriminant analysis [4] (LDA), and neural networks [6]. We choose boosting over other techniques for its ability to automatically combine many marginally discriminative features into a single, highly accurate ensemble classifier. Our choice seems timely in lieu of recent work on ensemble classification [4, 7] where the authors merge the outputs of two techniques into a single classification. Consequently, we view boosting as a principled, theoretically justified next step along these lines. We report the scoring results of 319,112 crystallization trial images constituting the data sets of all 150 structures solved by the Joint Center for Structural Genomics during the 2006-2007 year. Our system achieves a mean ROC-AUC score of 0.918 averaged over set scores. Simulations indicate an expected 94% savings in human effort when searching, in rank-order, for the first instance of each set that yielded an x-ray crystal structure. Alternatively, a hypothetical cutoff accepting the top 25% ranked trial images of each set and rejecting the rest would have captured at least one structure-yielding instance for 143 out of 150 sets. These results suggest that computer assisted analysis can augment, rather than require modifications to, existing image-based crystallization systems; ultimately, they may provide full annotation of crystallization, thus enhancing our ability to record crystallization results and derive optimal crystallization conditions for specific proteins.
منابع مشابه
Semi-quantitative segmental perfusion scoring in myocardial perfusion SPECT: visual vs. automated analysis
Introduction: It is recommended that the physician apply at least a semi-quantitative segmental scoring system in myocardial perfusion SPECT. We aimed to assess the agreement between automated semi-quantitative analysis using QPS (quantitative Perfusion SPECT) software and visual approach for calculation of summed stress score (SSS), summed rest score (SRS) and summed difference score (SDS). ...
متن کاملA procedure for setting up high-throughput nanolitre crystallization experiments. Crystallization workflow for initial screening, automated storage, imaging and optimization.
Crystallization trials at the Division of Structural Biology in Oxford are now almost exclusively carried out using a high-throughput workflow implemented in the Oxford Protein Production Facility. Initial crystallization screening is based on nanolitre-scale sitting-drop vapour-diffusion experiments (typically 100 nl of protein plus 100 nl of reservoir solution per droplet) which use standard ...
متن کاملEvaluation of automated and semi-automated scoring of polysomnographic recordings from a clinical trial using zolpidem in the treatment of insomnia.
OBJECTIVE To evaluate the performance of 2 automated systems, Morpheus and Somnolyzer24X7, with various levels of human review/editing, in scoring polysomnographic (PSG) recordings from a clinical trial using zolpidem in a model of transient insomnia. METHODS 164 all-night PSG recordings from 82 subjects collected during 2 nights of sleep, one under placebo and one under zolpidem (10 mg) trea...
متن کاملQuantitive evaluation of macromolecular crystallization experiments using 1,8-ANS fluorescence.
Modern X-ray structure analysis and advances in high-throughput robotics have allowed a significant increase in the number of conditions screened for a given sample volume. An efficient evaluation of the increased amount of crystallization trials in order to identify successful experiments is now urgently required. A novel approach is presented for the visualization of crystallization experimen...
متن کاملA robotic system for crystallizing membrane and soluble proteins in lipidic mesophases.
A high-throughput robotic system has been developed for crystallizing membrane proteins using lipidic mesophases. It incorporates commercially available components and is relatively inexpensive. The crystallization robot uses standard automated liquid-handlers and a specially built device for accurately and reproducibly delivering nanolitre volumes of highly viscous protein/lipid mesophases. Un...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007