A Framework for Efficient Text Analytics through Automatic Configuration and Customization of Scientific Workflows

نویسندگان

  • Matheus Hauder
  • Yolanda Gil
  • Yan Liu
چکیده

Data analytics involves choosing between many different algorithms and experimenting with possible combinations of those algorithms. Existing approaches however do not support scientists with the laborious tasks of exploring the design space of computational experiments. We have developed a framework to assist scientists with data analysis tasks in particular machine learning and data mining. It takes advantage of the unique capabilities of the Wings workflow system to reason about semantic constraints. We show how the framework can rule out invalid workflows and help scientists to explore the design space. We demonstrate our system in the domain of text analytics, and outline the benefits of our approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic configuration and collaborative scheduling in supply chains based on scalable multi-agent architecture

Due to diversified and frequently changing demands from customers, technological advances and global competition, manufacturers rely on collaboration with their business partners to share costs, risks and expertise. How to take advantage of advancement of technologies to effectively support operations and create competitive advantage is critical for manufacturers to survive. To respond to these...

متن کامل

Exploring Text Classification for Messy Data: An Industry Use Case for Domain-Specific Analytics

Industrial enterprise data present classification problems which are different from those problems typically discussed in the scientific community – with larger amounts of classes and with domain-specific, often unstructured data. We address one such problem through an analytics environment which makes use of domain-specific knowledge. Companies are beginning to use analytics on large amounts o...

متن کامل

An Efficient Framework for Accurate Arterial Input Selection in DSC-MRI of Glioma Brain Tumors

Introduction: Automatic arterial input function (AIF) selection has an essential role in quantification of cerebral perfusion parameters. The purpose of this study is to develop an optimal automatic method for AIF determination in dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) of glioma brain tumors by using a new preprocessing method.Material and Methods: For this study, ...

متن کامل

Kronos: a workflow assembler for genome analytics and informatics

Background The field of next-generation sequencing informatics has matured to a point where algorithmic advances in sequence alignment and individual feature detection methods have stabilized. Practical and robust implementation of complex analytical workflows (where such tools are structured into "best practices" for automated analysis of next-generation sequencing datasets) still requires sig...

متن کامل

Processing biological literature with customizable Web services supporting interoperable formats

Web services have become a popular means of interconnecting solutions for processing a body of scientific literature. This has fuelled research on high-level data exchange formats suitable for a given domain and ensuring the interoperability of Web services. In this article, we focus on the biological domain and consider four interoperability formats, BioC, BioNLP, XMI and RDF, that represent d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011