Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples

نویسندگان

چکیده

Due to the expansion of internet, we encounter various types big data such as web documents or sensing data. Compared traditional small experimental samples, provide more chances find hidden and novel patterns with analysis using statistics machine learning algorithms. However, use increases, problems also occur. One them is a zero-inflated problem in structured preprocessed from Most count values are zeros because specific word found only some documents. In particular, since most patent form text document, they affected by problem. To solve this problem, propose generation synthetic samples statistical inference tree structure. Using document simulation data, verify performance validity our proposed method. paper, focus on keyword analysis, just like other

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Zero Inflated Longitudinal Data Using Proc Nlmixed

Commonly used parametric models may lead to erroneous inference when analyzing count or continuous data with excess of zeroes. For non-clustered data, the most commonly used models to address the issue of excess zeroes are zero inflated Poisson (ZIP), zero inflated negative binomial (ZINB), hurdle Poisson (HP) and hurdle negative binomial (HNB). Our goal is to expand these for modeling longitud...

متن کامل

Semiparametric analysis of zero-inflated count data.

Medical and public health research often involve the analysis of count data that exhibit a substantially large proportion of zeros, such as the number of heart attacks and the number of days of missed primary activities in a given period. A zero-inflated Poisson regression model, which hypothesizes a two-point heterogeneity in the population characterized by a binary random effect, is generally...

متن کامل

Semiparametric analysis of longitudinal zero-inflated count data

Background: The instrumental activities of daily living (IADLs) are important index of physical functioning in older adult studies. These count outcomes with a large proportion of zeros are often collected in longitudinal studies. Data were from the Hispanic Established Population for Epidemiological Study of the Elderly (HEPESE), a four wave (seven years) longitudinal study of community-dwelli...

متن کامل

Mediation analysis for count and zero-inflated count data.

Different conventional and causal approaches have been proposed for mediation analysis to better understand the mechanism of a treatment. Count and zero-inflated count data occur in biomedicine, economics, and social sciences. This paper considers mediation analysis for count and zero-inflated count data under the potential outcome framework with nonlinear models. When there are post-treatment ...

متن کامل

Fitting Zero-Inflated Count Data Models by Using PROC GENMOD

Count data sometimes exhibit a greater proportion of zero counts than is consistent with the data having been generated by a simple Poisson or negative binomial process. For example, a preponderance of zero counts have been observed in data that record the number of automobile accidents per driver, the number of criminal acts per person, the number of derogatory credit reports per person, the n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Future Internet

سال: 2022

ISSN: ['1999-5903']

DOI: https://doi.org/10.3390/fi14070211