Tags
A tag is a keyword or label that categorizes your question with other, similar questions. Using the right tags makes it easier for others to find and answer your question.
Situation when there is strong linear relationship among predictor variables, so that their correlation matrix becomes (almost) singular. This "ill condition" makes it hard to determine the unique rol…
1241 questions
Usually "normalization" means re-expressing univariate data to make values lie within a specified range.
1240 questions
A stochastic process with the property that the future is conditionally independent of the past, given the present.
1239 questions
An instrument used to collect a sample from a population. Surveying often refers to sampling of human populations and is primarily done by administering questionnaires or interviewing individuals.
1228 questions
A proportion is the fraction of some total that is of a particular kind, either (i) as a count of one type of thing out of a total count, or (ii) as a component of a continuous variable.
1223 questions
Structural Equation Modeling is a multivariate technique. It is based on formulating a set of linear relations between variables, some of which may be latent, and estimating the whole system, typicall…
1222 questions
only for regression models in which the response is a nonlinear function of the parameters. Do not use this tag for nonlinear data transformation; use [data-transformation] for that inste…
1214 questions
Data with categorical values that can be ordered by magnitude, but the exact distance (spacing) between categories is undefined or unknown.
1205 questions
Goodness of fit tests indicate whether or not it is reasonable to assume that a random sample comes from a specific distribution.
1199 questions
Convergence generally means that a sequence of a certain sample quantity approaches a constant as the sample size tends to infinity. Convergence is also a property of an iterative algorithm to stabili…
1196 questions
In biostatistics, fixed-effects may mean population-average effects. In econometrics, fixed-effects may represent the observed quantities in terms of explanatory variables that are treated as if the q…
1195 questions
Non-constant variance along some continuum in a random process, or varying between discrete groups
1193 questions
A function used to quantify the difference between observed data and predicted values according to a model. Minimization of loss functions is a way to estimate the parameters of the model.
1187 questions
Data mining uses methods from artificial intelligence in a database context to discover previously unknown patterns. As such, the methods are usually unsupervised. It is closely related but not identi…
1174 questions
The coefficient of determination, usually symbolized by $R^2$, is the proportion of the total response variance explained by a regression model. Can also be used for various pseudo R-squared proposed…
1160 questions
A $k\times k$ matrix of covariances between all pairs of $k$ random variables. It is also called variance-covariance matrix or simply the covariance matrix.
1159 questions
The error of an estimate or prediction is its deviation from the true value, which may be unobservable (e.g., regression parameters), or observable (e.g., future realizations). Use the [error-message]…
1158 questions
Natural Language Processing is a set of techniques from linguistics, artificial intelligence, machine learning and statistics that aim at processing and understanding human languages.
1148 questions
On evaluating models, either in-sample or out-of-sample.
1140 questions
The difference between the expected value of a parameter estimator & the true value of the parameter. Do NOT use this tag to refer to the [bias-term] / [bias-node] (ie the [intercept]).
1119 questions
Techniques for reducing a large number of variables or dimensions spanned by data to a smaller number of dimensions while preserving as much information about the data as possible. Prominent methods i…
1117 questions
Gaussian processes refer to stochastic processes whose realization consists of normally distributed random variables, with the additional property that any finite collection of these random variables …
1096 questions
Generalized Linear Mixed (effects) Models are typically used for modeling non-independent non-normal data (eg, longitudinal binary data).
1093 questions
Generalized additive models (GAMs) are regressions that estimate nonlinear patterns in data. This tag should not be used with the `glm` tag unless the question explicitly deals with comparison of the …
1090 questions
A set of dynamic strategies by which an algorithm can learn the structure of an environment online by adaptively taking actions associated with different rewards so as to maximize the rewards earned.
1077 questions
Many statistical methods assume data or a model's residuals are normally distributed. Use this tag for questions about the assumption & testing of normality, or about normality as a *property*. Use [n…
1076 questions
Data organized into discrete categories or *classes* may present problems for certain analyses if the number of observations ($n$) belonging to each class is not constant across classes. Classes with …
1074 questions
AIC stands for the Akaike Information Criterion, which is one technique used to select the best model from a class of models using a penalized likelihood. A smaller AIC implies a better model.
1054 questions
k-means is a method to partition data into clusters by finding a specified number of means, k, s.t. when data are assigned to clusters w/ the nearest mean, the w/i cluster sum of squares is minimized
1052 questions
In Bayesian statistics, the term 'posterior' refers to the probability distribution of a parameter conditioned on the observed data.
1036 questions
a sequence of objects or individuals collected from a larger (possibly infinite) population or process.
1031 questions
Refers to the conditions under which a statistics procedure yields valid estimates and/or inference. E.g., many statistical techniques require the assumption that the data are randomly sampled in some…
1027 questions
In Bayesian statistics a prior distribution formalizes information or knowledge (often subjective), available before a sample is seen, in the form of a probability distribution. A distribution with la…
1021 questions
The autoregressive (AR) model is a stochastic process modelling time series, which specifies the value of the series linearly in terms of the previous values.
1003 questions
Poisson regression is one of a number of regression models for dependent variables that are counts (non-negative integers). A more general model is negative binomial regression. Both have numerous var…
1002 questions
Modeling error (especially sampling error) instead of replicable and informative relationships among variables improves model fit statistics, but reduces parsimony, and worsens explanatory and predict…
1000 questions