Tags - Cross Validated

multicollinearity

Situation when there is strong linear relationship among predictor variables, so that their correlation matrix becomes (almost) singular. This "ill condition" makes it hard to determine the unique rol…

1241 questions

5 asked this month, 52 this year

normalization

Usually "normalization" means re-expressing univariate data to make values lie within a specified range.

1240 questions

26 asked this year

markov-process

A stochastic process with the property that the future is conditionally independent of the past, given the present.

1239 questions

29 asked this year

survey

An instrument used to collect a sample from a population. Surveying often refers to sampling of human populations and is primarily done by administering questionnaires or interviewing individuals.

1228 questions

49 asked this year

proportion

A proportion is the fraction of some total that is of a particular kind, either (i) as a count of one type of thing out of a total count, or (ii) as a component of a continuous variable.

1223 questions

44 asked this year

structural-equation-modeling

Structural Equation Modeling is a multivariate technique. It is based on formulating a set of linear relations between variables, some of which may be latent, and estimating the whole system, typicall…

1222 questions

6 asked this month, 94 this year

nonlinear-regression

only for regression models in which the response is a nonlinear function of the parameters. Do not use this tag for nonlinear data transformation; use [data-transformation] for that inste…

1214 questions

40 asked this year

ordinal-data

Data with categorical values that can be ordered by magnitude, but the exact distance (spacing) between categories is undefined or unknown.

1205 questions

50 asked this year

goodness-of-fit

Goodness of fit tests indicate whether or not it is reasonable to assume that a random sample comes from a specific distribution.

1199 questions

35 asked this year

convergence

Convergence generally means that a sequence of a certain sample quantity approaches a constant as the sample size tends to infinity. Convergence is also a property of an iterative algorithm to stabili…

1196 questions

45 asked this year

fixed-effects-model

In biostatistics, fixed-effects may mean population-average effects. In econometrics, fixed-effects may represent the observed quantities in terms of explanatory variables that are treated as if the q…

1195 questions

71 asked this year

heteroscedasticity

Non-constant variance along some continuum in a random process, or varying between discrete groups

1193 questions

37 asked this year

loss-functions

A function used to quantify the difference between observed data and predicted values according to a model. Minimization of loss functions is a way to estimate the parameters of the model.

1187 questions

44 asked this year

data-mining

Data mining uses methods from artificial intelligence in a database context to discover previously unknown patterns. As such, the methods are usually unsupervised. It is closely related but not identi…

1174 questions

1 asked this year

r-squared

The coefficient of determination, usually symbolized by $R^2$, is the proportion of the total response variance explained by a regression model. Can also be used for various pseudo R-squared proposed…

1160 questions

47 asked this year

covariance-matrix

A $k\times k$ matrix of covariances between all pairs of $k$ random variables. It is also called variance-covariance matrix or simply the covariance matrix.

1159 questions

48 asked this year

error

The error of an estimate or prediction is its deviation from the true value, which may be unobservable (e.g., regression parameters), or observable (e.g., future realizations). Use the [error-message]…

1158 questions

30 asked this year

natural-language

Natural Language Processing is a set of techniques from linguistics, artificial intelligence, machine learning and statistics that aim at processing and understanding human languages.

1148 questions

21 asked this year

model-evaluation

On evaluating models, either in-sample or out-of-sample.

1140 questions

6 asked this month, 56 this year

bias

The difference between the expected value of a parameter estimator & the true value of the parameter. Do NOT use this tag to refer to the [bias-term] / [bias-node] (ie the [intercept]).

1119 questions

39 asked this year

dimensionality-reduction

Techniques for reducing a large number of variables or dimensions spanned by data to a smaller number of dimensions while preserving as much information about the data as possible. Prominent methods i…

1117 questions

33 asked this year

gaussian-process

Gaussian processes refer to stochastic processes whose realization consists of normally distributed random variables, with the additional property that any finite collection of these random variables …

1096 questions

32 asked this year

glmm

Generalized Linear Mixed (effects) Models are typically used for modeling non-independent non-normal data (eg, longitudinal binary data).

1093 questions

7 asked this month, 73 this year

generalized-additive-model

Generalized additive models (GAMs) are regressions that estimate nonlinear patterns in data. This tag should not be used with the `glm` tag unless the question explicitly deals with comparison of the …

1090 questions

5 asked this month, 116 this year

reinforcement-learning

A set of dynamic strategies by which an algorithm can learn the structure of an environment online by adaptively taking actions associated with different rewards so as to maximize the rewards earned.

1077 questions

26 asked this year

normality-assumption

Many statistical methods assume data or a model's residuals are normally distributed. Use this tag for questions about the assumption & testing of normality, or about normality as a *property*. Use [n…

1076 questions

32 asked this year

unbalanced-classes

Data organized into discrete categories or *classes* may present problems for certain analyses if the number of observations ($n$) belonging to each class is not constant across classes. Classes with …

1074 questions

38 asked this year

aic

AIC stands for the Akaike Information Criterion, which is one technique used to select the best model from a class of models using a penalized likelihood. A smaller AIC implies a better model.

1054 questions

36 asked this year

k-means

k-means is a method to partition data into clusters by finding a specified number of means, k, s.t. when data are assigned to clusters w/ the nearest mean, the w/i cluster sum of squares is minimized

1052 questions

19 asked this year

posterior

In Bayesian statistics, the term 'posterior' refers to the probability distribution of a parameter conditioned on the observed data.

1036 questions

38 asked this year

sample

a sequence of objects or individuals collected from a larger (possibly infinite) population or process.

1031 questions

29 asked this year

assumptions

Refers to the conditions under which a statistics procedure yields valid estimates and/or inference. E.g., many statistical techniques require the assumption that the data are randomly sampled in some…

1027 questions

37 asked this year

prior

In Bayesian statistics a prior distribution formalizes information or knowledge (often subjective), available before a sample is seen, in the form of a probability distribution. A distribution with la…

1021 questions

30 asked this year

autoregressive

The autoregressive (AR) model is a stochastic process modelling time series, which specifies the value of the series linearly in terms of the previous values.

1003 questions

29 asked this year

poisson-regression

Poisson regression is one of a number of regression models for dependent variables that are counts (non-negative integers). A more general model is negative binomial regression. Both have numerous var…

1002 questions

48 asked this year

overfitting

Modeling error (especially sampling error) instead of replicable and informative relationships among variables improves model fit statistics, but reduces parsimony, and worsens explanatory and predict…

1000 questions

29 asked this year