Overview of methods of statistical analysis of data

The object of research in applied statistics is statistical data obtained as a result of observations or experiments. Statistical data is a set of objects (observations, cases) and signs (variables) characterizing them. For example, objects of research - countries of the world and signs, - geographical and economic indicators characterizing them: the continent; the height of the area above sea level; average annual temperature; the country's place in the list of quality of life, the share of GDP per capita; public spending on health care, education, the army; average life expectancy; the share of unemployment, illiteracy; index of quality of life, etc.

Variables are variables that can take on different values as a result of measurement.

Independent variables are variables whose values can be changed during the experiment, and dependent variables are variables whose values can only be measured.

Variables can be measured in different scales. The distinction between scales is determined by their informativeness. The following types of scales are considered, presented in order of increasing informativeness: nominal, ordinal, interval, ratio scale, absolute. These scales also differ from each other in the number of allowed mathematical operations. The "poorest" scale is nominal, since no arithmetic operation is defined, the richest is absolute.

Measurement on the nominal (classification) scale means determining whether an object (observation) belongs to one class or another. For example: gender, military, profession, continent, etc. In this scale you can only count the number of objects in classes - frequency and relative frequency.

Measurement on an ordinal (rank) scale, in addition to determining the class of affiliation, allows you to order the observations by comparing them with each other in some respect. However, this scale does not determine the distance between classes, but only which of the two observations is preferable. Therefore, ordinal experimental data, even if depicted by numbers, cannot be treated as numbers and arithmetic operations on them 5 . In this scale, in addition to counting the frequency of the object, one can calculate the rank of the object. Examples of variables measured on an ordinal scale are: student scores, competition prizes, military ranks, a country's place on the quality of life list, etc. Sometimes nominal and ordinal variables are called categorical, or grouping variables, because they allow for the division of research objects into subgroups.

When measured on an interval scale, the ordering of the observations can be done so precisely that the distances between any two of them are known. The interval scale is unique to linear transformations (y = ax + b). This means that the scale has an arbitrary reference point - a conditional zero. Examples of variables measured on an interval scale: temperature, time, and elevation. Over the variables in this scale you can perform the operation of determining the distance between observations. Distances are full-fledged numbers and any arithmetic operations can be performed on them.

A ratio scale is similar to an interval scale, but it is unique to the transformation of the form y = ax. This means that the scale has a fixed reference point, absolute zero, but an arbitrary scale of measurement. Examples of variables measured in the ratio scale are: length, weight, amperage, amount of money, society's spending on health care, education, the military, average life expectancy, etc. The measurements in this scale are full numbers and any arithmetic operations can be performed on them.

An absolute scale has both an absolute zero and an absolute unit (scale). An example of an absolute scale is a number line. This scale is dimensionless, so measurements in it can be used as a measure of degree or a base of logarithm. Examples of measurements on an absolute scale: the share of unemployment; the share of illiterates, the quality of life index, etc.

Most statistical methods refer to methods of parametric statistics, which are based on the assumption that a random vector of variables forms some multivariate distribution, usually normal or transformed to a normal distribution. If this assumption is not confirmed, we should use nonparametric methods of mathematical statistics.

Get statistics assignment

Correlation analysis. There can be a functional relationship between variables (random variables) which is manifested by the fact that one of them is defined as a function of the other. But there can also be a different kind of relationship between variables, manifested in the fact that one of them responds to a change in the other by changing its law of distribution. This relationship is called a stochastic relationship. It appears when there are common random factors affecting both variables. As a measure of the relationship between the variables