Knowing what the discrete distribution is in Big Data statistics is extremely important knowledge, since This type of distribution is one of the two broadest and most used in a statistical study of big data.
The other type of distribution is the continuous one, which, in contrast, is responsible for the functions that cannot be addressed from the discrete one. In sum, each of these distributions It is made up of more types of distributions that specialize in certain data analysis.
Thus, this knowledge is referred to a statistical study based on a probability process; one of the branches of statistics. Next, we present what the discrete distribution is in Big Data statistics.
What is the discrete distribution in Big Data statistics?
The discrete distribution in Big Data statistics refers to a process that describes the level of occurrence of a random value. In fact, this type of knowledge is part of the probability branch within big data statistics.
Besides, The discrete distribution in Big Data statistics works with accounting values, that is, with whole numbers that can be projected under a statistical analysis of probability. As for its name, the discrete character alludes to how random this type of distribution is.
Now, within the discrete distribution in Big Data statistics there is a variety of subdivided distributions, some of them are the Bernoulli, binomial, Poisson or exponential distribution. Next, In this post, we share some specificities of each of them:
Bernoulli distribution
It is a discrete distribution that can take two values, one with probability and one not. It is used to describe events that only have two possible outcomes, such as Yes/No, 1/0, or Heads/Tails.
For example: What you are going to study are the results of tossing a coin once.
To do this, you can count on the following commands for the mean and variance estimators:
The Bernoulli distribution is a special case of the binomial distribution with n=1. You can simulate a Bernoulli distribution from a uniform one simply comparing whether the value exceeds a threshold that is determined by the probability of the binomial distribution.
Poisson distribution
It has its origin in a binomial distribution as n -> ∞ and p -> 0maintaining λ = n p constant.
This distribution expresses the probability that a given number of events will occur in a fixed interval of time (or space). if the events occur with a constant frequency and are independent (they do not depend on when the last event occurred).
For example: test the number of calls that a mobile telephone antenna makes in a certain time slot.
Now, the commands for the Mean and variance estimators are:
Binomial distribution
The binomial distribution is a generalization of the Bernoulli distribution for independent eventseach of which has two possible Yes/No outcomes with probability .
For example: by throwing three coins into the air and seeing what the probability is that two will come up heads.
To do this, you must take into account the variables that define the distribution:
p – probability of success of an individual case.
n – number of total events that you want to measure.
k – number of events in which YES has come up.
On the other hand, the registration of this type of binomial distribution for the mean and variance estimators is the next:
Exponential distribution
Describes the time that elapses between two events that follow a Poisson distribution. That is, given a process that continuously and independently produces events at a constant rate, The time between two events will be given by an exponential distribution.
For example: carry out a statistical probability study that takes care of the time between two consecutive calls that arrive at a mobile phone antenna during a certain time slot.
Finally, the registration of this function based on the mean and variance estimators is:
How to continue learning Big Data?
In the development of this post, we have familiarized you with what it is the discrete distribution in Big Data statistics. Despite this, you must consider that There are different types of distributions depending on the interest and the intended function. for data analysis, so it is appropriate to know them to implement the most appropriate one in each case.
To continue with your development and become an expert in managing Big Data, at we present our Bootcamp Full Stack Big Data, Artificial Intelligence & Big Data. It will help you have a global vision of the Big Data world and you will explore both theoretically and practically how to identify the most appropriate data management alternatives for big data studies, such as artificial intelligence methods, machine learning, statistics and database systems. Sign up and become a data scientist professional in less than a year!