The cumulative distribution function in statistics It is a fundamental part of the number of strategies that must be learned for a good knowledge of statistics. This It allows you to know if certain variables are related or not or whether multiple groups of data can be considered different or the same.
In fact, A good statistical analysis will provide you, as part of your work as data scientista series of responses that can complement a later model that is much more complex.
Given this importance, you will be able to count on many functions and tools that enhance the certainty of the results obtained. For this, In this post, we bring you what the cumulative distribution function is and how it works in statistics for the management of big data.
Cumulative distribution function in statistics
The cumulative distribution function in statistics (or cumulative distribution function) for the management of Big Data te says the probability of obtaining a value less than or equal to a given threshold in a given random variable x.
On the other hand, this function It is calculated by means of the following formula:
How does the cumulative distribution function work in statistics?
Now the best way to understand what is the cumulative distribution function in statistics It is from a practical example. So, think that if you choose a random person in Spain, What would be the probability that it is less than or equal to 64 years of age?
f_population <- population[,c(«Total»,»edad»)]
f_population$Ratio <- f_population$Total/sum(f_population$Total) f_population_subset <- subset(f_population, age<=64) paste0(«The probability that by choosing a person at random his age will be less than or equal to 64 years is «, round(sum(f_poblacion_subset$Ratio),3)*100,»%»)
Thus, it is calculated that the probability that choosing a random person in Spain whose age is less than or equal to 64 years of age is 80.2%.
f_population <- population[,c(«Total»,»edad»)]
f_population$Ratio <- f_population$Total/sum(f_population$Total) plot(cumsum(f_population$Ratio))
#subset(f_population, age<=64)
Finally, the development of the graph of the cumulative distribution function in statistics For this specific example it would be illustrated as follows:
Learn more about Big Data and cumulative distribution
now that you know What is and how does the cumulative distribution function work in statistics? for the management of big data, we hope that you can use it in the processing of your data. Nevertheless, You should keep in mind that statisticians have many more functions that can be coupled more effectively to your study of big data, so we also advise you to continue learning about Big Data and its tools.
If you don’t know how to take the next step, at we offer you our Full Stack Big Data, Artificial Intelligence & Machine Learning Bootcamp. Through this, you will be able to learn everything related to the systems, languages and tools that most manipulate Big Data with effectiveness and agility. In short, you will go through each and every module both theoretically and practically. Request information and start now!