The coefficient of determination R² in statistics It is one of the calculations that follow the model you implement in your big data processing. Indeed, having this type of knowledge will help you understand the main purpose of statistics, that is, understand the variables and the different relationships found between them.
In fact, this type of calculations serve the purpose of facilitating processes and ensuring more precise results, thanks to which the value of the information is highlighted. For this reason, In this post, we explain what the coefficient of determination R² is and how it works in statistics for the management of big data.
Coefficient of determination R² in statistics
The coefficient of determination R² in statistics provides a measure that lets you know how well the measure follows the model used. In sum, this statistic It is calculated by means of the following formula:
In which:
is the sum of the square of the residuals:
Besides:
is proportional to the variance of AND:
In this way, the coefficient of determination R² in statistics determines that the closer to 1, the better the prediction follows the actual data.
On the other hand, it answers the question: How better is my model compared to one that always returns the average value?
Rsq <- 1–(sum((And–est_Y)^2))/(sum((And–mean(Y))^2)) print(paste(«The coefficient of determination is:»,Rsq))
[1] «The coefficient of determination is: 0.985188061001936»
summary(model)
Next, so that you can delve deeper into how the coefficient of determination R² works in statistics, We present another example:
options(repr.plot.height=4,repr.plot.width=6) no.<-40 xn<-rnorm(n,sd=1) yn<-xn*2+rnorm(n,mean=2,sd=1) data<-data.frame(and=yn,X=xn) model=lm(data, formula=and~X+0) plot(xn,yn,col=«blue») abline(c(0,model$coefficients),col=«red») summary(model)$r.squared
0.493914310299537
This, taken to the schematization, points to a scatter plot which would be the following:
summary(model)
options(repr.plot.height=2,repr.plot.width=6) vcov_matrix<-as.numeric(t(model$residual)%*%model$residual/(length(model$residual)-2))*solve(t(xn)%*%xn) ggplot(data=data, aes(x=x)) +
stat_function(fun=mydt,args = list(df = df,mn=model$coefficients[1],s.d.=sqrt(diag(vcov_matrix))[1]),color=«#2222BB»)+
geom_vline(xintercept=qt(0.975,df)*sqrt(diag(vcov_matrix))[1]+model$coefficients[1])+
geom_vline(xintercept=qt(0.025,df)*sqrt(diag(vcov_matrix))[1]+model$coefficients[1])+
xlim(-20,20)
cnf_int<-confine(model) cnf_int
options(repr.plot.height=4,repr.plot.width=6) data<-data[order(datos$X),]
pred<-predict(model, data, interval=«confidence») est_Y <- pred[,»fit»]
plot(xn,yn,col=«blue») points(data$X,est_Y,col=«network») #lines(data$X,pred[,»fit»],col=»red») #lines(data$X,pred[,»lwr»],col=»black») #lines(data$X,pred[,»upr»],col=»black») #abline(c(0,cnf_int[2]),col=»gray») #abline(c(0,cnf_int[1]), col=»gray»)
Finally, We encourage you to continue practicing this calculation of determination coefficient R² in statistics through more practical examples.
Learn more about Big Data
Through this post, you have been able to identify What is the coefficient of determination R² in statistics for Big Data. However, this development requires continuing to practice to gain experience. If you are not sure how to get started, From we bring you the best option!
Our Full Stack Big Data, Artificial Intelligence & Machine Learning Bootcamp has eleven modules that will prepare you and test your skills with the main tools developed for big data processing over the course of its 11 modules. To do this, you will also have the support of a series of Big Data experts who will guide you in both theoretical and practical processes. Don’t wait any longer, sign up and start now!