The coefficient of determination R² in statistics It is one of the calculations that follow the model you implement in your big data processing. Indeed, having this type of knowledge will help you understand the main purpose of statistics, that is, understand the variables and the different relationships found between them.
In fact, this type of calculations serve the purpose of facilitating processes and ensuring more precise results, thanks to which the value of the information is highlighted. For this reason, In this post, we explain what the coefficient of determination R² is and how it works in statistics for the management of big data.
Coefficient of determination R² in statistics
The coefficient of determination R² in statistics provides a measure that lets you know how well the measure follows the model used. In sum, this statistic It is calculated by means of the following formula:
In which:
is the sum of the square of the residuals:
is proportional to the variance of AND:
In this way, the coefficient of determination R² in statistics determines that the closer to 1, the better the prediction follows the actual data.
On the other hand, it answers the question: How better is my model compared to one that always returns the average value?
Rsq <- 1–(sum((And–est_Y)^2))/(sum((And–mean(Y))^2)) print(paste(«The coefficient of determination is:»,Rsq))
[1] «The coefficient of determination is: 0.985188061001936»
Next, so that you can delve deeper into how the coefficient of determination R² works in statistics, We present another example:
options(repr.plot.height=4,repr.plot.width=6) no.<-40 xn<-rnorm(n,sd=1) yn<-xn*2+rnorm(n,mean=2,sd=1) data<-data.frame(and=yn,X=xn) model=lm(data, formula=and~X+0) plot(xn,yn,col=«blue») abline(c(0,model$coefficients),col=«red») summary(model)$r.squared
This, taken to the schematization, points to a scatter plot which would be the following:
options(repr.plot.height=2,repr.plot.width=6) vcov_matrix<-as.numeric(t(model$residual)%*%model$residual/(length(model$residual)-2))*solve(t(xn)%*%xn) ggplot(data=data, aes(x=x)) +
stat_function(fun=mydt,args = list(df = df,mn=model$coefficients[1],s.d.=sqrt(diag(vcov_matrix))[1]),color=«#2222BB»)+
cnf_int<-confine(model) cnf_int
options(repr.plot.height=4,repr.plot.width=6) data<-data[order(datos$X),]
pred<-predict(model, data, interval=«confidence») est_Y <- pred[,»fit»]
plot(xn,yn,col=«blue») points(data$X,est_Y,col=«network») #lines(data$X,pred[,»fit»],col=»red») #lines(data$X,pred[,»lwr»],col=»black») #lines(data$X,pred[,»upr»],col=»black») #abline(c(0,cnf_int[2]),col=»gray») #abline(c(0,cnf_int[1]), col=»gray»)
Finally, We encourage you to continue practicing this calculation of determination coefficient R² in statistics through more practical examples.
