What skills do you need to become a Data Scientist? This Big Data and Machine Learning professional is in charge of managing data and analyzing it, but in a very particular way.
According to the definition of Josh Wills, Director of Data Engineering at Slack: «Data Scientist (n): Person better at statistics than any developer and better at programming than any statistician». But reducing it to that would be too simple. Data science must be much more than that for it to become one of the most promising professions of today and, if everything continues like this, of the future.
So, how to be a Data Scientist? If you want to know how to be a data scientist and immerse yourself in the world of Big Data, we tell you everything you need to know.
What is a data scientist?
According to Jose Antonio Guerreroconsidered the best data scientist in Spain and who in 2013 was named the best in the world by the social network of data scientists Kaggle, this is the definition to know how to be data scientist:
He is a person with foundations in mathematics, statistics and optimization methods, with knowledge of programming languages and who also has practical experience in analyzing real data and developing predictive models.
Within that description appear the knowledge that a data scientist needs, but before exploring them in depth, we are going to differentiate between Big Data and Machine Learning through real cases to understand its importance and use and what to study to be a data scientist or how to be data scientist.
The importance of Big Data and Machine Learning
Throughout the world, trillions of data of different kinds are generated. In fact, on a normal day the following happens:
Blogs generate nearly 2 million new entries. More than 5 million queries are made on Google. WhatsApp sends 25 million messages. The number of emails sent exceeds 100 million.
Searching through this thicket of unclassified data and extracting valuable information is what is known as Big Data..
And it is only public data, we must also add the data that is generated in the private sector, especially consumption, banking, transportation and the energy sector. The data figures handled on the internet are astronomical.
This happens every minute. Hence, several automated applications value this area, such as Facebook or Netflix.
Facebook analyzes your likes and keywords in comments to show you similar content (see the Trump case). Amazon suggests products depending on your searches and proposes new products based on your purchases. Netflix recommends movies and series to you based on the information on the movies you’ve already seen along with the type of profile it considers you based on your browsing. Within a few years, we will be able to buy autonomous cars, which make decisions based on data at rest and in real time.
The creation of algorithms or machines with the ability to make decisions autonomously that, in addition, depending on the result of each of them, they reinforce their own learning, is Machine Learning.
But this data science, as we know it, does not end here. Today we are far from conquering the stars. Aeronautical technology has not evolved enough to be able to book round-trip tickets to the Moon, and driving a remote-controlled robot through the sands of Mars searching for information is not the calling card that NASA should have. Now, imagine a machine with continuous learning and decision-making capabilities learning about the topology of other planets and sending us the information. Definitely, The fields to which data science can be applied are enormous..
However, progress is still slow due to several reasons, but in my opinion, the main one is because the scarcity of these professional profiles. Due to the explosion of the internet, the exponential growth of computing power, the digitization of all types of objects (cars, refrigerators, watches, bracelets) and the lowering of storage capacities; Big Data and Machine Learning are in need of professionals.
Unfortunately, the skills that these professionals must have are not easy to find on the market, and the supply cannot satisfy a demand that, since 2014, increases 33% annual.
How to be a data scientist or data scientist?
To know how to be data scientist you must take into account the following.
4 skills or requirements you need
The 4 skills you must have as requirements to be a data scientist or a data scientist are:
1. Mathematics
As in most professions, two cases must be differentiated: what you need to know to be able to work as a Data Scientist, which you may have to use occasionally and relying on information from the internet, and what you are really going to need to apply in your day by day with judgment and fluidity. Think that a data scientist should know the basicsnot be a mathematician.
2. Data analysis
This is the real skill a data scientist must have and why they will be most valuable. Much of the software and tools used in Big Data and Machine Learning are responsible for doing most of the mathematical calculations for you, however, no one can do this.
80% of a Data Scientist’s work is based on data preparation and visualization. It is the most important skill and, therefore, you must have very solid data analysis skills. Exploration, cleaning, model construction and presentation of results.
3. Programming languages and tools
What does it take to be a Data Scientist or data scientist? You must also know programming languages. Within Big Data and Machine Learning there are a multitude of languages, frameworks and tools: Spark, Hadoop, Cloudera, Scala…
The more technologies you know how to use, the greater the value as a data scientist and the better your performance in different companies, but above all this, there are three basics that you should know.: SQL, R and Python.
SQL
He 68% of data scientists use SQL as a relational database manager, so it is necessary to study data science which is a complex discipline. It is true that non-relational must also be used because the data is often unstructured, but due to its speed, performance and low cost as it can be executed with few resources, it is one of the technologies that cannot be missing from the Data skills kit. Scientist.
R
He 52% of these professionals use R for their regular work. It is largely due to the fact that for years it has been the statistical language par excellence. Custom is combined with the solidity of the frameworks and tools that have been created over time. But little by little the paradigm is changing and, despite having a great weight in data science, its growth is stagnant. It is the language of those who approach this sector from mathematical sciences and other scientific branches.
Python
For study data scienceyou must know Python. He is being R’s nemesis and he is already a 51% those who use it. With a clear and easy syntax, many newcomers opt for this language, as well as those who come from computer engineering backgrounds. Another key is that it is a language that does not only have a statistical approach, but also can be used for other purposes. Although despite its growth, it still has a lot of work to do developing the entire infrastructure, which is little by little becoming more competitive, with initiatives like SciPy, for example.
4. Business intelligence
You may have wondered how to become a Data Scientist if you already know languages, data analysis and other skills.
As we see, a data scientist must have a compendium of skills belonging to different worlds to study data scienceandand another field is added: the business vision. You must have the ability and knowledge necessary to interpret and detect trends in your area and translate these discoveries into actions that impact the business, create new opportunities or communicate your findings in order to promote changes within the company, the product or the processes. services. Because there is no point in applying complex Machine Learning algorithms to objectives that have no value for the company.
This is where the data scientist must use knowledge to impact results and play an important role in deciding the direction a company can take in terms of innovation.
Employment situation: actively seeking a Data Scientist
There is a hole in the labor market for this professional. Hard skills to gather, big impact on the business and the fact that it comes preceded by a boom. Due to the exponential nature of technology that has made it possible to lower costs and expand the accessibility of Big Data to all types of companies, demand has multiplied while the training of data scientists, slower, cannot satisfy the market.
In other words: in a very short time the need has arisen to incorporate a professional who barely existed in these sectors. And that is why today it has become one of the most valued and best paid profiles within companies, with average salaries in Spain above €50,000 and which can reach up to €90,000.
How to become a data scientist?
Are you determined to boost your career in data? How to become a data scientist is one of the most asked questions by beginners, followed by what it takes to be a data scientist.