With Machine Learning and Big Data being a hot topic in today’s tech world, here are a few questions that come up to a beginner’s mind:
Why Data Science?
What programming languages should I learn?
Where do I start?
If you enjoy solving problems and want to use data to substantiate your findings, Data Science is the field you want to be in. We all deal with data in some form, on a day-to-day basis. Let’s say you are reviewing your annual credit card statement, which categorizes the spending across the year. You can quickly see trends of your spending in each category across several months and derive where you need to reduce your spending. If we are able to realize value out of the data we come across and make informed decisions, we have given ourselves the jumpstart we need. Of course, you definitely need tools and skills to do this more efficiently.
Let’s understand what tools would be helpful in your journey to become a Data Scientist. One would be R, which is an open source programming language and is primarily used for statistical analysis. R is well known for its visualization capabilities and data manipulation techniques and is very popular among academic scientists and researchers. Although R is widely used, there are areas where Python and other packages may be better suited. There may a scenario where you may have to deal with very large datasets and in such cases, Python will be a better tool to go with. Clearly, there is no single programming language that is perfect for every data problem. It is really up to the Data Scientist to identify and use the best tools on hand for the job.
So, now you may be wondering where to start. I am not an R-pro, but definitely pro-R. So, you may hear a more biased response from me. Yes, R should be your starting point for the following reason. Data analysis is the starting point of finding a solution to a Data Science problem. R, with its vast number of libraries is certainly a great tool for data analysis. As you may already know or have heard before, data wrangling is a very time-consuming process and R offers several packages to tidy up your data and get it to a format that can be absorbed by the model. So, learn and use R to an extent where you can comfortably build models and fine-tune them as well.
And, the best way to learn a new tool or language is to experience it! Just download a public dataset or download the famous Titanic dataset and start exploring. This was the first dataset I played with on Kaggle, which is a platform where Data Scientists learn and compete. This platform lets you explore the dataset, build and score models, thereby giving you a complete overview of the process. Do watch this video on Data Analysis, that serves as a great way to get started with both R and Data Science.