The One Important Course You Don’t Want to Miss as a to-be-Data Scientist Part II

Picking up where we left off

In my previous post, I discussed for the first half of Johns Hopkins Data Science Specialization, there is one course you don’t want to miss if you are learning data science on your own. Well, there is the second half of the specialization. Again, I’ll discuss each course, some of the pros and cons, things I learned and difficulty of the course. At the end, I’ll discuss if the specialization was worthwhile. If a course is particularly note-worthy, I’ll discuss why. At this point, every course has a project and I would recommend starting them at the beginning of class and working on it as you learn / go through the course.

Statistical Inference

My Take: Don’t Miss This if You’re New to Data Analysis

Statistical inference is the process of gaining insights from the data and the statistical means to do that. The course consists of video lectures, 4 quizzes, and a final project. You’ll be introduced to the basics of probabilities and statistical modeling. There are a lot of concepts and videos in this course to be covered. Feel free to go back and rewatch some of the key concepts covered like expected values, variance, confidence intervals, and p-values. The videos will be the majority of the time spent in this course. Plan to set aside about 2 – 4 hours a week for this course. The difficulty of this course isn’t all that high at about a 3, but I was already familiar with the concepts being covered. If you’re truly new to statistics, this could be a very challenging course and I would rate it at a 6 or 7. This course is more than worth the price of admission though. I would rate the value of what you will learn for this course and its use in everyday analysis at 5 thumb ups on the Worth It Meter!!

Regression Models

My Take: Definitely Worth the Money

Regression models are key tool in modeling data. They will be used a majority of the time in your analysis. Don’t worry too much about the math portion of the class. They keep it to a bare minimum to explain the ideas behind the principles that you will be learning. You’ll learn key concepts like linear regression, logistic regression, multivariate regression, and others. This course is taught using video lectures, quizzes (4 of them, one every week), and a project due at the end of the course. The course will take between 2 – 4 hours a week. With the math portion minimized, this course isn’t too difficult. I’d rate it about 1 – 2, if you’re not familiar with regression models though the score will be a bit higher (3 – 4). This course is valuable for learning to become a data scientist. I’d rate it at 3 thumbs up on the Worth It Meter!

Practical Machine Learning

My Take: Will be Worth It Course

Next up is one of the most crucial pieces of your toolkit as a data scientist, machine learning. Practical machine learning teaches you the basic concepts, techniques, and tools you will be using. Each week of the class was a quiz due as well as a project at the end. I would suggest starting on the project as soon as possible. Personally, I had a lot of issues processing the data on my laptop using a random forest technique. I spent a number of hours trying to let the process run it’s when really my computer had frozen up and stopped processing. You’ll need about 2 – 6 hours a week, especially if you run into computational issues like I did. When you complete the final project, you’ll have to submit a set of predictions and if your model works correctly, you should predict all 20 correctly! Overall, I’d rate this course’s difficulty at a 5 for someone who is unfamiliar with machine learning concepts. If you are more familiar than the score would be much lower (2 – 3). Machine learning is a key concept and this course is a great introduction to that. I’d give the course at 4 thumbs up out of 5 on the Worth It Score!

Developing Data Products

My Take: Will be Worth It Course

This is the course where you will bring everything together and learn various ways to publish the analyses that you create. You’ll be signing up for accounts on sites like Shinyapps.io and RPubs. I enjoyed learning how to create apps in Shiny as they can be very interactive for end users. It’s not a difficult course. I liked that you could quickly create a variety of different outputs using the tools that you pick up from this class. There’s 3 quizzes and 3 assignments for this course. Each of the assignments help to reinforce the new tools you’ll learn from the course. They are very open ended assignments though. You’ll be given a few criteria to meet, but you can do an analysis on any set of data you like. You’ll spend 3 – 6 hours a week on this course, but that could be considerably less (1 – 3) if you are already familiar with tools like Shiny and Rpubs. The difficulty is 1 even if you aren’t experienced with the new tools.

Data Science Capstone

My Take: Skip if you don’t have the time or money

The capstone course of the “Data Science” specialization brings everything together in an extended 7 week course that features video lectures, readings, 3 quizzes, and most importantly a final project. The beginning will be a couple of videos and quizzes. After that, the majority of the course will focus on contributing to your final project. They estimate that you’ll need between 4 – 9 hours a week for this course, but I would leave a little more time if you’re still not comfortable with R at this point (6 – 12 hours). Overall, the course isn’t difficult. By this point, I had a good enough understanding of R and the data science process that I felt comfortable to handle the project. I’d rate it at a scale of between 1 and 3 for difficulty. It really just depends on how well you grasp R at this point.

Program Review

Overall, the program was great. There were numerous changes over the course of the program. When I first started taking courses, they were only offered monthly. By the end, there were weekly starts to the courses. While this made it more convenient to start classes, I feel that the forums started to lack because of it. More offerings led to smaller course groupings, which meant less people to interact with while you were taking the course. The content was phenomenal. Each course contains a few hours of video lectures and they are comprehensive for an introduction to data science and R. The pacing and checkpoints in the courses worked well. Having some type of quiz or assignment each week really helps to stay engaged with the learning process and helps to build strong habits.

At $479, the specialization is worth it’s cost. Each course has a book, which you can download for free (or pay, a portion goes to the book writer) to keep with you after the program ends. Anything you developed you keep as well, which you can reference at any point later for your own uses. The projects towards the end become more open ended which are great opportunities to do analysis on data that interests you.

What did I learn?

Through the specialization, I became more comfortable with using R for analysis. Projects at the beginning that took hours now would take me 30 minutes. It was great to have some lengthy video lectures that cover key statistical concepts. If you were new to data analysis and science, I can’t recommend the R programming and Statistical Inference courses enough. Those two courses alone will help form a strong foundation for your statistical analysis going forward. I enjoyed the “Developing Data Products” considerably as it was great to see a number of different tools that I could use to display my analysis in different shapes and forms. Being in the consulting space, being able to design interactive reports and graphs are a great way to increase engagement on those key concepts or insights that your analysis has found. You’ll come away from the program with a lot of R packages for you to explore on your own as well. “ggplot2” is package that was used a few times, but realistically someone could easily do a whole course on using its functions properly.

Until next week, #statheads!