Statistics-Watistics
Alright, this post is a bit of a side trip on our learning journey. I’ve always believed that having a strong grip on statistics is super important when diving into machine learning. A lot of popular ML algorithms, like linear regression, are basically borrowed from stats. If you’re more into analytics than hardcore ML, knowing your statistics becomes even more critical. The tools and concepts you pick up in stats are insanely useful across all sorts of fields—whether it’s physics, engineering, social sciences, medical research, or even testing new drugs. Whatever buzzwords you’re into—Data Science, AI, Machine Learning, Deep Learning, Analytics, Data-Driven Decisions, Data Modeling—you name it, stats is at the heart of it all. Trust me, getting comfy with statistics will make everything else click into place.
But how so?
Well, there are two ways to look at data science:
Computational View: Here, data is seen as a huge sequence of numbers that need to be crunched by fast algorithms. Think of things like approximate nearest neighbors, low-dimensional embeddings, spectral methods, and distributed optimization.
Statistical View: In this view, data comes from a random process. The goal is to figure out how this process works so we can make predictions or understand what influences it.
Statistics is all about understanding the process that generates data. This process has two parts: one part is predictable and makes sense to us, and the other part is just pure randomness. The aim of statistics is to dig into this process, explain as much of it as possible, and strip away the randomness until all that's left is true, unpredictable randomness.
Therefore, having a reliable resource to understand what this field has to offer is crucial. This is why I want to talk about the course MITx 18.6501x: Fundamentals of Statistics by Professor Philippe Rigollet. It is offered by MIT through EdX. The course content is free for anyone to access, but if you want to complete it and get a certificate, there is a fee. It is a hell-of-course that one can take up to solidify one's foundations is his/her data science journey.
There are several advantages to accessing this course through EdX, just like other courses on the platform. You get access to all videos, transcripts, and PowerPoints used in class, along with online support for any questions about the material. Additionally, there's an active learning community where you can ask questions and get help with assignments and quizzes, often through helpful hints. The community itself is warm and welcoming, acting as a learning support group and keeping you motivated throughout the journey.
The course itself isn't a walk in the park—it's quite tough and demands a lot of rigor and discipline to complete, as the professor himself admits in his initial lectures. However, if you really put in the effort and push through with determination, the learnings are incredibly rewarding. Finishing this course has been one of the most satisfying experiences I've had in a long time.
So what do you get out of the course?
The course starts right off the bat by emphasizing the importance of the Central Limit Theorem (CLT) and the Law of Large Numbers (LLN), highlighting their foundational significance. One of the key concepts introduced early on is the three types of convergence between a sequence of numbers and a random variable generated from a random process:
- Almost surely convergence
- Convergence in probability
- Convergence in distribution
- Maximum Likelihood Estimation
- Method of Moments
- M-Estimation
Comments
Post a Comment