Mathematics is very important in the field of data science as concepts within mathematics aid in identifying patterns and assist in creating algorithms. The understanding of various notions of Statistics and Probability Theory are key for the implementation of such algorithms in data science.
Beyond the basics of calculus, linear algebra, and probability, there is a certain kind of mathematical thinking that comes up pretty often when you’re trying to understand data. It involves quantifying something you want to measure, then understanding how the quantification works in mathematical terms. The interesting part is not usually doing the math, but figuring out what math to do.
Most of the mathematics required for Data Science lie within the realms of statistics and algebra,
Statistics, in particular, is at the very foundation of Data Science, and is the collection of tools which helps us separate significance from randomness. Algebra is quite often at the heart of the analysis itself. The further quantitative skills facilitate intuition, which is essential in analytics.
Data-scientist should have a knowldge about one or more of this topics :
- Linear algebra
- Discrete math
- Differential equations
- Theory of statistics
- Numerical analysis : numerical linear algebra and quadrature
- Abstract algebra
- Number theory
- Real analysis
- Complex analysis
- Intermediate analysis
- Probability and Statistics
- Linear Algebra
- Matrix Theory
- Calculus
- Set theory
? Here are some of the Useful resources to improve your Math skills & Data Science Expertise-
MUST READ Books
1) The Elements of Statistical Learning(Springer Series)
2) Introduction to Linear Algebra by Gilbert Strang.
3) Naked Statistics by Charles Wheelan.
4) An Introduction to Statistical Learning: with Applications in R.
5) Pattern Recognition and Machine Learning by Christopher M. Bishop.
6) Pattern Classification ((A Wiley-Interscience publication).
7) Introduction to Statistical Learning
8) Introduction to Bayesian Statistics
Must Know Algorithms for Data Scientist
Algorithms | Library | Tutorial |
Principal Component Analysis(PCA)/SVD | https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.svd.html http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html | https://arxiv.org/pdf/1404.1100.pdf |
Least Squares and Polynomial Fitting | https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.htmlhttps://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.polyfit.html | https://lagunita.stanford.edu/c4x/HumanitiesScience/StatLearning/asset/linear_regression.pdf |
Constrained Linear Regression | http://scikit-learn.org/stable/modules/linear_model.html | https://www.youtube.com/watch?v=5asL5Eq2x0A https://www.youtube.com/watch?v=jbwSCwoT51M |
K means Clustering | http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html | https://www.youtube.com/watch?v=hDmNF9JG3lo https://www.datascience.com/blog/k-means-clustering |
Logistic Regression | http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html | https://www.youtube.com/watch?v=-la3q9d7AKQ |
SVM (Support Vector Machines) | http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html | https://www.youtube.com/watch?v=eHsErlPJWUU |
Feedforward Neural Networks | http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html https://github.com/keras-team/keras/blob/master/examples/reuters_mlp_relu_vs_selu.py | http://www.deeplearningbook.org/contents/mlp.html http://www.deeplearningbook.org/contents/autoencoders.html http://www.deeplearningbook.org/contents/representation.html |
Convolutional Neural Networks (Convnets) | https://developer.nvidia.com/digits https://github.com/kuangliu/torchcv https://github.com/chainer/chainercv https://keras.io/applications/ | http://cs231n.github.io/ https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/ |
Recurrent Neural Networks (RNNs) | https://github.com/tensorflow/models https://github.com/wabyking/TextClassificationBenchmark http://opennmt.net/ | http://cs224d.stanford.edu/ http://www.wildml.com/category/neural-networks/recurrent-neural-networks/ http://colah.github.io/posts/2015-08-Understanding-LSTMs/ |
Conditional Random Fields (CRFs) | https://sklearn-crfsuite.readthedocs.io/en/latest/ | http://blog.echen.me/2012/01/03/introduction-to-conditional-random-fields/ https://www.youtube.com/watch?v=GF3iSJkgPbA |
Decision Trees | http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html http://xgboost.readthedocs.io/en/latest/ https://catboost.yandex/ | http://xgboost.readthedocs.io/en/latest/model.html https://arxiv.org/abs/1511.05741 https://arxiv.org/abs/1407.7502 http://education.parrotprediction.teachable.com/p/practical-xgboost-in-python |
TD Algorithms | https://github.com/keras-rl/keras-rl https://github.com/tensorflow/minigo | https://web2.qatar.cmu.edu/~gdicaro/15381/additional/SuttonBarto-RL-5Nov17.pdf https://www.youtube.com/watch?v=2pWv7GOvuf0 |