subjects data scientists need to know:
- data processing (dataset prep for model/analysis)
- outlier removal techniques (distribution-based, grubbs test), normalization (box-cox transformation), standardization, correlation between features, dimensionality reduction (PCA, linear discriminant analysis, non-negative matrix factorization)
- modeling (building/optimizing/performance eval)
- different types of regressions/mathematical differences, regularization, variance and bias, linear and logistic regression, naive bayes, random forest (information gain / impurity score; gini, entropy), SVM, KNN, k-means clustering, performance eval (precision, recall, mse, r squared, ROC-AUC, accuracy, log-loss, benchmarking), clustering algorithms, anomaly detection models (autoencoders, PCA, isolation forest)
- math: linear algebra, calculus (multi-variate), statistics & probability (can include bayesian statistics for more math-specific DS roles)
- experimental design (A/B testing or synthetic controls)
- sampling techniques, identification of bias in data, hypothesis testing, confidence intervals, marketing use cases, model evaluation
- SQL; many companies still have rounds related to SQL/data querying involving window functions, rolling averages, complex merges, and so on
- algorithms; LeetCode Mediums/Hards should be expected in any tech interview, especially in more math-heavy/quant roles
- data structures + big O notation is probably not going to be asked for DS roles, but is required if the role is more software engineering focused
<aside>
💡 What you want to start learning first depends on the role you’re applying for. For example, roles within marketing analytics will involve more case studies and experimental design while R&D roles might involve more ML math and take-homes.
The main DS Interview book (Ace the Data Science Interview) follows this chronological 7-chapter structure: probability, statistics, machine learning, SQL & db design, coding, product sense, and case studies.
</aside>
DS Resources Links:
SQL cheat sheet.pdf