Data Science Cheat Sheet For Beginners

28 Aug 2022

Introduction

If you are a Data Scientist, you’re well aware of the numerous SQL statements, excel formulas, functions, and algorithms in your profession. While the ones you use often are undoubtedly mastered, sometimes you need to leap into a project that demands different applications or new tools of your programming language of preference.

This is a specially drafted list of Data Science cheat sheets. These Data Science cheat sheet resources will make your work easier and help you become a better Data Scientist. Read this to uncover the best references for Python, SQL, Machine Learning, seaborn and more.

Machine Learning

Machine Learning is changing our society, and Data Scientists are propelling that transformation. Machine Learning is used in our automated systems, Facebook algorithms, and Search engine results. However, there is a significant amount of programming that goes into constructing the Machine Learning models that customers deal with daily. It all starts with massive datasets and a lot of creative code.

The instant Machine Learning algorithms cheat sheet will be invaluable for Data Scientists who specialize in Machine Learning and analysts who are preparing to enter this booming domain.

Supervised Learning Algorithm Cheat sheet

Supervised Learning

Supervised learning algorithms aim to predict trends acquired in previous information on unknown data by mapping inputs to outputs. Supervised learning models can be either regression models, which strive to determine a continuous variable, or which attempt to predict a binary or multi-class variable

Here we have two types of supervised learning models-

Linear models
Tree-based models

Linear models

The outputs of linear models are a linear arrangement of characteristics. In this part, we will discuss the most used linear models in machine learning:

Algorithm	Description	Applications
Linear Regression	An approach for modeling a linear connection between inputs and a numeric output variable.	Stock Price Forecast Housing price forecasting Customer lifetime value prediction
Logistic Regression	An algorithm that represents a linear connection between inputs and a category output 1 or 0.	Credit risk score prediction Customer churn forecast
Ridge Regression	It is a member of the regression family that penalizes characteristics with poorly predicted outcomes by decreasing their coefficients closer to zero. It is relevant for classification and regression.	Automobile predictive maintenance Sales revenue forecasting
Lasso Regression	It is a member of the regression family that penalizes characteristics with poorly predicted outcomes by reducing their coefficients to zero. It is relevant for classification and regression.	Housing price forecasting Clinical outcome prediction using health data

Tree-based models

To forecast from decision trees, tree-based models employ a set of “if-then” rules. In this part, we will go through some of the most often used linear models in machine learning.

Algorithm	Description	Applications
Decision Tree	To create predictions, Decision Tree models apply decision rules to features. It is relevant for classification and regression.	Customer churn forecast Disease prediction credit score modeling
Random Forests	A form of ensemble learning that integrates the output of several decision trees.	Modeling of credit scores Housing price forecasting
Gradient Boosting Regression	Gradient Boosting Regression uses boosting to create predictive models from a group of poor predictive learners.	Car emission forecasting Estimating ride-hailing fee
XGBoost	The Gradient Boosting algorithm is an effective and adaptable boosting method. It is relevant for both classification and regression problems.	Churn prediction Insurance claims processing
LightGBM Regressor	A gradient boosting framework that is intended to be more effective than existing approaches.	Flight time prediction for airlines Using health data to predict cholesterol levels

Unsupervised Learning Algorithm cheat sheet

Unsupervised learning is concerned with identifying broad patterns in data. This form of segmentation is generalizable and used for a wide range of objects. Clustering methods learn how to group like data points together, and association algorithms group distinct data points depending on predefined criteria.

Clustering models

Algorithm	Description	Applications
K-Means	The most used approach—it dervies K clusters based on euclidean distances	Recommendation systems Customer segmentation
Hierarchical Clustering	A bottom-up methodology in which each data point is considered as its cluster, and the nearest two clusters are continually merged together.	Detection of Fraud Similarity-based document clustering
Gaussian Mixture Models	A probabilistic approach for representing evenly distributed clusters in a dataset.	Recommendation systems Customer segmentation

Association

Algorithm	Description	Applications
Apriori Algorithm	A rule-based technique that determines the most frequent itemset in a given dataset using prior information of frequent itemset attributes.	Recommendation engines Promotion optimization

SQL

Data Scientists use SQL worldwide to arrange data into tables and deal with different datasets. SQL is often used to extract the necessary data for a specific study, followed by Python and its many specialized modules to handle the challenging project.

As a Data Scientist, you will utilize the following SQL commands and functions:

Basic SQL cheat Sheet

Important keywords

Keyword	Description
SELECT	state which columns to query.
FROM	Declares which table/view to choose from
WHERE	gives a condition
=	compare a value to a given input
LIKE	used with the where clause to get a specific pattern in a column
GROUP BY	Sets similar data into groups
HAVING	Specifies only rows where aggregate values match the specified conditions should be returned.
INNER JOIN	Gives all rows where the record of one table is similar to the records of another table.
LEFT JOIN	Gives all rows from the left with similar rows on the right.
RIGHT JOIN	Gives all rows from the right table with similar rows on the left.
FULL OUTER JOIN	Gives rows similar either in the left or right table

Aggregate functions

Function	Description
COUNT	Give the no. of rows in a table.
SUM	Add the values
AVG	Gives the avg for of values
MIN	Gives the smallest value of the group
MAX	Gives the largest value of the group

Querying data

SQL	Description
SELECT student FROM class	Select data in column student from a table named class
SELECT * FROM class	Select rows and columns from a table class
SELECT student FROM class WHERE student = ‘Alex’	Select data in column student from a table class where student = ‘Alex’
SELECT student FROM class ORDER BY student ASC (DESC)	Select data in column student from a table class and order by student. (in asc by default or desc order)
SELECT student FROM class ORDER BY student LIMIT n OFFSET offset	Select data in column student from a table class and skip offset of rows and gives the next n rows
SELECT student, aggregate(subject) FROM class GROUP BY student	Select data in column student from a table class and group rows with aggregate function
SELECT student, aggregate(subject) FROM class GROUP BY HAVING clause	Select data in column student from a table class and group rows with aggregate function and filter groups using the HAVING condition.

Data modification

SQL	Description
INSERT INTO class(columnfirst) VALUES(list_value)	Insert a row into a table class
INSERT INTO class(columnlist) VALUES (list_value), (list_value), …	Insert rows into a table class
INSERT INTO class(columnlist) SELECT columnlist FROM subject	Insert rows from subject into a table class
UPDATE Class SET student = newvalue	Update a new value in table class in the column student for all rows
UPDATE Class SET student = newvalue, father_name = new_value WHERE condition	Update values in column student and father_name in table class that meet the condition
DELETE FROM class	Delete rows from a table class
DELETE FROM class WHERE condition	Delete all rows from table class that meet a certain condition

Math

Data Science is a highly difficult discipline that necessitates some pretty good mathematics. Depending on your field of study, you may be required to use calculus, linear algebra, and statistics regularly. To progress in the discipline, Data Scientists must comprehensively know the ideas and how they apply in various contexts.

They are tools for Data Science students and experts to find a certain equation or double-check their work swiftly.

Even for competent Data Scientists, many of these equations might get hazy if not used daily. This is your quick-reference basic linear algebra data Science cheat sheet, containing basic terminology that Data Scientists might need.

Cheat Sheet for Linear Algebra

Notation

TERM	NOTATION
vector	denoted by small letter v with arrow above
scalar	any real number, e.g. 2, 1,⅓ or π
matrix	A, represented by capital letter and equals a m × n matrix
m × n	m rows times n columns
basis vectors	represented by letters i, j and k with a ^ hat over
mapping	T:Rm →Rn, Changing from m to n
determinant	scalar, the area or volume of vectors
cross product	length perpendicular to the plane of two vectors in three dimensions
dot product	scalar, when one vector meets another vector

Data Science Resources

If you’re just starting your career in Data Science or are still studying to become a Data Scientist, you need to brush up on essential terminology and Excel functions. This cheat sheet will give important shortcuts and commands and paste-able formulae that will save you time.

Excel cheat sheet

Function	Shortcut
Add Current Date	ctrl+;
Add Current Time	shift+ctrl+;
Edit Cell Comment	shit+F2
Show Active Cell	ctrl+backspace
Add Column	alt+lC
Add Row	alt+lR
Fill Down	ctrl+D
Fill Right	ctrl+R
Save Workbook	shift+alt+F2
Add Chart	Alt+F1
Move to Last	ctrl+END

Excel cell reference cheat sheet

Formulas require a cell reference. Defining the cell reference will affect how the formula is implied and copied from one to another.

Relative Cell Reference	=A2+B2
Absolute Cell Reference	+$A$1

Excel date and time cheat sheet

Function	Syntax	Description
DATE	DATE(year, month, day)	returns a date given the parameters of year, month, date.
DATEDIF	DATEDIF(startdate,enddate,unit)	calculates the time between two given dates.
DAY	DAY(serial no.)	returns the actual day of a date (integer between to 31)
EDATE	EDATE(startdate, months)	adds a period of months onto a start date.
EOMONTH	EOMONTH(start_date, months)	same as the EDATE, returns the last period in the month.
NOW	NOW()	returns the serial no. showing the date at the real time
TODAY	TODAY()	returns the serial no. showing the date
YEAR	YEAR()	returns the serial no. showing the date into a year.

Conclusion

In this article, the recommended cheat sheets are a narrowed-down list of the best. They will keep you covered in the projects and help you brush up on your skills.

It’s critical to stay up with innovations in this fast-changing digital industry, no matter where you are on your Data Science journey. Every aspect of your profession is prone to change and progress with time. Data analysis programming languages, tools, and procedures are upgrading and becoming more robust. It is one of the best things that makes this profession so appealing.

Learning is a never-ending process. So, continue learning and advance professionally. Enroll in the latest online programs and webinars on big data, deep learning, Machine Learning, or Artificial intelligence if you want to dive further into a specific field of Data Science.

Data Science Cheat Sheet For Beginners

Introduction

Machine Learning

Supervised Learning Algorithm Cheat sheet

Supervised Learning

Unsupervised Learning Algorithm cheat sheet

Clustering models

Association

Important keywords

Aggregate functions

Querying data

Data modification

Math

Cheat Sheet for Linear Algebra

Notation

Excel cheat sheet

Excel cell reference cheat sheet

Excel date and time cheat sheet

Conclusion

Programs Offered By UNext

Programs Offered By UNext

Programs Offered By UNext

Data Science Cheat Sheet For Beginners

Introduction

Machine Learning

Supervised Learning Algorithm Cheat sheet

Supervised Learning

Unsupervised Learning Algorithm cheat sheet

Clustering models

Association

Important keywords

Aggregate functions

Querying data

Data modification

Math

Cheat Sheet for Linear Algebra

Notation

Excel cheat sheet

Excel cell reference cheat sheet

Excel date and time cheat sheet

Conclusion

Related Articles