Aspiring Data Scientist | Enthusiastic ML practitioner | Fellow at IIT Kanpur | Drama Lover | Subscribe https://www.youtube.com/channel/UCqq_T7ktsZO62k7CaibgQvA

Data Visualization plays a very important role in Data mining. Various data scientist spent their time exploring data through visualization. To accelerate this process we need to have a well-documentation of all the plots.

Even plenty of resources can’t be transformed into valuable goods without planning and architecture. Therefore I hope this article would provide you a good architecture of all plots and their documentation.

*Introduction**Know your Data**Distribution Plots*

a. Dist-Plot

b. Joint Plot

c. Pair Plot

d. Rug Plot*Categorical Plots*

a. Bar Plot

b. Count Plot

c. Box Plot

d. Violin Plot*Advanced Plots a. Strip…*

I am a final year undergraduate at the Indian Institute of Technology, Kanpur, in the Department of **Mechanical Engineering **and Minors in the Department of **Industrial Engineering and Management**.

You may find it interesting that belonging to a core field, how I land a job as a Data Scientist.

In the campus placement season (Dec 2020), I got placed as a Data Scientist at HiLabs. HiLabs has a healthcare-focused AI solution that automatically detects data errors without human intervention. It is a combination of Big Data, AI, and medical cosmologies.

The story behind how I landed as a Data Scientist…

Linear Regression is one of the most trivial machine algorithms. Interpretability and easy-to-train traits make this algorithm the first steps in Machine Learning. Being a little less complicated, Linear Regression acts as one of the fundamental concepts in understanding higher and complex algorithms.

To know what linear regression is? How we train it? How we obtain the best fit line? How we interpret it? And how we access the accuracy of fit, you may visit the following article.

After understanding the basic intuition of Linear regression, certain concepts make it more fascinating and more fun. These also provide a deep…

A time series comprises four major components. A trend. A seasonal component. A cyclic component. And a stochastic/ random component.

You can have a recap of all the basics of a time series from my following article.

We extract all these components and analyze them to get information from a time series. There are lots of standard methods to extract the components from a time series.

But all these components may air may not be present in a time serious altogether. Therefore, before estimating these components, we need to first check for their existence. …

Suppose you want to solve a predictive modeling problem, and for the same, you start to collect data. You would never know what exact features you want and how much data is needed. Hence, you go for the upper limit, and you collect all possible features and observations.

Consequently, you realize that you have collected a large amount of data. And, these extra features are intensifying the noise and time.

**Noise**: There may be some feature, which model find irrelevant. Hence they are just adding noise to the model.**Time**: The time I am talking about is computational time. For…

From the point of time we came to know that data contains trends and we can extract knowledge from it, we started collecting it. In some instances, we try to generate trends from data where the time is not so large. Hence we do not find any trend concerning time.

But now, after decades of data collection, we can find at least some patterns with respect time and this is called a **Time Series analysis**.

A series of observations recorded sequentially over a while i.e. a collection of observations recorded along with the timestamp is called a Time series.

The world we that we see today have automated data collection tools, databases systems, world wide web, and computerized society. This results in an explosive growth in data, from terabytes to petabytes.

We are drowning in the ocean of data but starving for knowledge.

A huge velocity, volume, and variety of data are what our new age has provided us. We have cheaper technology, mobile computing, social networking, Cloud computing which has evoked this data storm.

These are the reasons why conventional methods fade away and we need some novel methods like **Data mining** to process the new era of…

The first question that comes to my mind is that why is probability even necessary to learn machine learning and data science? After some web searching, I came to some important conclusions about why probability is vital.

Probability is used several times in predictive circumstances. Observing this will help us to understand why probability is indispensable.

**Classification Problem:**A classification problem requires us to predict the probability that the input example belongs to a particular class. Whether it is an image classification or object detection, we predict the probability of the input belonging to each class.**Models based on Probability…**

Most data analysis problems start with understanding the data. It is the most crucial and complicated step. This step also affects the further decisions that we make in a predictive modeling problem, one of which is what algorithm we are going to choose for a problem.

In this article, we will see a complete tough guide for such a problem.

Content

- Reading Data
- Variable Identification
- Univariate analysis
- Bivariate analysis
- Missing values- types and analysis
- Outlier treatment
- Variable Transformation

Reading the data infers getting the answers to the following questions

- What is the shape of my data?
- How many features does…

The new era of machine learning and artificial intelligence is the Deep learning era. It not only has immeasurable accuracy but also a huge hunger for data. Employing neural nets, functions with more exceeding complexity can be mapped on given data points.

But there are a few very precise things which make the experience with neural networks more incredible and perceiving.

Let us assume that we have trained a huge neural network. For simplicity, the constant term is zero and the activation function is identity.