Data Analyst Role:-
The steps required in analytics Project:-
1. Define a goal/Business Understanding
2. Getting Data/ Understanding Data
3. Cleaning Data/Data Preparations
4. Exploiting Data/Getting Insight
5. Deploying Machine Learning/Iterating
6. Validating
7. Visualizing and Presenting
The process of repairing or removing incorrect, distorted, improperly formatted, redundant, or imperfect data from a dataset is known as data cleaning.
The dendrogram is a tree-shaped structure that we use to create the hierarchy of clusters in this approach.
If Users A and B both prefer Item X, and User B also prefers Item Y, the system may suggest Product Y to User A.
A time series is an ordered sequence of observations with regard to time periods. In other words, a time series is a sequential grouping of data based on the time of occurrence.
A time series data set is a collection of measurements taken over a fixed period of time, with time acting as the independent variable and the goal as the dependent variables.
Imputation Technique for following Type of Data are:-
1. Numerical Variable :- Mean, Median , Mode, End of Tail, Arbitrary Value Imputation
2. Categorical Variable :- Frequent Category, Adding Missing Imputation
” Hadoop ” Hive ” Pig ” Flume ” Mahout ” Sqoop
” Tableau ” RapidMiner ” OpenRefine ” KNIME ” Google Search Term ” Solver ” NodeXL ” io ” Alpha Wolfram
Feature Selection: This step is more concerned with the feature that we are selecting from the set of available features. Sometimes there are a lot of features, and we have to make an intelligent decision about which type of feature we want to use to move forward with our machine learning endeavor.
Algorithm: This is a critical step because the algorithms we choose will have a significant impact on the entire machine learning process. You have the option of using either the linear or nonlinear algorithms. Support Vector Machines, Decision Trees, Naive Bayes, K-Means Clustering, and other algorithms are used.
Training: This is the most significant aspect of machine learning and where it differs from traditional programming. The training is based on the data we have and includes additional real-world experiences. With each subsequent training phase, the machine improves and becomes wiser, allowing it to make better judgments.
Evaluation: In this stage, we review the machine’s decisions to see whether or not they are appropriate. There are several metrics involved in this process, and we must closely deploy each of them to determine the efficacy of the entire machine learning endeavor.
Optimization : This the process of enhancing the performance of the machine learning process via the use of various optimization approaches. Optimization of machine learning is one of the most important components in which the algorithm’s performance is substantially increased. The best aspect about optimization strategies is that machine learning not only consumes optimization approaches but also generates new optimization ideas.
Testing: Various tests are performed here, some of which are previously unseen sets of test cases. The data is divided into two sets: test and training. There are different tests available.
Data modeling is the initial phase in the creation of a database. Data modeling is the process of developing a conceptual model based on the relationships between distinct data models. The procedure entails progressing from the conceptual stage to the logical model and finally to the physical schema. It entails a methodical approach to using data modeling approaches.
The process of creating a database is known as Database Design. The database design generates an output that is a comprehensive database data model. Database design, strictly speaking, contains the full logical model of a database, but it can also include physical design options and storage characteristics.
Measures of recall “How many of the real true samples did we label as true”
Precision is defined as “how many of all the samples we categorized as true are truly true.”
In a nutshell, the distinctions are as follows:
The purpose of the Training Set is to suit the criteria, such as weights.
The purpose of the Test Set is to evaluate the model’s performance, namely its predictive power and generalization.
The validation set is used to fine-tune the settings.
Senstivity is easy to calculate: Senstivity = True Positives / Positives in Actual Dependent Variable
True positives are Positive occurrences that have been appropriately identified as Positives.
About The Author:
Digital Marketing Course
₹ 9,999/-Included 18% GST
Buy Course₹ 29,999/-Included 18% GST
Buy Course