COURSE OBJECTIVES:
To understand the techniques and processes of data science
To apply descriptive data analytics
To visualize data for various applications
To understand inferential data analytics
To analysis and build predictive models from data
UNIT I INTRODUCTION TO DATA SCIENCE
Need for data science – benefits and uses – facets of data – data science process – setting the
research goal – retrieving data – cleansing, integrating, and transforming data – exploratory data
analysis – build the models – presenting and building applications.
UNIT II DESCRIPTIVE ANALYTICS
Frequency distributions – Outliers –interpreting distributions – graphs – averages – describing
variability – interquartile range – variability for qualitative and ranked data – Normal distributions – z
scores –correlation – scatter plots – regression – regression line – least squares regression line –
standard error of estimate – interpretation of r2
– multiple regression equations – regression toward
the mean.
UNIT III INFERENTIAL STATISTICS
Populations – samples – random sampling – Sampling distribution- standard error of the mean –
Hypothesis testing – z-test – z-test procedure –decision rule – calculations – decisions –
interpretations – one-tailed and two-tailed tests – Estimation – point estimate – confidence interval
– level of confidence – effect of sample size.
UNIT IV ANALYSIS OF VARIANCE
t-test for one sample – sampling distribution of t – t-test procedure – t-test for two independent
samples – p-value – statistical significance – t-test for two related samples. F-test – ANOVA –
Two-factor experiments – three f-tests – two-factor ANOVA –Introduction to chi-square tests.
UNIT V PREDICTIVE ANALYTICS
Linear least squares – implementation – goodness of fit – testing a linear model – weighted
resampling. Regression using StatsModels – multiple regression – nonlinear relationships – logistic
regression – estimating parameters – Time series analysis – moving averages – missing values –
serial correlation – autocorrelation. Introduction to survival analysis.
AD3491 Fundamentals of data science analysis
Unit 1
- Benefits and uses and process. of data science
- cleansing, integrating, and transforming data
3.Data analysis, building applications
UNIT-2 - Correlation, scatter plots, regression, least squares regression line
2.Normal Distributions and Standard (z) Scores
UNIT-3 - random sampling, Sampling distribution, standard error of the mean
- z-test procedure, decision rule
UNIT-4 - two-factor ANOVA, Introduction to chi-square tests, experiments**
- sampling distribution of t – t-test procedure, three F test**
UNIT-5 - weighted resampling. Regression using StatsModels
2.serial correlation, autocorrelation, TOTA