COURSE OBJECTIVES:
To understand the data science fundamentals and process.
To learn to describe the data for the data science process.
To learn to describe the relationship between data.
To utilize the Python libraries for Data Wrangling.
To present and interpret data using visualization libraries in Python
UNIT I INTRODUCTION
Data Science: Benefits and uses – facets of data – Data Science Process: Overview – Defining
research goals – Retrieving data – Data preparation – Exploratory Data analysis – build the model–
presenting findings and building applications – Data Mining – Data Warehousing – Basic Statistical
descriptions of Data
UNIT II DESCRIBING DATA
Types of Data – Types of Variables -Describing Data with Tables and Graphs –Describing Data
with Averages – Describing Variability – Normal Distributions and Standard (z) Scores
UNIT III DESCRIBING RELATIONSHIPS
Correlation –Scatter plots –correlation coefficient for quantitative data –computational formula for
correlation coefficient – Regression –regression line –least squares regression line – Standard
error of estimate – interpretation of r2 –multiple regression equations –regression towards the mean
UNIT IV PYTHON LIBRARIES FOR DATA WRANGLING
Basics of Numpy arrays –aggregations –computations on arrays –comparisons, masks, boolean
logic – fancy indexing – structured arrays – Data manipulation with Pandas – data indexing and
selection – operating on data – missing data – Hierarchical indexing – combining datasets –
aggregation and grouping – pivot tables
UNIT V DATA VISUALIZATION
Importing Matplotlib – Line plots – Scatter plots – visualizing errors – density and contour plots –
Histograms – legends – colors – subplots – text and annotation – customization – three dimensional
plotting – Geographic Data with Basemap – Visualization with Seaborn
UNIT I
1.Exploratory of data analysis
- Data Preparation
3.Data mining and warehousing
4.Data Science Process
UNIT II
1.Describing data with tables
[Frequency distribution for quantative data, constructing FD, Outliers relative and cumulative frequency distribution, frequency distribution for qualitative]
2.Graphs for Quantitative data
3.Probems
[Standard deviation, interquartile range, z scores]
UNIT III
- Interpretation of R
- Standard Error of estimate (problem)
- Regression and Least Square regression (problem)
- Coefficient of correlation (problem)
UNIT IV
- Numpy array
- Comparisons, mask, Boolean array
- Structured arrays, Hierachical Indexing
- Data Manipulation with pandas
UNIT V
- Line plot and Scatter plots
- Geographic Data with basemap- Visualization with seaborn
- Density and Contour plots
- Histogram- legend- 3 dimensional plot