CS3352-FOUNDATIONS OF DATA SCIENCE IMPORTANT QUESTIONS

COURSE OBJECTIVES:
 To understand the data science fundamentals and process.
 To learn to describe the data for the data science process.
 To learn to describe the relationship between data.
 To utilize the Python libraries for Data Wrangling.
 To present and interpret data using visualization libraries in Python


UNIT I INTRODUCTION
Data Science: Benefits and uses – facets of data – Data Science Process: Overview – Defining
research goals – Retrieving data – Data preparation – Exploratory Data analysis – build the model–
presenting findings and building applications – Data Mining – Data Warehousing – Basic Statistical
descriptions of Data


UNIT II DESCRIBING DATA
Types of Data – Types of Variables -Describing Data with Tables and Graphs –Describing Data
with Averages – Describing Variability – Normal Distributions and Standard (z) Scores


UNIT III DESCRIBING RELATIONSHIPS
Correlation –Scatter plots –correlation coefficient for quantitative data –computational formula for
correlation coefficient – Regression –regression line –least squares regression line – Standard
error of estimate – interpretation of r2 –multiple regression equations –regression towards the mean


UNIT IV PYTHON LIBRARIES FOR DATA WRANGLING

Basics of Numpy arrays –aggregations –computations on arrays –comparisons, masks, boolean
logic – fancy indexing – structured arrays – Data manipulation with Pandas – data indexing and
selection – operating on data – missing data – Hierarchical indexing – combining datasets –
aggregation and grouping – pivot tables


UNIT V DATA VISUALIZATION
Importing Matplotlib – Line plots – Scatter plots – visualizing errors – density and contour plots –
Histograms – legends – colors – subplots – text and annotation – customization – three dimensional
plotting – Geographic Data with Basemap – Visualization with Seaborn

UNIT I

1.Exploratory of data analysis

  1. Data Preparation
    3.Data mining and warehousing
    4.Data Science Process

UNIT II

1.Describing data with tables

[Frequency distribution for quantative data, constructing FD, Outliers relative and cumulative frequency distribution, frequency distribution for qualitative]

2.Graphs for Quantitative data
3.Probems
[Standard deviation, interquartile range, z scores]

UNIT III

  1. Interpretation of R
  2. Standard Error of estimate (problem)
  3. Regression and Least Square regression (problem)
  4. Coefficient of correlation (problem)

UNIT IV

  1. Numpy array
  2. Comparisons, mask, Boolean array
  3. Structured arrays, Hierachical Indexing
  4. Data Manipulation with pandas

UNIT V

  1. Line plot and Scatter plots
  2. Geographic Data with basemap- Visualization with seaborn
  3. Density and Contour plots
  4. Histogram- legend- 3 dimensional plot

Leave a Reply

Your email address will not be published. Required fields are marked *

error: Content is protected !!