Definition

It’s the scientific method applied to extract insights from data and solve business problems, converting data into value. It combines areas like statistics, computer science and specific business knowledge to analyze and interpret data.

Data Scientists employ diverse tactics to find patterns, make predictions and provide useful information to make decisions.

Use of Machine Learning in Data Science

Machine Learning allows systems to learn and improve autonomously through experience without being explicitly developed.

Types of ML

Supervised Learning

Models trained with labeled data to predict results.

  • Lineal regression
  • Decision trees
  • Neural Networks

Non Supervised Learning

Models that find patterns in non-labeled data.

  • Clustering
  • Principal Component Analysis

Reinforced Learning

Models that learn to take decisions through trial and error.

  • Q-Learning
  • Deep Q-Networks

What is Data Science?

Data Science is the intersection between three disciplines:

  • Computer Science
  • Math
  • Business Experience

Different Types of Analysis

Descriptive Analytics

Answers: What is happening? It involves having accurate data collection.

Diagnostic Analytics

Answers: Why did something happens? It involves drilling down to the root cause of a problem.

What is Data?

Data are raw values, without context nor interpretation. By themselves, this data points are meaningless.

On the other hand, Information is the result of processing and organizing said data so they become useful. Eg: calculate average monthly sale, etc.

They’re easy to differentiate: In marketing, a data point could be “500 clicks on an ad campaign”, meanwhile an information is “The latest ad campaign generated a 10% more clicks the the previous one”.

Data Science Process

Obtain: Gather Data from relevant sources Scrub: Clean Data to formats that the machine understands Explore: Find significant patterns and trends using statistical methods Model: Construct models to predict and forecast Interpret: Put the results into good use

Obtaining, scrubbing and exploring data takes 80% of the time.

Types of Data

Structured Data

It’s both tabular and standardized

Unstructured Data

It’s neither tabular nor standardized

Semi-structured Data

It’s not tabular but it is standardized

Data Science Tools

Python

Using python with libraries like pandas and numpy.

pandas is a core library for data manipulation and is part of the Data Science workflow.

Data Sources

See Data Acquisition.

Data Processing

See Data Processing.

Data Exploring

See Data Exploring.

Workflows

See Workflows.

Modeling

See Modeling

Interpretation

See Data Interpretation.