Pular para o conteúdo principal

(A) Data Science in Practice with Python


The top trending in Twitter or other social network is the term “data science”. But ...
  • What’s the data science? 
  • How do real companies use data science to make products, services and operations better? 
  • How does it work? 
  • What does the data science lifecycle look like? 
This is the buzzword at the moment. A lot of people ask me about it. Are many questions. I’ll try answer all of these questions through of some samples.

Sample 1 - Regression

WHAT IS A REGRESSION? This is the better definition what I found [Source: Wikipedia] - Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning.
HOW DOES IT WORK? Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. However this can lead to illusions or false relationships, so caution is advisable; for example, correlation does not imply causation.

Sample 2 - Recommender System

WHAT IS A RECOMMENDER SYSTEM? A model that filters information to present users with a curated subset of options they’re likely to find appealing.
HOW DOES IT WORK? Generally via a collaborative approach (considering user’s previous behavior) or content based approach (based on discrete assigned characteristics).

Sample 3 - Credit Scoring

WHAT IS CREDIT SCORING? A model that determines an applicant’s creditworthiness for a mortgage, loan or credit card.
HOW DOES IT WORK? A set of decision management rules evaluates how likely an applicant is to repay debts.

Sample 4 - Dynamic Pricing

WHAT IS DYNAMIC PRICING? Modeling price as a function of supply, demand, competitor pricing and exogenous factors.
HOW DOES IT WORK? Generalized linear models and classification trees are popular techniques for estimating the “right” price to maximize expected revenue.

Sample 5 - Customer Churn

WHAT IS CUSTOMER CHURN? Predicting which customers are going to abandon a product or service.
HOW DOES IT WORK? Data scientists may consider using support vector machines, random forest or k-nearest-neighbors algorithms.

Sample 6 - Fraud Detection

WHAT IS FRAUD DETECTION? Detecting and preventing fraudulent financial transactions from being processed.
HOW DOES IT WORK? Fraud detection is a binary classification problem: “is this transaction legitimate or not?”

This post will be divided in 5 parts. In each one I’ll explain the machine learning techniques mentioned above. This is the first post and I’ll show you how work the sample 1: regression. But, first let’s start with the question (below):

- What’s the data science?

In my previous post “Tucson Best Buy Analysis”, you can know more about it. There I explain “what is” and show you some examples of the day-by-day of a Data Scientist.

Being so, let’s talk about sample 1: regression. To explain this let’s start with a simple problem, I could say “classic problem”, predicting house prices in Russia. A Kaggle's challenge what was closed on 06/29/2017. The goal is to predict the median value of a house in a particular area. As usual, we have some training data, where the answer is known to us. To our study and a better comprehension about this topic I created a notebook on Jupyter tool. To access the notebook, please click here.

Have fun! See you on next post ...

Comentários

Postagens mais visitadas deste blog

(T) Como indexar uma tabela Fato - (Best Practice)

A base de qualquer projeto de bi é ter um bom dw/data mart. Podemos falar em modelagem star-schema durante dias, sem falar nas variações do snowflake, mas o objetivo principal deste artigo é apontar algumas negligências que tenho percebido no tratamento da tabela fato. Tabela esta que é o principal pilar da casa que reside um modelo star-schema.

Ouço muitas vezes os clientes reclamando do desempenho das consultas enviadas contra o seu dw/data mart, ou do tempo de resposta das análises solicitadas ao bi. Isto é realmente inaceitável, não só numa perspectiva de implantação do projeto, mas também de desempenho da entrega das informações.

Como eu mencionei anteriormente o meu objetivo neste artigo, é alertar sobre a importância da indexação da tabela fato: o que deveria ser, porque é necessário, porque chaves compostas são boas e más, e porque você deveria se preocupar com isso.

Então, vejamos:

|a| Indexação padrão (default):
De forma rápida, todas as colunas de chave estrangeira (FK) devem …

(A) Data Science in Practice with Python - Sample 2

In this post I'll explain what is a recommender system, how work it and show you some code examples. In my previous post I did a quick introduction:

Sample 2 - Recommender System

WHAT IS A RECOMMENDER SYSTEM? A model that filters information to present users with a curated subset of options they’re likely to find appealing.
HOW DOES IT WORK? Generally via a collaborative approach (considering user’s previous behavior) or content based approach (based on discrete assigned characteristics).

Now I'll get into in some concepts very important about recommender systems.

Recommender System in Details:

We can say that the goal of a recommender system is to make product or service recommendations to people. Of course, these recommendations should be for products or services they’re more likely to want buy or consume.

Recommender systems are active information filtering systems which personalize the information coming to a user based on his interests, relevance of the information etc.…

(T) Verificando a consistência de um repositório ou um Business Model no Oracle BI