Pular para o conteúdo principal

(T) NumPy in Action

The NumPy module provides effecient and convenient handling of large numerical arrays in Python. This module is used by many other libraries and projects and in this sense is a "base" technology. Let's look at some quick examples.

NumPy objects are of type ndarray. There are different ways of creating then. We can create an ndarray by:

  • Converting a Python list
  • Using a library function that returns a populated vector
  • Reading data from a file directly into a NumPy object
The listing that follows shows five different ways to create NumPy objects. First we create one by converting a Python list. Then we show two different factory routines that generate equally spaced grid points. These routines differ in how they interpret the provided boundary values: one routine includes both boundary values, and the other includes one and excludes the other. Next we create a vector filled with zeros and set each element in a loop. Finally, we read data from a text file. (I am showing only the simplest or default cases here—all these routines have many more options that can be used to influence their behavior.)

In the end, all five vectors contain identical data. You should observe that the values in the Python list used to initialize vec1 are floating-point values and that we specified the type desired for the vector elements explicitly when using the arange() function to create vec2. Now that we have created these objects, we can operate with them (see the next listing). One of the major conveniences provided by NumPy is that we can operate with NumPy objects as if they were atomic data types: we can add, subtract, and multiply them (and so forth) without the need for explicit loops. Avoiding explicit loops makes our code clearer. It also makes it faster.

All operations are performed element by element: if we add two vectors, then the corresponding elements from each vector are combined to give the element in the resulting vector. In other words, the compact expression vec1 + vec2 for v1 in the listing is equivalent to the explicit loop construction used to calculate v2. This is true even for multiplication: vec1 * vec2 will result in a vector in which the corresponding elements of both operands have been multiplied element by element. (If you want a true vector or “dot” product, you must use the dot() function instead.) Obviously, this requires that all operands have the same number of elements!

Now we shall demonstrate two further convenience features that in the NumPy documentation are referred to as broadcasting and ufuncs (short for “universal functions”). The term “broadcasting” in this context has nothing to do with messaging. Instead, it means that if you try to combine two arguments of different shapes, then the smaller one will be extended (“cast broader”) to match the larger one. This is especially useful when combining scalars with vectors: the scalar is expanded to a vector of appropriate size and whose elements all have the value given by the scalar; then the operation proceeds, element by element, as before. The term “ufunc” refers to a scalar function that can be applied to a NumPy object. The function is applied, element by element, to all entries in the NumPy object, and the result is a new NumPy object with the same shape as the original one.


Postagens mais visitadas deste blog

(A) Data Science in Practice with Python

The top trending in Twitter or other social network is the term “data science”. But ...
What’s the data science? How do real companies use data science to make products, services and operations better? How does it work? What does the data science lifecycle look like?  This is the buzzword at the moment. A lot of people ask me about it. Are many questions. I’ll try answer all of these questions through of some samples.

Sample 1 - Regression

WHAT IS A REGRESSION? This is the better definition what I found [Source: Wikipedia] - Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning.
HOW DOES IT WORK? Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variable…

(A) Tucson Best Buy Analysis

“Data! Data! Data!” he cried impatiently.  “I can’t make bricks without clay.” —Arthur Conan Doyle
The Ascendance of Data

We live in a world that’s drowning in data. Websites track every user’s every click. Your smartphone is building up a record of your location and speed every second of every day. “Quantified selfers” wear pedometers-on-steroids that are ever recording their heart rates, movement habits, diet, and sleep patterns. Smart cars collect driving habits, smart homes collect living habits, and smart marketers collect purchasing habits. The Internet itself represents a huge graph of knowledge that contains (among other things) an enormous cross-referenced encyclopedia; domain-specific databases about movies, music, sports results, pinball machines, memes, and cocktails; and too many government statistics (some of them nearly true!) from too many governments to wrap your head around.
Buried in these data are answers to countless questions that no one’s ever thought to ask. In…

(T) Como indexar uma tabela Fato - (Best Practice)

A base de qualquer projeto de bi é ter um bom dw/data mart. Podemos falar em modelagem star-schema durante dias, sem falar nas variações do snowflake, mas o objetivo principal deste artigo é apontar algumas negligências que tenho percebido no tratamento da tabela fato. Tabela esta que é o principal pilar da casa que reside um modelo star-schema.

Ouço muitas vezes os clientes reclamando do desempenho das consultas enviadas contra o seu dw/data mart, ou do tempo de resposta das análises solicitadas ao bi. Isto é realmente inaceitável, não só numa perspectiva de implantação do projeto, mas também de desempenho da entrega das informações.

Como eu mencionei anteriormente o meu objetivo neste artigo, é alertar sobre a importância da indexação da tabela fato: o que deveria ser, porque é necessário, porque chaves compostas são boas e más, e porque você deveria se preocupar com isso.

Então, vejamos:

|a| Indexação padrão (default):
De forma rápida, todas as colunas de chave estrangeira (FK) devem …