Bq_ml Models

BQML is enables users to create and execute machine learning models in BigQuery using SQL queries.

In this lab, I will create a model that predicts whether a visitor will make a transaction.

Lab 2: Predict Visitor Purchases with a Classification Model in BigQuery ML

BigQuery is Google’s fully managed, NoOps, low cost analytics database. With BigQuery you can query terabytes and terabytes of data without having any infrastructure to manage or needing a database administrator
In this lab, you use a special ecommerce dataset that has millions of Google Analytics records for the Google Merchandise Store loaded into BigQuery. You use this data to create a classification (logistic regression) model in BigQuery ML that predicts customers’ purchasing habits.
Analyzing the results, you can see that (11873 / 741721) = 1.6% of total visitors will return and purchase from the website
Question: What are some of the reasons a typical ecommerce customer will browse but not buy until a later visit?
Answer: In the world of online marketing, identifying and marketing to these future customers based on the characteristics of their first visit will increase conversion rates and reduce the outflow to competitor sites.
Now you will create a Machine Learning model in BigQuery to predict whether or not a new user is likely to purchase in the future.
Since you are bucketing visitors into “will buy in future” or “won’t buy in future”, use logistic_reg in a classification model.
Task 2. Identify an objective: create a Machine Learning model in BigQuery to predict whether or not a new user is likely to purchase in the future.
Question: Which two fields are known after a visitor’s first session?
Question: Which field isn’t known until later in the future?
Which model type should you choose?

Model	Model type	Label data type	Example
Forecasting	linear_reg	Numeric value (int or float)	Forecast sales figures for next year
Classification	logistic_reg	0 or 1 for binary classification	Classify an email as spam or not spam
Clustering	kmeans	unsupervised learning for exploration	?

Lab 3:
- Task 2. Identify an objective: Maybe predict the price of a cab ride in New York City
- Task 4. Have to create a BigQuery dataset to store models.
- Below is a powerful SQL query to validate a data before training:
- ```
  SELECT
    COUNT(fare_amount) AS num_fares,
    MIN(fare_amount) AS low_fare,
    MAX(fare_amount) AS high_fare,
    AVG(fare_amount) AS avg_fare,
    STDDEV(fare_amount) AS stddev
  FROM `nyc-tlc.yellow.trips`
  WHERE trip_distance > 0 AND fare_amount BETWEEN 6 and 200
```
- After evaluating your model you get a RMSE of 9.47. Since we took the Root of the Mean Squared Error (RMSE) the 9.47 error can be evaluated in the same units as the total_fare so it’s +/-$$$9.47.

Lab 4: Bracketology with Google Machine Learning

We need “label” column as training to create a new ML model

  CREATE OR REPLACE MODEL `bracketology.ncaa_model`
  OPTIONS ( model_type='logistic_reg') AS
  # create a row for the winning team
  SELECT
  # features
  'win' AS label, # our label
  ...
  UNION ALL
  SELECT
  'loss' AS label, # our label
  ...

Take out the WEIGHTS by sql

 SELECT category_weights
 FROM ML.WEIGHTS(MODEL `bracketology.ncaa_model`)
 WHERE processed_input = 'seed'

OR ```sql SELECT * FROM ML.WEIGHTS(MODEL bracketology.ncaa_model_updated) ORDER BY ABS(weight) DESC

 - Evaluate a model:
 - ```sql
 SELECT *
 FROM ML.EVALUATE(MODEL   `bracketology.ncaa_model`)

Prediction:

 CREATE OR REPLACE TABLE `bracketology.predictions` AS (
 SELECT * FROM ML.PREDICT(MODEL `bracketology.ncaa_model`,
  (SELECT * FROM `data-to-insights.ncaa.2018_tournament_results`)
 ))

Check insights of a prediction table

 SELECT * FROM `bracketology.predictions`
 WHERE predicted_label <> label
 (or WHERE model.prob > .8 AND predicted_label <> predictions.label)

Create a magic description in SQL

  SELECT
  CONCAT(school_ncaa, " was predicted to ",IF(predicted_label="loss","lose","win")," ",CAST(ROUND(p.prob,2)*100 AS STRING), "% but ", IF(n.label="loss","lost","won")) AS narrative1,
  CONCAT(opponent_school_ncaa, " (", opponent_seed, ") was ",CAST(ROUND(ROUND(p.prob,2)*100,2) AS STRING),"% predicted to upset ", school_ncaa, " (", seed, ") and did!") AS narrative2,

SOFTWARE ENGINEER

Bq_ml Models

Share on: