Skills obtained: write SQL queries, query public tables, load sample data into BigQuery, troubleshoot common syntax errors with the query validator in BigQuery, and create reports in Looker Studio by connecting to BigQuery data.
- Lab 1: Introduction to SQL for BigQuery and Cloud SQL
- For advance Weather Data in BigQuery
- Task 1,2,3: basic SQL keywords
- Task 4. Working with Cloud SQL
- Cloud SQL is a fully-managed database service that makes it easy to set up, maintain, manage, and administer your relational PostgreSQL and MySQL databases in the cloud
- Navigation menu => Cloud Storage > Buckets > create a Buckets
- Upload files to bucket “london-riding-bikes” (bucket is not a database, it is like a cloud folder)
- Task 5. Create a Cloud SQL instance
- Navigation menu > SQL => create a “my-demo” instance of Cloud SQL
- Task 6. New queries in Cloud SQL
-
Enable Cloud shell > authenitcate, connect Cloud shell to the Cloud SQL instance “my-demo” as a root user > create a new database “bike”, tables “london1” & “london2” > upload data from the above bucket into the tables > checking with some standard queries as normal inside “Cloud shell”.
-
-
- Lab 3: BigQuery: Qwik Start - Command Line
- use Cloud Shell to execute commands starting with “bq” to invoke the BigQuery command line tool.
- bq help query
-
bq query \ ‘SELECT …’ is the pattern. For ex: bq query –use_legacy_sql=false \ ‘SELECT * FROM
bigquery-public-data
.samples.shakespeare’ - bq ls => list existing datasets.
- bq ls bigquery-public-data: => list datasets in project “bigquery-public-data”
- bq mk babynames => create a new dataset.
- ls => list existing files in the project.
- bq load babynames.names2010 yob2010.txt name:string,gender:string,count:integer => create a new table “names2010”, load data from the file “yob2010.txt” onto it with a schema.
- bq ls babynames => list tables in a dataset
-
bq show babynames.names2010 => show schema of a table
- bq query “SELECT name,count FROM babynames.names2010 WHERE gender = ‘F’ ORDER BY count DESC LIMIT 5” is an example of a BQ query in a Cloud Shell.
- bq rm -r babynames => delete a dataset inside BQ.
- use Cloud Shell to execute commands starting with “bq” to invoke the BigQuery command line tool.
- Lab 4: Explore an Ecommerce Dataset with SQL in BigQuery
- Task 1. Pin the lab project in BigQuery
- add the data-to-insights project: Click + Add > Star a project by name, then set the name to data-to-insights. Click STAR.
- There many queries inside…
- Task 1. Pin the lab project in BigQuery
- Lab 5 (Challenge Lab): Derive Insights from BigQuery Data
- Goal: use queries to answer some questions related to the “Covid-19 pandemic”.
- Q1: What was the total count of confirmed cases on April 20, 2020?
-
# What was the total count of confirmed cases on April 20, 2020? SELECT SUM(cumulative_confirmed) AS total_cases_worldwide FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE date = '2020-04-20'
- Q2: How many states in the US had more than 300 deaths on April 20, 2020?
-
# standardSQL # How many states in the US had more than 300 deaths on April 20, 2020? WITH list_US_states_over_300_deaths AS ( SELECT subregion1_name AS state, SUM(cumulative_deceased) AS total_deaths FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_name = 'United States of America' AND subregion1_name IS NOT NULL AND date = '2020-04-20' GROUP BY state HAVING total_deaths > 300 ORDER BY total_deaths DESC ) SELECT COUNT(*) AS count_of_states FROM list_US_states_over_300_deaths;
- Q3 (not yet): List all the states in the United States of America that had more than 2500 confirmed cases on April 20, 2020?
-
# standardSQL SELECT subregion1_name AS state, SUM(cumulative_confirmed) AS total_confirmed_cases FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_code = 'US' AND subregion1_name IS NOT NULL AND date = '2020-04-20' GROUP BY state HAVING total_confirmed_cases > 2500 ORDER BY total_confirmed_cases DESC
- Q4: What was the case-fatality ratio in Italy for the month of April 2020?
-
# standardSQL # What was the case-fatality ratio in Italy for the month of April 2020? SELECT SUM(cumulative_confirmed) AS total_confirmed_cases, SUM(cumulative_deceased) AS total_deaths, (SUM(cumulative_deceased) / SUM(cumulative_confirmed)) * 100 AS case_fatality_ratio FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_name = 'Italy' AND date BETWEEN '2020-04-01' AND '2020-04-30'
- Q5 (not yet): On what day did the total number of deaths cross 14000 in Italy?
-
# standardSQL # On what day did the total number of deaths cross 14000 in Italy? SELECT date FROM ( SELECT date, SUM(cumulative_deceased) AS total_deaths FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_name = 'Italy' GROUP BY date HAVING total_deaths > 14000 ORDER BY date ASC ) LIMIT 1
- Q6: **
-
# standardSQL # identify the number of days in India between 21, Feb 2020 and 14, March 2020 when there were zero increases in the number of confirmed cases WITH india_cases_by_date AS ( SELECT date, SUM(cumulative_confirmed) AS cases FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_name="India" AND date between '2020-02-21' and '2020-03-14' GROUP BY date ORDER BY date ASC ) ,india_previous_day_comparison AS ( SELECT date, cases, LAG(cases) OVER(ORDER BY date) AS previous_day, cases - LAG(cases) OVER(ORDER BY date) AS net_new_cases FROM india_cases_by_date ORDER BY date ASC ) SELECT COUNT(date) AS days_with_zero_net_new_cases FROM india_previous_day_comparison WHERE net_new_cases = 0
- Q7 (not yet): write a query to find out the dates on which the confirmed cases increased by more than 15% compared to the previous day (indicating doubling rate of ~ 7 days) in the US between the dates March 22, 2020 and April 20, 2020
-
# standardSQL # identify the number of days in India between 21, Feb 2020 and 14, March 2020 when there were zero increases in the number of confirmed cases WITH us_cases_by_date AS ( SELECT date, SUM(cumulative_confirmed) AS cases FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_code="US" AND date BETWEEN '2020-03-22' AND '2020-04-20' GROUP BY date ORDER BY date ASC ) ,us_previous_day_comparison AS ( SELECT date AS Date, cases AS Confirmed_Cases_On_Day, LAG(cases) OVER(ORDER BY date) AS Confirmed_Cases_Previous_Day, CASE WHEN LAG(cases) OVER (ORDER BY date) IS NULL THEN 0 ELSE (cases - LAG(cases) OVER (ORDER BY date)) * 100 / LAG(cases) OVER (ORDER BY date) END AS Percentage_Increase_In_Cases FROM us_cases_by_date ORDER BY date ASC ) SELECT * FROM us_previous_day_comparison WHERE Percentage_Increase_In_Cases > 10
- Q8: Build a query to list the recovery rates of countries arranged in descending order (limit to 20) upto the date May 10, 2020.
-
SELECT * FROM ( SELECT country_name AS country, SUM(cumulative_recovered) AS recovered_cases, SUM(cumulative_confirmed) AS confirmed_cases, SUM(cumulative_recovered) / SUM(cumulative_confirmed) AS recovery_rate, FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE date='2020-05-10' GROUP BY country_name ) WHERE confirmed_cases > 50000 ORDER BY recovery_rate DESC LIMIT 20;
- Q9 (not yet) : The following query is trying to calculate the CDGR on June 20, 2020(Cumulative Daily Growth Rate) for France since the day the first case was reported.The first case was reported on Jan 24, 2020.
-
WITH france_cases AS ( SELECT date, SUM(cumulative_confirmed) AS total_cases FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_name="France" AND date IN ('2020-01-24', '2020-05-25') GROUP BY date ORDER BY date ASC ) , summary as ( SELECT date, total_cases AS first_day_cases, LEAD(total_cases) OVER (ORDER BY date) AS last_day_cases, DATE_DIFF(LEAD(date) OVER(ORDER BY date), date, day) AS days_diff FROM france_cases ) Select first_day_cases, last_day_cases, days_diff, (POWER(last_day_cases/first_day_cases, 1/days_diff) - 1) as CDGR from summary WHERE first_day_cases > 0 AND days_diff > 0;
- Q10: Create a Looker Studio report