Data exploration
Loading and inspecting the data
After visiting Michigan and learning that wine grapes can be grown (and that wine can be made!) in such a cold place, you decide that you would like to start a vineyard there. You've seen the vineyards and know that, although it is possible to grow wine grapes there, that sometimes it is too cold. You wonder if because of climate change, Michigan might soon have a warmer, more suitable climate for growing grapes.
You know that Europe has a long history of growing grapes, and you wonder if they kept records that might indicate how grapes respond to changes in temperature. You find a study that has compiled numerous records of grape harvest dates for more than four centuries and also a database of temperature anomalies in Europe dating back to 1655.
Using the provided dataset, grape_harvest.csv
(download here), you're going to explore how the European grape harvest date changes with respect to temperature across centuries of data.
To get started, import pandas
in the cell below:
Then, read in grape_harvest.csv
using the pd.read_csv()
function a pandas dataframe.
# Read in the grape harvest data here
# Put grape_harvest.csv in the same directory you are running this .ipynb from
# If in a different directory, you will need to specify the path to the file
# Alternatively, you can read in the data from GitHub using the following url:
# https://raw.githubusercontent.com/DanChitwood/PlantsAndPython/master/grape_harvest.csv
Answer
Now, write some code to inspect the properties of the data and then answer the following questions:
Use a pandas function to look at the first five lines of data:
Answer
year | region | harvest | anomaly | |
---|---|---|---|---|
0 | 1700 | alsace | 42.9 | -0.91 |
1 | 1701 | alsace | 35.9 | -0.76 |
2 | 1702 | alsace | 45.0 | -1.40 |
3 | 1703 | alsace | 49.4 | -1.21 |
4 | 1704 | alsace | 30.4 | -0.44 |
Use a pandas function to look at the last five lines of data:
Answer
year | region | harvest | anomaly | |
---|---|---|---|---|
4727 | 1873 | vendee_poitou_charente | 32.0 | 0.06 |
4728 | 1874 | vendee_poitou_charente | 2.0 | -0.22 |
4729 | 1875 | vendee_poitou_charente | 29.0 | -1.02 |
4730 | 1876 | vendee_poitou_charente | 32.0 | -0.55 |
4731 | 1877 | vendee_poitou_charente | 34.0 | -0.56 |
Use a pandas function to look at summary statistics (like the count, min, max, and mean) for columns with continuous data:
Answer
year | harvest | anomaly | |
---|---|---|---|
count | 4732.000000 | 4732.000000 | 4732.000000 |
mean | 1832.835376 | 33.959510 | -0.337811 |
std | 91.713152 | 11.807714 | 0.675309 |
min | 1655.000000 | -13.000000 | -2.470000 |
25% | 1762.000000 | 25.900000 | -0.750000 |
50% | 1834.500000 | 34.000000 | -0.280000 |
75% | 1903.000000 | 42.600000 | 0.060000 |
max | 2007.000000 | 75.000000 | 1.460000 |
Use a pandas function to retrieve the names of the columns.
For one of the columns that is a categorical variable, use a function to list all the levels for that variable.
Answer
array(['alsace', 'auvergne', 'auxerre_avalon', 'beaujolais_maconnais',
'bordeaux', 'burgundy', 'champagne_1', 'champagne_2',
'gaillac_south_west', 'germany', 'high_loire_valley',
'ile_de_france', 'jura', 'languedoc', 'low_loire_valley',
'luxembourg', 'maritime_alps', 'northern_italy',
'northern_lorraine', 'northern_rhone_valley', 'savoie',
'southern_lorraine', 'southern_rhone_valley', 'spain',
'switzerland_leman_lake', 'various_south_east',
'vendee_poitou_charente'], dtype=object)
For the categorical variable, also use a function to determine how many rows there are representing each level.
Answer
switzerland_leman_lake 353
burgundy 350
southern_rhone_valley 333
jura 306
ile_de_france 302
bordeaux 274
alsace 262
languedoc 233
spain 231
low_loire_valley 203
champagne_2 183
germany 165
northern_italy 156
maritime_alps 136
auxerre_avalon 128
northern_lorraine 127
northern_rhone_valley 126
savoie 123
southern_lorraine 109
luxembourg 107
high_loire_valley 92
various_south_east 82
champagne_1 81
auvergne 80
vendee_poitou_charente 75
beaujolais_maconnais 73
gaillac_south_west 42
Name: region, dtype: int64
How many rows are in this dataset?
Congratulations on reading in the data and exploring its structure! In the next activity, we will be exploring the relationship between grape harvest dates and climate!