Customisable Tours Dashboard

Customisable Tours Dashboard

I was chatting with a fellow business owner at my regular coworking space. He runs a company that does brewery tours around Melbourne and was doing a lot of exciting things in regard to streamlining his business processes.

One thing he mentioned was how he has continued to only provide several standard tour options (par for the course for tour operators). However, as he explored his market, he was gathering a whole collection of different destinations and activity suppliers.

He mentioned he was wondering if there wasn't some way to let a potential customer create a customised, bespoke tour, on the spot, on his website. Or for him to have some way to optimise for discovering some potentially "golden tours" that he was not currently offering.

He admitted that this was probably a bit of a pipedream and that there were more immediate short-term goals that he was focusing on. But it got me thinking...

From my NSW Fuel Prices dashboard project, I had a fair understanding of geospatial. I also really love combinatorics style problems. Plus, this guy didn't know about Tableau, so I figured it would be a good opportunity to demonstrate my skillset to him. So I did a little investigating... and came up with this.

I also got his permission to share this neat little proof of concept with all my hordes of followers (hello, all 5 of you).

feel free to view this on Tableau Public, as well

What follows is my guided walkthrough of the process I took to create this.

Part 1. Track down some dummy data

The base ingredient for this project are a whole bunch of destinations. I've had some experience with the VCGLR website in the past, which is where all the Victorian gambling and liquor licenses are made public (and updated monthly).

There is a particular download with the various licences presented with their lat longs. So I figured it as the best starting point.

Here is what the Excel file looks like.

Rather than manually downloading this file, I utilise the httr package to bring it down via some code.

library(tidyverse)
library(httr)

url <- "https://www.vcglr.vic.gov.au/sites/default/files/current_victorian_licences_by_location_august_2019.xlsx"
GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))
downloaded_data <- readxl::read_excel(tf, col_names = F, col_types = "text") 

vcglr_data <- downloaded_data %>%
  slice(-c(1:2)) %>%
  select(-`...7`, -`...13`, -`...14`) %>%
  set_names(as.character(unlist(.[1,]))) %>%
  slice(-1) %>%
  # filter out some invalid records 
  # see the full code in the gist (later)
  distinct(Latitude, Longitude, .keep_all = T) %>%
  mutate(`Licence Num` = as.integer(`Licence Num`),
         Latitude = as.numeric(Latitude),
         Longitude = as.numeric(Longitude))

saveRDS(vcglr_data, "rds files/vcglr_data.rds")

Part 2: Select a sample of destinations

Imported and cleaned, the VCGLR download yields 19,427 unique destinations. I figured it unlikely that the business owner had that many destinations; I would also need to limit the number in order for the process to be manageable (as you will see in just a minute).

As expected, most of the sites were concentrated around the CBD and inner suburbs. In order to get a nice mixture of locations, I stratified my sample by distance, into near, medium and far. Excluded the far and sampled 15 from each of the near and medium groups, to get my initial sample of 30 destinations.

I would also need to determine some start and end locations so I decided on the location where I currently work -The Commons QV in Melbourne's CBD. Admittedly, in the future I could incorporate having different start and end positions but, for the time being, I wanted to keep things simple.

Part 3: Determine possible combinations of destinations

Given our 30 destinations, we needed to make a variety of possible combinations. However, this combination function is dependent upon the number of stops in each tour (and this has the risk of taking up all my computer's 8gb of memory):

1 stop = 30 tours
2 stops = 900 (30^2) tours
3 stops = 27,000 (30^3) tours
4 stops = 810,000 (30^4) tours
5 stops = 24,300,000 (30^5) tours

This only becomes a problem as we extend beyond these parameters.

for (i in 1:4) {

number_of_stops <- i

all_combos <- as_tibble(
  expand.grid(rep(list(sample_of_sites), number_of_stops))) %>% 
  mutate(tour_id = row_number())
....

We also remove any tours that visit the same destination more than once.

 ...
  group_by(tour_id) %>% 
  filter(n_distinct(value) == number_of_stops)
...

Moving beyond these event space parameters, probably our only option would be to sample randomly, n-number of stops from our vector of sites (without replacement). Remembering to remove any duplicate tours that came through.

There might be clever things we could do, towards ensuring that potentially "golden" tours were not over-looked, but we don't have to go down this challenging path, at this stage.

Part 4. Determining the viable tours

Tours must be limited to a realistic timeframe. To determine the time any tour takes, we would have to make an estimate for the ideal amount of time to be spent at each destination, as well as the travel times between each of them.

An ideal time for each destination can be generated randomly. At this point, I also added some (also random) destination cost and popularity scores (the end user would be optimising for these features, later on).

Determining the transit time for each tour was complex.

My initial thinking was simply to calculate the distance between each destination and its subsequent using the haversine formula. However, this required plenty of redundant calculations -as various destination-pairs occurred in multiple tours.

A better way was to pre-calculate the distances between every possible pair in a separate data-frame and then to simply join these distances onto each pair instance, observed in the tours.

The total travel time could be estimated by dividing the total distance by an average travel speed (say, 70km/hour). Combined with a sum of the ideal time spent at each destination, this yielded each tour's total duration. However, this method resulted in tour paths that were probably not 100% feasible:

Instead, it seemed more realistic to utilise the Google maps API to provide specific "driving" travel times (at some specified or default time of day). And to use this instead.

library(gmapsdistance)

unique_trips %>%
  mutate(travel_time = map2_dbl(`1`, `2`,
                                ~gmapsdistance(origin = .x,
                                               destination = .y,
                                               mode = "driving") %>%
                                  first())) %>%
  bind_rows(rename(., `2` =`1`,
                   `1` = `2`))

My next step was to filter out those tours that covered the same destinations in a different order but took longer to do so. This is certainly an optional step, as visiting certain destinations in a certain order could have a more positive impact beyond simply shaving a few minutes off the total transit time.

For example, having a gin or whisky distillery stop right before a horse-riding or paintball destination (depending on how wild your party is), is going to be a lot more preferable one way over the other.

We might also need to include some logic around opening hours as some activities (such as dinner or breakfast destinations or sunset horse-riding) may be dependant upon this.


Part 5. Tableau time

I utilised a self-join, connecting each tour's destination with its subsequent.

I then utilised the MAKEPOINT and MAKELINE formulas to add each tour's paths to a simple map.

I wasn't liking all those straight lines, so I decide to head back to R and to download the actual travel paths (again utilising the Google maps API).

library(googleway)

unique_trip_paths %>% 
  rename(`1` = location,
         `2` = location1) %>% 
  mutate(some_list_col = map2(`1`, `2`, ~google_directions(origin = .x,
                                                           destination = .y,
                                                           mode = "driving") %>% 
                                .$routes%>% 
                                .$legs %>% 
                                .[[1]] %>% 
                                .$steps %>% 
                                .[[1]]))

I then returned back to Tableau, adding these in.

Including the specific paths between destinations significantly expanded our Tableau data source; as between any two destinations there could be any number of legs. I was concerned about the dashboard's response time. With this addition, including tours with more than 3 stops generated too much lag (with the filtering via the histograms/filters).

This is the same tour as the one further above. It certainly looks to have a pretty high transit time percentage.

Part 6. Dashboard useability

The overall goal of this dashboard is for the user to be able to refine down a specific collection of tours that best suits their requirements. That is why presenting every single possible tour was not really that helpful. For this reason, I added two data source filters.

The effect of this was a substantial reduction in the number of tours presented and also the near complete removal of any lag.

This didn't quite convey the sheer amount of processing involved in generating this now relatively simple dashboard (as opposed to the dashboard presented at the very top of this post). But this was actually a much more helpful dashboard to the end user. As their investigation of possible tours is now not anywhere near as overwhelming.

I also allowed use of the set filter actions on the histograms (on the right). This allows the user to drag-select to further filter the feasible tour set, by these features.

The filters along the bottom, allow the user to refine their tours' destination number and specific type inclusions and exclusions. In this example, we are still simply using the original VCGLR Licence types.


Part 7. Export the result

The next task was regarding the delivery of the end result: a refined list of appropriate tours. While I could have added a second sheet to the dashboard, presenting a review of each tour's specifics, I was pleasantly surprised that the default Export Crosstab to Excel, yielded a satisfactory summary document:

Sheet 1 contained the summary statistics for each tour.
Sheet 2 presented the various destinations and the order that they were included in each of the filtered tours.

And so the proof of concept was done. For me, this dashboard was an opportunity to bring an interesting concept to life, completely based off of dummy data that I was able to pull from the web or generate randomly.

This project contains numerous techniques that would have application for businesses involved with logistics. If you have something in mind and would like to work together to whip up a quick prototype, please feel free to get in touch.

Thanks for reading.


If you would like to follow along. The gist of my R code is available below:

Custom Tours R processing
GitHub Gist: instantly share code, notes, and snippets.

To save yourself some time in querying the Google maps API, the relevant rds files are available here.


If you would like to recreate the Tableau dashboard, from scratch, you can download the csv files, here.