Geopath Observed Mobile Device Routed Trip Frequency

Understanding out-of-home audience impressions is valuable. The delivery and effectiveness of a campaign, though, is largely influenced by reach and frequency as well. In the past, reach and frequency were estimated based on surveys, complicated models, or random duplication. With a thorough understanding of population movement across all possible pedestrian and vehicular links in a geography, it is possible to deliver granular reach and frequency metrics based on observed device movement.

This document explains the process of capturing observed reach and frequency measures for any given:

  1. Campaign (inventory display(s) and time period)

  2. Target market

  3. Target audience profile(s)

Reach & Frequency Development Process

Panel Identification & Panel Characteristics

The process starts with identifying a set of consistent devices that make up a panel throughout the course of the campaign period. A device in the panel must be visible in at least 40 separate hours per week for all weeks during the campaign period. Due to their high visibility, these are the devices that we have confidence in around their activity and how their movement represents the general behavior of their cohort. Devices must also live within the target market

The figure below shows our nationwide device panel size that is being used for the analysis as a function of campaign start date and its duration. Results show that even though panel size decreases as the length of the campaign increases, it is often the case that a consistent panel even over 12 weeks represents around 2% of the US 16+ population and can be confidently extrapolated to the US population. In addition, as we curate and improve our data sources, our sampling rates have been increasing over time. We expect this trend to continue.

Figure 1. Panel size by campaign start (color) and duration (X axis)

Trip Identification

Once the panel is curated and devices are tagged with the probabilities belonging to a certain audience profile of interest, we then identify all the trips made by each of these panel devices throughout the course of a campaign. Trips product documentation explains the details of how we build trips from raw mobile phone GPS sightings. In short, mobile GPS data are noisy, at times sparse, and generally inconsistent in sighting density over time. It can be very well the case that a device sitting at home keeps generating sightings, and on the other hand, a moving device could have a smaller number of sightings. Our trip making processes can confidently isolate the episodes that a device starts and ends moving, therefore, filtering out the times that a device is stationary. One of the key concepts in our trip making procedure is that a device must leave its origin resolution 9 H3 cell flower as demonstrated in Figure 3 (travels more than roughly a quarter a mile) for a trip to be initiated. In other words, a device located in the purple cell needs to at least travel as far  as yellow cells for a trip to be initiated. Our data shows that post-covid each device has made on average about 2 routable (ignoring short trips like walking the dog) trips per day.

Figure 3. Device origin and trip route

As mentioned, we monitor every individual trip made by any of the panel devices over the campaign period. For the frequency analysis, we only consider the trips that are shorter than 150 miles (local trips). This way we refrain from attempting to route air travel or longer and other long-distance trips. Each trip marks the start (time and location) and end of a significant movement for a device. Knowing the locations of the ends of a trip (home, work, and other), we can then mark a trip with a purpose such as commute or errand. In addition, knowing the pings each device drops throughout a trip, we have insights on an approximate route they took toward their destination. This information is used to later snap a trip to the transportation network.

Network Map Matching

As each trip comes with a set of latitude, longitude, and timestamp during travel, we snap each trip to the transportation network using Open Street Routing Machine (OSRM) and their map matching service. This will yield the most probable route that was taken by a device and gives us a remarkably close approximation of which road segment a device was travelling on at any given time during their trip. We use the latest HERE Map for our transportation network. With this we can conceptually observe the gaps between each sighting while the device is in motion. Before the sightings are run through the route match algorithm, there are two steps that intend to remove noise found in the original sightings:

  • Remove subsequent sightings that have a point-to-point speed of more than 45 meters per second (100 mph)

    • This will make sure that there are no sudden jumps in the latitudes and longitudes that are fed to the router.

  • Remove subsequent sightings that are less than 100m apart

    • This will make sure that the device is not being routed while it is temporarily stationary (e.g., sitting at the red light or stop sign) but generating sightings.

The result from the map matching algorithm gives us what time the device was traveling on each roadway segment. And on an aggregate level, it essentially turns noisy and sparse latitude, longitude, time series data (Figure 4, left) into snapped locations on the transportation network (Figure 4, right).

Traffic Count Validation

We continuously validate the results of our routed trips in Los Angeles County at an aggregate level with CalTrans Performance Measurement System (PeMS) traffic counts. CalTrans publicly releases traffic count data in 5-minute intervals at the most granular level over numerous count stations across the state of California. Matching PeMS count stations using their coordinates to HERE Maps roadways (or more precisely "links"), we study how our results compare with PeMS on an aggregate level.

PeMS Station Mapping to HERE Map Roadway

For this purpose, we assign one roadway to each PeMS station with the following criteria: 

  • Consider only main line stations, ignoring on/off-ramps, ignoring high­ occupancy vehicle (HOV), and toll lanes

    • These will cover over 50% of all stations in CA

  • Consider only the stations that the counts are 100% observed and not imputed or modeled

  • The distance between the station and the roadway must be less than 5 meters

    • The roadway must be closest to the station, compared to the other roads within 5 meters of the station

  • The general direction of the road (north, east, west, or south) has to match that of the station.

However, even with such strict matching criteria, there will be a handful of mismatches as shown in the next sections. Figure 5 shows the geographic distribution of approximately 1500 stations and their matched roadways in Los Angeles County.

Hourly Traffic Patterns

We continuously validate traffic flows calculated from the routed person trips of selected panel devices with the actual vehicle counts from PeMS data. Analyzing one month of data (2021-04-26 to 2021-05-23), Figure 6 shows the comparison between hourly average flow patterns from PeMS vehicle flow data (blue) and an upper and lower limit based on based on Motionworks-routed person trips. The upper limit is simply calculated using the person trips directly, represents 1 person per vehicle (red). The lower limit uses the 2017 NHTS National vehicle occupancy data applied to the person trips, to get a highest reasonable occupancy (orange). The results show PeMS flows (blue), well within the limits.

Average Daily Flow Validation

In addition, Figure 7 shows the same flow rates (upper limit - persons and lower limit - NHTS) compared against PeMS daily vehicle flows over a month. Each bound represents the reasonable limits of a road segment and day according to the person trips. And x-axis shows PeMS vehicle counts for the same segment and day. The dashed line on the diagonal shows a perfect one-to-one relationship. Note that here we are expecting to see a positive correlation but not a perfect fit. And as long as the one-to-one relationship falls within the bounds, which is the case for the  majority of the segment/days, the person trips and counts are validated.

Once we are confident that the routed person trips are in line with the traffic counts at and represent the total travel well, we move on to analyzing the passes of these trips through the viewsheds and then calculating reach and frequency statistics.

Viewshed and (Automated) Links Creation

Alongside building geospatial viewsheds for each face in the campaign, Motionworks also curates a set of directional roadway links (automated and published) which not only pass through the viewshed but are where the face is consumable and within the field of view of the driver. In addition, each face/link pair also gets a Visibility Adjustment Index (VAI) that defines the percentage of Opportunity to See (OTS) that turns into a Likelihood to See (LTS) impression. This concept will later be shown when comparing circulation versus impression frequency distribution. Note that link assignments to faces can also be provided by the client. In such a case, the process of building viewsheds and automatically assigning links to it can be skipped.

Viewshed Passes

Once the faces and their viewsheds and assigned links are settled, then we intersect the routed trips to these links to identify which devices were exposed to any of the faces in a campaign and how often. Note that here we consider the VAI as a probability that a pass through a viewshed/link resulted in an impression.

Reach and Frequency Aggregation 

At this step, all device level exposures and frequencies are summarized and aggregated to reach percentage and frequency distributions such as:

  • Percentage reached

    • From (a) target audience profile

    • In a market

    • By a campaign

    • Over a certain period of time

    • That have seen (LTS) or had the opportunity to see (OTS) any ad in the campaign one or more times (LTS/OTS frequency distribution)

Roadside Reach and Frequency in Action

To demonstrate, in this section we go over the process of calculating nationwide reach and frequency measures for specific roadside media types by GRP over 4 weeks.

For the purpose of this study, we created numerous random packages for each media type (presented in Figure 8) and GRP level. Each package of a media type and GRP level wasuniformly distributed across all counties in the US. We do this by knowing the GRP that an average face of a specific media type delivers in each county and the number of the faces in the county. This will determine the maximum potential of each county by media type. So, as we build inventory packages that grow in GRP, each county grows until it maxes out while other counties that have potential continue growing. This process will yield a truly nationwide package that delivers impressions across the country and not particularly in one market.

Now, having the packages identified, we can study their nationwide reach by frequency.

The resulting curves are presented in Figure 8 as average nationwide percentage reached over 4 weeks by a package of certain media type that has a specific weekly GRP. For instance, a uniform package consisting of only poster inventories that is resulting in 500 weekly GRP reaches ~58% of the US population over 4 weeks at least one time and ~13% at least 11 times.

Sampling Rate Discussion

When discussing products that are generated by mobile device data, it is critical to understand the impact of sampling rate on the final results. Below we present the reach curves of media type by weekly TRP over 1, 4, and 12 weeks in Los Angeles (LA) County. All faces are within the county and the reach curves also show the percentage reached of all LA County population (~10M). Figure 9 shows that even a small sample of the population yields a close approximation of reached population. For instance, the movement of even as low as 300 devices out of 10M population in LA county (0.003%) results in the same reach curve as that of 30,000 devices (0.3%). The results show that below 0.003% sampling rate (less than 50 samples), reach curves start to break down.

 

Original Source Document:

 

Related pages