R/F Taskforce Meeting 20220314
Janna - We still need to look into the edge cases, against base audiences
Ali - As we run more packages, we can improve the model. We have to decide on a path, and continuously work on improving that as we go. Do we need to adjust the sensitivity?
How do we deal with a new media type?
There has to be another training portion, and model creation, and then release
We need to have a list and go through
Dylan - new media type shouldn’t be driver of R/F itself
Dylan - do we logically agree with the model? Is there something in there that shouldn’t be?
“Media Features”, other that visibility score, shouldn’t have anything to do with R/F
Location matters, and number of locations matter
Ali - how do we deal with Visibility Adjustment?
Dylan - the process of the regression doesn’t make sense right now
Dylan - we need to also build the model off of packages with random distribution
Matthew - creating packages where there is extreme spread, and use those for the model
Extremity score?
Janna - going to work on adding spread to modeled Reach
Add market area, and market population
Ali - separate models for different types of market?
Dylan - what goals are practical goals for what market?
Are there certain clusterings of inventory by place types/market types, etc?
Ali - What is the acceptable % error here?
Dylan - +/- 5%
It needs to be really close on lower levels.
Ali - if you’re more than 5% off observed, how do we bucket that?
Dylan - where are we on the correlation table?
Ali - Device-level correlation was added to the model, (footprint correlation), was perfectly explainable by how far apart the units of a package are. It was dropped in favor of model complexity and compilation time.
It required a new set of deliverables.
Looked at it, it didn’t help, so it was dropped.
Dylan - I feel it’s really important, because the external application must be sensitive to the unique footprint of the inventories.
Ali - if two units are close to each other, they’ll have very similar footprints.
Ali - Deadline and Roadmap wise:
Should we position that this is an evolving model, and will change?
Janna - we are confident on the bulk of the numbers. Some cases, we still need more work to address. We will be improving over time.
Dylan - I feel like we are missing something. Not just a model; there is a process as well. The underlying data need to go into a process instead of just an aggregate model
We are aggregating dimension of the underlying data, and that is going into the model. It’s a modeling approach vs. a calculation approach. No process for handling the data
Dylan - we are calculating Reach at the macro level.
Can we take another stab at looking at the census block footprints?
Model at a smaller geography, like Census Tract or County, and then roll it up. Some of the noise might be offset as we roll up.
Much better correlation when looking at the counties that make up a DMA and then roll it up to the DMA.
More granular-level model.
Dylan - could we do reach to each county and then roll it up?
Janna - county model is performing worse than DMAs
Dylan - they may offset one another
There are only 5 counties in the US that are bisected by DMAs
Ali - can you send me a couple examples where the model is doing poorly? Use the extreme cases. Why is the prediction off?
Action Plan - from here
Janna - identify extreme cases for observed
Ali to look into raw data
Need to define “clustered” and “random” packages
Janna will use use cases/input from users for testing packages
Repurpose old script Ali has for randomly compiling packages
We are going to take a hybrid approach
Timing wise - we have people that are expecting Transit data on Monday 3/21 in the system. Brian R. will provide packages.
Dylan - maybe we have one model, but it’s to calculate county level so we can roll the model up to larger markets.