Login

PremierBromanov · 08-20-2020, 12:50 PM

Hello and welcome to another high quality FHN article but unironically this time. Here we're going to explore the idea of increasing the amount of TPE rewarded for predictions, namely PrimeTime. It's not necessarily a suggestion, just a thought experiment to see what it might look like. Big thanks to @hotdog for providing me with the data.

Why?

One of the more exciting tasks in the SHL is making predictions. It's like gambling, but without losing (only falling behind). This comes in many forms, be it mock drafts, season predictions, primetime, or 3on3. And many other forms as well. But, primetime has one small problem: you only get 0.5 TPE for every correct answer out of 3 possible answers, plus 1 free TPE for trying. What this means is that you are very likely to get 2 TPE from predicting either 1 or 2 options correctly and very unlikely to get 3 TPE. It's not so much about the likelihood as it is feeling like you havent been rewarded for a correct prediction. If you predict two, you are rewarded as if you had predicted only one.

Now, this makes sense in other predictions, if we want to limit how much TPE is gained overall. For instance, a mock draft could easily generate 20 TPE for more astute users, widening the gap between dedicated and casual users. And that, I think, is one of the things the half-TPE-per-correct-answer rule is seeking to address: To limit the gap between a dedicated user and a new/casual user. Now, this is to say nothing of equipment training, which is a pretty hefty chunk of change and can put hundreds of TPE between users that are rich and users that are poor (namely, new users). Take it from me, I never bought much equipment and I'm about 200 TPE behind our leaders. This is more due to the fact that equipment didn't look useful to me in the SimonT era, except for gloves and maybe sticks. But folks buying up the best gloves, skates, sticks, and shoulder pads got a big bump. I digress, since this isn't an issue anymore. My point is this: We limit the gap in predictions, but not in equipment. Someone with hours and hours worth of media can easily pull ahead through their bank, but users who put time into predictions cannot. It's simply not worth the time.

And so, I'm proposing that we increase predictions TPE to 1 per correct prediction, and we'll explore what that looks like in the data.

PrimeTime

PrimeTime is currently a task that asks you to predict the winner of 3 separate games. Each correct answer gets you 0.5 TPE. Since the choices are binary, we can think of each of these like a coin flip. You have two choices and only one choice is correct. Just like a true or false question. The logic follows, then, that such questions cannot simply be reduced to a mere coin flip, as some predictions have obvious answers, some answers are based on common consensus, and some outcomes are unpredictable. So in order to accurately measure the outcomes, we need the raw data, and we can compare this to a coin flip to see how our proposed method of 1 TPE per prediction might look.

First, lets look at distribution of TPE earned. Specifically, lets look at the distribution of a coin flip. See chart below. The old method of TPE allows for 3 possible outcomes: 1 TPE, 2 TPE, and 3 TPE. That is, in order to get 1, you need zero correct answers. We can represent the possible outcomes of a coin flip with 0s and 1s, like binary. So, in order to get 1 TPE, you need 3 misses or the set [0,0,0]. The probability that 3 coins would land at 0 three times is 12.5%, or 1 divided by 2 divided by 2 divided by 2, or 1/8 chance. That 8 comes from the total number of outcomes possible, which can be simplified as 2 to the power of the number of coin flips. 2 comes from the number of possible outcomes for each flip. It can help to think of this as a branching tree. Start with two lines going down like so /\, then at the end of each of those, repeat such that you have a tree with 4 leaves. That's 2 coin flips. Add one more, and each of those 4 leaves doubles, so you get 8. but enough of a math lesson!

[Image: Bd55TZp.png]

So what we've got here is the distribution of outcomes based on raw mathematics of probability.

12.5% chance to get 1 TPE [0,0,0]
75% chance to get 2 TPE [1,0,0], [0,1,0], [0,0,1], [1,1,0], [0,1,1] [1,0,1]
12.5% chance to get 3 TPE [1,1,1]

The proposed method would be 1 TPE per correct answer plus 1 for participating.

This is represented by 3 coin flips as well, but you'll notice it seems we have 4 values compared to the 3 from before. This is because for the current method, two of the values are the same. 1.5 and 2 represent the same number (2) and are thus combined into the 75% you see above.

In reality, the probabilities of 3 coin flips is always the same, but we've done math to each chart. Both are shifted to the right to show the 1 possible TPE, and the coinflips have been modified from a d2 of values 1,2 to values 0,1. See this calculation for reference https://anydice.com/program/1d5ea

So under normal conditions, with 3 coin flips, there are 4 possible outcomes. 0, 1, 2, and 3. That is, the number of "heads". We've, again, shifted this to include the 1 TPE we get.
12.5% chance to get 1 TPE [0,0,0]
37.5% chance to get 2 TPE [1,0,0], [0,1,0], [0,0,1]
37.5% chance to get 3 TPE [1,1,0], [0,1,1] [1,0,1]
12.5% chance to get 4 TPE [1,1,1]

So, now we understand (hopefully) the distribution of values and how that changes. But what really matters here? We can say "Okay, i see far fewer people will get 2 TPE, but more will get 3 and 4". But what does this mean per user? whats the average? Well, when looking at these, we cannot simply average 1,2, and 3. I mean, mathematically, we can. Because it's the same answer. But later, this will be important: We need to look at the weighted average of each of these distributions, because in real life our users do not align with the distribution of coin flips AT ALL. With dice (or coins), you can get the average result by adding up the sides and averaging them. For a D6, it's 3.5.

For a d2, it's .5. For 3d2, we can add up the possible values in our range and average them, since it's the same as the weighted average. 1+2+3 = 6 / 3 = 2. A weighted average of 2 TPE under the current system. When we consider the proposed system, the weighted average increases to 2.5 TPE. Don't round this up like a claim, this is raw TPE. That's an increase of 25%. So lets consider that when we talk about chancing the system, there will now be 25% more TPE distributed to users from Prime Time tasks. Is this significant? Is it worth it? I think so, but lets continue.

Real Data

Coin flips are all well and good, but they are not how human beings operate. Additionally, we only have 5 sets of data to look at. If we did a coin flip test for each user and only did that test 5 times, we might see data that conflicts with the mathematical data, because probability isn't accurate for actual outcomes. if I flip a coin twice it's most likely not going to land heads, heads. But if it does 3 times in a row, that's not wrong, its just rare!

So anyway, here are the results of the 5 PrimeTimes from season 54. We're using pie charts here instead of line charts because it helps us see the results of a single set of data, rather than comparing percentages like a line chart can do. Please note the number next to the pie charts indicating which PrimeTime it was and the last one showing the combined efforts.

[Image: aWcBVL9.png]

A couple of observations based on the eye test. There seems to be a negligible correlation between the number of users getting 3 and the number of users getting 1. That is, there always appears to be a similar number of users getting everything right, regardless of how many users fail to get any right. Such is not always the case, we can see in number #4 no one was able to guess all 3 and many people failed to guess any. This is the result of an upset, we can assume.

but all in all, our distribution is WAY different than we expect!

41.0% of users get the participation TPE and nothing more
54.4% of users get one or two correct
and 4.7% of users get all three correct!

So, whats the weighted average? Remember, we can't simply average the results, that would give us 2. We know that's not the case with real data. The weighted average is actually the same as all of the results averaged out, so rather than 1,2,3, we average the set of data (that is, 200 users' TPE results). This weighted average comes to 1.62 TPE per user per PrimeTime. That's 20% less than randomly flipping a coin!

What this means is: Users are either bad at predicting and/or hockey simulations are unpredictable.

So what does this mean in combination with our coin flips? Well, we have a real distribution of TPE that is much much different.

[Image: PAPvmVe.png]

Bonus chart!
[Image: CVQ4J1A.png]

Fewer users get 2 and 3 TPE, and far more get only 1. But how is this useful to our proposed TPE method?

Well, we need to take the average between the coin flip, or rather calculate the median between those values.

The current median is thus (CFC means coin flip current, versus CFP which means coin flip proposed)

[Image: R6oZo03.png]

Using that median, we can calculate a new media for our proposed TPE method. The weighted average here is 1.81 TPE! That's an increase of about 11%.

[Image: 3ByuGq4.png]

The PROPOSED method has a weighted average of 2.07! Still 0.43 less than a coin flip (2.5) , but an increase of about 27% versus the real data! That's pretty close to that 25% we saw with the coin flips.

Lets compare

[Image: GwUnsru.png]

The blue hues represent the 0.5 TPE per prediction, while the red hues represent the proposed method. The solid lines are the calculated medians. We can see that, compared to the proposed coin flip, far more people still get no TPE. In fact, this might not actually change at all, since the same number of people would still, in theory, only get 1 TPE. Such is the limitations of the median. And this is where our graphs can break down, since we do not have the raw data for how many correct the user got right, only their end TPE. But I don't blame the data, Hotdog worked hard to get it to me, so I won't look a gift horse in the mouth.

Rather, this helps us calculate the average TPE better, since we need to extrapolate the data, rather than just calculating the TPE based on the number of correct predictions. So, tugging our distributions from the coin flip to the real data, we get this new solid red line!

So thats what the proposed system looks like, I hoped you enjoyed this shallow dive into some of the grading data we have.

Additionally, we can look at the Playoff Bracket as a different example. This is far more complex than a coin flip, but I've calculated the average TPE gained from the S54 playoff bracket to be 2.99 (not counting participation TPE of 2). If we increase each correct answer to be worth 1 TPE, our new average is 5.49 TPE (again not counting 2 participation TPE). A pretty big difference there. Do we want that? Do we want to shift the distribution of TPE away from SHL dollars and more into tasks? I'd be for it, personally. It takes far less time to make a prediction and it being worth more encourages users to get to know the league. When so much TPE comes from your bank, it rewards those who really engage with the site. This is great, because we definitely want users engaged with the site. But we also want this to be a home for casual users, or for users to be allowed to spend time with their family and friends without feeling like they're falling behind.

I am hesitant to add TPE onto the max possible TPE earned for a user every season, because this necessarily makes history complicated, even more so than usual. it would necessarily increase the number of users to hit 2k and our best users would continue to leave people in the dust. That is why I spoke of moving TPE from training/coaching to predictions, rather than tacking it on. But that, my friends, is a discussion for another time.

2478 words + graphs + data crunch.

***hotdog*** · 08-20-2020, 01:51 PM

Nice number crunching, bromanov, this is awesome!

When I was discussing things with HO about the removal of the PT cap, they were strongly in favor of systems that keep the available & average TPE roughly the same as it was before when the cap was in place. This resulted in shifting some things around in the offseason before S54, and this primetime format is new for S55.

Primetime itself was actually a new task I introduced in S50, and I had even loftier goals for it when I was proposing it (more games and, in turn, more TPE) but my discussions with HO made it clear that they prefer to keep available TPE around the same level as previous seasons. I was excited to get this interactive task into the pipeline, and I was ok with it being less TPE than I originally envisioned. Now there's 1 participation TPE for it, too, which came during the no-more-cap reshuffling.

From my understanding (this part is no longer my domain lol), having different tiers of equipment and training is the preferred way of seeing some separation between the elite earners and the rest of the field, so this is a feature, not a bug.

TLDR TPE is fun (and I'm not personally opposed to increasing it) but HO generally likes to keep the amount of available TPE roughly the same as in recent history, and the current system a result of my attempts to honor that wish Smile

PremierBromanov · 08-20-2020, 02:01 PM

08-20-2020, 01:51 PMhotdog Wrote: Nice number crunching, bromanov, this is awesome!

When I was discussing things with HO about the removal of the PT cap, they were strongly in favor of systems that keep the available & average TPE roughly the same as it was before when the cap was in place. This resulted in shifting some things around in the offseason before S54, and this primetime format is new for S55.

Primetime itself was actually a new task I introduced in S50, and I had even loftier goals for it when I was proposing it (more games and, in turn, more TPE) but my discussions with HO made it clear that they prefer to keep available TPE around the same level as previous seasons. I was excited to get this interactive task into the pipeline, and I was ok with it being less TPE than I originally envisioned. Now there's 1 participation TPE for it, too, which came during the no-more-cap reshuffling.

From my understanding (this part is no longer my domain lol), having different tiers of equipment and training is the preferred way of seeing some separation between the elite earners and the rest of the field, so this is a feature, not a bug.

TLDR TPE is fun (and I'm not personally opposed to increasing it) but HO generally likes to keep the amount of available TPE roughly the same as in recent history, and the current system a result of my attempts to honor that wish

yeah not changing the overall TPE makes sense to me. But, I could see a good discussion on that. Really challenge why or why not we would want to do that.