<02-03-2023> 1 days ago
reading time 13 minutes
text is more expensive than you think
wizards of the coast a billion (with a b) dollar company creates a pretty interesting game involving cardboard and magic. a game which i have enjoyed despite the insane f2p economy.
a billion dollars is beyond an absurb amount of money. a billion of anything is not something human beings can remotely visualize. if you wanted to try though you could check out printing money or attempt to spend 1 billion of bill gates insane 100 billion fortune created the utterly amazing Neal Agarwal.
hint:
try 250 million big macs currently priced at 3.99 USD...
looking through a few sources we can gather (ha) that there is something like 7 million average users a month.
thinking about that for a second, 7 million users is just how many people that play the game a month. we can realistically safely assume there is like 20 - 40 million accounts.
(source: ¯\(ツ)/¯)
in the same way that you don't design a bridge to hold 'like about that much...' you generally want to overestimate by quite a bit.
depending on the needs of the application, the context, the phases of the moon, you want to give different kinds of margins.
for example, a machine that burns cookies 1% of the time is perhaps a little less severe than a machine that explodes kittens at the same rate.
so a healthy overestimate of 7 million monthly users to a total of 80 million users is probably a safe margin for the scale currently being operated at when we take in yearly growth.
i caught a fish 80 million users about thiiiiiiiiiiiiis big.
the max profile picture size for twitter is 2mb. so if we added the ability for all users to have a custom image, just how much data is that?
oh i don't know...
160 terabytes
aws tells us that a gb is worth about 0.021 USD or about 2 cents and 2 british american ha'penny
so for nearly double the median salary of a single mom in ohio our users can have profile pictures.
and that is just 2 megabytes, 2 * 10^6 (2,000,000) bytes.
we now have the needed context we can get to work. in other words we can finally get to:
to what do we owe a text file? and to what does a text file owe to us? well to start they're usually very small. like teeny tiny.
here is my friend Xe's seemingly favorite letter, h.
2 bytes (on my text encoding)! wow that is small.
the amount of space any given character takes up depends on the encoding scheme present. utf-8 for example encodes bytes variably from 1-4 bytes per character.
this is why the ascii alphabet can be represented with one byte (8 bits) but more complex characters requires multiple bytes.
now a magic the gathering deck file looks like this when exported from some tool like mtggoldfish
3 Birds of Paradise [MB1] 3 Blooming Marsh [KLR] 1 Boseiju, Who Endures <extended> [NEO] (F) 4 Chord of Calling <magic 30> [DMU] 1 Dryad Arbor [V12] 4 Eldritch Evolution [MB1] 1 Essence Warden [MB1] 1 Forest <254> [THB] 2 Geralf's Messenger [DKA] (F) 4 Grist, the Hunger Tide 4 Ignoble Hierarch 1 Misty Rainforest [SLU] 1 Nurturing Peatland [MH1] (F) 3 Overgrown Tomb <planeswalker stamp> [GRN] (F) 1 Snow-Covered Forest <285> [KHM] (F) 4 Strangleroot Geist [DKA] (F) 1 Swamp <252> [THB] 2 Twilight Mire [A25] (F) 1 Urborg, Tomb of Yawgmoth [PRM-UMA] 4 Verdant Catacombs [SLU] 4 Wall of Roots <magic 30> [DMU] 1 Yavimaya, Cradle of Growth 4 Yawgmoth, Thran Physician [MH1] (F) 4 Young Wolf [DKA] (F) 1 Zulaport Cutthroat [C20] 1 Crime/Punishment [DIS] (F) 2 Endurance 3 Force of Vigor [MH1] (F) 1 Kataki, War's Wage [MD1] 1 Magus of the Moon [IMA] 2 Necromentia <prerelease> [M21] (F) 1 Outland Liberator <showcase> [MID] (F) 1 Plague Engineer [MH1] (F) 2 Thoughtseize [AKR] 1 Thrun, the Last Troll [MB1]
this file is only 500 times the size of the h file at a whopping... 1006 bytes.
this is a pretty good case file actually, most of the cards are 4 ofs (the max in a deck). you could have something much worse as there are multiple formats that MTGA supports.
take for example this 100 card singleton deck.
1 Akroma's Will <418038> [PRM] 1 Arcane Denial [A25] 1 Arcane Signet [DMC] 1 Arid Mesa [MH2] 1 Ash Barrens <retro> [BRC] 1 Assassin's Trophy [GRN] 1 Badlands [VMA] 1 Bayou [VMA] 1 Beast Within [NPH] 1 Birds of Paradise <208> [PRM] 1 Bloodstained Mire [KTK] 1 Boreas Charger [PZ2] 1 Brainstorm [VMA] 1 Cavern of Souls [AVR] 1 Chromatic Lantern [RTR] 1 City of Brass [8ED] 1 Coat of Arms [8ED] 1 Command Tower [PZ1] 1 Crested Sunmare [HOU] 1 Cultivate <Black is Magic> [SLD] 1 Curse of Opulence [PZ2] 1 Demonic Tutor [VMA] 1 Distant Melody [MOR] 1 Dryad Arbor [FUT] 1 Dryad of the Ilysian Grove [THB] 1 Elesh Norn, Grand Cenobite [MM2] 1 Emiel the Blessed [2X2] 1 Exotic Orchard <retro> [BRC] 1 Fact or Fiction <retro> [DMR] 1 Farseek [M13] 1 Felidar Retreat [ZNR] 1 Fellwar Stone <retro> [BRC] 1 Flooded Strand [KTK] 1 Forbidden Orchard [CHK] 1 Forest <254> [THB] 1 Gemstone Mine [TSB] 1 Generous Gift [MH1] 1 Genesis Wave [SOM] 1 Good-Fortune Unicorn [MH1] 1 Grand Coliseum [VMA] 1 Green Sun's Zenith [MBS] 1 Harrow <198> [PRM] (F) 1 Helm of the Host <retro> [BRR] 1 Heraldic Banner [ELD] 1 Heroic Intervention [AER] 1 Imperial Seal <20> [PRM] 1 Island <251> [THB] 1 Keeper of the Accord [CMR] 1 Kodama's Reach [MMA] (F) 1 Loyal Unicorn [PZ2] 1 Mana Confluence [JOU] 1 Mana Crypt [EMA] 1 Mana Vault [VMA] 1 Marsh Flats [ZEN] 1 Mirari's Wake [MH2] 1 Mirror Entity [LRW] 1 Misty Rainforest [ZEN] 1 Mountain <253> [THB] 1 Nightmare Moon [PTG] (F) 1 Opaline Unicorn [THS] 1 Path to Exile <160397> [PRM] 1 Plains <250> [THB] 1 Plateau [VMA] 1 Plated Pegasus [TSP] (F) 1 Polluted Delta [KTK] 1 Ponder <Black is Magic> [SLD] 1 Pongify [2XM] 1 Preordain [M11] 1 Princess Twilight Sparkle [PTG] (F) 1 Rapid Hybridization [GTC] 1 Rarity [PTG] (F) 1 Reflecting Pool [SHM] 1 Relentless Assault [10E] 1 Return of the Wildspeaker [ELD] 1 Rhythm of the Wild [RNA] 1 Savannah [VMA] 1 Scalding Tarn [MH2] 1 Scrubland [VMA] 1 Shard Convergence [CON] 1 Skullclamp [DST] 1 Smothering Tithe [RNA] 1 Sol Ring <Black is Magic> [SLD] 1 Survival of the Fittest [VMA] 1 Swamp <252> [THB] 1 Swords to Plowshares <retro> [BRC] 1 Taiga [VMA] 1 The Great Henge [ELD] 1 The Immortal Sun [RIX] 1 Tropical Island [VMA] 1 Tundra [VMA] (F) 1 Underground Sea [VMA] 1 Verdant Catacombs [ZEN] 1 Vivien's Arkbow [WAR] 1 Volcanic Island [VMA] 1 Vryn Wingmare [M21] (F) 1 Wayfarer's Bauble [MM2] (F) 1 Windswept Heath [KTK] 1 Wooded Foothills [KTK] 1 Workhorse [EX] 1 Worldly Tutor [MI]
now this file is 2446 bytes! although this is probably the average worst case.
it could be worse with custom arts for each deck file. plus building a singleton deck around cards with long names such as the oh so fun to say asmoranomardicadaistinaculdicar! (as-more-an-uh-mar-dih-cuh-dye-stin-uh-cuhl-dih-car).
the deck limit is currently ~100, and there is 7 million users, but we're overestimating and assuming something like 80 million, with the worst case scenario being 2500 bytes (2.5kb) doing some math we can see that takes up...
using our handy dandy graph from earlier we do some computations and...
over some 12 year period that only costs us 60,000 USD.
that's not too bad but i think we can do much better.
firstly we need to understand this problem is a little more complex than just deck files.
magic arena being the f2p economy is incentivizes players to spend money by having alternative arts for each card. there are a few ways to handle this.
what if we just compressed the text through an encoding scheme?
turning a card such as
we partially encode some data into the file format where a lack of a number is simply just 1.
acknowledging same set name collisions we could embed an index, or pregenerate a look up table, or just keep adding letters until it is unique. an example of this happening would be in 10E where both Peek and Persuasion are cards in the set.
example solution output
both have benefits and drawbacks but variadic numeric indexing is likely the most efficient approach with this encoding scheme to solve collisions.
this adds often only 2 byte overheads to each collision so it is negligible in my protoyping, (to be read as: i'm not implementing this for now but acknowledging it)
just how much does this encoding reduce our sizes?
in the best worst case scenario we can reduce it substantially. but what about something more average?
even in this more realistic example we get still get a substantial reduction of nearly 4 - 5 times.
applying this to our data model:
some validation will need to be done and maintenance on the compendium of card names which adds to cost but that is not within the scope of this solution.