fv/
Posts
💰 DATA BILLIONAIRES

💰 DATA BILLIONAIRES

Creators and tech platforms are upset about AI firms’ scraping their content to train their chatbots

July 18, 2023

Culture, tech, luxury, travel, and media news, emailed twice a week

The “Eastern & Oriental Express” is returning to the rails with 15 carriages, including two restaurant cars, one piano bar car and an open-air observation car, departing from Singapore and traveling to Malaysia (Belmond)

🔼 Wheat prices. Russia axed a UN-brokered grain deal with Ukraine »»

🔽 The price of Ford’s electric pickup. The EV wars are heating up »»

💬 “Too soon b.” Jack Dorsey snubbed Zuck’s Threads follow request »»

🛫 Tokyo’s best boutique hotels »»

👗 Adidas turned Homer Simpson’s bush meme into Stan Smiths »»

💎 Super luxe, overnight train journeys are having a moment. Asia’s most luxurious train is back, with an all new route »»

Built from the ground up from board-formed concrete and hand-trowelled plaster, a new LA residence from Masastudio and Kelly Wearstler boasts a pool pavilion, guest house and landscaped garden (The Ingalls)

Is Hyundai’s new “not-quite-a-Land Rover” SUV cool or not? »»

Why some restaurant companies are starting to act more like marketing companies »»

British Airways opened a Whispering Angel rosé bar in their Heathrow T5 lounge »»

Paris is running tests on the Seine to create a Summer Olympics opening ceremony to remember in 2024 »»

An emotional David Beckham introduced Lionel Messi to Miami »» Flashback: this newsletter broke down why Messi’s there to begin with »»

Mesh flats are officially everywhere—here are 16 to shop now »»

AI’s “tech tsunami” is coming for call center workers »»

America's young workers are headed toward a career calamity »»

Welcome to the Carlos Alcaraz Era »»

TV and movie sets used to be messy. What happened? »»

This Los Angeles bolthole by Masastudio and Kelly Wearstler is a carefully considered gem »»

Tesla’s Texas factory (finally) produced the first Cybertruck. Mass production is set for next year »»

Birkenstock loyalists say these Amazon sandals feel like 'walking on a marshmallow' — and they're US$25 »»

“We built it, we trained it, but we don’t know what it’s doing.” The scientists who build AI can’t tell you how it works »»

Check out Europe’s fastest train »»

Clean these things yourself when you check into a hotel room »»

Wix’s new tool can create entire websites from prompts »»

This is the No. 1 best thing about living in the US, according to expats »»

AI firms scrape every bit of content they can find to help train their systems. Some creators are starting to sue

Vast reservoirs of public domain data have already been ingested by generative AI systems —and more data is added to this pool every second of every day. Data is the new oil

“I’LL TAKE THAT, THANK YOU VERY MUCH”

AI firms are “scraping” artists’ and writers’ content from every corner of the web —and said artists and writers are upset. Some are even suing.

Their concerns:

IP (intellectual property): People who earn a living by creating original content typically own the rights to their output. These folks see data scraping as a form of plagiarism/copyright infringement. To them, AI firms’ using content without permission or compensation is a violation of IP rights.

Loss of control: Once works —whether they’re written, visual, audio, or anything else— are scraped and ingested by an AI firm to train its models, the original creators totally lose control over how those works are used. This just feels wrong to many people. Of particular concern: AI misinterpreting or misrepresenting the work. However, the chances of this are very low. As most people know by now, AI remixes its inputs into new outputs, most times unrecognizably so.

Devaluation of work: Many people think that the widespread use of AI to generate new content may throughly devalue human-created work. Though this newsletter doesn’t agree with that line of thought, the argument goes like this: If AI can reproduce similar content at scale, that will drive down demand for human writers and artists, which will in turn make it way, way more difficult for creatives to earn a living from their craft.

In my opinion, the jury’s still out on this. I think the ubiquity of AI content will actually increase the value of the human written word, much like there’s still a market for Michelin star meals as well as for protein shakes —and for Loro Piana shoes and clothes as well as for Shein.

Authenticity and attribution: This one’s emotional. Creative output often reflects a deeply personal perspective, or some sort of belief, style, or voice. When an AI scrapes and uses this data, it can reproduce the style without any regard whatsoever for the original creator, leaving them feeling “robbed” —as well as creating issues around both authenticity and attribution.

In short, creators are worried about the potential exploitation of their work, the erosion of its value, and the emotional implications of AI firms data scraping their content to make new stuff.

HAVE IT

BUT. Some people (like me) are totally okay with AI firms using their written output (like this newsletter). Here’s my rationale:

My vision is for FATE v FUTURE the newsletter to always be open and free for anyone who wants it. That includes generative AI firms training their next models.

It’s not that I see these emails contributing to the advancement of AI tech, per se. But happily opening up the 300 or so issues of FATE v FUTURE that I’ve written to date to be part of some future AI dataset, does make me feel like I’m helping fine-tune a system that is beneficial to society as a whole —in my own tiny way.

It stands to reason that other creators might agree.

“BEDROOM RICK RUBINS”

I’ve written a lot about the democratization of creativity that AI systems have brought. I especially love the idea that anyone with a creative impulse can basically conjure up anything they can imagine to life nowadays.

Some of that stuff may even be good!

Generative AI allows hundreds of millions people —if not billions— to make creative output on their own, regardless of their training or skillset.

It bears repeating: some of those creations might even be good!

Seen this way, some existing creative pros might see the idea of their data being used to let others to express themselves creatively as a net positive— especially if that new output fuels even more creative expression onwards.

(On that same note, open source advocates —individuals with strong beliefs that software, including its source code, should remain free for all to use and modify for the “greater good” of technological advancement— may also be of the opinion that written and visual data, especially when it's publicly available to begin with, should be freely used to contribute to technological progress.)

ADMIT ONE

Look around: data is currency. How do you think you “pay” for all the “free” software you use, like Google, Gmail, Instagram, TikTok, Spotify, YouTube, Facebook, LinkedIn, Twitter, Pinterest, and many, many, many more?

If you’re ok with that exchange (trading your data for free email, free content, and free social media services), you might —just might— feel like releasing the rights to your historical output to become part of an AI's learning process is simply the price we all have to pay to enjoy the fruits of AI’s labor.

In other words, if you like using ChatGPT for free, you may well be cool with the idea that it was trained, in part, on your past Tweets.

YOU CAN’T BEAT EVOLUTION

Industries and economies evolve.

The Industrial Revolution gave birth to the rise of resource-driven economies. (That’s just fancy economics-speak for a situation where oil, precious metals, and other physical resources are highly coveted.)

And while we’re still very much in that era, we're simultaneously witnessing the explosion of a data-driven economy too.

Stay with me.

A PRECIOUS RESOURCE

As I’ve written, the world’s most precious resource today is data. This fun analogy explains how:

Raw data is just like crude oil: it is worth something, but it’s also difficult to derive immediate utility from it. See, raw data —like crude oil— needs to be processed, refined, and synthesized into a usable form.

When raw data is converted into structured information, its value rises. This is just like how crude oil becomes more useful —and more expensive—once it is refined into petrol or diesel.

But the analogy doesn't stop there.

$$$

When structured information is further analyzed and interpreted, it becomes insights, which are more valuable, still.

Think of data-based insights as the huge range of end products that are in some way shape or form derived from petroleum— an insanely long list of products that extends from literally anything made of plastic, to fertilizers, perfume, toothpaste, glasses, cosmetics and much, much, much more.

Each of those things serve specific and —in some cases— extremely high-value purposes. They’re also worth much, much more than the original crude oil that went into them.

WAIT, WEREN’T WE TALKING ABOUT AI?

Yes, and data scraping too —and, specifically, about how upset many creative people are about their data being scraped to train new generative AI models.

But here’s the thing: that ship has already sailed.

Vast reservoirs of public domain data have already been ingested by generative AI systems —and more data is added to this pool every second of every day.

On top of that, as I’ve shared, many AI firms pay real people to interact with their AI all day long, every day. These human “interactors” constantly feed generative AI systems fresh insights, perspectives, and language —keeping the training wheels turning.

WHAT NOW?

In our view, high-quality, humanmade data —that is, words, music, and pictures— will remain highly valued.

Can you or your business generate, collect, or otherwise harness and package up this kind of data?
Or, can you create a community, or otherwise encourage user interaction, around this kind of data?

If the answer is yes to either one of these questions, and if you can keep both of those elements within your own digital ecosystem, then in our view, you stand to benefit from this shifting landscape.

If you run a business, start treating your data, however narrow or niche, as a potential revenue stream. Every customer interaction can be mined for information, and that information distilled in order to understand trends, derive actionable insights.

Big tech is investing billions in AI —and making moves to gate their data to protect it from competitors— because they understand this dynamic.

Now you do too.

More:

“Not for machines to harvest” »»

Sarah Silverman is suing OpenAI and Meta for copyright infringement »»

Written by Jon Kallus. Any feedback? Simply reply. Like this? Share it!

Reply

or to participate.