Web scraping is the process of gathering and extracting data from websites through automated software scripts or bots. It involves making HTTP requests to download web page data, then processing that data to extract and structure the information. Web scraping is commonly used to collect large volumes of data from various online sources for analytics, research, reporting, and other applications. With the rising popularity of TikTok, web scraping has attracted interest for extracting data from the platform.
TikTok is a video-sharing social networking app that has over 1 billion monthly active users globally as of 2022. Its popularity, especially among Gen Z, has skyrocketed in recent years. This has led third-party developers and researchers to want access to TikTok’s data at scale for analysis. However, TikTok does not provide open APIs for data access. As a result, some have turned to web scraping as a method to gather insights from TikTok.
This raises questions around the legality and ethics of scraping TikTok. While web scraping itself is not strictly illegal according to U.S. law, provisions around accessing data without authorization, violating terms of service, breaching copyright protections, and misusing personal data may apply. This article examines the key legal considerations around scraping TikTok data.
Background on TikTok
TikTok is a popular short-form video sharing app that launched in the United States in August 2018 after merging with Musical.ly. The app was originally launched in China in September 2016 under the name Douyin.
Since its launch, TikTok has seen massive growth in users. As of January 2023, TikTok has over 1.5 billion monthly active users worldwide, with over 100 million monthly active users in the US alone according to Statista and Ecwid.
TikTok’s core functionality is sharing short videos, often accompanied by music. Users can create and upload their own videos as well as view, like, comment on, and share others’ videos. The app’s algorithm curates content to each user based on factors like user interactions.
TikTok collects various types of data from its users including profile information, user generated content, interests and online behavior, social connections, and device and technical data as outlined in their privacy policy.
TikTok’s Terms of Service
TikTok’s Terms of Service contain several provisions that prohibit scraping or unauthorized collection of data from the platform. Per TikTok’s Terms of Service, “users shall not:
(i) crawl, scrape, copy, modify, interfere with or collect, by automated means or otherwise, any content or information from the Services, including the user accounts of other users; or
(ii) breach, circumvent, remove, disable, decrypt or otherwise interfere with any technological protection measures used by TikTok or any TikTok Content Providers to protect content or the Services”
Violating TikTok’s Terms of Service can lead to penalties including suspension or termination of your TikTok account. Additionally, in civil cases involving web scraping without permission, entities have been subject to financial judgements and injunctions barring them from further scraping and using the scraped data.
Copyright Law Considerations
TikTok holds copyrights over the creative content uploaded to its platform, including the videos, images, effects, filters, and more. Under U.S. copyright law, specifically the Digital Millennium Copyright Act (DMCA), scraping or copying content from TikTok without permission could constitute copyright infringement.
The DMCA prohibits circumventing technological protection measures that control access to copyrighted works. Though TikTok’s API is public, scraping at scale to download large amounts of content could violate its terms of service and access controls. The Computer Fraud and Abuse Act (CFAA) also makes it illegal to access a computer system in a way that exceeds authorized access.
Overall, while limited manual downloads may be acceptable, systematically scraping TikTok without permission risks being considered copyright infringement. The legality depends on factors like the amount downloaded, whether it’s transformative, and its impact on TikTok. But automated scraping specifically to copy content likely requires licensing from TikTok.
Sources: The Copyright Defense Against Web Scraping, Grabbing Data From the Web? Our Copyright Guide Outlines …
Data Protection Regulations
Privacy regulations like the EU’s General Data Protection Regulation (GDPR) create protections around user data and scraping activities. According to the GDPR, scraping names, emails, and other identifying information from a website without consent is generally prohibited (1). The regulation gives EU citizens more control over their personal data and restricts how companies can collect, use, and share that data. Under the GDPR, companies that scrape data from EU citizens without permission can face steep fines (2).
The GDPR does not outright ban web scraping. Companies can legally scrape public data that does not identify individuals. However, scrapers must be careful to avoid collecting protected personal information without consent (3). Ultimately, the legality depends on what data is being scraped, how it is obtained, and how it is used. Scrapers targeting users in the EU must ensure compliance with GDPR’s data privacy rules.
(1) https://www.linkedin.com/pulse/data-scraping-vs-gdpr-jorge-garcia-herrero
(2) https://scrapfly.io/is-web-scraping-legal
(3) https://www.octoparse.com/blog/gdpr-compliance-in-web-scraping
Ethical Concerns
When scraping data from platforms like TikTok, it’s important to consider the ethical implications, especially around user privacy and potential data misuse. TikTok users do not consent for their data to be scraped, so scraping may infringe on their privacy rights. There are concerns that scraped TikTok data could be used to profile or target users without their knowledge.
According to an article on web scraping ethics from Towards Data Science (https://towardsdatascience.com/ethics-in-web-scraping-b96b18136f01), it’s important to carefully evaluate the scraping use case and minimize harm, especially around personal data. Scrapers should only extract publicly available data and avoid infringing on privacy. There are also concerns that scraped data could be used to manipulate users or spread misinformation.
Overall, ethical scraping requires protecting user consent and minimizing data collection. Scrapers should carefully evaluate their use case, protect private information, and ensure transparency in how data is used.
Exceptions and Defenses
There are some exceptions and defenses that may apply in certain web scraping cases to make the practice legal. Two of the main defenses are fair use and using public data.
Fair use is a legal doctrine that provides protections for copyrighted material to be used without permission for purposes such as criticism, commentary, news reporting, teaching, scholarship or research. As long as the web scraping is being done for one of these purposes in a transformative way, and not merely copied, it may qualify as fair use (Source).
Scraping or collecting data that is publicly available is another potential defense. If the website owner has not taken any steps to protect the data or make it private, and willingly makes it public, then scraping that data is less likely to be illegal, as there is no expected privacy (Source).
In addition, web scraping or crawling data for non-commercial purposes such as research, journalism, or analysis may be more protected than scraping for commercial gain. Small-scale scraping projects done manually just to learn may also face less scrutiny (Source).
Past Legal Cases
There have been several relevant lawsuits regarding the legality of web scraping:
In hiQ Labs, Inc. v. LinkedIn, hiQ Labs scraped public LinkedIn profile data to provide analytics services. LinkedIn sent a cease-and-desist letter claiming violations of the Computer Fraud and Abuse Act (CFAA) and its terms of service. The Ninth Circuit Court ruled in favor of hiQ Labs, finding that the CFAA does not prohibit accessing publicly available data, and that violating a website’s terms of service does not make scraping illegal on its own. This set an important precedent establishing protections for scrapers of public data. [1]
Similarly, in the 3Taps case, Craigslist sued 3Taps for scraping its public listings data. The court ruled that violating a site’s terms of service does not constitute a violation of the CFAA, affirming protections for scraping public data. However, 3Taps was found liable for copyright and trademark infringement. [2]
LinkedIn filed another scraping lawsuit against data analytics company 100ms.ai in 2021. The case is ongoing, but 100ms.ai argues their activities fall under fair use protections. The outcome may further define protections for scrapers. [3]
Best Practices for Responsible Data Collection from TikTok
When scraping data from any website or platform, it’s important to follow ethical guidelines and legal requirements for responsible data collection. Here are some best practices to keep in mind:
Get consent and be transparent. Make your intentions clear and get permission before scraping large amounts of data. Avoid deception or trying to “hide” your scraping activities.
Follow the Terms of Service. Review TikTok’s ToS and API documentation closely and stay within the prescribed limits. Don’t violate prohibited activities.
Limit collection and storage. Only gather the minimum data needed for your purposes. Don’t store copies indefinitely.
Use data responsibly. Don’t share private data or use it for unethical purposes like harassment or discrimination.
Consider using TikTok’s API as an alternative to scraping. This provides official access to certain public data in a regulated manner. However, API access is limited and still requires following TikTok’s policies.
When in doubt, consult with legal counsel to evaluate if your specific data collection plans are permitted and ethical.
Conclusion
In conclusion, while web scraping is a common practice used in data science and analysis, there are important legal and ethical considerations when it comes to scraping data from platforms like TikTok.
The legality typically depends on factors like a platform’s Terms of Service, data protection laws, copyright regulations, and whether the scraping causes harm. TikTok’s Terms prohibit scraping without permission, but defenses like fair use may apply in some cases.
However, even if legally defensible, scraping TikTok raises ethical concerns around privacy, informed consent, and commercial use of people’s data. The platform’s young userbase also merits extra caution with their information.
Ultimately, consult legal counsel before any scraping project. Consider limiting collection, anonymizing data, and notifying users – especially minors. Weigh public benefit vs. potential harm. While scraping offers research opportunities, proceed ethically and responsibly.