TikTok is a popular social media app that allows users to create and share short videos. Since launching in 2016, TikTok has grown to over 1 billion monthly active users worldwide (source). TikTok’s massive user base and algorithmic video recommendations make it an enticing platform for brands and marketers.
Web scraping involves using bots or scripts to automatically extract data from websites. Scraping tools can copy text, images, videos, and other website content. Brands may want to scrape TikTok data to analyze trends, track competitors, or gain insights about TikTok’s users and platform.
However, scraping raises legal and ethical questions. TikTok’s terms prohibit unauthorized automation and data collection. Violating terms can result in litigation or bans. Laws like copyright and computer fraud statutes may also apply. This article will analyze the legality of scraping TikTok data.
What is Web Scraping?
Web scraping refers to the automated extraction of data from websites using software tools such as Python scripts or bots. According to ParseHub, web scraping involves collecting publically available information from websites and exporting it into a format that is more useful for analysis.
As explained by GeeksforGeeks, web scrapers can extract data from HTML pages, API endpoints, JavaScript files, images, and other online sources. The scraped data is often unstructured and must be cleaned and organized programmatically. Web scrapers follow links to crawl websites in an automated fashion to extract large volumes of data.
TikTok’s Terms of Service
TikTok’s Terms of Service contain several relevant sections regarding scraping data from the platform:
Section 2 outlines that users must comply with TikTok’s API Terms of Service when accessing TikTok’s APIs or automated tools to crawl, scrape, or collect data from the platform. This implies that scraping data without using the formal APIs may violate the terms.
Section 4 prohibits reverse engineering, decompiling, or disassembling the app/API. This could apply to scraping techniques that rely on reverse engineering.
Section 5 bars activities like data mining, data scraping, data extraction, and data harvesting from the platform. This clearly prohibits most forms of web scraping TikTok data.
However, TikTok’s position on scraping public data that doesn’t require logging in is somewhat unclear. The terms focus on protecting TikTok’s exclusive rights and technical measures, implying that public data may be handled differently.
Overall, scraping private user data or circumventing TikTok’s technical barriers is likely a violation of the terms. Scraping public data appears to be in more of a legal gray area according to the terms.
Copyright Law
Copyright law protects original works of authorship and gives the copyright holder exclusive rights to reproduce, distribute, publicly display, and create derivative works from their original content. This applies to websites and their content. Scraping a website without permission could be considered copyright infringement if you reproduce or distribute substantial portions of the site’s content (Court of Justice of the European Union v. Meltwater, 2013).
However, copyright law generally allows for “fair use” of copyrighted material for purposes like criticism, commentary, news reporting, teaching, scholarship, or research. Courts use a four-factor test to evaluate if scraping qualifies as fair use, considering the purpose of the use, nature of the work, amount copied, and commercial impact on the original work (Authors Guild v. Google, 2015). Scraping modest amounts of data for noncommercial research purposes may be considered fair use, while large-scale scraping that harms the commercial viability of the original site is less likely to qualify.
To avoid copyright issues, it’s best to only scrape data you have permission to use, scrape minimal portions of content, or use scraping for “transformative” purposes that add new meaning and don’t merely replicate the original work.
For more details see:
Is Scraping E-Commerce Websites Legal? A …
Computer Fraud and Abuse Act
The Computer Fraud and Abuse Act (CFAA) is a federal law that prohibits unauthorized access to computers and networks. Some companies have tried to use the CFAA to sue scrapers for unauthorized access to their websites. However, recent court rulings have made it difficult to successfully bring CFAA claims against web scrapers.
In hiQ v. LinkedIn, the Ninth Circuit ruled that scraping public profiles on LinkedIn did not violate the CFAA because the information was not protected by any authentication system. As long as the data is public and no circumvention of technical barriers is required, web scraping likely does not violate the CFAA according to this precedent.
The Supreme Court’s 2021 Van Buren v. United States decision also narrowed the scope of the CFAA by focusing on unauthorized access rather than use restrictions. This makes it more difficult for companies to claim that violating Terms of Service constitutes a CFAA violation.
Based on these rulings, web scraping public data is unlikely to violate the CFAA as long as no hacking or circumvention of access controls is involved. However, scraping non-public data or protected information may still risk CFAA liability.
Recent Legal Cases
There have been several high-profile legal cases involving web scraping that help establish precedent on its legality. One of the most notable is LinkedIn vs. hiQ. In 2017, LinkedIn sent hiQ a cease-and-desist letter demanding that they stop scraping LinkedIn user data. HiQ filed for a preliminary injunction against LinkedIn to prevent them from blocking hiQ’s access.
The district court ruled in favor of hiQ, stating that the Computer Fraud and Abuse Act (CFAA) does not prohibit accessing publicly viewable data on websites. The court acknowledged that hiQ’s scraping could hypothetically cause harm to LinkedIn, but there was no evidence it had done so. On appeal, the Ninth Circuit Court affirmed the lower court’s decision, establishing that scraping publicly accessible data is permissible under the CFAA.
This landmark ruling set a precedent that scraping public profiles does not violate the CFAA, even if it violates the target website’s terms of service. As long as the data being scraped is public and permissionless, web scraping does not constitute “hacking” under the CFAA. The LinkedIn vs. hiQ case was a big win for proponents of web scraping.
Ethical Considerations
When web scraping any site, it’s important to consider the ethics and potential privacy implications of collecting user data without consent. According to one source, some best practices for ethical web scraping include:
- Use a public API when available, and avoid scraping altogether if the data is accessible through the API (Source)
- Be transparent by identifying yourself properly in HTTP request headers (Source)
- Scrape gently on smaller sites to avoid overloading servers (Source)
- When in doubt about what’s allowed, directly ask the website owner for permission (Source)
Overall, web scrapers should be mindful of a site’s terms of service, limitations, and content ownership. Striking a balance between gathering data and respecting websites will lead to the most ethical practices.
Scraping Best Practices
When scraping data from sites like TikTok, it’s important to follow responsible scraping practices. According to Responsible Web Scraping: An Ethical Way of Data Gathering, scrapers should:
- Respect robots.txt files and site terms of service
- Limit request frequency to avoid overloading servers
- Scrape data sparingly and only what is needed
- Avoid republishing large copied portions of content
- Use scraped data responsibly and protect user privacy
- Be transparent about scraping activities when required
As discussed in Best Practices for Web Crawling and Scraping, it’s important for scrapers to be ethical and minimize their impact on websites. Responsible scraping involves respect, moderation and transparency.
Alternatives to Scraping
While scraping can provide access to large amounts of online data, there are legal alternatives that should be considered first before attempting to scrape websites without permission (https://hackernoon.com/alternatives-to-web-scraping-with-python). Here are some options:
Use official APIs – Many major platforms like Twitter, YouTube, Reddit, etc. offer official APIs that allow access to certain types of public data in a structured format. These APIs often have rate limits but provide a legal means of collecting data.
Licensed data – Some companies like Gnip and DataSift offer licensed streams of data from sites like Twitter and Facebook. While this isn’t free, it provides legal access to large datasets. There are also platforms like Quandl that aggregate publicly available data.
Public datasets – Many governments, companies, and other organizations publicly release datasets that can be legally used. Sources like Kaggle, Amazon Web Services Public Datasets, and Data.gov are places to find curated public datasets.
FOIA requests – The Freedom of Information Act (FOIA) can be used to request data from branches of the US federal government. While a somewhat slow process, FOIA requests can unlock data that is otherwise hard to access.
Conclusion
In summary, web scraping TikTok data can be legally risky and in violation of TikTok’s Terms of Service. While the courts have not definitively ruled whether scraping is allowed under copyright law, TikTok could pursue legal action under the Computer Fraud and Abuse Act and assert that scraping violates their terms. The safest approach is to avoid scraping TikTok altogether or obtain express permission first. If you do decide to scrape, make sure to follow ethical practices, limit the scope and frequency, and properly credit TikTok.
The legality of web scraping is still a gray area evolving with new legislation and court rulings. However, TikTok has shown a willingness to aggressively protect its platform from scraping. Proceed with caution, consult qualified legal counsel, and stay up-to-date on the latest developments around TikTok data scraping legality.