AI & Machine Learning

How I Sniffed Xiaohongshu's Collection API in 90 Seconds - and Why CORS Made Me Rewrite the Whole Approach

1m & 2 s read172 views0 listens

This post details a practical approach to scraping data from a platform with a hostile API, specifically focusing on extracting user posts related to AI technologies from Xiaohongshu. The author faced several challenges and developed an efficient solution using Playwright for browser automation and response listening. Here are the key takeaways:

Challenges Faced

Cookie vs Login State: Initially, it was unclear whether a cookie represented login state or not.
Endpoint Discovery: Multiple endpoint guesses led to 404 errors, indicating that guessing wasn't effective.
Signed Requests: The API required signed requests which were difficult to replicate manually.

Solution Approach

Login Verification:
- Used an identity API call to verify login state instead of relying on cookies.
Browser Sniffing:
- Instead of guessing endpoints, the author used browser automation (Playwright) to observe actual network requests made by the browser when interacting with the platform.
Response Listener:
- Listened for specific API responses (collect/page) triggered by user actions like scrolling.
- Captured paginated JSON data without needing to replicate complex signing schemes.
Idle Detection:

Read the full article at DEV Community

Want to create content about this topic? Use Nemati AI tools to generate articles, social posts, and more.

172

x402 in production: pay-per-call APIs for autonomous AI agents

x711.io offers a unified API endpoint that addresses common issues faced by AI agent developers, including multiple API keys, lack of shared memory, and restrictive free trials. This service provides tools like web search, data retrieval, and more th...

Ali Nemati

Real Estate & Home14 hours ago29 sec read

From automation to intelligence: Why enterprise AI mortgage operations are reshaping the industry

The mortgage industry is shifting from individual AI point solutions to an enterprise-wide "intelligence layer" that integrates with existing systems, operationalizes knowledge, and transforms loan processes. This approach aims to address rising cost...

Ali Nemati

CybersecurityJun 2332 sec read

Agent Tesla Malware Analysis: How This .NET RAT Steals Your Data

Agent Tesla, a .NET-based Remote Access Trojan (RAT) sold as Malware-as-a-Service since 2014, actively targets Windows endpoints to steal sensitive information. This malware uses API calls for global keystroke logging and screenshot capture, maintain...

Ali Nemati

CybersecurityJun 111m read

Making A SQLi Lab Is Not Difficult, Build One With Me.

The article provides a detailed guide on how to create a SQL Injection (SQLi) lab for educational purposes. It walks through the process of setting up a vulnerable web application that can be exploited using SQL injection techniques. Here's a summary...

Ali Nemati

Marketing & SEOJun 1032 sec read

Schema.org now shows you how many sites are using each schema type

Schema.org has launched a new monthly dataset providing aggregate usage statistics for structured data terms across the public web, grouped by domain-level popularity buckets. This transparency allows developers and SEO engineers to analyze meaningfu...

Ali Nemati

How I Sniffed Xiaohongshu's Collection API in 90 Seconds - and Why CORS Made Me Rewrite the Whole Approach

Challenges Faced

Solution Approach

Idle Detection:

Related Articles

x402 in production: pay-per-call APIs for autonomous AI agents

From automation to intelligence: Why enterprise AI mortgage operations are reshaping the industry

Agent Tesla Malware Analysis: How This .NET RAT Steals Your Data

Making A SQLi Lab Is Not Difficult, Build One With Me.

Schema.org now shows you how many sites are using each schema type