AI & Machine Learning

Why finding a GitHub user's email is harder than you'd think

1m & 1 s read139 views0 listens

The process described in the article is an intricate method for resolving GitHub usernames to real-world email addresses. This involves several steps:

Identify Relevant Data: Determine which parts of the vast GitHub activity logs (stored as monthly archives) are likely to contain data relevant to a specific user.
Data Ingestion and Processing:
- Stream and decompress large volumes of JSON data from these archives.
- Maintain a comprehensive database of possible email local parts (the part before '@') based on SHA-1 hashes, which helps in reversing obfuscated emails found within the logs.
Identity Correlation:
- Create an index that correlates GitHub usernames with author names across all historical data to establish consistent identity patterns.
Deduplication and Validation:
- Ensure that each resolved email address is unique and not already associated with another user.
- Verify the validity of the email addresses by attempting to send a test message (SMTP validation).
API Endpoint:
- Provide an API endpoint (https://peopledb.io/api/v1/people?github_login=octocat) that returns all relevant information about a GitHub username, including linked LinkedIn profiles and email

Read the full article at DEV Community

Want to create content about this topic? Use Nemati AI tools to generate articles, social posts, and more.

139

Protect Your Privacy with California's DROP Tool

California residents can now use the DROP tool to request data deletion and opt-out of information sale from all registered data brokers with a single request. This simplifies the process for exercising privacy rights, reducing the risk of spam, scam...

Ali Nemati

AI & Machine LearningMay 426 sec read

NO no Account is not just a database , whose history is traced and logged .

Solana’s account model differs from traditional web2 databases by ensuring public read access and requiring cryptographic authorization for writes, making it an open ledger where anyone can view details without permissions. This approach solves issue...

Ali Nemati

AI & Machine LearningMay 324 sec read

Mainframe modernization is no longer optional for the AI-driven enterprise

Mainframe modernization is essential for enterprises to remain competitive in the AI era, as mainframes house critical data necessary for innovation. Cultural shifts and initiatives like the Open Mainframe Project and Zowe are crucial for breaking do...

Ali Nemati

Tech & GadgetsMay 225 sec read

Show HN: Piruetas - A self-hosted diary app I built for my girlfriend

Piruetas, a self-hosted diary app with features like rich text editing and drag-and-drop image uploads, is now available for personal or multi-user use. This tool matters to developers and tech professionals as it offers a customizable alternative to...

Ali Nemati

Legal & PolicyApr 301m & 7 s read

Open Records Laws Reveal ALPRs' Sprawling Surveillance. Now States Want to Block What the Public Sees.

Automated Summary: Several states are proposing or have enacted laws that exempt Automatic License Plate Reader (ALPR) information from public records laws, which could diminish meaningful oversight over these controversial technologies. Bills in Con...

Ali Nemati

Why finding a GitHub user's email is harder than you'd think

Related Articles

Protect Your Privacy with California's DROP Tool

NO no Account is not just a database , whose history is traced and logged .

Mainframe modernization is no longer optional for the AI-driven enterprise

Show HN: Piruetas - A self-hosted diary app I built for my girlfriend

Open Records Laws Reveal ALPRs' Sprawling Surveillance. Now States Want to Block What the Public Sees.