Sign Up
Already have an account?Log In
By clicking "Sign Up" you agree to our terms of service and privacy policy
- Username should be more than 3 characters.
- Username cannot start with numeric character.
- Username characters must be from {a-z,0-9}, special characters are not allowed.
- Make sure the Email is working to receive verification code & password reset link.
- Password should be more than 6 characters.
Forgot Password
Amazon has a secret workaround to scrape Microsoft's GitHub for AI model training data, leaked memo shows
To create powerful AI models, you need mountains of good data. Amazon is going to great lengths to collect this type of valuable information. The company recently told employees to sign up for Microsoft's GitHub software development platform and share their accounts so Amazon can scrape data from GitHub more quickly, Business Insider has learned. This is a key step in Amazon's efforts to train its upcoming in-house AI model. In an internal memo shared with employees last month, Amazon's Artificial General Intelligence Group wrote that it needs "quantitative and qualitative metadata from GitHub" for AI training purposes. But there's a problem. A single GitHub account can only make 5,000 data-collection requests per hour. There are more than 150 million public data repositories on GitHub, so these account limitations mean scraping all this information would take too long, according to the memo.
Share
Copied