[ale] 100 million Facebook pages leaked on torrent site

Jim Lynch ale_nospam at fayettedigital.com
Sun Aug 1 12:47:09 EDT 2010


On 08/01/2010 11:15 AM, Michael B. Trausch wrote:
> On Fri, 2010-07-30 at 11:55 -0400, Jim Philips wrote:
>    
>> I saw a report today that major corporations are already downloading
>> the file through BitTorrent. A free goldmine of information for them!
>>      
> I have already downloaded it myself, just to take a look at what's
> actually in the whole thing.
>
> There is a *lot* of data, mostly names, but also URLs to profile pages
> for each of those names.  It's about 17GB worth of data, enough to burn
> to a BD-R for storage.  It's not indexed, just plain-text, along with
> counts for various names which could be used to determine popularity, as
> an example.
>
> I can see some of this data taking the place of 1930 Census Data in
> terms of storage of proper names, such that businesses that use the aid
> of data to parse free-form documents would benefit.
>
> Here are the ten most listed first names (with frequency of occurrence):
>
>   977014 michael
>   963693 john
>   924816 david
>   819879 chris
>   640957 mike
>   602088 james
>   584438 mark
>   515686 jason
>   503658 robert
>   484403 jessica
>
> And the ten most listed last names (also with frequency of occurrence):
>
>   913465 smith
>   571819 johnson
>   512312 jones
>   503266 williams
>   471390 brown
>   386764 lee
>   360010 khan
>   355639 singh
>   343220 kumar
>   324972 miller
>
> I guess "Michael Smith" would be the most generic name possible if you
> look at those numbers. :-)
>
>    
Hm, "*Stranger in a Strange Land 
<http://en.wikipedia.org/wiki/Stranger_in_a_Strange_Land>*" comes to mind.



More information about the Ale mailing list