A statistical analysis of the quality of the data on the
Ellis Island website

By Edward L. Rosenbaum, © 2001

People have posted many a message to the JewishGen newsgroup, asking why they have had no success in finding their ancestors in the Ellis Island Database. Regardless of the tools used some people seem to have just swam from Europe. To try and determine how hard the data is to search, I examined a large number of records that contained the ethnicity 'Hebrew'. I looked at the data using the spellings from the Ellis Island Database, and at the Daitch-Mokotoff soundex code equivalents.

What I discovered (and should be of no surprise to those to have spent many hours searching for long lost ancestors) is that there is no consistency in the spelling of names and places. The spellings are VERY inconsistent from record to record, and the data entry process introduced additional errors. Searching by the spelling you know for the surname and/or town may not find everyone.

The moral of this story is that you need to keep looking for your ancestors, and be very creative in the spellings that you try. Sometimes a narrow search is best, and sometimes a very broad search is best. It all depends.

Statistical findings for Surnames

Of the records examined, 23.14% of the surnames were unique.

Of these unique surnames,

The most common surnames are "Katz", "Lewin", and "Goldberg".

Some names were totally unreadable when entered into the Ellis Island Database, and were therefore entered with … (three dots). 1.28% of the unique surnames had … (three dots) somewhere in the word.

Statistical findings for Daitch-Mokotoff soundex codes of the surnames

When examining the surnames by Daitch-Mokotoff soundex, I found that

Statistical findings for Residences

Of the records examined, 24.98% of the residences were unique.

Of these unique residences,

The most common towns were "London, England", "London", and "Russia".

Some towns were totally unreadable when entered into the Ellis Island Database, and were therefore entered with … (three dots). 0.49% of the residences only were … (three dots), and 0.66% of the unique residences had … (three dots) somewhere in the word.

Statistical findings for Daitch-Mokotoff soundex codes of the residences

When examining the residences by Daitch-Mokotoff soundex, I found that

Tools for searching the Ellis Island Database

For more information, see the JewishGen FAQ