Evolution of unisex firstnames in the US

 

Data

https://www.ssa.gov/oact/babynames/limits.html

The data comes from "The United States Social Security Administration" (SSA ) that gather each year, the firstnames of all the babies born in the U.S. The data officialy starts in 1880 but numbers reported are highly under-estimated for this period. For example, in 1900 there has been 2.7 millions live births reported but our "firstname dataset" only contains results for 590 thousands births.

Starting from 1920, the dataset seems more compliant with the real number of births, so further analysis will start after 1920. It should also be noted that rare names (given less than 5 times a year) does not appears in the dataset - for confidentiality reason. That's why total in the dataset and total living births will never totally match.

Analysis

Multiple analysis can be made with this kind of data but we will focus on unisex names, ie names that can are commonly given to boys or girls in equivalent proportions.

Data transformation is simple, for each year and each firstname we will cout the number of time it has been given for boys and for girls. We can then compute the ratio boys/girls for this firstname this year. 

Most firstnames are given mostly to one gender but few are unisex and even fewer exhibits trend variation across time. Only firstnames with at least a total of 30K boys and 30K girls were kept.

 How to read the graph :

There is a value for each year indicating the total number of babies with this firstname this given year. The vertical position of this value depends on the ratio boys/girls with this name. The higher in the graph (and the more in the blue region) the biggest the boys-to-girls ratio. It could go way above 10 times more boys than girls. On the other hand, the further in the pink area, the bigger the ratio favors the girl-version of the first name.

Each ratio is also linked to a confidence inteval (the orange vertical box) that show where the true ratio should probably be (in a year with only few hundreds babie with this name, the ratio can be heavily influnced by sampling error).

The following example is for Jaime. From the 40's to 1975 there was between 100 and 1000 babies that where given that name each year with a ratio from 4 to 10 times more boys than girls. For example in 1974, there were 1092 boys named Jaime and 259 girls (hence a ratio of around 4)

But then, quickly in 1976, the number of Jaime exploded and the ratio completely changed; 9240 babies were named Jaime this year and 84% of them were girls, quite a turnaround !














Aucun commentaire:

Enregistrer un commentaire