This project shows methods of mapping surname data from 1940 in Nova Scotia, based on voter lists. The names were mapped in ArcMap to create a hard copy map and ArcGIS Online application, while a CSV database was loaded onto CartoDB to use with a Leaflet template, allowing users to query surnames and see their distribution. The idea was inspired by an interactive map of Irish surname data. The data for this project was transcribed from Ancestry.ca.
2. TOPICS
Historical Context of Nova Scotia (where, who, etc)
Data Preparation (Ancestry Microfilms, CSV organization)
Using CartoDB to store Surname Data in CSV format
Using a Leaflet template to query and present data for
individual surnames
Using ArcMap to present cartographic data
Prepare data within ArcMap to use with AGOL and Poster
3. OVERVIEW
This projected consisted of genealogical research to
create a database for three Nova Scotia counties.
The goal was to help show a geographic output in
regards to the distribution of surnames and their
frequencies within particular communities
This will be beneficial to those who are interested in
finding possible home communities for their ancestors
(it is possible that students great-grandparents are in
this database)
May benefit historical groups looking to analyze
temporal change in distribution of common surnames.
4. HISTORICAL CONTEXT
Historical groups and trends affecting this study area:
English (New England Planters, Yorkshiremen in Cumberland
Co., Loyalists, etc.)
Scottish (Highlanders mostly within Pictou Co.): 1700’s –
1800’s
Ulster-Scots (Colchester Co.), After 1760.
French-Swiss Huguenots (Northumberland Shore within all
three counties). Via Lunenburg, 1770’s.
5. DATA PREP - USING VOTER LISTS
Only three counties could be processed due to time restrictions.
Census lists are not legally attainable after 1921.
Voter lists are readily available from 1935 onwards on Ancestry.ca
Provided in microfilm format using Optical Character Recognition
(OCR)technology, which is indexed for users to query data.
The data can often be incorrect:
Surnames with Mac as a prefix are often incorrectly indexed.
Incorrect identification of letters such as c and e, s and z, etc. (Dclancy vs. Delaney,
Mackensic vs. MacKenzie, Eraser vs. Fraser, with many other examples).
Several missing counties.
Limitations: No record of Natives, no voters under 21.
6. CREATING A DATABASE
An xlsx was used originally, which was converted to a csv so
that it could be uploaded to CartoDB
An excel formula was used to return the last name of each
individual
=IF(ISERROR(FIND(" ",A2)),A2,RIGHT(A2,LEN(A2) -
FIND("~",SUBSTITUTE(A2," ","~",LEN(A2)-LEN(SUBSTITUTE(A2," ",""))))))
Problematic names: names with Jr, two words (ex.: Van
Buskirk).
Finalized fields: PERSONID, Raw_Name (which was later
excluded to conserve space), LASTNAME, LOCATION, COUNTY,
LATITUDE, LONGITUDE.
7. HOW DOES THE VOTER LIST DATA APPEAR?
• The data is incorrectly transcribed as Miss Oeorgle Alkens (Miss George Aikens
is correct).
8. CARTODB
CartoDB is an online mapping service, which allows users to
upload a CSV, Esri shapefile, among other files.
Users can store 5 MB of data, use 5 unique tables, and have
up to 10,000 map views per month for free.
Users can manipulate the map by using SQL queries to display
desired data.
CartoCSS language is editable to help style data, while users
can modify an info window, and use a visualization wizard to
alter the display properties.
Problem: A slight cost is necessary for larger datasets, which
will likely be encountered with province-wide genealogical data.
10. LEAFLET AND CARTODB
Open-source JavaScript library, which is very lightweight (only about 33 KB).
Can be used for several mapping sources, including Esri and Google.
Utilizes HTML5 and CSS3 codes.
For this project, a name search is used to select any record with the
specified surname, along with a dropdown to filter the results by county.
A JavaScript function was added, which queried the CSV in CartoDB based
on locations within a specific county.
The spiderfy option allows users to click on a location at a suitable level,
with all of the symbols spreading outward from the central point.
Note: A thank-you to Ed Symons for helping demonstrate a leaflet template.
11. LEAFLET AND CARTODB
Fraser was the 2nd most commonly occurring name within the three
counties in 1940, with over 1200 records. However, Fraser was the
most commonly occurring name in Pictou County (over 1000).
13. PROCESSING THE DATA IN ARCMAP: LOCATION
The original xlsx was used.
A Locations feature class was created by referencing two
feature classes from the nscccods2 SDE geodatabase: places
and roads. This feature class was used largely for gathering
coordinates to use for the CartoDB CSV.
Points were placed near the community or between two
communities if the electoral district consisted of both
communities.
Estimations were made, based on looking at addresses for
individuals, if a community could not be found.
14. PROCESSING THE DATA IN ARCMAP: PART 2
As mentioned, the Locations shapefile was best used as a
reference for coordinates.
The spreadsheet was spatially represented in ArcMap by
displaying XY fields as Latitude and Longitude, and was saved
as a feature class.
The frequency tool was performed to help show the frequency
of every surname within each community, leaving 15,000
individual records for the 135 communities.
The surnames were ordered by both frequency and community,
meaning that the top five values could easily be selected and
exported into a new feature class.
16. USING ET GEO WIZARDS TO STYLIZE SURNAME
DATA
The new feature class would have displayed all five surnames per location
for one point.
The Build Thiessen Polygon Surface was used to develop a polygon that
contained the community/location coordinate.
Helped determine placement of the random points and allowed the
cartographer to refer to a spatial region that a community had influence
over.
Random Points in Polygons tool could be used to generate five points
per polygon, creating a total of 675 new points. The ID field had to be
multiplied by 10 to create 5 points for each.
Both fields were exported and converted to a csv with FME Quick Translator
to reorder by location and be assigned a uniqueID.
These tables were brought back into ArcMap, and joined based on the
matching ObjectIDs.
17. HOW DOES THIS LOOK?
Result of Thiessen Polygon and Random Point Tools. The communities are
symbolized with the larger symbols, while the random points are blue.
18. LABELING THE FEATURES
Three new label classes were created to indicate the frequency
of surnames within a community:
Greater than than 50 within a community, 20 to 50 within a
community, and less than 20 within a community.
Proportional labelling symbols were applied.
An effort was made to only have labels on-land and keep them
a reasonable distance from county boundaries.
Labeling was difficult for some coastal communities, as the
font was represented as being larger than the geographical
area.
19. POINTS EDITED FOR LABELING
In this example, the labeling is very basic and has no settings applied.
The labels have been moved to enhance visibility, but more work will be needed.
The points largely follow the Thiessen polygons.
21. EXCEPTION…
Due to the high population density, large number of communities,
and very common surnames, the Thiessen Polygon strategy does not
work well for Central Pictou Co.
22. ANNOTATIONS
Annotations allow the data to be easily
placed in any location. Users can move
annotations in an edit session.
For this dataset, there may be many
unplaced labels. These labels can still
be drawn and edited in the same style
as the other annotations.
Individual annotations can be edited to
appear however the user would like
them to appear.
Label classes can also have their
original settings overridden if, for some
reason, the user wants to change
them.
Overlapping labels can be used with
annotations.
23. OTHER CARTOGRAPHIC ADDITIONS
DEM
Shapefile with dissolved counties to symbolize
coastline (copied layer 3 times, different outline
widths).
Community names
Labeled counties, water bodies
25. USING AGOL TO SHOW SURNAME FREQUENCIES
The inspiration for this map was Mapping the Emerald Isle: a
geo-genealogy of Irish Surnames found here
http://storymaps.esri.com/stories/ireland/
ArcGIS Online can be used to display label text, but must be
converted to a shapefile first.
The data can be loaded into and used within the Find, Edit,
Filter downloadable application, but this application has a more
limited functionality than the Irish Surname story map.
26. STEPS NEEDED TO CREATE AGOL-FRIENDLY
LABELS: 1
Labels converted to annotations, containing limited information.
The annotations and the original top five surname feature class were joined
to link the feature ID of the annotation with the original object ID of a
feature.
The Feature Outline Masks tool was used to create unique features.
Coordinate system: WGS 1984 Web Mercator Auxiliary Sphere. This will
allow the features to draw correctly.
The margin around the feature must be zero.
Mask kind is set to exact; the mask will only include the annotation.
A mask was only created for placed features; this is useful for lower-
scale zoom levels.
All features were transferred, which will allow the user to rejoin the
original feature class to these new features based on the Object ID and
the Feature ID.
27. STEPS NEEDED TO CREATE AGOL-FRIENDLY
LABELS: 2
The Top 5 Surnames feature class was joined to the annotations, using the
FeatureID (annotation) and the ObjectID (Top5Names).
The Top 5 Surnames feature was joined to the output of the Feature Outline
Masks tool (using the FeatureID and ObjectID, again).
By doing so, features in AGOL will have meaningful attributes, such as
community, county, and frequency.
Each feature will need to be created four times, which will account for four
different zoom levels (based on the ArcGIS - Google Maps – Bing online
zoom levels): 1:144,448, 1:288,895, 1:577,791, and 1:1,155,581.
Minimum and Maximum zoom levels need to be set at each step.
This map shows how the features look when two feature
classes are not controlled by zoom settings.
28. PUBLISHING TO AGOL
Create a new document with only the four shapefiles that will
be used on AGOL.
Publish as a feature service.
Turn off tiled mapping.
Mark as exception: Layer does not have a feature template set,
and Map is being published with data copied to the server
using data frame full extent. These will not affect the output.
29. USING A CONFIGURABLE APP FROM AGOL
The Find, Filter, Edit application from Esri allows
users to easily query and select data.
This application can be downloaded and configured
to suit the user’s needs.
The web map created from the Surname feature
service was used.
The filter and edit features were not useful for this
assignment (changing the data is not desirable).
The 144K layer was used.
The LASTNAME field was used in the find field
section.
The result fields are the LASTNAME, Location,
Frequency, and County.
A zoom level of 13 was applied for selected features.
30. AGOL APPLICATION IN ACTION
https://cogsnscc.maps.arcgis.com/apps/Solutions/s2.html?appid=6
11860d390a7423a900f09e7036cb06e
31. CONCLUSIONS AND LIMITATIONS
Useful for family identification
There are several possibilities for this data, as long as
there is a geography
Programming knowledge may come in handy to further
the research for this project
Ancestry: Missing counties, time-consuming, incorrect
transcriptions
Possible add-in: ethnic origin symbolization (would
require extensive research for each name)
Leaflet/CartoDB: cost-effective options for genealogy
groups who cannot afford ArcGIS