This document discusses using NoSQL databases like MongoDB for real estate data applications. Real estate data comes from a variety of sources like MLS listings, public records, and third parties, and has both structured and unstructured components. MongoDB allows for flexible storage of this heterogeneous real estate data and provides horizontal scalability and geo queries needed. The company Zulloo uses MongoDB and Meteor for their real estate data platform to provide common data storage and access for both web and machine learning teams. They plan to explore using graph databases and GPU databases going forward.
4. What is NoSQL?
NoSQL encompasses a wide variety of different database technologies that were developed in response to the demands presented in
building modern applications:
● Developers are working with applications that create massive volumes of new, rapidly changing data types — structured, semi-
structured, unstructured and polymorphic data.
● Long gone is the twelve-to-eighteen month waterfall development cycle. Now small teams work in agile sprints, iterating quickly
and pushing code every week or two, some even multiple times every day.
● Applications that once served a finite audience are now delivered as services that must be always-on, accessible from many
different devices and scaled globally to millions of users
● Organizations are now turning to scale-out architectures using open source software, commodity servers and cloud computing
instead of large monolithic servers and storage infrastructure.
5. Business Application
● Personalization
● Profile Management
● Real-Time Big Data
● Content Management
● Catalog
● Customer 360° View
● Mobile Applications
● Internet of Things
● Digital Communications
● Fraud Detection
7. Real Estate Data
● Multiple Listing Service (MLS)
○ History of sales and current properties on the market
○ Property descriptions
○ Structured data (literally just a big spreadsheet)
○ However, fields change over time
○ Different locations have different MLS provider, thus, different format
● Public Records
○ A lot of missing data
○ A huge variety of data (demographics, crime rates, etc.)
● 3rd party providers
○ School reviews
○ Proximity to POIs
○ Same as public records but cleaned up
8. Real Estate Data
● Multiple Listing Service (MLS)
○ History of sales and current properties on the market
○ Property descriptions
○ Structured data (literally just a big spreadsheet)
○ However, fields change over time
○ Different locations have different MLS provider, thus, different format
● Public Records
○ A lot of missing data
○ Total havok in what can you get (demographics, crime rates, etc.)
● 3rd party providers
○ School reviews
○ Proximity to POIs
○ Same as public records but cleaned up
heterogeneous
9. AVMs
● Automated valuation model (AVM) is a service that can provide real estate
property valuations using mathematical modelling combined with a
database
● Typical AVM uses hedonic regression, means property value is
decomposes
○ number of bedrooms, bathrooms
○ size of lot
○ distance to the city center, schools, etc.
○ etc.
10. Zulloo Approach
● Real Estate data:
○ Data is combination of structured and
unstructured
○ Missing features
○ Geo-specific
● Our goals
○ Common storage for web-development
and data science teams
○ Horizontal scalability
○ Geo queries
3rd party
provider
s
MLS
Public
records
11. Things to consider
● As a startup
○ Availability of free support
○ Roadmap
● Speed considerations
○ Comparison PostgreSQL vs MongoDB
12. Zulloo Approach
● Real Estate data:
○ Data is combination of structured and
unstructured
○ Missing features
○ Geo-specific
● Our goals
○ Common storage for web-development
and data science teams
○ Horizontal scalability
○ Geo queries
3rd party
provider
s
MLS
Public
records
15. Next big thing
● Graphical databases
○ RE data is just a huge graph
anyway
● GPU databases
○ Speed(!!!) but only SQL so far :(
● ApolloStack/GraphQL/etc
○ Clean API between backends and
frontends
○ Less time spent digging API
documentation