SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
SQL Server – How to handle NULL data
Duncan Greaves MSc MCSE
Postgraduate Researcher
The Problem with NULL
 There is a problem with NULL that has persisted since the Relational
Model was proposed in the 1970’s.
 “The simple scientific fact is that an SQL table that contains a null isn’t a relation; thus,
relational theory doesn’t apply, and all bets are off. ” C.Date (2014).
 The presence of NULLs in a database ‘breaks’ the relational model of
Boolean expressions on which SQL databases rely.
 In ‘real world’ applications of data structures NULLs are often unavoidable.
 It confuses users, and designers and DBA’s hate it.
 Users need to be aware of the design and query compromises they need
to use.
Three Valued Logic
 The SQL language is based on Relational Logic.
 Adding NULL Values to a database breaks the TRUE/ FALSE relations implicit in the
model and leads to ‘TRUE’, ‘FALSE’ and ‘UNKNOWN’
 At best this leads to increased complexity by having to use horizontally decomposed
WHERE clauses, workaround syntax and inference.
 At worst leads to incomplete information, returned error codes, interoperability
problems, interpretation problems.
 Messes up Reporting, ETL, Business Intelligence and Data Science initiatives.
 Reduces confidence in results. Applies to ALL systems, not just databases.
Domain Knowledge
 SQL Databases are modelled as domains, and as such the designer needs to
be able to define what the domain encompasses by defining the boundaries,
identifying components and relationships.
 Almost by definition the designer will have incomplete information about the
information that is relevant, especially when implementing new systems.
 NULL is stored as a flag, therefore is not part of any particular domain or type
and in making assumptions about NULLs is where query complexity is
introduced.
Null is not a value, it is not zero, it is unknown
NULL Data Scenarios
 Existence -Attribute does not exist in the domain, or domain understanding is wrong. E.g eye
colour for a car.
 Missing – The information has not been given at the time a row was created. E.g. A customer my
decline to give their age.
 Not Yet – Data is contingent upon an unknown event in the future, E.g. Termination date or Date
of death.
 Does not apply- Is not applicable for this instance of a record . E.g. Hair colour for bald people,
Number of pregnancies for male patients.
 Placeholders – Indicates that we know that a bit of data exists, but we don’t know what it is,
useful for CUBE or ROLLUP queries.
Handling NULL in Queries
 NULLIF
 Syntax: NULLIF (expression, expression)
 Returns NULL if both expressions are equal, else returns the first expression.
 ISNULL to check the state of a field
 Syntax: ISNULL (check_expression, replacement_value)
 Returns replacement value that must be implicitly convertible to check expression data type.
 COALESCE to use the first non-null field.
 Syntax: COALESCE( exp1, exp2,…expn)
 Can use multiple input expressions.
 Returns the datatype of the expression with highest precedence.
 Slower than ISNULL.
Handling NULL in WHERE Clauses
 Using Three Value Logic (True,False,Unknown). UNKNOWN is the logical outcome and is
not the same as NULL.
 To compare values we have to use the IS NULL and IS NOT NULL operators in the
WHERE clause, not the = operator.
 IS NULL
 SELECT * From Customers WHERE CustName IS NULL
 IS NOT NULL
 SELECT * From Customers WHERE CustName IS NOT NULL
 Use Horizontal Decomposition to add other conditionals
 SELECT * From Customers WHERE CustName IS NOT NULL OR Custname=‘Bob’
Environment and Aggregate
Settings
 ANSI_NULLS environment setting.
 When creating or altering stored procedures or User Defined Functions.
 This option specifies the setting for ANSI NULL comparisons. When this is on, any query that
compares a value with a null returns a 0. When off, any query that compares a value with a
null returns a null value.
 Keep at default value of ON.
 “Null value is eliminated by an aggregate or other SET operation”
 Trying to do arithmetic or other operation on fields that contain NULLS.
 May lead to incomplete information returned.
 Use ISNULL or NULLIF to prevent this happening.
Table Design Guidelines
 Design Integrity into your tables.
 Use NOT NULL CHECK() constraints where possible.
 Do not use as a Primary key if there is ANY possibility that value
could be NULL.
 Avoid in FOREIGN KEY relationships
 Consider using de-normalised separate tables to get around this.
 Use default field values where appropriate.
 Bear in mind arithmetic consequences of using 0, -99 as defaults.
App Design Guidelines
Take steps to avoid NULL values from host programs.
 Initialisation of variables
 Use defaults and appropriate auto filling of variable values
 Deduce values
 Track missing data using companion codes
 Determine the impact of missing data
 Validate data and prevent audit difficulties
 Use consistent datatypes and nullability across apps
 NULL is not “NULL”
ETL Guidelines
 Where multiple fields may contain NULL, consider using a check
code field to indicate where records need attention.
 Check the NULL status of each field using ISNULL (Field,0) and
build a count of the number of fields that fail validation.
 Use this as part of the data cleansing process.
 Use in ETL and Scrubbing tables
Master Data Management
Management is about catching the information before it enters the database and
cleaning up what is already there.
 MDS
 Master Data Services is included as part of SQL Server.
 Allows Models, Entities, Attributes, Rules and Versions to be defined and implemented.
 Includes Excel add in. Allows Power users or analysts to define models and rules.
 DQS
 To help ensure domain validity and knowledge driven data quality.
 Good for data correction, enrichment, standardization, and de-duplication.
 Other third party applications available
 Master Data Maestro, etc.
Performance Considerations
Time spent in designing appropriate data quality controls will reduce
the cost of maintaining the database because
 NULL slows down the working of indexes.
 Increases query retrieval times.
 Increase search times.
 Increases SQL code complexity.
 Decreases the confidence in the information gained from the
database.
Data Science
NULLS may provide the catalyst for development of data science to discover why,
or what data or domain knowledge we are lacking.
 Null indicates value is not known or indicates missing or incomplete data.
 May point to missing entities or uncaptured events.
 May skew the results of data tools that disregard NULL values.
 Known knowns, Unknown knowns and Unknown unknowns. Data discovery and
Knowledge begin by examining what it is that is unexplained.
Summary
The presence of NULL values in SQL databases has always happened, but this degrades the
quality of the information that can be obtained from the data source.
NULLs can have an adverse effect on downstream systems, in particular Reporting, BI,
Predictive Analytics or Machine Learning that rely on the integrity of the data.
Reduce the impact on your information by:
 Manage the quality of data going in
 Design tables with integrity constraints
 Design apps to validate the input
 Design queries to ensure correct results are returned
Use NULLs as clues to pick up where domain knowledge is lacking.
Further Reading
https://technet.microsoft.com/en-us/library/ms191504(v=sql.105).aspx
MS handling NULL Values
https://www.simple-talk.com/sql/t-sql-programming/how-to-get-nulls-horribly-wrong-in-sql-server/
greavesd@uni.coventry.ac.uk
@duncan_greaves
InformationWithInsight.com

Más contenido relacionado

La actualidad más candente

SE18_Lec 08_UML Class Diagram
SE18_Lec 08_UML Class DiagramSE18_Lec 08_UML Class Diagram
SE18_Lec 08_UML Class DiagramAmr E. Mohamed
 
ODS Data Sleuth: Tracking Down Calculated Fields in Banner
ODS Data Sleuth: Tracking Down Calculated Fields in BannerODS Data Sleuth: Tracking Down Calculated Fields in Banner
ODS Data Sleuth: Tracking Down Calculated Fields in BannerBryan L. Mack
 
LinkedList vs ArrayList in Java | Edureka
LinkedList vs ArrayList in Java | EdurekaLinkedList vs ArrayList in Java | Edureka
LinkedList vs ArrayList in Java | EdurekaEdureka!
 

La actualidad más candente (12)

Functional modeling
Functional modelingFunctional modeling
Functional modeling
 
06collaboration
06collaboration06collaboration
06collaboration
 
SE18_Lec 08_UML Class Diagram
SE18_Lec 08_UML Class DiagramSE18_Lec 08_UML Class Diagram
SE18_Lec 08_UML Class Diagram
 
Cs8592 ooad unit 1
Cs8592 ooad unit 1Cs8592 ooad unit 1
Cs8592 ooad unit 1
 
UML tutorial
UML tutorialUML tutorial
UML tutorial
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
ODS Data Sleuth: Tracking Down Calculated Fields in Banner
ODS Data Sleuth: Tracking Down Calculated Fields in BannerODS Data Sleuth: Tracking Down Calculated Fields in Banner
ODS Data Sleuth: Tracking Down Calculated Fields in Banner
 
Lazy Indexing
Lazy IndexingLazy Indexing
Lazy Indexing
 
Matlab for marketing people
Matlab for marketing peopleMatlab for marketing people
Matlab for marketing people
 
Database note for 4th semester Notes
Database note for 4th semester Notes Database note for 4th semester Notes
Database note for 4th semester Notes
 
LinkedList vs ArrayList in Java | Edureka
LinkedList vs ArrayList in Java | EdurekaLinkedList vs ArrayList in Java | Edureka
LinkedList vs ArrayList in Java | Edureka
 
Shlaer mellor-method
Shlaer mellor-methodShlaer mellor-method
Shlaer mellor-method
 

Similar a SQL Server - Handle NULL data with queries, design and MDM

DREAM Principles & User Guide 1.0
DREAM Principles & User Guide 1.0DREAM Principles & User Guide 1.0
DREAM Principles & User Guide 1.0Marcus Drost
 
Sql interview question part 12
Sql interview question part 12Sql interview question part 12
Sql interview question part 12kaashiv1
 
Sql interview question part 12
Sql interview question part 12Sql interview question part 12
Sql interview question part 12kaashiv1
 
Module 4_PART1.pptx
Module 4_PART1.pptxModule 4_PART1.pptx
Module 4_PART1.pptxHaso12
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343Edgar Alejandro Villegas
 
Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsWayne Yaddow
 
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Massimo Cenci
 
Natural vs.surrogate keys
Natural vs.surrogate keysNatural vs.surrogate keys
Natural vs.surrogate keysRon Morgan
 
Dqs mds-matching 15042015
Dqs mds-matching 15042015Dqs mds-matching 15042015
Dqs mds-matching 15042015Neil Hambly
 
Tips for Database Performance
Tips for Database PerformanceTips for Database Performance
Tips for Database PerformanceKesavan Munuswamy
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data scienceShilpaKrishna6
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
IRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET Journal
 
01 Persistence And Orm
01 Persistence And Orm01 Persistence And Orm
01 Persistence And OrmRanjan Kumar
 
MODULE 3 -Normalization_1.ppt moduled in design
MODULE 3 -Normalization_1.ppt moduled in designMODULE 3 -Normalization_1.ppt moduled in design
MODULE 3 -Normalization_1.ppt moduled in designHemaSenthil5
 
MODULE 3 -Normalization bwdhwbifnweipfnewknfqekndd_1.ppt
MODULE 3 -Normalization bwdhwbifnweipfnewknfqekndd_1.pptMODULE 3 -Normalization bwdhwbifnweipfnewknfqekndd_1.ppt
MODULE 3 -Normalization bwdhwbifnweipfnewknfqekndd_1.pptHemaSenthil5
 
Database Modeling presentation
Database Modeling  presentationDatabase Modeling  presentation
Database Modeling presentationBhavishya Tyagi
 

Similar a SQL Server - Handle NULL data with queries, design and MDM (20)

DREAM Principles & User Guide 1.0
DREAM Principles & User Guide 1.0DREAM Principles & User Guide 1.0
DREAM Principles & User Guide 1.0
 
Sql interview question part 12
Sql interview question part 12Sql interview question part 12
Sql interview question part 12
 
Ebook12
Ebook12Ebook12
Ebook12
 
Sql interview question part 12
Sql interview question part 12Sql interview question part 12
Sql interview question part 12
 
Module 4_PART1.pptx
Module 4_PART1.pptxModule 4_PART1.pptx
Module 4_PART1.pptx
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
 
Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large Applications
 
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
 
Natural vs.surrogate keys
Natural vs.surrogate keysNatural vs.surrogate keys
Natural vs.surrogate keys
 
Dqs mds-matching 15042015
Dqs mds-matching 15042015Dqs mds-matching 15042015
Dqs mds-matching 15042015
 
Tips for Database Performance
Tips for Database PerformanceTips for Database Performance
Tips for Database Performance
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
IRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET- Probability based Missing Value Imputation Method and its Analysis
 
01 Persistence And Orm
01 Persistence And Orm01 Persistence And Orm
01 Persistence And Orm
 
MODULE 3 -Normalization_1.ppt moduled in design
MODULE 3 -Normalization_1.ppt moduled in designMODULE 3 -Normalization_1.ppt moduled in design
MODULE 3 -Normalization_1.ppt moduled in design
 
MODULE 3 -Normalization bwdhwbifnweipfnewknfqekndd_1.ppt
MODULE 3 -Normalization bwdhwbifnweipfnewknfqekndd_1.pptMODULE 3 -Normalization bwdhwbifnweipfnewknfqekndd_1.ppt
MODULE 3 -Normalization bwdhwbifnweipfnewknfqekndd_1.ppt
 
1234
12341234
1234
 
Database Modeling presentation
Database Modeling  presentationDatabase Modeling  presentation
Database Modeling presentation
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

SQL Server - Handle NULL data with queries, design and MDM

  • 1. SQL Server – How to handle NULL data Duncan Greaves MSc MCSE Postgraduate Researcher
  • 2. The Problem with NULL  There is a problem with NULL that has persisted since the Relational Model was proposed in the 1970’s.  “The simple scientific fact is that an SQL table that contains a null isn’t a relation; thus, relational theory doesn’t apply, and all bets are off. ” C.Date (2014).  The presence of NULLs in a database ‘breaks’ the relational model of Boolean expressions on which SQL databases rely.  In ‘real world’ applications of data structures NULLs are often unavoidable.  It confuses users, and designers and DBA’s hate it.  Users need to be aware of the design and query compromises they need to use.
  • 3. Three Valued Logic  The SQL language is based on Relational Logic.  Adding NULL Values to a database breaks the TRUE/ FALSE relations implicit in the model and leads to ‘TRUE’, ‘FALSE’ and ‘UNKNOWN’  At best this leads to increased complexity by having to use horizontally decomposed WHERE clauses, workaround syntax and inference.  At worst leads to incomplete information, returned error codes, interoperability problems, interpretation problems.  Messes up Reporting, ETL, Business Intelligence and Data Science initiatives.  Reduces confidence in results. Applies to ALL systems, not just databases.
  • 4. Domain Knowledge  SQL Databases are modelled as domains, and as such the designer needs to be able to define what the domain encompasses by defining the boundaries, identifying components and relationships.  Almost by definition the designer will have incomplete information about the information that is relevant, especially when implementing new systems.  NULL is stored as a flag, therefore is not part of any particular domain or type and in making assumptions about NULLs is where query complexity is introduced. Null is not a value, it is not zero, it is unknown
  • 5. NULL Data Scenarios  Existence -Attribute does not exist in the domain, or domain understanding is wrong. E.g eye colour for a car.  Missing – The information has not been given at the time a row was created. E.g. A customer my decline to give their age.  Not Yet – Data is contingent upon an unknown event in the future, E.g. Termination date or Date of death.  Does not apply- Is not applicable for this instance of a record . E.g. Hair colour for bald people, Number of pregnancies for male patients.  Placeholders – Indicates that we know that a bit of data exists, but we don’t know what it is, useful for CUBE or ROLLUP queries.
  • 6. Handling NULL in Queries  NULLIF  Syntax: NULLIF (expression, expression)  Returns NULL if both expressions are equal, else returns the first expression.  ISNULL to check the state of a field  Syntax: ISNULL (check_expression, replacement_value)  Returns replacement value that must be implicitly convertible to check expression data type.  COALESCE to use the first non-null field.  Syntax: COALESCE( exp1, exp2,…expn)  Can use multiple input expressions.  Returns the datatype of the expression with highest precedence.  Slower than ISNULL.
  • 7. Handling NULL in WHERE Clauses  Using Three Value Logic (True,False,Unknown). UNKNOWN is the logical outcome and is not the same as NULL.  To compare values we have to use the IS NULL and IS NOT NULL operators in the WHERE clause, not the = operator.  IS NULL  SELECT * From Customers WHERE CustName IS NULL  IS NOT NULL  SELECT * From Customers WHERE CustName IS NOT NULL  Use Horizontal Decomposition to add other conditionals  SELECT * From Customers WHERE CustName IS NOT NULL OR Custname=‘Bob’
  • 8. Environment and Aggregate Settings  ANSI_NULLS environment setting.  When creating or altering stored procedures or User Defined Functions.  This option specifies the setting for ANSI NULL comparisons. When this is on, any query that compares a value with a null returns a 0. When off, any query that compares a value with a null returns a null value.  Keep at default value of ON.  “Null value is eliminated by an aggregate or other SET operation”  Trying to do arithmetic or other operation on fields that contain NULLS.  May lead to incomplete information returned.  Use ISNULL or NULLIF to prevent this happening.
  • 9. Table Design Guidelines  Design Integrity into your tables.  Use NOT NULL CHECK() constraints where possible.  Do not use as a Primary key if there is ANY possibility that value could be NULL.  Avoid in FOREIGN KEY relationships  Consider using de-normalised separate tables to get around this.  Use default field values where appropriate.  Bear in mind arithmetic consequences of using 0, -99 as defaults.
  • 10. App Design Guidelines Take steps to avoid NULL values from host programs.  Initialisation of variables  Use defaults and appropriate auto filling of variable values  Deduce values  Track missing data using companion codes  Determine the impact of missing data  Validate data and prevent audit difficulties  Use consistent datatypes and nullability across apps  NULL is not “NULL”
  • 11. ETL Guidelines  Where multiple fields may contain NULL, consider using a check code field to indicate where records need attention.  Check the NULL status of each field using ISNULL (Field,0) and build a count of the number of fields that fail validation.  Use this as part of the data cleansing process.  Use in ETL and Scrubbing tables
  • 12. Master Data Management Management is about catching the information before it enters the database and cleaning up what is already there.  MDS  Master Data Services is included as part of SQL Server.  Allows Models, Entities, Attributes, Rules and Versions to be defined and implemented.  Includes Excel add in. Allows Power users or analysts to define models and rules.  DQS  To help ensure domain validity and knowledge driven data quality.  Good for data correction, enrichment, standardization, and de-duplication.  Other third party applications available  Master Data Maestro, etc.
  • 13. Performance Considerations Time spent in designing appropriate data quality controls will reduce the cost of maintaining the database because  NULL slows down the working of indexes.  Increases query retrieval times.  Increase search times.  Increases SQL code complexity.  Decreases the confidence in the information gained from the database.
  • 14. Data Science NULLS may provide the catalyst for development of data science to discover why, or what data or domain knowledge we are lacking.  Null indicates value is not known or indicates missing or incomplete data.  May point to missing entities or uncaptured events.  May skew the results of data tools that disregard NULL values.  Known knowns, Unknown knowns and Unknown unknowns. Data discovery and Knowledge begin by examining what it is that is unexplained.
  • 15. Summary The presence of NULL values in SQL databases has always happened, but this degrades the quality of the information that can be obtained from the data source. NULLs can have an adverse effect on downstream systems, in particular Reporting, BI, Predictive Analytics or Machine Learning that rely on the integrity of the data. Reduce the impact on your information by:  Manage the quality of data going in  Design tables with integrity constraints  Design apps to validate the input  Design queries to ensure correct results are returned Use NULLs as clues to pick up where domain knowledge is lacking.
  • 16. Further Reading https://technet.microsoft.com/en-us/library/ms191504(v=sql.105).aspx MS handling NULL Values https://www.simple-talk.com/sql/t-sql-programming/how-to-get-nulls-horribly-wrong-in-sql-server/ greavesd@uni.coventry.ac.uk @duncan_greaves InformationWithInsight.com