A complete revisiting of the corporate data architecture and its respective best practices is in order because of cloud computing and big changes in computing technology and software development. In some cases, a complete inversion has occurred (in the best way) to solve a particular problem. To be competitive, organizations need to take advantage of these new ways of doing things. Massive data and information is out there if we can just grasp it. Below are some principles and practices on how we can better deal with data going forward.
Seven best practices for revolutionizing your data
1. SevenBestPracticesForRevolutionizingYourData
A complete revisitingof the corporate dataarchitecture anditsrespective bestpracticesisinorder
because of cloudcomputingandbigchangesin computingtechnologyandsoftware development.In
some cases,a complete inversionhasoccurred(inthe bestway) tosolve aparticularproblem.Tobe
competitive,organizationsneedtotake advantage of these new waysof doingthings.Massive data
and informationisoutthere if we canjustgrasp it. Below are some principlesandpracticesonhow
we can betterdeal withdatagoingforward.
Store First,Analyze Later:Diskischeap. We can’t alwayspredictwhatdatawill be importantlater.
Store firstand ask questionslater.Withscalable infrastructure andtoday’shardware economics,it’s
okayif a piece of data turnsout to neverbe used.The schemaflexibilityof NoSQLtechnology
facilitatesthis. Forexample,withacustomerdocument,addingadditional fieldsof informationata
laterdate is easyeven if theywere notenvisionedinitially.
DefaulttoReal-time:Historically,dataprocessingandanalysishasbeendone viabatchprocessing.
We defaultedtobatchprocessingbecause it’scomputationallyefficienthowever,givenMoore’slaw
and the passage of time we nowhave much more powerat our disposal.We canaffordto do more
workto get real-time answersinsteadof answerstomorrow. NoSQLandfast storage technologies
(suchas solidstate disk) make real-time possible. Yourorganizationshoulddeliver
recommendations,personalizationandbusinessmetricsimmediately.Defaulttoreal-timeandgoto
batch onlywhennecessary.
Structure Shouldn’tHoldYouBack: It’seasyto store basic stockinformation –for example (ticker,
high,low,close) –inany database. What abouta complete derivative security?How dowe store
that inthe database,especiallygiventhatnew securitiesare inventedall the time? A legal
contract’s terms? Howdo we store polymorphicinformationordatawe weren’taware of a priori?
Historicallyafewmethodshave beenmostcommon:the relational database fordatawithvery
precise structuring;completelyunstructureddata(“BLOBs”);andthingsinthe middle,suchas
spreadsheets. The lattertwoformatsare mostlyuselessfor integrationintoyourapplications,yet
the volume of suchdata is massive. Withthe rise of dynamicdocument-orienteddatamodels(using
JSON),semi-structured,complex structured,andpolymorphicdatacan be stored,accessedand
organizedjustasefficientlyasthe more rigidlystructureddatathathasbeenindatabases
traditionally.
AgilityIsKey:The software developmentworldhasmovedfromclassic“waterfall”software
developmentlifecyclestomore agile,oriterative,methodologies(forexample,Scrum). These
methods’rapiditerationallowsorganizationstodeliverfeaturesandenhancementstoendusers
quicklyandeffectively.Toworkthisway,we neednew toolsthatare agile-compatible — version
control,continuousintegration,programminglanguageshave adaptedalready. We needsimilar
adaptionbythe database if we want to make software developmentnimbleandproductive. NoSQL
technologiesfacilitate iterationinthe datamodel muchthe same way as youiterate withyourcode.
One Size Doesn’tFitAll:One-size-fits-all isover. Use multiple database technologiesaspartof your
standardenterprise technologyplatform. Youwon’twantdozens – that wouldbe far toocomplex –
but more than one isoptimal. A goodmodel forthe future isto have three primarytools:an DBMS,
2. a relational datawarehouse andaNoSQLdatabase. For each projector sub-problem, use whichever
tool is best. Augmentwithniche tools(e.g.,atime seriesdatabase) forspecial cases.The above
approach ishighlycompatiblewithservice-orientedarchitectures,whichyoushouldbe using.
Monolithichub-and-spoke architecturesleadtolate projectsandunchangeable systems. Instead,
buildwebserviceswitheachone potentiallyhavingitsowndatabase ordatamart behindit.
Go Commodity:The rise of commodityhardware asa viable productionplatformhasmade it
possible todeploymulti-node systemsquickly.Newerdatabase technologiesare designedwith
commodityserversinmind.Companiesare movingawayfrom“bigiron”serversandembrace this
approach.By adoptinga commodityserverdeploymentmodel,thereislessof adependencyon
proprietarymechanismsandvendorlock-inisoftenavoided.Findthe sweetspotonthe price-
performance curve andbuyserversof that size. Don’tbuy$1k servers,you’ll have toomanyto
manage (or evenplugin!) Butdon’tgo toobig either.Manyorganizationsare standardizingon$10k
commodityXeon(orAMD) basedserverswithgigabitEthernet.
Use SolidState Drives –a Lot: Traditional spinningdiskshave increasedincapacityanddata transfer
ratesby a factor of one thousand,yetthe randomi/otimeshave barelybudgedoveradecade. If
youare doingany randomI/Oat all,youshoulduse SSDsinstead.CommoditySATA-style SSDscan
worksurprisinglywell. Be sure to mirrorthem – theystill fail eventhoughthereare nomovingparts
(exceptelectrons!) Reserve20%+of the disk’sspace as un-partitionedtogive the drive roomto
optimize randomwritesandavoidexcess“write amplification”.
For sequentialI/O,sticktospinningdisks. Thus,use spinningdisksforHadoopbatchprocessingand
for backups. Some have predictedeventually99% of data maybe storedonspinningdisksyet99%
of accesseswill be happeningonSSDs.Withspinning disksbeingthe mainplace forbackups,thatis
conceivable.
Source : forbes.com
Recommendedby:
JonCohn ,CTO, VP IT Architecture
https://www.linkedin.com/in/jonacohn
joncohn@comcast.net
"JonCohn ExtonPA""JonCohn Exton""JonCohnEvolution"