Presentation to the Open Data Charter Implementation Working Group meeting, November 2019 on ways to manage the barriers to open government data release.
No matter where in the would you are, there are common challenges, or barriers, to people being comfortable with releasing open data. This presentation is about how to manage the major challenges to releasing open data.
Dr Masood Ahmed and Alan Davies - ECO 17: Transforming care through digital h...Innovation Agency
Presentation by Dr Masood Ahmed, Advisor, Digital Health London and Alan Davies, Director of Digital Health, Innovation Agency: Getting AI into practice in the NHS at ECO 17: Transforming care through digital health on Tuesday 4 December at Lancaster University, Lancaster
Tim Estes - Information Systems in an Entity Centric WorldDigital Reasoning
Tim Estes, CEO of Digital Reasoning, talks about the use of Hadoop and other scalable technologies along with Digital Reasoning's analytics for automated understanding of cloud-scale text challenges.
This presentation was delivered at Hadoop World in New York in Oct 2010
No matter where in the would you are, there are common challenges, or barriers, to people being comfortable with releasing open data. This presentation is about how to manage the major challenges to releasing open data.
Dr Masood Ahmed and Alan Davies - ECO 17: Transforming care through digital h...Innovation Agency
Presentation by Dr Masood Ahmed, Advisor, Digital Health London and Alan Davies, Director of Digital Health, Innovation Agency: Getting AI into practice in the NHS at ECO 17: Transforming care through digital health on Tuesday 4 December at Lancaster University, Lancaster
Tim Estes - Information Systems in an Entity Centric WorldDigital Reasoning
Tim Estes, CEO of Digital Reasoning, talks about the use of Hadoop and other scalable technologies along with Digital Reasoning's analytics for automated understanding of cloud-scale text challenges.
This presentation was delivered at Hadoop World in New York in Oct 2010
During the last ten years, the volume and diversity of information grew at unprecedented rates. The growth of information continues to outpace available storage capacity. Businesses and individuals are being overloaded with information that exceeds their ability and time to analyze, synthesize, and disseminate it.
This presentation describes current information overload and methods of dealing with it.
A safer approach to build recommendation systems on unidentifiable dataKishor Datta Gupta
Conference: 14th International Conference on Agents and Artificial Intelligence (ICAART 2022)
In recent years, data security has been one of the biggest concerns, and individuals have grown increasingly worried about the security of their personal information. Personalization typically necessitates the collection of individual data for analysis, exposing customers to privacy concerns. Companies create an illusion of safety to make people feel safe using a mainstream word, "encryption". Though encryption protects personal data from an external breach, the companies can still exploit personal data collected from users as they own the encryption keys. We present a naive yet secure approach for recommending movies to consumers without collecting any personally identifiable information. Our proposed approach can assist a movie recommendation system understand user preferences using the user's movie watch-time and watch history only. We conducted a comprehensive and comparative study on the performance of three deep reinforcement learning architectures, namely DQN, DDQN, and D3QN, on the same task. We observed that D3QN outperformed the other two architectures and achieved a precision of 0.880, recall of 0.805, and F1 score of 0.830. The results show that we can build a competitive movie recommendation system using unidentifiable data.
Data analytics with managerial application ass 3Nishant Kumar
In this presentation , we discussed about important insight from “ what do we do with all these big data “ by Susan Etlinger , finally data are not always correct, but we can make useful deduction to search other possibilities in order to innovate by changing our frame of reference
Netflix was a trailblazing innovator in machine learning as applied to personalization and recommendation systems but there are many other applications of machine learning at Netflix, especially as we further evolve into a global entertainment company. This talk will give an overview of how machine learning is leveraged before content launches on Netflix and how machine learning can support the creative process and serve as a tool for decision makers in our content and marketing organization. The process of creating content is a high-touch, creative endeavor so we need to be similarly creative in the machine learning innovations we develop. From neural nets that predict audience size for content that doesn't exist yet, to NLP and deep learning techniques that mine scripts to highlight properties we need legal clearance for ... we are building unprecedented innovations. The talk will also broadly cover the challenges we face in this space, including data scarcity and making ML interpretable for non-technical stakeholders.
The MEASURE project : Measuring Software Engineering, Manrique Lopez, OW2con'...OW2
Open Source software projects have diverse goals, but they mainly have a mission in common: promote the adoption and collaboration of their specific products. They might have different reasons for it and different policies to achieve that vision. But it ends being about the people using and developing those products.
Talking about "success", in this case, it would mean that the products are used and developed by individuals or by the industry. How could we measure this success? Metrics are useful for project transparency, neutrality, marketing and engineering, and during this talk we will present some use cases and tools to manage your open source and collaborative software projects in an effective way.
Got Chaos? Extracting Business Intelligence from Email with Natural Language ...Digital Reasoning
In this presentation, O'Reilly author and Digital Reasoning CTO Matthew Russell along with Dr. Steve Kramer, founder and chief scientist at Paragon Science, discuss how Digital Reasoning processed the Enron corpus with its advanced Natural Language Processing (NLP) technology - effectively transforming it into building blocks that are viable for data science. Then, Paragon Science used dynamic graph analysis inspired from particle physics to tease out insights from the data in order to better understand whether an enterprise fiasco such as the Enron scandal could have been thwarted.
Talking about Big Data generates a lot of questions; however, most of the focus is on the technologies and skills required to collect and store this volume of information as opposed to the insight that companies need to derive from it. What factors should organizations consider in order to ensure that they are capitalizing on their investments with these technologies? How do you break through business silos to enable sharing of data to increase organizational value? Leveraging his cross-industry experience at companies like The Walt Disney Company, Travelers Insurance and Demand Media, Brendan Aldrich will discuss the question of “big value” with industry examples and a particular focus on his current work to deploy a “data democracy” within the City Colleges of Chicago.
Session Discovery Topics:
• Big value - keeping an eye on the forest (assumptions, judgment and bias)
• Data democracy - increasing productivity with data transparency and open access
Privacy by Design - taking in account the state of the artJames Mulhern
Establishing transparency and building trust provide an opportunity to develop greater, more meaningful relationships with data subjects i.e people, customers, colleagues... in turn this can lead to more effective and valuable services that help transform organisations.
A "Privacy by design" approach can help achieve this but it doesn't happen by accident and transformation doesn't occur over night. So a deliberate approach that looks beyond May 2018 and compliance is required.
Presentation to representatives from the technology and Local Government sectors at TechUK, the UK's trade association for the technology.
The General Data Protection Regulation (GDPR) becomes enforceable at the end of May, 2018. Designed to strengthen and unify data protection for individuals within the European Union (EU), it comes with a strict set of compliance protocols. And, because GDPR also applies to the export of the export of personal data outside the EU, it is applicable to any entity that uses or exchanges this data. As Vice President and Senior Legal Counsel for a leading international bank, Paul knows firsthand the importance of protecting and securing customer data and intelligence. Join Paul to learn about responsibilities and accountabilities that your organization will need to address.
Towards data responsibility - how to put ideals into actionMindtrek
Track | Sustainable and Future-proof Tech
Mikko Eloholma Accelerator of Digital skills, TIEKE
Mindtrek Conference
3rd of October 2023.
Tampere, Finland
www.mindtrek.org
During the last ten years, the volume and diversity of information grew at unprecedented rates. The growth of information continues to outpace available storage capacity. Businesses and individuals are being overloaded with information that exceeds their ability and time to analyze, synthesize, and disseminate it.
This presentation describes current information overload and methods of dealing with it.
A safer approach to build recommendation systems on unidentifiable dataKishor Datta Gupta
Conference: 14th International Conference on Agents and Artificial Intelligence (ICAART 2022)
In recent years, data security has been one of the biggest concerns, and individuals have grown increasingly worried about the security of their personal information. Personalization typically necessitates the collection of individual data for analysis, exposing customers to privacy concerns. Companies create an illusion of safety to make people feel safe using a mainstream word, "encryption". Though encryption protects personal data from an external breach, the companies can still exploit personal data collected from users as they own the encryption keys. We present a naive yet secure approach for recommending movies to consumers without collecting any personally identifiable information. Our proposed approach can assist a movie recommendation system understand user preferences using the user's movie watch-time and watch history only. We conducted a comprehensive and comparative study on the performance of three deep reinforcement learning architectures, namely DQN, DDQN, and D3QN, on the same task. We observed that D3QN outperformed the other two architectures and achieved a precision of 0.880, recall of 0.805, and F1 score of 0.830. The results show that we can build a competitive movie recommendation system using unidentifiable data.
Data analytics with managerial application ass 3Nishant Kumar
In this presentation , we discussed about important insight from “ what do we do with all these big data “ by Susan Etlinger , finally data are not always correct, but we can make useful deduction to search other possibilities in order to innovate by changing our frame of reference
Netflix was a trailblazing innovator in machine learning as applied to personalization and recommendation systems but there are many other applications of machine learning at Netflix, especially as we further evolve into a global entertainment company. This talk will give an overview of how machine learning is leveraged before content launches on Netflix and how machine learning can support the creative process and serve as a tool for decision makers in our content and marketing organization. The process of creating content is a high-touch, creative endeavor so we need to be similarly creative in the machine learning innovations we develop. From neural nets that predict audience size for content that doesn't exist yet, to NLP and deep learning techniques that mine scripts to highlight properties we need legal clearance for ... we are building unprecedented innovations. The talk will also broadly cover the challenges we face in this space, including data scarcity and making ML interpretable for non-technical stakeholders.
The MEASURE project : Measuring Software Engineering, Manrique Lopez, OW2con'...OW2
Open Source software projects have diverse goals, but they mainly have a mission in common: promote the adoption and collaboration of their specific products. They might have different reasons for it and different policies to achieve that vision. But it ends being about the people using and developing those products.
Talking about "success", in this case, it would mean that the products are used and developed by individuals or by the industry. How could we measure this success? Metrics are useful for project transparency, neutrality, marketing and engineering, and during this talk we will present some use cases and tools to manage your open source and collaborative software projects in an effective way.
Got Chaos? Extracting Business Intelligence from Email with Natural Language ...Digital Reasoning
In this presentation, O'Reilly author and Digital Reasoning CTO Matthew Russell along with Dr. Steve Kramer, founder and chief scientist at Paragon Science, discuss how Digital Reasoning processed the Enron corpus with its advanced Natural Language Processing (NLP) technology - effectively transforming it into building blocks that are viable for data science. Then, Paragon Science used dynamic graph analysis inspired from particle physics to tease out insights from the data in order to better understand whether an enterprise fiasco such as the Enron scandal could have been thwarted.
Talking about Big Data generates a lot of questions; however, most of the focus is on the technologies and skills required to collect and store this volume of information as opposed to the insight that companies need to derive from it. What factors should organizations consider in order to ensure that they are capitalizing on their investments with these technologies? How do you break through business silos to enable sharing of data to increase organizational value? Leveraging his cross-industry experience at companies like The Walt Disney Company, Travelers Insurance and Demand Media, Brendan Aldrich will discuss the question of “big value” with industry examples and a particular focus on his current work to deploy a “data democracy” within the City Colleges of Chicago.
Session Discovery Topics:
• Big value - keeping an eye on the forest (assumptions, judgment and bias)
• Data democracy - increasing productivity with data transparency and open access
Privacy by Design - taking in account the state of the artJames Mulhern
Establishing transparency and building trust provide an opportunity to develop greater, more meaningful relationships with data subjects i.e people, customers, colleagues... in turn this can lead to more effective and valuable services that help transform organisations.
A "Privacy by design" approach can help achieve this but it doesn't happen by accident and transformation doesn't occur over night. So a deliberate approach that looks beyond May 2018 and compliance is required.
Presentation to representatives from the technology and Local Government sectors at TechUK, the UK's trade association for the technology.
The General Data Protection Regulation (GDPR) becomes enforceable at the end of May, 2018. Designed to strengthen and unify data protection for individuals within the European Union (EU), it comes with a strict set of compliance protocols. And, because GDPR also applies to the export of the export of personal data outside the EU, it is applicable to any entity that uses or exchanges this data. As Vice President and Senior Legal Counsel for a leading international bank, Paul knows firsthand the importance of protecting and securing customer data and intelligence. Join Paul to learn about responsibilities and accountabilities that your organization will need to address.
Towards data responsibility - how to put ideals into actionMindtrek
Track | Sustainable and Future-proof Tech
Mikko Eloholma Accelerator of Digital skills, TIEKE
Mindtrek Conference
3rd of October 2023.
Tampere, Finland
www.mindtrek.org
Innovation and economic growth depends on company's ability to gain insight into data. However, data is growing exponentially, but our ability to make use of it is not. Untapped economic value resides in this unutilized data, called "dark data." This presentation looks at some of the causes for the explosion of data, some of the impediments preventing exploring and creating business value from dark data; and some ideas for ways around those impediments.
Your Personal Information is none of anyone’s Business. Information Architecture can help keep it that way.
Are you creating products that respect the needs and autonomy of your users? Would you like to learn how to evaluate your digital products for respectful behavior?
It’s possible to measure ethical behavior of technology and Information Architecture is a key component. Join Noreen Whysel and you will Learn:
-How can IA inform the design of safe and respectful apps and websites?
-What is the Me2B Safe Technology Specification?
-How can you get involved?
Open data and business - presentation to the Hawke's Bay Chamber of Commerce September 2019. Open Government policy, what is open data, what is available, how has it been used, and what data could businesses be making open for their benefit.
Hawke's Bay Open Data Conference - 2 May 2019enotsluap
Hawke's Bay Open Data Conference - 2 May 2019. Presentation on open data Policy, data available and innovative ways it is being reused. Also why the private sector could/should release data.
Presentation to an audience of the Institute of Public Administration NZ (IPANZ), covering the value of open data; government's policy; the role of the Government Chief Data Steward; what sort of data is open and innovative ways it has been reused for new products, services and insights.
Open data policy and intentions in New Zealand - why open has more potential to generate value. The Chief Government Data Steward role and what's happening to lift data capability across government. What's out there as open data and how is it being used to make an impact.
Symbiotic relationships: bringing about change for open data togetherenotsluap
Presentation to the Open Data Leaders Summit at IODC18, Buenos Aires, about the value of symbiotic relationships and an informal approach to bring about open data change.
Presentation on the open data landscape - government policy and its intentions; open by design vs privacy by design; the Open Data Charter; what the programme is doing; some examples of open data put to use; and some examples of data found on Data.govt.nz
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
1. Open Data
Challenges
Paul Stone, NZ Open Government Data Programme,
Data Leadership and Capability, Stats NZ.
ODC Implementation Working Group, November 2019
2. Quality
Some data is better than
no data…
1. Good context
• Purpose of collection
• Method of collection
• Known strengths and
weaknesses
2. Opportunity for
feedback
3. Creative Commons
Licence
Image: Quality by Nick Youngson CC BY-SA 3.0 Alpha Stock Images
3. Accurac
y
• Context – inform how data is
collected and for what purpose.
• Be upfront about limitations in
the data or risk of human error
• Create a feedback loop - invite
data users to contribute and help
lift the accuracy of the data
4. Creative Commons
Licence – not liable for
quality
THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS AND
AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR
WARRANTIES OF ANY KIND CONCERNING THE LICENSED
MATERIAL… THIS INCLUDES… FITNESS FOR A PARTICULAR
PURPOSE…
5. Ignorant misuse
Context, context, context.
• Describe the collection
method and purpose
• Describe well the
variables in the data and
what they mean
• Don’t assume everyone is
ignorant.
6. Malicious
misuse Don’t let one bad
potential use prevent
the opportunity for
many good uses
Further clause on
liability in CC licence.
7. Creative Commons
Licence – not liable for
reuse (misuse)
…IN NO EVENT WILL THE LICENSOR BE LIABLE TO YOU ON
…DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR USE
OF THE LICENSED MATERIAL…
8. Reputation
• Transparency builds trust
• Good context reduces
misunderstanding
• An opportunity to contribute to
data quality is an opportunity to
engage and build relationships
All good reasons to release the
data – with good context.
9. Lack of resources
• Open by design
• Prioritise
o Alignment to organisational
strategy
o demand
• Don’t re-invent the wheel
• Start small
10. Lack of technology
• Keep it simple
• Start small
• Every agency has a website
• Most agencies have API
capability but just don’t use it
12. Culture
Probably the biggest challenge.
Shifting the mindset of the whole
organisation to open by default.
Challenges are manageable, not
insurmountable.
Key messages – like “we are
custodians of a public data asset”