We'll consider standard and not very approaches to solving problems for NER and name resolution. Also, we'll see how we can use similarity queries for the name resolution and how the results depend on various types of similarity.
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...Yandex
Лекция Сергея Царика и Антона Роменского в Школе вебмастеров: «Основные принципы ранжирования»
https://academy.yandex.ru/events/webmasters_school/yawebm2015/
Как работает поиск
При запросе пользователя к поисковой системе происходит множество процессов, которые позволяют дать наиболее релевантный ответ. Рассмотрим основные механизмы формирования выдачи: формулы, Матрикснет, персонализацию и обновления.
Что учитывается при ранжировании сайтов
Так как сайты разные и по-разному решают пользовательские задачи, при ранжировании поисковой системе нужно учитывать множество факторов. Поговорим о том, что обязательно должно быть на сайте для правильной индексации.
Ещё о факторах ранжирования
Какой контент действительно важен и как его правильно представить. Для правильного ранжирования сайта важно разобраться с его региональной привязкой. Разберёмся, какой регион присваивать сайту и как сделать это правильно.
Реальный кейс долгосрочной работы над позициями
Посмотрим на реальном примере, как изменялись основные жизненные характеристики (трафик, конверсии) сайта на пути в топ выдачи поисковых систем.
журба александр, Tex drive, презентация инвестору как «продать» себя и свой п...New Business Idea
В ходе лекции, совмещенного с практическим занятием, будут рассмотрены основные принципы составления презентационных материалов и проведения презентаций для инвестора. Будут рассмотрены реальные примеры презентаций, даны рекомендации по их составлению и процедуре контакта с инвесторами.
Продвижение интернет-проекта: о том, что сделать простоNetpeak
Простые в реализации рекомендации для продвижения интернет-проекта. Доклад был представлен в рамках VII Международного Фестиваля Маркетинга The marketing Jazz Fest 2011 Digital Experience
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...Yandex
Лекция Сергея Царика и Антона Роменского в Школе вебмастеров: «Основные принципы ранжирования»
https://academy.yandex.ru/events/webmasters_school/yawebm2015/
Как работает поиск
При запросе пользователя к поисковой системе происходит множество процессов, которые позволяют дать наиболее релевантный ответ. Рассмотрим основные механизмы формирования выдачи: формулы, Матрикснет, персонализацию и обновления.
Что учитывается при ранжировании сайтов
Так как сайты разные и по-разному решают пользовательские задачи, при ранжировании поисковой системе нужно учитывать множество факторов. Поговорим о том, что обязательно должно быть на сайте для правильной индексации.
Ещё о факторах ранжирования
Какой контент действительно важен и как его правильно представить. Для правильного ранжирования сайта важно разобраться с его региональной привязкой. Разберёмся, какой регион присваивать сайту и как сделать это правильно.
Реальный кейс долгосрочной работы над позициями
Посмотрим на реальном примере, как изменялись основные жизненные характеристики (трафик, конверсии) сайта на пути в топ выдачи поисковых систем.
журба александр, Tex drive, презентация инвестору как «продать» себя и свой п...New Business Idea
В ходе лекции, совмещенного с практическим занятием, будут рассмотрены основные принципы составления презентационных материалов и проведения презентаций для инвестора. Будут рассмотрены реальные примеры презентаций, даны рекомендации по их составлению и процедуре контакта с инвесторами.
Продвижение интернет-проекта: о том, что сделать простоNetpeak
Простые в реализации рекомендации для продвижения интернет-проекта. Доклад был представлен в рамках VII Международного Фестиваля Маркетинга The marketing Jazz Fest 2011 Digital Experience
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)Iosif Itkin
Exactpro is supporting the 3rd annual IT-conference YouCon to take place on 14th October in Saratov, Russia. Over 900 programmers, systems engineers and architects, software QA engineers, and marketing specialists will gather to discuss the latest trends in programming technology. It is the largest IT industry event in Saratov.
Iosif Itkin, CEO of Exactpro, part of London Stock Exchange Group, will deliver a "BDD. The Outer Limits" presentation named after Iosif's favorite Sci-Fi series.
The topics to be covered are:
Behavior Driven Development concepts
Applying BDD in trading and clearing systems
Specification by Example and using production data
Combining Model-based testing and BDD
The Outer Limits
There will be an opportunity to ask questions, share thoughts and expertise in BDD, or just chat with a representative at the Exactpro stand at any time during the event.
Don't miss out, stop by and ask how you can get your Exactpro souvenir :)
We look forward to meeting you there!
#Exactpro #Youconsaratov
Данный шаблон инвестиционной презентации был разработан на основе мировых практик, а также критериев и требований к материалам, которые мы применяем во ФРИИ.
Спасибо Максиму Штейгервальду и Инге Фокша за бесценные добавления и правки.
Все ссылки на источники вдохновения и благодарности внутри.
Примеры и совпадения случайны.
Если у вас есть вопросы и комментарии пишите на ikorolev (собака) iidf.ru
Keynote: http://bit.ly/1OwPiZi
PPT: http://bit.ly/1Kub5Ts
(блиц-доклад) «IT-Мастерская» - пример эффективного инструмента рекрутингаIT-Доминанта
(блиц-доклад)
«IT-Мастерская» - пример эффективного инструмента рекрутинга
Каланов Денис
Директор по развитию
IT-Доминанта
Айти-Событие
Россия, СПб
http://www.it-sobytie.ru/events/1889
проектирование, поддержка и контент интернет магазинаТауруна
Как правильно формировать требования к интернет-проекту, пример бизнес-модели интернет-магазина, детализация технического задания и подводные камни при оформлении договора на разработку, вопросы поддержки и развития интернет-магазина, риски аутсорса, и, конечно, грамотное управление контентом – об этом и многом другом в презентации "Проектирование, поддержка и контент интернет-магазина".
стратегическое планирование в интернет бизнесеDenis Zapirkin
2 из 3 презентаций Дениса Запиркина совместно с Нетологией о развитии бизнеса. Часть 2: стратегическое планирование и финансовые показатели в интернет бизнесе
"What I learned through reverse engineering", Yuri ArtiukhFwdays
In recent years, I have gained most of my knowledge through reverse engineering, how I did it and what I learned during this period, I decided to share. All this concerns graphic programming, performance, best practices in the frontend.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Más contenido relacionado
Similar a Oleksiy Shashlyuk "Named-entity recognition and name resolution using similarity queries"
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)Iosif Itkin
Exactpro is supporting the 3rd annual IT-conference YouCon to take place on 14th October in Saratov, Russia. Over 900 programmers, systems engineers and architects, software QA engineers, and marketing specialists will gather to discuss the latest trends in programming technology. It is the largest IT industry event in Saratov.
Iosif Itkin, CEO of Exactpro, part of London Stock Exchange Group, will deliver a "BDD. The Outer Limits" presentation named after Iosif's favorite Sci-Fi series.
The topics to be covered are:
Behavior Driven Development concepts
Applying BDD in trading and clearing systems
Specification by Example and using production data
Combining Model-based testing and BDD
The Outer Limits
There will be an opportunity to ask questions, share thoughts and expertise in BDD, or just chat with a representative at the Exactpro stand at any time during the event.
Don't miss out, stop by and ask how you can get your Exactpro souvenir :)
We look forward to meeting you there!
#Exactpro #Youconsaratov
Данный шаблон инвестиционной презентации был разработан на основе мировых практик, а также критериев и требований к материалам, которые мы применяем во ФРИИ.
Спасибо Максиму Штейгервальду и Инге Фокша за бесценные добавления и правки.
Все ссылки на источники вдохновения и благодарности внутри.
Примеры и совпадения случайны.
Если у вас есть вопросы и комментарии пишите на ikorolev (собака) iidf.ru
Keynote: http://bit.ly/1OwPiZi
PPT: http://bit.ly/1Kub5Ts
(блиц-доклад) «IT-Мастерская» - пример эффективного инструмента рекрутингаIT-Доминанта
(блиц-доклад)
«IT-Мастерская» - пример эффективного инструмента рекрутинга
Каланов Денис
Директор по развитию
IT-Доминанта
Айти-Событие
Россия, СПб
http://www.it-sobytie.ru/events/1889
проектирование, поддержка и контент интернет магазинаТауруна
Как правильно формировать требования к интернет-проекту, пример бизнес-модели интернет-магазина, детализация технического задания и подводные камни при оформлении договора на разработку, вопросы поддержки и развития интернет-магазина, риски аутсорса, и, конечно, грамотное управление контентом – об этом и многом другом в презентации "Проектирование, поддержка и контент интернет-магазина".
стратегическое планирование в интернет бизнесеDenis Zapirkin
2 из 3 презентаций Дениса Запиркина совместно с Нетологией о развитии бизнеса. Часть 2: стратегическое планирование и финансовые показатели в интернет бизнесе
"What I learned through reverse engineering", Yuri ArtiukhFwdays
In recent years, I have gained most of my knowledge through reverse engineering, how I did it and what I learned during this period, I decided to share. All this concerns graphic programming, performance, best practices in the frontend.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
"Micro frontends: Unbelievably true life story", Dmytro PavlovFwdays
A real life story about the experience of using Micro frontends in an existing Enterprise product. Problems and their solutions on the way from the integration of a separate component to an extensible No-code platform.
"Objects validation and comparison using runtime types (io-ts)", Oleksandr SuhakFwdays
A common task in modern JS is parsing, validating and then comparing JSON objects. In this talk I will quickly go through most common ways to parse/validate and compare objects we use today and then focus more on how runtime types (based on io-ts) can help make such tasks easier and quicker to implement.
"JavaScript. Standard evolution, when nobody cares", Roman SavitskyiFwdays
Should we take a look at JavaScript when everyone is writing in TypeScript? What happens to the standard? What did we get last year? What new features can we expect this and next year? And most importantly, when will Observer be standardized?
Let's try to answer all these questions and even a little more, dream about the future, and enjoy that Observer is alive (or not).
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...Fwdays
Case study of how small team in Preply started with inheriting an existing ranking model to being able to produce a model per day. In this talk we'll cover steps to take if you find yourself in a similar situation: what kind of technology and processes can you introduce in order to achieve a great speedup in a development speed.
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil TopchiiFwdays
In my talk, I will tell about the world of GenAI services beyond GPT-wrappers and how we developed and scaled GenAI-centric applications. I'll share personal experiences about the obstacles, lessons, and strategic tools and methodologies that were key in taking GenAI applications from 0 to 1. I'll talk about the challenges we faced when launching LLM-based and image generative applications and delivering them to end users, and what conclusions and solutions were made.
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
Python engineers are introduced to the transformative potential of Large Language Models (LLMs) in the realm of advanced data analysis and the application of Semantic Kernel techniques. We will talk about how LLMs like ChatGPT can be integrated into Python environments to automate data processing, enhance predictive modeling, and unlock deeper insights from complex datasets. The session will delve into practical strategies for embedding Semantic Kernel methods within Python projects, illustrating how these advanced techniques can refine the accuracy of machine learning models by embedding domain-specific knowledge directly into the analysis process. Attendees will leave with a clear roadmap for leveraging the combined power of LLMs and Semantic Kernels, equipped with actionable knowledge to drive innovation in their data analysis projects and beyond, marking a significant leap forward in the evolution of Python engineering practices.
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
Federated learning. Algorithmic solution to the problem of privacy preserving ML. Pieces involved to support the training with NVIDIA Flare as example. How newest legislation affects federated learning.
"What is a RAG system and how to build it",Dmytro SpodaretsFwdays
Today, large language models are becoming an integral part of almost every IT solution. However, their use is often accompanied by certain limitations, such as the relevance of information or its depth and specificity. One of the ways to overcome these limitations is the method of working with LLMs - RAG (Retrieval Augmented Generation).
In an ideal world, you would write Python code and then it would work perfectly. But unfortunately, it doesn't work in this manner. In my talk, I'll cover how to efficiently debug your programs, especially in cloud environments or inside Kubernetes.
MLOps (Machine Learning Operations) is a recent buzzword, that trends a lot. Let's figure out together how maintaining applications with machine learning components is significantly different from maintaining applications without them.
We will look into MLOps best practices and typical problems and their implementations/solutions in real world production.
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
Ever seen a code base where understanding a simple method meant jumping through tangled class hierarchies? We all have! And while "Favor composition over inheritance!" is almost as old as object-oriented programming, strictly avoiding all types of subclassing leads to verbose, un-Pythonic code. So, what to do?
The discussion on composition vs. inheritance is so frustrating because far-reaching design decisions like this can only be made with the ecosystem in mind – and because there's more than one type of subclassing!
Let's take a dogma-free stroll through the types of subclassing through a Pythonic lens and untangle some patterns and trade-offs together. By the end, you'll be more confident in deciding when subclassing will make your code more Pythonic and when composition will improve its clarity.
"Distributed graphs and microservices in Prom.ua", Maksym KindritskyiFwdays
The current architecture of Prom.ua is built on microservices and GraphQL API, but it was not always like that. In this talk, I'll tell you how far we've come and how we've made using graphs in a microservice architecture convenient and simple. I will talk about the problems we faced and how we overcame them, made our development process more accessible, deployments faster, and the remains of the monolith less loaded.
"Rethinking the existing data loading and processing process as an ETL exampl...Fwdays
ETL stands for extract, transform, load. It's a process that combines data from different sources into a single repository for further processing, analysis, and utilization.
This talk provides an example of how pandas can be used to solve ETL tasks as a stage in the evolution of the data intake component. This involves preliminary validation, filtering, and conversion of data according to a set of business rules and internal representation, with intermediate combination with other sources.
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...Fwdays
I’m confident that many IT professionals are currently facing the same situation I was in a few months ago. Mobilization, uncertainty. How can I be maximally beneficial to the country with my experience and continue professional development in such circumstances? Since the onset of the full-scale invasion, I've been actively volunteering and assisting the army. Mobilization became the next logical step.
I want to share:
My journey in IT, volunteering, and the beginning of my service in the Armed Forces
Impressions from the first few months
Which Soft Skills are helpful in this context
I aim to dispel myths about the mobilization process and projects of the Armed Forces. Address your questions
And yes, military personnel can travel abroad during their leave.
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...Fwdays
The leader must be strong all the time. The leader cannot afford to make mistakes, let alone fail in front of their team. Is that really true? Nick Gicinto, a cybersecurity leader with over 25 years of experience, who has worked for the CIA and has built security systems from scratch at Tesla and Uber, fully hiring teams for these projects, will talk about the importance of being vulnerable to build trust within a team.
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...Fwdays
Sharing open feedback can be difficult because it equals much work on yourself. However, feedback needs attention and a special place in the corporate culture. It helps to grow dynamically, build a team of like-minded people and achieve powerful results.
In the presentation, I will talk about:
The ability to work with feedback as a soft, solid skill in developing technical specialists.
A list of difficulties that prevent quality work with feedback.
The 4A Framework is a tool for successful giving and receiving feedback.
I will also help specialists learn the following:
Form constructive feedback and understand how and when to give it.
Work analytically with the received feedback.
Feel free to share your thoughts and be heard.
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...Fwdays
Will discuss:
Current communication challenges, including mishaps and toxic versus productive interactions.
Ever wondered about PDP? It’s likely because its relevance to career planning, even outside your current company, hasn’t been fully spotlighted.
Exploring how PDP functions within career planning, applicable even if you’re eyeing an exit.
“Who do I aspire to become?”
Summarizing key points with a reference to a practical form you can download to use.
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...Fwdays
This talk will reveal four destructive communication patterns that can undermine team spirit, reduce productivity and cause conflict, and offer effective strategies for neutralizing them.
Let's start with exciting storytelling about a fictional team of developers working on Scrum. You will learn about situations that their team member noticed during team meetings.
Next, we will analyze "The Gottman Four Horsemen" model, which describes the four "horsemen of the apocalypse" of work relationships: criticism, defensiveness, contempt, and stonewalling. For each of these patterns, specific "antidotes" will be offered that allow you to build healthier and more productive relationships in the team.
Finally, we'll look at why this topic is critical to team productivity, drawing on Google's "Project Aristotle" research. Special attention will be paid to the concept of psychological safety, which is a key factor in the success of high-performance teams.
This talk will not only provide valuable insights and tools for improving communication and management in Tech teams, but will also help each member better understand their own contribution to the overall success of the team.
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
Oleksiy Shashlyuk "Named-entity recognition and name resolution using similarity queries"
1.
2. PitchBook - это все, что касается инвестиций. Основная цель компании - предоставить
исчерпывающую информацию о том, кто (компания или физическое лицо), когда, как и
на каких условиях инвестировал деньги в некоторые активы, и как они использовались
для получения дивидендов.
С самого начала PitchBook был сосредоточен на инвестициях в частный капитал (PE),
затем плавно распространился на венчурный капитал (VC / Startups), частично
раскрывая финансовые показатели публичных компаний.
PitchBook отслеживает все возможные инвестиционные действия (сделки, фонды) и
обеспечивает удобный для пользователя вид, позволяет искать, анализировать и
экспортировать данные наиболее удобным для клиентов способом.
О компании
4. Постановка проблемы
● 20 000+ новостей в день
● 8 000 000+ компаний в БД
● 20+ отслеживаемых событий (Revenue, EBITDA, Public offering etc.)
● 5+ типов имен (FORMAL, FAMILIAR, FORMER, LEGAL, PARENT etc.)
6. Предобученные модели
● 16 языков
● 3 модели для большинства языков (small, medium,
large)
● 18 типов
● натренирована на корпусе OntoNote 5
● есть возможность визуализации
7. Предобученные модели
● 5 языков
● 3 модели (3, 4, 7 классов)
● натренированы на CoNLL 2003
● требует установленной Java
8. Предобученные модели
● 8 языков
● 2 модели (4, 8 классов)
● натренированы на CoNLL 2003 и OntoNotes
● выполнена как надстройка над PyTorch
9. Тренировка собственной модели NER
Обычно задача NER сводится к задаче классификации на уровне токенов, т.е. каждый
токен относится к одному из нескольких возможных классов.
BIOES-схема. К метке сущности (например, ORG для организаций) добавить
некоторый префикс, который обозначает позицию токена в спане сущности:
1) B – beginning, первый токен в спане сущности
2) I – inside, токены в середине спана сущности
3) E – ending, последний токен в спане сущности
4) S – single, сущность состоит из одного слова.
10. Тренировка собственной модели NER
Добавление признаков:
1) word[:n]
2) word[-n:]
3) is_upper
4) is_lower
5) is_camelcase
6) postag (Part-of-speach)
7) lemma
8) stem
9) n_grams
10) ...
14. Entity resolution (entity linking)
Similarity queries (using gensim library)
gensim - это библиотека для тематического моделирования
Основные понятия:
1) Document
2) Corpus
3) Vector
4) Model
21. Entity resolution (entity linking)
Word Mover’s Distance
1. Obama speaks to the media in Illinois
2. The president greets the press in Chicago
22. Entity resolution (entity linking)
Query
Similar keywords by
Cosine similarity
Similar keywords by
Soft Cosine similarity
Similar keywords by
WMD similarity
000 squares foot car dealership
000 squares foot car dealership,
17000 squares foot,
16987 squares foot,
squares foot,
9631 squares foot complex,
buildable squares foot,
squares foot facility,
50599 squares foot stores,
820000 squares foot facility,
car dealership,
car dealership service,
used car dealership,
2 car dealership,
new car dealership,
squares,
78000 squares foot data center,
foot,
dealership information,
dealership service,
dealership
000 squares foot car dealership,
beds die cutter,
beds liners,
hot wedges welder,
indoor heated beds,
truck beds liners,
double beds rooms,
double beds occupancy,
brass beds restoration,
hot rolled billets,
hot rolled angles,
retractable truck beds,
fluid beds dryers,
beds shaker,
forged grinding balls,
medical beds wedge,
beds springs,
bed-spring distributor,
beds surfaces cleaners,
hot rolled rings
000 squares foot car dealership,
toyota car dealership, bmw car
dealership, honda car
dealership, ford car dealership,
chevrolet car dealership,
franchised car dealership,
citroen car dealership, rain
repellent car washing, car
shuttle train loaders, car carpet
& upholstery detailing,
car dealership showroom,
citroen car dealership franchise,
car seating dealership, car
washing brushes, convertible
car dealer, waterless car
washing, hopper car vibrators,
pre-owned cars dealership,
rearing trunk dealership
plaques
23. Entity resolution (entity linking)
Cosine similarity Soft Cosine similarity WMD similarity
Inference time 1.23 s 158 ms 59 ms
Query time 1.03 ms 24 ms 100 ms