At BlaBlaCar we have built a streaming platform to have fast insights about the usage of our services. I will show you how BlaBlaCar builds an automatic access log streaming analysis to improve the security and gain fine-grained knowledge of the platform usage.
Pierre Villard - BlaBlaCar
https://dataxday.fr
8. Security
Enables us to identify
crawling of our web site
Attempted hack
Product simplification
Statistics of all our
endpoint usage
Simplify migration
API
Help our partner to use
our API in a good way.
Use cases
@DataXDay
13. Flink
Standalone mode
Apply regex on free
format text
Build dynamically
schema and push to
schema registry &
kafka
Containerized
(Rkt/Fleet, k8s
migration on going)
Serialize into avro
format
@DataXDay
16. Schema Registry
→ Kafka message represented by a key and a value
→ Schema registry will have two schemas for a topic
- One for the key
- One for the value
→ A schema is represented by a Json
@DataXDay
17. Schema Registry - API
GET : http://schema-registry/subjects
["accesslogs_avro-value","accesslogs_avro-key"]
GET :
http://schema-registry/subjects/accesslogs_avro-value/versions
[1,2,3,4]
@DataXDay
18. Schema Registry - API
GET : http://schema/subjects/accesslogs_avro-value/versions/4
{"subject":"accesslogs_avro-value","version":4,"id":181,"schema":"
{"type":"record","name":"accesslog","fields":
[
{"name":"fieldname1","type":"string"},
{"name":"fieldname2","type":"int"}}
]
}
"}
@DataXDay
23. Concrete example
Help identify crawlers
Fast identification of
bugs / bad behaviors
of a new release
Detect bad API
usages
Help us to integrate
new functionalities
@DataXDay