Professional Documents
Culture Documents
Abstract
Un moteur ? Pourquoi faire ? Elasticsearch : une solution simple, complte, performante Et si on indexait Twitter ?
LE BESOIN
Facets
Demo
Architecture Communaut
5
Facets
Demo
Architecture Communaut
6
Performances du like %
Facets
Demo
Architecture Communaut
7
Facets
Demo
Architecture Communaut
8
ELASTICSEARCH
Elasticsearch
Moteur de recherche pour la gnration NoSQL Bas sur le standard Apache Lucene Masque la complexit Java/Lucene laide de services standards HTTP /
RESTful / JSON Utilisable partir de nimporte quelle technologie Ajoute la couche cloud manquante Lucene Cest un moteur, pas une interface graphique !
Facets
Demo
Architecture Communaut
10
Points cls
Simple ! En quelques minutes (Zero Conf), on dispose dun moteur
complet prt recevoir nos documents indexer et faire des recherches. Efficace ! Il suffit de dmarrer des nuds Elasticsearch pour bnficier immdiatement de la rplication, de lquilibrage de charge. Puissant ! Bas sur Lucene, il en paralllise les traitements pour donner des temps de rponse acceptables (en gnral infrieurs 100ms) Complet ! Beaucoup de fonctionnalits : analyse et facettes, percolation, rivires, plugins,
Facets
Demo
Architecture Communaut
11
"text": "Bienvenue la confrence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", "truncated": false, "retweet_count": 0, "hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 }, { "text": "devoxxfr", "start": 47, "end": 55 } ], "user": { "id": 51172224, "name": "David Pilato", "screen_name": "dadoonet", "location": "France", "description": "Soft Architect, Project Manager, Senior Developper.\r\nAt this time, enjoying NoSQL world : CouchDB, ElasticSearch.\r\nDeeJay 4 times a year, just for fun !" } }
Un tweet
Type : Regroupe des documents de mme type Index : Espace logique de stockage des documents dont les types sont
fonctionnellement communs
Facets Demo
Architecture Communaut
12
Documents
curl -XPUT http://localhost:9200/twitter/tweet/1 curl -XGET http://localhost:9200/twitter/tweet/1 curl -XDELETE http://localhost:9200/twitter/tweet/1 curl -XGET http://localhost:9200/twitter/tweet/_search curl -XGET http://localhost:9200/twitter/_search curl -XGET http://localhost:9200/_search curl -XGET http://localhost:9200/twitter/_status
Facets Demo Architecture Communaut
13
Recherche
Indexons un document
$ curl -XPUT localhost:9200/twitter/tweet/1 -d ' { "text": "Bienvenue la confrence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", "truncated": false, "retweet_count": 0, "hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 }, { "text": "devoxxfr", "start": 47, "end": 55 } ], "user": { "id": 51172224, "name": "David Pilato", "screen_name": "dadoonet", "location": "France", "description": "Soft Architect, Project Manager, Senior Developper.\r\nAt this time, enjoying NoSQL world : CouchDB, ElasticSearch.\r\nDeeJay 4 times a year, just for fun !" } }' {
Facets
Demo
Architecture Communaut
14
Cherchons un document
$ curl localhost:9200/twitter/tweet/_search?q=elasticsearch { "took" : 24, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "text": "Bienvenue la confrence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", [] } } ] }
Pertinence
Facets
Demo
Architecture Communaut
15
$ curl "localhost:9200/twitter/tweet/_search?q=elasticsearch&explain=true"
Facets
Demo
Architecture Communaut
16
18
La collecte
Doc
Stockage Donnes
Facets
Demo
Architecture Communaut
19
La collecte
Doc
Stockage Donnes
Doc
Facets
Demo
Architecture Communaut
20
La collecte
Stockage Donnes
Doc Doc
Facets
Demo
Architecture Communaut
21
La collecte
Doc
Stockage Donnes
Doc
Doc
Facets
Demo
Architecture Communaut
22
La collecte
Stockage Donnes
Doc
Doc
Facets
Demo
Architecture Communaut
23
La collecte
Stockage Donnes
Doc
Facets
Demo
Architecture Communaut
24
Rivers
CouchDB River MongoDB River Wikipedia River Twitter River RabbitMQ River RSS River Dick Rivers
Facets
Demo
Architecture Communaut
25
La puissance des facettes ! Faites parler vos donnes en les regardant sous diffrentes facettes !
26
Les facettes
ID 1 2 3 4 5 6 7 8 9
Moteur Elasticsearch Rivers
Username dadoonet devoxxfr elasticsearchfr dadoonet devoxxfr elasticsearchfr dadoonet devoxxfr elasticsearchfr
Facets Demo Architecture Communaut
Date 2012-04-18 2012-04-18 2012-04-18 2012-04-18 2012-04-18 2012-04-19 2012-04-19 2012-04-19 2012-04-20
Hashtags 1 5 2 2 6 3 3 7 4
Des tweets
27
Facette "Term"
Username dadoonet devoxxfr elasticsearchfr dadoonet devoxxfr elasticsearchfr dadoonet devoxxfr elasticsearchfr
Moteur Elasticsearch Rivers Facets
Count 3 3 3
2012-04-19 elasticsearchfr 3
Architecture Communaut
Facette "Term"
} ID 1 2 3 4 5 6 7 8 9 Username dadoonet devoxxfr elasticsearchfr dadoonet devoxxfr elasticsearchfr dadoonet devoxxfr elasticsearchfr
Facets
Date Hashtags "facets" : { 2012-04-18 : { 1 "users" "_type" 2012-04-18 : "terms", 5 "missing" : 0, 2012-04-18 2 "total": 9, 2012-04-18 2 "other": 0, "terms" 2012-04-18 : [ 6 { "term" : "dadoonet", "count" : 3 }, 2012-04-19 3 { "term" : "devoxxfr", "count" : 3 }, 2012-04-19 3 { "term" : "elasticsearchfr", "count" : 3 } 2012-04-19 7 ] } 2012-04-20 4
Demo Architecture Communaut
29
ame
Hashtags 1 5 2 2 6 3 3 7 4
Architecture Communaut
onet
Par mois
Date 2012-04 Count 9
xxfr
archfr
onet
Par jour
Date 2012-04-18 2012-04-19 2012-04-20 Count 5 3 1
xxfr
archfr
onet
xxfr
archfr
30
ame
onet
xxfr
"facets" : { "perday" : { "date_histogram" : { "field" : "date", "interval" : "day" Hashtags } }1 } 5 2 "facets" : { 2 "perday" : { "_type" : "date_histogram", 6 "entries": [ 3 { "time": 1334700000000, "count": 5 }, 3 { "time": 1334786400000, "count": 3 }, 7 { "time": 1334872800000, "count": 1 } ] } 4 }
Demo Architecture Communaut
31
archfr
onet
xxfr
archfr
onet
xxfr
archfr
Facette "Ranges"
Hashtags 1 5 2 2 6 3 3 7 4
Moteur Elasticsearch Rivers Facets Demo Architecture Communaut
32
Count 3 3 3
Min 1 3 5
Max 2 4 7
Total 5 10 18
Facette "Ranges"
Hashtags 1 5 2 2 6 3 3 7 4 "facets" : { "hashtags" : { "range" : { "field" : "hashtags", "ranges" : [ { "to" : 3 }, { "from" : 3, "to" : 5 }, { "from" : 5 } ] } } } "facets" : { "hashtags" : { "_type" : "range", "ranges" : [ { "to": 3, "count": 3, "min": 1, "max": 2, "total": 5, "mean": 1.667 }, { "from":3, "to" : 5, "count": 3, "min": 3, "max": 4, "total": 10, "mean": 3.333 }, { "from":5, "count": 3, "min": 5, "max": 7, "total": 18, "mean": 6 } ] } }
Facets Demo Architecture Communaut
33
Term
Term
Ranges
Moteur Elasticsearch Rivers Facets Demo Architecture Communaut
34
Term
Ranges Rsultats
Date histogram
Facets
Demo
Architecture Communaut
35
Critres
Facets
Demo
Architecture Communaut
36
Date histogram
Term
Facets
Demo
Architecture Communaut
37
DMONSTRATION
38
Dmonstration : architecture
Chrome Twitter River Twitter Streaming API
$ curl -XPUT localhost:9200/_river/twitter/_meta -d ' { "type" : "twitter", "twitter" : { "user" : "twitter_user", "password" : "twitter_passowrd", "filter" : { "tracks" : ["devoxxfr"] } } }'
Facets
Demo
Architecture Communaut
39
ARCHITECTURE
40
Lexique
Nud (node) : Une instance d'Elasticsearch (~ machine ?) Cluster : Un ensemble de nuds Partition (shard) : permet de dcouper un index en plusieurs parties pour y
distribuer les documents Rplication (replica) : recopie dune partition en une ou plusieurs copies dans l'ensemble du cluster Partition primaire (primary shard) : partition lue "principale" dans l'ensemble du cluster. C'est l que se fait l'indexation par Lucene. Il n'y en a qu'une seule par shard dans l'ensemble du cluster. Partition secondaire (secondary shard) : partitions secondaires stockant les replicas des partitions primaires.
Facets
Demo
Architecture Communaut
41
Crons un index
$ curl -XPUT localhost:9200/twitter -d '{ "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }'
Nud 2
Shard 0 Shard 1
Facets
Demo
Architecture Communaut
42
Rallocation dynamique
Cluster Nud 1
Shard 0
Nud 2
Shard 0 Shard 1
Nud 3
Shard 1
Facets
Demo
Architecture Communaut
43
Rallocation dynamique
Cluster Nud 1
Shard 0
Nud 2
Shard 0 Shard 1
Nud 3
Shard 0
Nud 4
Shard 1
Facets
Demo
Architecture Communaut
44
Rallocation dynamique
Cluster Nud 1
Shard 0
Nud 2
Nud 3
Shard 0
Nud 4
Shard 1
Shard 1
Shard 1
Le tuning, c'est trouver le bon quilibre entre le nombre de nodes, shards et replicas !
Facets
Demo
Architecture Communaut
45
Indexons un document
Cluster Nud 1
Shard 0
Nud 2
Nud 3
Shard 0
Nud 4
Shard 1
Shard 1
Doc 1 Client
CURL
Facets
Demo
Architecture Communaut
46
Indexons un document
Cluster Nud 1
Shard 0
Doc 1
Nud 2
Nud 3
Shard 0
Nud 4
Shard 1
Shard 1
Client CURL
Facets
Demo
Architecture Communaut
47
Indexons un document
Cluster Nud 1
Shard 0
Doc 1
Nud 2
Nud 3
Shard 0
Doc 1
Nud 4
Shard 1
Shard 1
Client CURL
Facets
Demo
Architecture Communaut
48
Indexons un
Nud 1
Shard 0
Doc 1
me 2
Cluster
document
Nud 3
Shard 0
Doc 1
Nud 2
Nud 4
Shard 1
Shard 1
Doc 2
Client CURL
Facets
Demo
Architecture Communaut
49
Indexons un
Nud 1
Shard 0
Doc 1
me 2
Cluster
document
Nud 3
Shard 0
Doc 1
Nud 2
Nud 4
Doc 2
Shard 1
Shard 1
Client CURL
50
Indexons un
Nud 1
Shard 0
Doc 1 Doc 2
me 2
Cluster
document
Nud 3
Shard 0
Doc 1
Nud 2
Nud 4
Shard 1
Shard 1
Client CURL
Facets
Demo
Architecture Communaut
51
Indexons un
Nud 1
Shard 0
Doc 1 Doc 2
me 2
Cluster
document
Nud 3
Shard 0
Doc 1 Doc 2
Nud 2
Nud 4
Shard 1
Shard 1
Client CURL
Facets
Demo
Architecture Communaut
52
Cherchons
Cluster Nud 1
Shard 0
Doc 1 Doc 2
Nud 2
Nud 3
Shard 0
Doc 1
Nud 4
Shard 1
Shard 1
Doc 2
Client CURL
$ curl localhost:9200/twitter/_search?q=elasticsearch
Facets
Demo
Architecture Communaut
53
Cherchons
Cluster Nud 1
Shard 0
Nud 2
Nud 3
Shard 0
Doc 1
Nud 4
Doc 1
Shard 1
Doc 2
Shard 1
Doc 2
Client CURL
$ curl localhost:9200/twitter/_search?q=elasticsearch
Facets
Demo
Architecture Communaut
54
Cherchons
{
Cluster
Nud 1
Shard 0
Doc 1
Doc 2
Client CURL
"took" : 24, "timed_out" : false, Nud 3 Nud 2 Nud 4 "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, Doc "hits" : { Shard 0 1 "total" : 2, "max_score" : 0.227, Doc "hits" : [ { Shard 1 Shard 1 2 "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { ... } }, { "_index" : "twitter", "_type" : "tweet", $ curl localhost:9200/twitter/_search?q=elasticsearch "_id" : "2", "_score" : 0.152, "_source" : { ... } } ] } }
Facets
Demo
Architecture Communaut
55
Cherchons encore
Cluster Nud 1
Shard 0
Doc 1 Doc 2
Nud 2
Nud 3
Shard 0
Doc 1
Nud 4
Shard 1
Shard 1
Doc 2
Client CURL
$ curl localhost:9200/twitter/_search?q=elasticsearch
Facets
Demo
Architecture Communaut
56
Cherchons encore
Cluster Nud 1
Shard 0
Doc 1
Nud 2
Doc 1
Nud 3
Shard 0
Doc 2 Doc 2
Nud 4
Shard 1
Shard 1
Client CURL
$ curl localhost:9200/twitter/_search?q=elasticsearch
Facets
Demo
Architecture Communaut
57
Cherchons encore
Cluster Nud 1
Shard 0
Doc 1
Nud 2
Nud 3
Shard 0
Doc 1
Doc 2
Shard 1
Client CURL
$ curl localhost:9200/twitter/_search?q=elasticsearch
Facets
Demo
Architecture Communaut
58
Cherchons encore
{
Cluster
Nud 1
Shard 0
Doc 1
Doc 1 Client
Doc 2
CURL
"took" : 24, "timed_out" : false, Nud 3 Nud 2 "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { Shard 0 "total" : 2, "max_score" : 0.227, "hits" : [ { Shard 1 Shard 1 "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { ... } }, { "_index" : "twitter", "_type" : "tweet", $ curl localhost:9200/twitter/_search?q=elasticsearch "_id" : "2", "_score" : 0.152, "_source" : { ... } } ] } }
Facets
Demo
Architecture Communaut
59
Elasticsearch : la communaut
Facets
Demo
Architecture Communaut
60
Rejoignez le mouvement !
@ElasticsearchFR
QUESTIONS ?