Skip to content

Commit 011d40a

Browse files
authored
Merge pull request #3835 from ClickHouse/observability_docs
Observability docs
2 parents 2ea00e1 + 08116d3 commit 011d40a

File tree

127 files changed

+5438
-64
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

127 files changed

+5438
-64
lines changed

docs/getting-started/index.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,33 @@ functions in ClickHouse. The sample datasets include:
2121

2222
<!-- The following table is automatically generated at build time
2323
by https://github.com/ClickHouse/clickhouse-docs/blob/main/scripts/autogenerate-table-of-contents.sh -->
24+
| Page | Description |
25+
|-----|-----|
26+
| [New York Taxi Data](/getting-started/example-datasets/nyc-taxi) | Data for billions of taxi and for-hire vehicle (Uber, Lyft, etc.) trips originating in New York City since 2009 |
27+
| [Terabyte Click Logs from Criteo](/getting-started/example-datasets/criteo) | A terabyte of Click Logs from Criteo |
28+
| [WikiStat](/getting-started/example-datasets/wikistat) | Explore the WikiStat dataset containing 0.5 trillion records. |
29+
| [TPC-DS (2012)](/getting-started/example-datasets/tpcds) | The TPC-DS benchmark data set and queries. |
30+
| [Recipes Dataset](/getting-started/example-datasets/recipes) | The RecipeNLG dataset, containing 2.2 million recipes |
31+
| [COVID-19 Open-Data](/getting-started/example-datasets/covid19) | COVID-19 Open-Data is a large, open-source database of COVID-19 epidemiological data and related factors like demographics, economics, and government responses |
32+
| [NOAA Global Historical Climatology Network](/getting-started/example-datasets/noaa) | 2.5 billion rows of climate data for the last 120 yrs |
33+
| [GitHub Events Dataset](/getting-started/example-datasets/github-events) | Dataset containing all events on GitHub from 2011 to Dec 6 2020, with a size of 3.1 billion records. |
34+
| [Amazon Customer Review](/getting-started/example-datasets/amazon-reviews) | Over 150M customer reviews of Amazon products |
35+
| [Brown University Benchmark](/getting-started/example-datasets/brown-benchmark) | A new analytical benchmark for machine-generated log data |
36+
| [Writing Queries in ClickHouse using GitHub Data](/getting-started/example-datasets/github) | Dataset containing all of the commits and changes for the ClickHouse repository |
37+
| [Analyzing Stack Overflow data with ClickHouse](/getting-started/example-datasets/stackoverflow) | Analyzing Stack Overflow data with ClickHouse |
38+
| [AMPLab Big Data Benchmark](/getting-started/example-datasets/amplab-benchmark) | A benchmark dataset used for comparing the performance of data warehousing solutions. |
39+
| [New York Public Library "What's on the Menu?" Dataset](/getting-started/example-datasets/menus) | Dataset containing 1.3 million records of historical data on the menus of hotels, restaurants and cafes with the dishes along with their prices. |
40+
| [Laion-400M dataset](/getting-started/example-datasets/laion-400m-dataset) | Dataset containing 400 million images with English image captions |
41+
| [Star Schema Benchmark (SSB, 2009)](/getting-started/example-datasets/star-schema) | The Star Schema Benchmark (SSB) data set and queries |
42+
| [The UK property prices dataset](/getting-started/example-datasets/uk-price-paid) | Learn how to use projections to improve the performance of queries that you run frequently using the UK property dataset, which contains data about prices paid for real-estate property in England and Wales |
43+
| [Reddit comments dataset](/getting-started/example-datasets/reddit-comments) | Dataset containing publicly available comments on Reddit from December 2005 to March 2023 with over 14B rows of data in JSON format |
44+
| [OnTime](/getting-started/example-datasets/ontime) | Dataset containing the on-time performance of airline flights |
45+
| [Taiwan Historical Weather Datasets](/getting-started/example-datasets/tw-weather) | 131 million rows of weather observation data for the last 128 yrs |
46+
| [Crowdsourced air traffic data from The OpenSky Network 2020](/getting-started/example-datasets/opensky) | The data in this dataset is derived and cleaned from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. |
47+
| [NYPD Complaint Data](/getting-started/example-datasets/nypd_complaint_data) | Ingest and query Tab Separated Value data in 5 steps |
48+
| [TPC-H (1999)](/getting-started/example-datasets/tpch) | The TPC-H benchmark data set and queries. |
49+
| [Foursquare places](/getting-started/example-datasets/foursquare-places) | Dataset with over 100 million records containing information about places on a map, such as shops, restaurants, parks, playgrounds, and monuments. |
50+
| [YouTube dataset of dislikes](/getting-started/example-datasets/youtube-dislikes) | A collection is dislikes of YouTube videos. |
51+
| [Geo Data using the Cell Tower Dataset](/getting-started/example-datasets/cell-towers) | Learn how to load OpenCelliD data into ClickHouse, connect Apache Superset to ClickHouse and build a dashboard based on data |
52+
| [Environmental Sensors Data](/getting-started/example-datasets/environmental-sensors) | Over 20 billion records of data from Sensor.Community, a contributors-driven global sensor network that creates Open Environmental Data. |
53+
| [Anonymized Web Analytics](/getting-started/example-datasets/metrica) | Dataset consisting of two tables containing anonymized web analytics data with hits and visits |
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
title: 'Demo Application'
3+
description: 'Demo application for observability'
4+
slug: /observability/demo-application
5+
keywords: ['observability', 'logs', 'traces', 'metrics', 'OpenTelemetry', 'Grafana', 'OTel']
6+
---
7+
8+
The OpenTelemetry project includes a [demo application](https://opentelemetry.io/docs/demo/). A maintained fork of this application with ClickHouse as a data source for logs and traces can be found [here](https://github.com/ClickHouse/opentelemetry-demo). The [official demo instructions](https://opentelemetry.io/docs/demo/docker-deployment/) can be followed to deploy this demo with docker. In addition to the [existing components](https://opentelemetry.io/docs/demo/collector-data-flow-dashboard/), an instance of ClickHouse will be deployed and used for the storage of logs and traces.

docs/use-cases/observability/grafana.md renamed to docs/use-cases/observability/build-your-own/grafana.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,11 @@ import Image from '@theme/IdealImage';
2323
Grafana represents the preferred visualization tool for Observability data in ClickHouse. This is achieved using the official ClickHouse plugin for Grafana. Users can follow the installation instructions found [here](/integrations/grafana).
2424

2525
V4 of the plugin makes logs and traces a first-class citizen in a new query builder experience. This minimizes the need for SREs to write SQL queries and simplifies SQL-based Observability, moving the needle forward for this emerging paradigm.
26-
Part of this has been placing Open Telemetry (OTel) at the core of the plugin, as we believe this will be the foundation of SQL-based Observability over the coming years and how data will be collected.
26+
Part of this has been placing OpenTelemetry (OTel) at the core of the plugin, as we believe this will be the foundation of SQL-based Observability over the coming years and how data will be collected.
2727

28-
## Open Telemetry Integration {#open-telemetry-integration}
28+
## OpenTelemetry Integration {#open-telemetry-integration}
2929

30-
On configuring a Clickhouse datasource in Grafana, the plugin allows the users to specify a default database and table for logs and traces and whether these tables conform to the OTel schema. This allows the plugin to return the columns required for correct log and trace rendering in Grafana. If you've made changes to the default OTel schema and prefer to use your own column names, these can be specified. Usage of the default OTel column names for columns such as time (Timestamp), log level (SeverityText), or message body (Body) means no changes need to be made.
30+
On configuring a ClickHouse datasource in Grafana, the plugin allows the user to specify a default database and table for logs and traces and whether these tables conform to the OTel schema. This allows the plugin to return the columns required for correct log and trace rendering in Grafana. If you've made changes to the default OTel schema and prefer to use your own column names, these can be specified. Usage of the default OTel column names for columns such as time (`Timestamp`), log level (`SeverityText`), or message body (`Body`) means no changes need to be made.
3131

3232
:::note HTTP or Native
3333
Users can connect Grafana to ClickHouse over either the HTTP or Native protocol. The latter offers marginal performance advantages which are unlikely to be appreciable in the aggregation queries issued by Grafana users. Conversely, the HTTP protocol is typically simpler for users to proxy and introspect.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
slug: /use-cases/observability/build-your-own
3+
title: 'Build Your Own Observability Stack'
4+
pagination_prev: null
5+
pagination_next: null
6+
description: 'Landing page building your own observability stack'
7+
---
8+
9+
This guide helps you build a custom observability stack using ClickHouse as the foundation. Learn how to design, implement, and optimize your observability solution for logs, metrics, and traces, with practical examples and best practices.
10+
11+
| Page | Description |
12+
|-------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
13+
| [Introduction](/use-cases/observability/introduction) | This guide is designed for users looking to build their own observability solution using ClickHouse, focusing on logs and traces. |
14+
| [Schema design](/use-cases/observability/schema-design) | Learn why users are recommended to create their own schema for logs and traces, along with some best practices for doing so. |
15+
| [Managing data](/observability/managing-data) | Deployments of ClickHouse for observability invariably involve large datasets, which need to be managed. ClickHouse offers features to assist with data management. |
16+
| [Integrating OpenTelemetry](/observability/integrating-opentelemetry) | Collecting and exporting logs and traces using OpenTelemetry with ClickHouse. |
17+
| [Using Visualization Tools](/observability/grafana) | Learn how to use observability visualization tools for ClickHouse, including HyperDX and Grafana. |
18+
| [Demo Application](/observability/demo-application) | Explore the OpenTelemetry demo application forked to work with ClickHouse for logs and traces. |

0 commit comments

Comments
 (0)