Skip to content

Commit 994b333

Browse files
fhennigrazvan
andauthored
Add descriptions and fix formatting (#97)
* Add descriptions and fix formatting * Add descriptions and fix formatting * fix formatting and HBase spelling * Update docs/modules/demos/pages/hbase-hdfs-load-cycling-data.adoc Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> * Update docs/modules/demos/pages/hbase-hdfs-load-cycling-data.adoc Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> * Update docs/modules/demos/pages/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> * Update docs/modules/demos/pages/nifi-kafka-druid-earthquake-data.adoc Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> * Update docs/modules/demos/pages/spark-k8s-anomaly-detection-taxi-data.adoc Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> --------- Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com>
1 parent bffebdd commit 994b333

14 files changed

+368
-393
lines changed

docs/modules/demos/pages/airflow-scheduled-job.adoc

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
= airflow-scheduled-job
22
:page-aliases: stable@stackablectl::demos/airflow-scheduled-job.adoc
3+
:description: This demo installs Airflow with Postgres and Redis on Kubernetes, showcasing DAG scheduling, job runs, and status verification via the Airflow UI.
34

45
Install this demo on an existing Kubernetes cluster:
56

@@ -102,9 +103,10 @@ Click on the `run_every_minute` box in the centre of the page and then select `L
102103

103104
[WARNING]
104105
====
105-
In this demo, the logs are not available when the KubernetesExecutor is deployed. See the https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/kubernetes.html#managing-dags-and-logs[Airflow Documentation] for more details.
106+
In this demo, the logs are not available when the KubernetesExecutor is deployed.
107+
See the https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/kubernetes.html#managing-dags-and-logs[Airflow Documentation] for more details.
106108
107-
If you are interested in persisting the logs, please take a look at the xref:logging.adoc[] demo.
109+
If you are interested in persisting the logs, take a look at the xref:logging.adoc[] demo.
108110
====
109111

110112
image::airflow-scheduled-job/airflow_9.png[]

docs/modules/demos/pages/data-lakehouse-iceberg-trino-spark.adoc

Lines changed: 102 additions & 105 deletions
Large diffs are not rendered by default.

docs/modules/demos/pages/end-to-end-security.adoc

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
= end-to-end-security
2-
32
:k8s-cpu: https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu
3+
:description: This demo showcases end-to-end security in Stackable Data Platform with OPA, featuring row/column access control, OIDC, Kerberos, and flexible group policies.
44

55
This is a demo to showcase what can be done with Open Policy Agent around authorization in the Stackable Data Platform.
66
It covers the following aspects of security:
@@ -55,8 +55,7 @@ You can see the deployed products and their relationship in the following diagra
5555

5656
image::end-to-end-security/overview.png[Architectural overview]
5757

58-
Please note the different types of arrows used to connect the technologies in here, which symbolize
59-
how authentication happens along that route and if impersonation is used for queries executed.
58+
Note the different types of arrows used to connect the technologies in here, which symbolize how authentication happens along that route and if impersonation is used for queries executed.
6059

6160
The Trino schema (with schemas, tables and views) is shown below.
6261

docs/modules/demos/pages/hbase-hdfs-load-cycling-data.adoc

Lines changed: 22 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
= hbase-hdfs-cycling-data
22
:page-aliases: stable@stackablectl::demos/hbase-hdfs-load-cycling-data.adoc
3+
:description: Load cyclist data from HDFS to HBase on Kubernetes using Stackable's demo. Install, copy data, create HFiles, and query efficiently.
34

45
:kaggle: https://www.kaggle.com/datasets/timgid/cyclistic-dataset-google-certificate-capstone?select=Divvy_Trips_2020_Q1.csv
56
:k8s-cpu: https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu
@@ -14,10 +15,7 @@ Install this demo on an existing Kubernetes cluster:
1415
$ stackablectl demo install hbase-hdfs-load-cycling-data
1516
----
1617

17-
[WARNING]
18-
====
19-
This demo should not be run alongside other demos.
20-
====
18+
WARNING: This demo should not be run alongside other demos.
2119

2220
[#system-requirements]
2321
== System requirements
@@ -34,11 +32,11 @@ This demo will
3432

3533
* Install the required Stackable operators.
3634
* Spin up the following data products:
37-
** *Hbase:* An open source distributed, scalable, big data store. This demo uses it to store the
35+
** *HBase:* An open source distributed, scalable, big data store. This demo uses it to store the
3836
{kaggle}[cyclist dataset] and enable access.
39-
** *HDFS:* A distributed file system used to intermediately store the dataset before importing it into Hbase
37+
** *HDFS:* A distributed file system used to intermediately store the dataset before importing it into HBase
4038
* Use {distcp}[distcp] to copy a {kaggle}[cyclist dataset] from an S3 bucket into HDFS.
41-
* Create HFiles, a File format for hbase consisting of sorted key/value pairs. Both keys and values are byte arrays.
39+
* Create HFiles, a File format for hBase consisting of sorted key/value pairs. Both keys and values are byte arrays.
4240
* Load Hfiles into an existing table via the `Importtsv` utility, which will load data in `TSV` or `CSV` format into
4341
HBase.
4442
* Query data via the `hbase` shell, which is an interactive shell to execute commands on the created table
@@ -86,10 +84,9 @@ This demo will run two jobs to automatically load data.
8684

8785
=== distcp-cycling-data
8886

89-
{distcp}[DistCp] (distributed copy) is used for large inter/intra-cluster copying. It uses MapReduce to effect its
90-
distribution, error handling, recovery, and reporting. It expands a list of files and directories into input to map
91-
tasks, each of which will copy a partition of the files specified in the source list. Therefore, the first Job uses
92-
DistCp to copy data from a S3 bucket into HDFS. Below, you'll see parts from the logs.
87+
{distcp}[DistCp] (distributed copy) efficiently transfers large amounts of data from one location to another.
88+
Therefore, the first Job uses DistCp to copy data from a S3 bucket into HDFS.
89+
Below, you'll see parts from the logs.
9390

9491
[source]
9592
----
@@ -110,11 +107,12 @@ Copying s3a://public-backup-nyc-tlc/cycling-tripdata/demo-cycling-tripdata.csv.g
110107

111108
The second Job consists of 2 steps.
112109

113-
First, we use `org.apache.hadoop.hbase.mapreduce.ImportTsv` (see {importtsv}[ImportTsv Docs]) to create a table and
114-
Hfiles. Hfile is an Hbase dedicated file format which is performance optimized for hbase. It stores meta-information
115-
about the data and thus increases the performance of hbase. When connecting to the hbase master, opening a hbase shell
116-
and executing `list`, you will see the created table. However, it'll contain 0 rows at this point. You can connect to
117-
the shell via:
110+
First, we use `org.apache.hadoop.hbase.mapreduce.ImportTsv` (see {importtsv}[ImportTsv Docs]) to create a table and Hfiles.
111+
Hfile is an HBase dedicated file format which is performance optimized for HBase.
112+
It stores meta-information about the data and thus increases the performance of HBase.
113+
When connecting to the HBase master, opening a HBase shell and executing `list`, you will see the created table.
114+
However, it'll contain 0 rows at this point.
115+
You can connect to the shell via:
118116

119117
[source,console]
120118
----
@@ -135,7 +133,7 @@ cycling-tripdata
135133
----
136134

137135
Secondly, we'll use `org.apache.hadoop.hbase.tool.LoadIncrementalHFiles` (see {bulkload}[bulk load docs]) to import
138-
the Hfiles into the table and ingest rows.
136+
the Hfiles into the table and ingest rows.
139137

140138
Now we will see how many rows are in the `cycling-tripdata` table:
141139

@@ -162,7 +160,7 @@ Took 13.4666 seconds
162160

163161
== Inspecting the Table
164162

165-
You can now use the table and the data. You can use all available hbase shell commands.
163+
You can now use the table and the data. You can use all available HBase shell commands.
166164

167165
[source,sql]
168166
----
@@ -190,15 +188,15 @@ COLUMN FAMILIES DESCRIPTION
190188
{NAME => 'started_at', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
191189
----
192190

193-
== Accessing the Hbase web interface
191+
== Accessing the HBase web interface
194192

195193
[TIP]
196194
====
197195
Run `stackablectl stacklet list` to get the address of the _ui-http_ endpoint.
198-
If the UI is unavailable, please do a port-forward `kubectl port-forward hbase-master-default-0 16010`.
196+
If the UI is unavailable, do a port-forward `kubectl port-forward hbase-master-default-0 16010`.
199197
====
200198

201-
The Hbase web UI will give you information on the status and metrics of your Hbase cluster. See below for the start page.
199+
The HBase web UI will give you information on the status and metrics of your HBase cluster. See below for the start page.
202200

203201
image::hbase-hdfs-load-cycling-data/hbase-ui-start-page.png[]
204202

@@ -208,8 +206,7 @@ image::hbase-hdfs-load-cycling-data/hbase-table-ui.png[]
208206

209207
== Accessing the HDFS web interface
210208

211-
You can also see HDFS details via a UI by running `stackablectl stacklet list` and following the link next to one of
212-
the namenodes.
209+
You can also see HDFS details via a UI by running `stackablectl stacklet list` and following the link next to one of the namenodes.
213210

214211
Below you will see the overview of your HDFS cluster.
215212

@@ -223,7 +220,8 @@ You can also browse the file system by clicking on the `Utilities` tab and selec
223220

224221
image::hbase-hdfs-load-cycling-data/hdfs-data.png[]
225222

226-
Navigate in the file system to the folder `data` and then the `raw` folder. Here you can find the raw data from the distcp job.
223+
Navigate in the file system to the folder `data` and then the `raw` folder.
224+
Here you can find the raw data from the distcp job.
227225

228226
image::hbase-hdfs-load-cycling-data/hdfs-data-raw.png[]
229227

docs/modules/demos/pages/index.adoc

Lines changed: 17 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,30 @@
11
= Demos
22
:page-aliases: stable@stackablectl::demos/index.adoc
3+
:description: Explore Stackable demos showcasing data platform architectures. Includes external components for evaluation.
34

4-
The pages below this section guide you on how to use the demos provided by Stackable. To install a demo please follow
5-
the xref:management:stackablectl:quickstart.adoc[quickstart guide] or have a look at the
6-
xref:management:stackablectl:commands/demo.adoc[demo command]. We currently offer the following list of demos:
5+
The pages in this section guide you on how to use the demos provided by Stackable.
6+
To install a demo follow the xref:management:stackablectl:quickstart.adoc[quickstart guide] or have a look at the xref:management:stackablectl:commands/demo.adoc[demo command].
7+
These are the available demos:
78

89
include::partial$demos.adoc[]
910

1011
[IMPORTANT]
1112
.External Components in these demos
1213
====
13-
These demos are provided by Stackable as showcases to demonstrate potential architectures that could be built with the
14-
Stackable Data Platform. As such they may include components that are not supported by Stackable as part of our
15-
commercial offering.
14+
These demos are provided by Stackable as showcases to demonstrate potential architectures that could be built with the Stackable Data Platform.
15+
As such they may include components that are not supported by Stackable as part of our commercial offering.
1616
17-
If you are evaluating one or more of these demos with the intention of purchasing a subscription, please make sure to
18-
double-check the list of supported operators, anything that is not mentioned on there is not part of our commercial
19-
offering.
17+
If you are evaluating one or more of these demos with the intention of purchasing a subscription, make sure to double-check the list of supported operators; anything that is not mentioned on there is not part of our commercial offering.
2018
21-
Below you can find a list of components that are currently contained in one or more of the demos for reference, if
22-
something is missing from this list and also not mentioned on our operators list, then this component is not supported:
19+
Below you can find a list of components that are currently contained in one or more of the demos for reference, if something is missing from this list and also not mentioned on our operators list, then this component is not supported:
2320
24-
- Grafana
25-
- JupyterHub
26-
- MinIO
27-
- OpenLDAP
28-
- OpenSearch
29-
- OpenSearch Dashboards
30-
- PostgreSQL
31-
- Prometheus
32-
- Redis
21+
* Grafana
22+
* JupyterHub
23+
* MinIO
24+
* OpenLDAP
25+
* OpenSearch
26+
* OpenSearch Dashboards
27+
* PostgreSQL
28+
* Prometheus
29+
* Redis
3330
====

0 commit comments

Comments
 (0)