Skip to content

Commit 7baba0d

Browse files
author
Felix Hennig
committed
fix formatting and HBase spelling
1 parent 6cddc91 commit 7baba0d

File tree

1 file changed

+21
-21
lines changed

1 file changed

+21
-21
lines changed

docs/modules/demos/pages/hbase-hdfs-load-cycling-data.adoc

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,7 @@ Install this demo on an existing Kubernetes cluster:
1515
$ stackablectl demo install hbase-hdfs-load-cycling-data
1616
----
1717

18-
[WARNING]
19-
====
20-
This demo should not be run alongside other demos.
21-
====
18+
WARNING: This demo should not be run alongside other demos.
2219

2320
[#system-requirements]
2421
== System requirements
@@ -35,11 +32,11 @@ This demo will
3532

3633
* Install the required Stackable operators.
3734
* Spin up the following data products:
38-
** *Hbase:* An open source distributed, scalable, big data store. This demo uses it to store the
35+
** *HBase:* An open source distributed, scalable, big data store. This demo uses it to store the
3936
{kaggle}[cyclist dataset] and enable access.
40-
** *HDFS:* A distributed file system used to intermediately store the dataset before importing it into Hbase
37+
** *HDFS:* A distributed file system used to intermediately store the dataset before importing it into HBase
4138
* Use {distcp}[distcp] to copy a {kaggle}[cyclist dataset] from an S3 bucket into HDFS.
42-
* Create HFiles, a File format for hbase consisting of sorted key/value pairs. Both keys and values are byte arrays.
39+
* Create HFiles, a File format for hBase consisting of sorted key/value pairs. Both keys and values are byte arrays.
4340
* Load Hfiles into an existing table via the `Importtsv` utility, which will load data in `TSV` or `CSV` format into
4441
HBase.
4542
* Query data via the `hbase` shell, which is an interactive shell to execute commands on the created table
@@ -87,10 +84,11 @@ This demo will run two jobs to automatically load data.
8784

8885
=== distcp-cycling-data
8986

90-
{distcp}[DistCp] (distributed copy) is used for large inter/intra-cluster copying. It uses MapReduce to effect its
91-
distribution, error handling, recovery, and reporting. It expands a list of files and directories into input to map
92-
tasks, each of which will copy a partition of the files specified in the source list. Therefore, the first Job uses
93-
DistCp to copy data from a S3 bucket into HDFS. Below, you'll see parts from the logs.
87+
{distcp}[DistCp] (distributed copy) is used for large inter/intra-cluster copying.
88+
It uses MapReduce to effect its distribution, error handling, recovery, and reporting.
89+
It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.
90+
Therefore, the first Job uses DistCp to copy data from a S3 bucket into HDFS.
91+
Below, you'll see parts from the logs.
9492

9593
[source]
9694
----
@@ -111,11 +109,12 @@ Copying s3a://public-backup-nyc-tlc/cycling-tripdata/demo-cycling-tripdata.csv.g
111109

112110
The second Job consists of 2 steps.
113111

114-
First, we use `org.apache.hadoop.hbase.mapreduce.ImportTsv` (see {importtsv}[ImportTsv Docs]) to create a table and
115-
Hfiles. Hfile is an Hbase dedicated file format which is performance optimized for hbase. It stores meta-information
116-
about the data and thus increases the performance of hbase. When connecting to the hbase master, opening a hbase shell
117-
and executing `list`, you will see the created table. However, it'll contain 0 rows at this point. You can connect to
118-
the shell via:
112+
First, we use `org.apache.hadoop.hbase.mapreduce.ImportTsv` (see {importtsv}[ImportTsv Docs]) to create a table and Hfiles.
113+
Hfile is an HBase dedicated file format which is performance optimized for HBase.
114+
It stores meta-information about the data and thus increases the performance of HBase.
115+
When connecting to the HBase master, opening a HBase shell and executing `list`, you will see the created table.
116+
However, it'll contain 0 rows at this point.
117+
You can connect to the shell via:
119118

120119
[source,console]
121120
----
@@ -163,7 +162,7 @@ Took 13.4666 seconds
163162

164163
== Inspecting the Table
165164

166-
You can now use the table and the data. You can use all available hbase shell commands.
165+
You can now use the table and the data. You can use all available HBase shell commands.
167166

168167
[source,sql]
169168
----
@@ -191,15 +190,15 @@ COLUMN FAMILIES DESCRIPTION
191190
{NAME => 'started_at', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
192191
----
193192

194-
== Accessing the Hbase web interface
193+
== Accessing the HBase web interface
195194

196195
[TIP]
197196
====
198197
Run `stackablectl stacklet list` to get the address of the _ui-http_ endpoint.
199198
If the UI is unavailable, do a port-forward `kubectl port-forward hbase-master-default-0 16010`.
200199
====
201200

202-
The Hbase web UI will give you information on the status and metrics of your Hbase cluster. See below for the start page.
201+
The HBase web UI will give you information on the status and metrics of your HBase cluster. See below for the start page.
203202

204203
image::hbase-hdfs-load-cycling-data/hbase-ui-start-page.png[]
205204

@@ -209,7 +208,7 @@ image::hbase-hdfs-load-cycling-data/hbase-table-ui.png[]
209208

210209
== Accessing the HDFS web interface
211210

212-
You can also see HDFS details via a UI by running `stackablectl stacklet list` and following the link next to one of the namenodes.
211+
You can also see HDFS details via a UI by running `stackablectl stacklet list` and following the link next to one of the namenodes.
213212

214213
Below you will see the overview of your HDFS cluster.
215214

@@ -223,7 +222,8 @@ You can also browse the file system by clicking on the `Utilities` tab and selec
223222

224223
image::hbase-hdfs-load-cycling-data/hdfs-data.png[]
225224

226-
Navigate in the file system to the folder `data` and then the `raw` folder. Here you can find the raw data from the distcp job.
225+
Navigate in the file system to the folder `data` and then the `raw` folder.
226+
Here you can find the raw data from the distcp job.
227227

228228
image::hbase-hdfs-load-cycling-data/hdfs-data-raw.png[]
229229

0 commit comments

Comments
 (0)