How to extract measurements and/or units from texts?

0

Issue

Situation

Given are some titles with measuremnts and units in various combinations, I´m focused to extract the measurements and units with unit m.

1 Kabel0,3m
2 Kabel,0,3 m m 
3 Kabelx,0.3 m
4 Kabel 1m,
5 Kabel 1 m/
6 Kabel 1 HW-Y 2.0 m LAN/LAN RJ45/2xRJ45 Homeway f.2 unabh.Datennetz-Anwend.blau
7 Rundleitung 0,24 mm 2/ 250 m,8p   
8 Televes TV/RF-Empfängeranschlußkabel 10, 0 m weiss

Best try

(?P<match>(?P<value>\d+(?:\.|,|)\s*\d*)\s*(?<unit>m))

https://regex101.com/r/5yH4GN/1

Still struggling with the the line 7 – So how to exclude the mm?

Expected result

Hope somebody can give me a hint, to come closer to a solution.

match value unit
0,3m 0,3 m
0,3 m 0,3 m
0.3 m 0.3 m
1m 1 m
1 m 1 m
2.0 m 2.0 m
250 m 250 m
10, 0m 10, 0 m

Solution

For the unit m, optionally match the decimal part \d+(?:[.,]\s*\d+)? where the digits after the dot or comma are not optional.

You could add the dot and comma to a character class [.,] and add a word boundary \b after the first m to for example not match mm

(?P<match>(?P<value>\d+(?:[.,]\s*\d+)?)\s*(?<unit>m\b))

Regex demo

Answered By – The fourth bird

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More