Discussion:
Dealing with renamed source packages during CVE triaging
Brian May
2018-06-08 07:29:38 UTC
Permalink
Right now, it seems that all scripts that hammer at those files do so
with their own ad-hoc parsing code. Is that the recommended way of
chopping those files up? Or is there a better parsing library out there?
It sounds like we really good do with a good parsing library. Maybe one
that supports making changes too.

I could make a start on this.

Obligatory XKCD:
https://xkcd.com/927/
--
Brian May <***@debian.org>
Holger Levsen
2018-06-08 01:22:02 UTC
Permalink
Sorry for resurrecting this old thread

No!

I very much appreciate it when people keep issues in the back of their minds
and keep thinking about them and keep reminding us "others" until they are
solved properly!

Thank you.

:)
--
cheers,
Holger (and SCNR too)
Antoine Beaupré
2018-06-08 13:44:20 UTC
Permalink
Post by Brian May
Right now, it seems that all scripts that hammer at those files do so
with their own ad-hoc parsing code. Is that the recommended way of
chopping those files up? Or is there a better parsing library out there?
It sounds like we really good do with a good parsing library. Maybe one
that supports making changes too.
I could make a start on this.
As I mentioned in the other thread, I am uncertain where to go from
here. Some scripts use JSON, others parse the files by hand... I also
found out yesterday after writing this that there is *already* a parsing
library in the security tracker. It can parse {CVE,DSA,DLA}/list files
and lives in lib/python/bugs.py, but it's somewhat coupled with the
sqlite database - i'm not sure it's usable standalone.

But yeah, maybe clarifying all this stuff would help, for sure... I
would recommend not writing yet another library from scratch however, as
we probably have a dozen such parser already and it's confusing enough
as it is. ;)

a.
--
L'ennui avec la grande famille humaine, c'est que tout le monde veut
en être le père.
- Mafalda
Antoine Beaupré
2018-06-08 17:51:52 UTC
Permalink
I've finalized a prototype during my research on this problem, which I
have detailed on GitLab, as it's really code that should be merged. It
would also benefit from wider attention considering it affects more than
LTS now. Anyways, the MR is here:

https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/4

Comments are welcome there or here.

For what it's worth, I reused Lamby's crude parser because I wanted to
get the prototype out the door. I am also uncertain that a full parser
can create the CVE/list file as is reliably without introducing
inconsistent diffs...

I also drifted into the core datastructures of the security tracker, and
wondered if it would be better to split up our large CVE/list file now
that we're using git. I had mixed results. For those interested, it is
documented here:

https://salsa.debian.org/security-tracker-team/security-tracker/issues/2

Cheers!

a.
--
If it's important for you, you'll find a way.
If it's not, you'll find an excuse.
- Unknown
Brian May
2018-06-12 07:40:34 UTC
Permalink
Post by Antoine Beaupré
I've finalized a prototype during my research on this problem, which I
have detailed on GitLab, as it's really code that should be merged. It
would also benefit from wider attention considering it affects more than
https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/4
Comments are welcome there or here.
For what it's worth, I reused Lamby's crude parser because I wanted to
get the prototype out the door. I am also uncertain that a full parser
can create the CVE/list file as is reliably without introducing
inconsistent diffs...
I also drifted into the core datastructures of the security tracker, and
wondered if it would be better to split up our large CVE/list file now
that we're using git. I had mixed results. For those interested, it is
https://salsa.debian.org/security-tracker-team/security-tracker/issues/2
So if I understand correctly, the parts that aren't done yet are:

1. Tagging with <removed>/<unfixed> instead of <undetermined>.
2. Not processing old entries that we don't care about anymore.
3. Resolve general issue regarding CVE/list, and if it should be split up.

For these:

1. We need to be able if the package still exists or not in a given
distribution. This information is not available from the security-tacker
database, we would need to get it using online json calls. For each and
every package we look at. Which is likely to be very slow, although
incremental processing might help (????).

2. For incrememntal updates, coming up with a definition of old entries
that is easy to check seems to be the stumbling point here. Particularly
as entries in CVE/list can be created not in order, and old CVEs might
still be very relevant.

Maybe we need to create/update a list of all CVEs we have processed
before? Would this work, or is there some problem I haven't thought of?

Ideally for this to work properly we would also need to ensure that it
updates all entries in one run, as one run would be all we get. Not
multiple runs as can be the case now.

3. I have not noticed git operations being slow, but then again I don't
often update this file. As a potential compromise, maybe instead of one
file per CVE, one file per year?
--
Brian May <***@debian.org>
Moritz Muehlenhoff
2018-06-12 17:34:26 UTC
Permalink
Post by Brian May
1. Tagging with <removed>/<unfixed> instead of <undetermined>.
Nothing of those can automated. The basic point of <undetermined> is that
we lack data to make a proper assessment.

The correct way to handle these is to triage
https://security-tracker.debian.org/tracker/status/undetermined by contacting
e.g. upstream developers or the reporters of the vulnerability and then amend
CVE/list with the necessary information, i.e. either converting them to
<unfixed> if it has been confirmed to be an issue or to <not-affected>.
Post by Brian May
3. Resolve general issue regarding CVE/list, and if it should be split up.
That has been proposed and nacked several times before. There's simply
no practical reason for it. It would add multiple complications (starting
with the MITRE sync, syncing with external parties, changes to the tracker)
for no measurable gain. Quite the contrary; it's extremely useful to have
20 years of vulnerability data easily available in a single emacs buffer.

Cheers,
Moritz
Brian May
2018-06-13 07:19:40 UTC
Permalink
Post by Moritz Muehlenhoff
Post by Brian May
1. Tagging with <removed>/<unfixed> instead of <undetermined>.
Nothing of those can automated. The basic point of <undetermined> is that
we lack data to make a proper assessment.
The correct way to handle these is to triage
https://security-tracker.debian.org/tracker/status/undetermined by contacting
e.g. upstream developers or the reporters of the vulnerability and then amend
CVE/list with the necessary information, i.e. either converting them to
<unfixed> if it has been confirmed to be an issue or to
<not-affected>.
"as I said in the mailing list discussion, I don't like the usage of the
undetermined tag... we use it to hide stuff we can't investigate under
the carpet, I would much prefer that we put it as <removed> directly
when it's the case, or <unfixed> otherwise."

Having said that, not sure I personally understand this concern. It
would simplify things if we could just use <undertermined>.
Post by Moritz Muehlenhoff
Post by Brian May
3. Resolve general issue regarding CVE/list, and if it should be split up.
That has been proposed and nacked several times before. There's simply
no practical reason for it. It would add multiple complications (starting
with the MITRE sync, syncing with external parties, changes to the tracker)
for no measurable gain. Quite the contrary; it's extremely useful to have
20 years of vulnerability data easily available in a single emacs buffer.
The concerns (from reading the PR) were that:

* git can't cope efficiently with such large files.
* emacs can't cope efficiently with such large files.

In any case, possibly better to leave feedback on the pull request:

https://salsa.debian.org/security-tracker-team/security-tracker/issues/2
--
Brian May <***@debian.org>
Brian May
2018-06-13 07:28:41 UTC
Permalink
s/pull request/issue/

Sorry for any confusion.
--
Brian May <***@debian.org>
Moritz Muehlenhoff
2018-06-13 17:25:14 UTC
Permalink
Post by Brian May
"as I said in the mailing list discussion, I don't like the usage of the
undetermined tag... we use it to hide stuff we can't investigate under
the carpet, I would much prefer that we put it as <removed> directly
when it's the case, or <unfixed> otherwise."
Of course, those can be resolved; it just needs someone to do the analysis work.
Switching to some other tags (and incorrect ones!) doesn't change anything.

Cheers,
Moritz
Brian May
2018-06-15 06:34:14 UTC
Permalink
Post by Moritz Muehlenhoff
Post by Brian May
"as I said in the mailing list discussion, I don't like the usage of the
undetermined tag... we use it to hide stuff we can't investigate under
the carpet, I would much prefer that we put it as <removed> directly
when it's the case, or <unfixed> otherwise."
Of course, those can be resolved; it just needs someone to do the analysis work.
Switching to some other tags (and incorrect ones!) doesn't change anything.
Seems like this a mute point anyway, as from the comments you left in
the pull request, you don't like this approach of automatically adding
entries in data/CVE/list. Fair enough.

So we could write a script, lets say:
bin/list-potential-packages-affected-by-code-copies

That generates a report of all packages that we need to check. I assume
we would need some way of marking packages that we have checked and
found to be not affected, so we can get a list of packages that need
immediate attention and don't repeatedly check the same package multiple
times. How should we do this? Maybe another file in the security tracker
repository?

Would anybody object to this approach?
--
Brian May <***@debian.org>
Brian May
2018-06-15 07:21:55 UTC
Permalink
Post by Brian May
bin/list-potential-packages-affected-by-code-copies
In investigating the possibility of this, I noticed the scripts in
lib/python/sectracker use legacy python coding standards.

I have updated these files on my local box to work with Python 3, but
refraining from pushing for now, because of the possibilty I might break
something important.

Is Python 2 compatability still required?
--
Brian May <***@debian.org>
Moritz Muehlenhoff
2018-06-15 08:23:15 UTC
Permalink
Post by Brian May
Post by Brian May
bin/list-potential-packages-affected-by-code-copies
In investigating the possibility of this, I noticed the scripts in
lib/python/sectracker use legacy python coding standards.
I have updated these files on my local box to work with Python 3, but
refraining from pushing for now, because of the possibilty I might break
something important.
When the Debian Security Tracker was created, Python 3 didn't even exist
yet :-)

Feel free to make a pull request, I don't think we have a specific dependency
on Python 2 modules anywhere. But it might take a bit to get reviewed/deployed
as it's not a high priority issue.

Cheers,
Moritz
Salvatore Bonaccorso
2018-06-17 06:23:37 UTC
Permalink
Hi,
Post by Moritz Muehlenhoff
Post by Brian May
Post by Brian May
bin/list-potential-packages-affected-by-code-copies
In investigating the possibility of this, I noticed the scripts in
lib/python/sectracker use legacy python coding standards.
I have updated these files on my local box to work with Python 3, but
refraining from pushing for now, because of the possibilty I might break
something important.
When the Debian Security Tracker was created, Python 3 didn't even exist
yet :-)
Feel free to make a pull request, I don't think we have a specific dependency
on Python 2 modules anywhere. But it might take a bit to get reviewed/deployed
as it's not a high priority issue.
To be kept in mind: whatever change is proposed for the code part of
the security tracker needs potentially to be able to run on the
security-tracker host soriano (running on stretch), preferably without
introducing new dependencies if they are not needed. Merge/pull requests
for those parts are preferred.

Regards,
Salvatore
Brian May
2018-06-17 07:56:50 UTC
Permalink
Post by Salvatore Bonaccorso
Post by Moritz Muehlenhoff
Feel free to make a pull request, I don't think we have a specific dependency
on Python 2 modules anywhere. But it might take a bit to get reviewed/deployed
as it's not a high priority issue.
To be kept in mind: whatever change is proposed for the code part of
the security tracker needs potentially to be able to run on the
security-tracker host soriano (running on stretch), preferably without
introducing new dependencies if they are not needed. Merge/pull requests
for those parts are preferred.
I will look at making a pull request tomorrow. The changes should be
reasonably straight forward syntax changes (e.g. use "!=" instead of
"<>" for the does not equal operator), work with Python3 in stretch, and
not require any additional dependancies (I think it only depends on
Python3).

Perhaps the most intrusive change is deleting the py file with the
definition of namedtuple, it is not needed now Python has the
collections module with a built in namedtuple.
--
Brian May <***@debian.org>
Brian May
2018-06-18 07:54:04 UTC
Permalink
Post by Brian May
I will look at making a pull request tomorrow. The changes should be
reasonably straight forward syntax changes (e.g. use "!=" instead of
"<>" for the does not equal operator), work with Python3 in stretch, and
not require any additional dependancies (I think it only depends on
Python3).
Python3 support:

https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/7

This one implements
bin/list-potential-packages-affected-by-code-copies:

https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/8

At present time I have written this one to work with Python 2.7 and
Python 3.6, but it won't work with Python 3.6 without the other pull
request first
--
Brian May <***@debian.org>
Moritz Muehlenhoff
2018-06-15 08:27:45 UTC
Permalink
Post by Brian May
Post by Moritz Muehlenhoff
Post by Brian May
"as I said in the mailing list discussion, I don't like the usage of the
undetermined tag... we use it to hide stuff we can't investigate under
the carpet, I would much prefer that we put it as <removed> directly
when it's the case, or <unfixed> otherwise."
Of course, those can be resolved; it just needs someone to do the analysis work.
Switching to some other tags (and incorrect ones!) doesn't change anything.
Seems like this a mute point anyway, as from the comments you left in
the pull request, you don't like this approach of automatically adding
entries in data/CVE/list. Fair enough.
bin/list-potential-packages-affected-by-code-copies
You're mixing two things; my comment above refers to <undetermined>, those
are one-off investigations and don't need any particular tooling.
Post by Brian May
That generates a report of all packages that we need to check. I assume
we would need some way of marking packages that we have checked and
found to be not affected, so we can get a list of packages that need
immediate attention and don't repeatedly check the same package multiple
times. How should we do this? Maybe another file in the security tracker
repository?
Maybe start with the script initially and see whether it's useful as an
approach in general. State tracking can be discussed/added later.

Lots of the false positives will result from crappy/outdated entries
in embedded-code-copies, so fixing those up will drastically reduce
false positives.

Cheers,
Moritz
Antoine Beaupré
2018-06-20 17:42:11 UTC
Permalink
[...]
Post by Moritz Muehlenhoff
Post by Brian May
That generates a report of all packages that we need to check. I assume
we would need some way of marking packages that we have checked and
found to be not affected, so we can get a list of packages that need
immediate attention and don't repeatedly check the same package multiple
times. How should we do this? Maybe another file in the security tracker
repository?
Maybe start with the script initially and see whether it's useful as an
approach in general. State tracking can be discussed/added later.
Maybe the same principle applies as with the approach I considered. We
could have a --stop argument that would consider entries up to a certain
CVE number and ignore the rest of the file.
Post by Moritz Muehlenhoff
Lots of the false positives will result from crappy/outdated entries
in embedded-code-copies, so fixing those up will drastically reduce
false positives.
If the embedded-code-copies is used more systematically, with a
semi-automated script, in the triaging process, we'll be more inclined
to keep it up to date as well so I think it would actually help with
that as well...

bam: do you want me to start working on that script or were you working
on this already?

Thanks for the feedback,

A.
--
Ils versent un pauvre miel sur leurs mots pourris et te parlent de pénurie
Et sur ta faim, sur tes amis, ils aiguisent leur appétit
- Richard Desjardins, La maison est ouverte
Brian May
2018-06-21 06:54:45 UTC
Permalink
Post by Antoine Beaupré
bam: do you want me to start working on that script or were you working
on this already?
See
https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/8

I personally find this easier to understand as we use the existing CVE
list parser, although I have not considered how to write changes (as
this wasn't a requirement when I wrote this).
--
Brian May <***@debian.org>
Brian May
2018-06-13 07:38:07 UTC
Permalink
Post by Antoine Beaupré
https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/4
Comments are welcome there or here.
Current comments on merge request, copied and pasted here, as I think
relevant for the discussion here:

Moritz Muehlenhoff @jmm commented 4 days ago Owner
Strong nack, the data quality of embedded code copies isn't useful for
this. When you've verified a certain package to be affected, add it
manually (with references), but don't dump lots of unactionable data
into the tracker.

Brian May @bam commented 2 minutes ago Developer
@jmm The problem I
believe is how do we keep track of packages that might be affected but
aren't listed in the security tracker? Do we maybe need to keep track of
this information outside the security tracker?
--
Brian May <***@debian.org>
Loading...